HK1260281B - Detecting and classifying copy number variation - Google Patents
Detecting and classifying copy number variationInfo
- Publication number
- HK1260281B HK1260281B HK19120238.1A HK19120238A HK1260281B HK 1260281 B HK1260281 B HK 1260281B HK 19120238 A HK19120238 A HK 19120238A HK 1260281 B HK1260281 B HK 1260281B
- Authority
- HK
- Hong Kong
- Prior art keywords
- chromosome
- sequence
- interest
- segment
- dose
- Prior art date
Links
Description
本申请是申请日为2012年11月7日,申请号为201210441134.8,发明名称为“拷贝数变异的检测和分类”的发明专利申请的分案申请。This application is a divisional application of the invention patent application with application date of November 7, 2012, application number 201210441134.8, and invention name “Detection and classification of copy number variation”.
背景技术Background Art
人类医学研究中的关键努力之一是发现了对不良健康结果极其重要的遗传性异常。在很多情况下,在基因组的多个部分中已经识别出了特定基因和/ 或关键诊断标记物,它们是以异常拷贝数存在的。例如,在产前诊断中,整个染色体的额外的或丢失的拷贝是经常发生的遗传损伤。在癌症中,整个染色体或染色体区段的拷贝缺失或倍增、以及基因组特定区域的更高水平的扩增是常见的情况。One of the key efforts in human medical research is the discovery of genetic abnormalities that are crucial for adverse health outcomes. In many cases, specific genes and/or key diagnostic markers have been identified in multiple parts of the genome that are present in abnormal copy numbers. For example, in prenatal diagnosis, extra or lost copies of entire chromosomes are common genetic lesions. In cancer, loss or duplication of copies of entire chromosomes or chromosome segments, as well as higher levels of amplification in specific regions of the genome, are common.
通过允许识别出结构性异常的细胞遗传学分辨能力已经提供了关于拷贝数变异的大部分信息。用于遗传筛选和生物剂量测定的多种常规程序已经利用了侵入性程序(例如羊膜穿刺)来获得用于核型分析的细胞。认识到对不需要细胞培养的更迅速测试方法的需要,已经开发出了荧光原位杂交(FISH)、定量荧光PCR(QF-PCR)以及阵列-比较基因组杂交(阵列-CGH)来作为用于分析拷贝数变异的分子细胞遗传学方法。Most of the information about copy number variation has been provided by the cytogenetic resolution that allows the identification of structural abnormalities. A variety of conventional procedures for genetic screening and biological dosimetry have utilized invasive procedures (e.g., amniocentesis) to obtain cells for karyotyping. Recognizing the need for more rapid testing methods that do not require cell culture, fluorescence in situ hybridization (FISH), quantitative fluorescence PCR (QF-PCR), and array-comparative genomic hybridization (array-CGH) have been developed as molecular cytogenetic methods for analyzing copy number variation.
允许在较短时间内对整个基因组进行测序的技术的出现、以及循环无细胞DNA(cfDNA)的发现已经提供了机会来将源自一个有待比较的的染色体遗传物质与另一遗传物质的染色体进行比较,而没有与侵入性采样过程相关的风险。然而,现存方法的多种限制(它们包括出自有限水平的cfDNA的不足的敏感性)以及出自基因组信息的固有性质的技术的测序偏差决定了对于无创性方法的持续性需求,这些无创性方法将提供特异性、敏感性、和适用性中任一项或全部,以便在多种临床环境中可靠地确定拷贝数的变化。The advent of technologies that allow sequencing of entire genomes in a relatively short period of time, as well as the discovery of circulating cell-free DNA (cfDNA), have provided the opportunity to compare genetic material from one chromosome to be compared with that of another chromosome without the risks associated with invasive sampling procedures. However, limitations of existing methods, including insufficient sensitivity due to limited levels of cfDNA and sequencing biases of the technology due to the inherent nature of genomic information, dictate a continuing need for noninvasive methods that will provide any or all of the specificity, sensitivity, and applicability to reliably determine copy number changes in a variety of clinical settings.
在此披露的实施方案满足了以上需求中的一些,并且特别是在提供一种可靠方法方面给出了一种优势,该方法至少适用于实施无创性产前诊断学、并且适用于诊断并监护癌症病人中的转移性进展。The embodiments disclosed herein satisfy some of the above needs and, in particular, offer an advantage in providing a reliable method at least suitable for performing non-invasive prenatal diagnostics and for diagnosing and monitoring metastatic progression in cancer patients.
概述Overview
母体样品中的母体DNA背景对任何试图从样品的母体染色体组中区分胎儿染色体的检测而言都具有敏感性的操作限制。因此,对于依靠胎儿和母体染色体组之间的量化差异和/或实质差异的诊断和常规检测来说,胎儿分数是需要考虑的重要参数。本发明提供了一种用于确定母体样品中的胎儿分数的方法。该方法将胎儿分数作为归一化染色体值或归一化染色体区段值的函数来获得。本发明用于确定胎儿分数的方法可以与其他方法结合,例如与将胎儿分数作为多态性中等位基因不平衡信息的函数来获得的方法相结合,对母体样品中的胎儿染色体或染色体区段的拷贝数变异进行分类。本发明还提供了实施所述方法的设备和试剂盒。The maternal DNA background in a maternal sample imposes operational limitations on the sensitivity of any test that attempts to distinguish fetal chromosomes from the maternal chromosome set in the sample. Therefore, for diagnostics and routine tests that rely on quantitative differences and/or substantial differences between the fetal and maternal chromosome sets, fetal fraction is an important parameter to consider. The present invention provides a method for determining the fetal fraction in a maternal sample. The method obtains the fetal fraction as a function of normalized chromosome values or normalized chromosome segment values. The method for determining the fetal fraction of the present invention can be combined with other methods, such as a method for obtaining the fetal fraction as a function of allele imbalance information in a polymorphism, to classify copy number variations of fetal chromosomes or chromosome segments in a maternal sample. The present invention also provides devices and kits for implementing the method.
供了多种方法用于在包括核酸混合物的测试样品中确定感兴趣序列的拷贝数变异(CNV),这些核酸已知或被怀疑在感兴趣的一个或多个序列的量上是不同的。这种方法包括一种统计方式,该统计方式法将来自过程相关的、染色体间的和序列间的变异性的累积性变异性考虑在内。该方法适用于确定任何胎儿非整倍性的CNV,以及已知或怀疑与多种医学条件相关的多种CNV。根据本方法可以确定的CNV包括染色体1-22、X和Y中的任一个或多个的三体性或单体性,其他染色体的多体性,以及这些染色体中的任一个或多个的区段的缺失和/或复制,这些可以通过对测试样品的核酸仅进行一次测序而检测出。从通过测试样品的核酸的仅进行一次测序而获得的测序信息可以确定任何非整倍性。Provide a variety of methods for determining the copy number variation (CNV) of a sequence of interest in a test sample comprising a mixture of nucleic acids, which are known or suspected to be different in the amount of one or more sequences of interest. This method includes a statistical approach that takes into account the cumulative variability from process-related, interchromosomal and intersequence variability. The method is applicable to determining the CNV of any fetal aneuploidy, as well as multiple CNVs known or suspected to be associated with multiple medical conditions. CNVs that can be determined according to this method include trisomy or monosomy of any one or more of chromosomes 1-22, X and Y, polysomy of other chromosomes, and deletion and/or duplication of any one or more segments of these chromosomes, which can be detected by sequencing the nucleic acid of the test sample only once. Any aneuploidy can be determined from the sequencing information obtained by sequencing the nucleic acid of the test sample only once.
在一个实施方案中提供了一种方法,该方法用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何四种或更多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得在母体测试样品中胎儿的和母体核酸的序列信息;(b)使用该序列信息来针对选自染色体1-22、X、以及Y 的感兴趣的任何四个或更多个染色体中的每一个识别出一定数目的序列标签,并且针对用于所述感兴趣的任何四个或更多个染色体中的每一个的一个归一化染色体序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何四个或更多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化染色体序列识别出的所述序列标签的数目来针对所述感兴趣的任何四个或更多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何四个或更多个染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何四个或更多个染色体中的每一个的一个阈值进行比较,并且由此来确定在该母体测试样品中存在或不存在任何四种或更多种完整的、不同的胎儿染色体性非整倍性。步骤(a)可以包括对一个测试样品的这些核酸中的至少一部分进行测序,以获得针对测试样品的胎儿和母体核酸分子的所述序列信息。在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b)中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化染色体序列识别出的这个序列标签的数目与每个所述归一化染色体序列的长度进行关联来针对每个所述归一化染色体序列计算出一个序列标签密度比;并且 (iii)使用在步骤(i)和(ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的染色体计算出一个单染色体剂量,其中该染色体剂量是作为针对每个所述感兴趣的染色体的序列标签密度比与针对每个所述感兴趣的染色体的所述归一化染色体序列的序列标签密度比的比率来计算的。In one embodiment, a method is provided for determining the presence or absence of any four or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information of fetal and maternal nucleic acids in the maternal test sample; (b) using the sequence information to identify a certain number of sequence tags for each of any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a certain number of sequence tags for a normalizing chromosome sequence for each of any four or more chromosomes of interest; (c) using the number of sequence tags identified for each of any four or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosome sequences to calculate a single chromosome dose for each of any four or more chromosomes of interest; and (d) comparing each of the single chromosome doses for each of any four or more chromosomes of interest with a threshold for each of any four or more chromosomes of interest, and thereby determining the presence or absence of any four or more complete, different fetal chromosomal aneuploidies in the maternal test sample. Step (a) can comprise sequencing at least a portion of the nucleic acids of a test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each chromosome of interest as the ratio of the number of sequence tags identified for each chromosome of interest to the number of sequence tags identified for the normalizing chromosome sequence for each chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each chromosome of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing chromosome sequence by relating the number of sequence tags identified for each normalizing chromosome sequence in step (b) to the length of each normalizing chromosome sequence; and (iii) using the sequence tag density ratios calculated in steps (i) and (ii) to calculate a single chromosome dose for each chromosome of interest, wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing chromosome sequence for each chromosome of interest.
在另一个实施方案中提供了一种方法用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何四种或更多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在母体测试样品中的胎儿和母体核酸的序列信息;(b)使用所述序列信息来针对选自染色体1-22、X、以及Y 的感兴趣的任何四个或更多个染色体中的每一个识别出一定数目的序列标签、并且针对用于所述感兴趣的任何四个或更多个染色体中的每一个的一个归一化染色体序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何四个或更多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化染色体序列识别出的所述序列标签的数目来针对所述感兴趣的任何四个或更多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何四个或更多个染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何四个或更多个染色体中的每一个的一个阈值进行比较,并且由此来确定在该母体测试样品中存在或不存在任何四种或更多种完整的、不同的胎儿染色体性非整倍性,其中选自染色体1-22、X、以及Y的所述感兴趣的任何四个或更多个染色体包括选自染色体1-22、X、以及Y的至少二十个染色体,并且其中确定了存在或不存在至少二十种不同的、完整的胎儿染色体性非整倍性。步骤(a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b)中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化染色体序列识别出的这个序列标签的数目与每个所述归一化染色体序列的长度进行关联来针对每个所述归一化染色体序列计算出一个序列标签密度比;并且(iii)使用在步骤(i) 和(ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的染色体计算出一个单染色体剂量,其中所述染色体剂量是作为针对每个所述感兴趣的染色体的序列标签密度比与针对每个所述感兴趣的染色体的所述归一化染色体序列的序列标签密度比的比率来计算的。In another embodiment, a method is provided for determining the presence or absence of any four or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The method comprises the steps of: (a) obtaining sequence information for the fetal and maternal nucleic acids in the maternal test sample; (b) using the sequence information to identify a chromosome selected from chromosomes 1-22, X, and Y. (c) calculating a single chromosome dose for each of any four or more chromosomes of interest using the number of sequence tags identified for each of the four or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosome sequences; and (d) comparing each of the single chromosome doses for each of any four or more chromosomes of interest to a threshold value for each of the four or more chromosomes of interest, and thereby determining the presence or absence of any four or more complete, different fetal chromosomal aneuploidies in the maternal test sample, wherein the any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y include at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different, complete fetal chromosomal aneuploidies is determined. Step (a) can comprise sequencing at least a portion of the nucleic acids of the test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each chromosome of interest as the ratio of the number of sequence tags identified for each chromosome of interest to the number of sequence tags identified for the normalizing chromosome sequence for each chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each chromosome of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing chromosome sequence by relating the number of sequence tags identified for each normalizing chromosome sequence in step (b) to the length of each normalizing chromosome sequence; and (iii) using the sequence tag density ratios calculated in steps (i) and (ii) to calculate a single chromosome dose for each chromosome of interest, wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing chromosome sequence for each chromosome of interest.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何四种或更多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在母体测试样品中的所述胎儿和母体核酸的序列信息;(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的感兴趣的任何四个或更多个染色体中的每一个识别出一定数目的序列标签,并且针对用于所述感兴趣的任何四个或更多个染色体中的每一个的一个归一化染色体序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何四个或更多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化染色体序列识别出的所述序列标签的数目来针对所述感兴趣的任何四个或更多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何四个或更多个染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何四个或更多个染色体中的每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在任何四种或更多种完整的、不同的胎儿染色体性非整倍性,其中选自染色体1-22、X、以及Y的所述感兴趣的任何四个或更多个染色体是所有染色体1-22、X和Y,并且其中确定了存在或不存在全部染色体1-22、X、和Y的完整的胎儿染色体性非整倍性。步骤 (a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。在一些实施方案中,步骤(c) 包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b)中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化染色体序列识别出的这个序列标签的数目与每个所述归一化染色体序列的长度进行关联来针对每个所述归一化染色体序列计算出一个序列标签密度比;并且(iii)用在步骤(i)和(ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的染色体计算出一个单染色体剂量,其中该染色体剂量是作为针对每个所述感兴趣的染色体的序列标签密度比与针对每个所述感兴趣的染色体的所述归一化染色体序列的序列标签密度比的比率来计算的。In another embodiment, a method is provided for determining the presence or absence of any four or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information for the fetal and maternal nucleic acids in the maternal test sample; (b) using the sequence information to identify a certain number of sequence tags for each of any four or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a certain number of sequence tags for a normalizing chromosome sequence for each of any four or more chromosomes of interest; (c) using the number of sequence tags identified for each of any four or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosome sequences to identify a certain number of sequence tags for any four or more chromosomes of interest. (d) comparing each of the single chromosome doses for each of the four or more chromosomes of interest to a threshold for each of the four or more chromosomes of interest, and thereby determining the presence or absence of any four or more complete, different fetal chromosomal aneuploidies in the sample, wherein the four or more chromosomes of interest selected from chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein the presence or absence of complete fetal chromosomal aneuploidies of all chromosomes 1-22, X, and Y is determined. Step (a) may include sequencing at least a portion of the nucleic acids of the test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) includes calculating a single chromosome dose for each of the chromosomes of interest as the ratio of the number of sequence tags identified for each of the chromosomes of interest to the number of sequence tags identified for the normalized chromosome sequence of each of the chromosomes of interest. In some other embodiments, step (c) includes: (i) calculating a sequence tag density ratio for each chromosome of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing chromosome sequence by relating the number of sequence tags identified for each normalizing chromosome sequence in step (b) to the length of each normalizing chromosome sequence; and (iii) using the sequence tag density ratios calculated in steps (i) and (ii) to calculate a single chromosome dose for each chromosome of interest, wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing chromosome sequence for each chromosome of interest.
在任何以上实施方案中,这个归一化染色体序列可以是选自染色体1-22、 X、以及Y的一种单染色体。可替代地,这个归一化染色体序列是选自染色体 1-22、X、以及Y的一组染色体。In any of the above embodiments, the normalizing chromosome sequence can be a single chromosome selected from chromosomes 1-22, X, and Y. Alternatively, the normalizing chromosome sequence is a group of chromosomes selected from chromosomes 1-22, X, and Y.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何一种或多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在样品中的所述胎儿和母体核酸的序列信息;(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的感兴趣的任何一个或多个染色体中的每一个识别出一定数目的序列标签,并且针对用于所述感兴趣的任何一个或多个染色体中的每一个的一个归一化染色体序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何一个或多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何一个或多个染色体中的每个所述单染色体剂量与针对所述感兴趣的一个或多个染色体中的每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在任一种或多种完整的、不同的胎儿染色体性非整倍性。步骤(a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。In another embodiment, a method is provided for determining the presence or absence of any one or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information for the fetal and maternal nucleic acids in the sample; (b) using the sequence information to identify a certain number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a certain number of sequence tags for a normalizing chromosome sequence for each of any one or more chromosomes of interest; (c) using the number of sequence tags identified for each of any one or more chromosomes of interest and the number of sequence tags identified for each of the normalizing segment sequences to calculate a single chromosome dose for each of any one or more chromosomes of interest; and (d) comparing each of the single chromosome doses for any one or more chromosomes of interest with a threshold for each of the one or more chromosomes of interest, and thereby determining the presence or absence of any one or more complete, different fetal chromosomal aneuploidies in the sample. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample.
在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b)中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体中的每一个计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化区段序列识别出的这个序列标签的数目与每个所述归一化染色体的长度进行关联来针对每个所述归一化区段序列计算出一个序列标签密度比;并且 (iii)使用步骤(i)和(ii)中计算出的序列标签密度比来计算感兴趣的所述染色体中的每一个的单染色体剂量,其中所述染色体剂量被计算为感兴趣的染色体中的每一个的序列标签密度比和感兴趣的染色体中的每一个的归一化区段序列的序列标签密度比的比率。In some embodiments, step (c) comprises calculating a single chromosome dose for each chromosome of interest as the ratio of the number of sequence tags identified for each chromosome of interest to the number of sequence tags identified for the normalizing chromosome sequence for each chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each of the chromosomes of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing segment sequence by relating the number of sequence tags identified for each normalizing segment sequence in step (b) to the length of each normalizing chromosome; and (iii) calculating a single chromosome dose for each of the chromosomes of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing segment sequence for each chromosome of interest.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何一种或多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在样品中的胎儿和母体核酸的序列信息;(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的感兴趣的任何一个或多个染色体中的每一个识别出一定数目的序列标签,并且针对用于所述感兴趣的任何一个或多个染色体中的每一个的一个归一化染色体序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何一个或多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何一个或多个染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何一个或多个染色体中的每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种完整的、不同的胎儿染色体性非整倍性,其中选自染色体1-22、X、以及Y的所述感兴趣的任何一个或多个染色体包括选自染色体1-22、X和Y的至少二十个染色体,并且其中确定了存在或不存在至少二十种不同的完整的胎儿染色体性非整倍性。步骤(a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括: (i)通过使在步骤(b)中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化区段序列识别出的这个序列标签的数目与每个所述归一化染色体的长度进行关联来针对每个所述归一化区段序列计算出一个序列标签密度比;并且 (iii)使用在步骤(i)和(ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的染色体计算出一个单染色体剂量,其中所述染色体剂量是作为针对每个所述感兴趣的染色体的序列标签密度比与针对每个所述感兴趣的染色体的所述归一化区段序列的序列标签密度比的比率来计算的。In another embodiment, a method is provided for determining the presence or absence of any one or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information for fetal and maternal nucleic acids in a sample; (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing chromosome sequence for each of the one or more chromosomes of interest; (c) using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each normalizing segment sequence to calculate a single chromosome dose for each of the one or more chromosomes of interest; and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, and thereby determining the presence or absence of one or more complete, different fetal chromosomal aneuploidies in the sample, wherein the one or more chromosomes of interest selected from chromosomes 1-22, X, and Y include at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different complete fetal chromosomal aneuploidies is determined. Step (a) can comprise sequencing at least a portion of the nucleic acids of the test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each chromosome of interest as the ratio of the number of sequence tags identified for each chromosome of interest to the number of sequence tags identified for the normalizing chromosome sequence for each chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each chromosome of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing segment sequence by relating the number of sequence tags identified for each normalizing segment sequence in step (b) to the length of each normalizing chromosome; and (iii) calculating a single chromosome dose for each chromosome of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing segment sequence of each chromosome of interest.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在任何一种或多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在样品中的胎儿和母体核酸的序列信息;(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的感兴趣的任何一个或多个染色体中的每一个识别出一定数目的序列标签,并且针对用于所述感兴趣的任何一个或多个染色体中的每一个的一个归一化区段序列识别出一定数目的序列标签;(c)使用针对所述感兴趣的任何一个或多个染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的任何一个或多个染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何一个或多个染色体中的每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种完整的、不同的胎儿染色体性非整倍性,其中选自染色体1-22、X、以及Y的所述感兴趣的任何一个或多个染色体是全部染色体1-22、 X和Y,并且其中确定了存在或不存在全部染色体1-22、X、和Y的完整的胎儿染色体性非整倍性。步骤(a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的这个序列标签的数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的这个序列标签数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b) 中针对每个所述感兴趣的染色体识别出的这个序列标签的数目与每个所述感兴趣的染色体的长度进行关联来针对每个所述感兴趣的染色体计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化区段序列识别出的这个序列标签的数目与每个所述归一化染色体的长度进行关联来针对每个所述归一化区段序列计算出一个序列标签密度比;并且(iii)用在步骤(i)和 (ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的染色体计算出一个单染色体剂量,其中所述染色体剂量是作为针对每个所述感兴趣的染色体的序列标签密度比与针对每个所述感兴趣的染色体的所述归一化区段序列的序列标签密度比的比率来计算的。In another embodiment, a method is provided for determining the presence or absence of any one or more different, complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information for fetal and maternal nucleic acids in a sample; (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing segment sequence for each of the one or more chromosomes of interest; (c) using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each normalizing segment sequence to calculate a single chromosome dose for each of the one or more chromosomes of interest; and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, and thereby determining the presence or absence of one or more complete, different fetal chromosomal aneuploidies in the sample, wherein the one or more chromosomes of interest selected from chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein the presence or absence of complete fetal chromosomal aneuploidies of all chromosomes 1-22, X, and Y is determined. Step (a) can comprise sequencing at least a portion of the nucleic acids of the test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, step (c) comprises calculating a single chromosome dose for each chromosome of interest as the ratio of the number of sequence tags identified for each chromosome of interest to the number of sequence tags identified for the normalizing chromosome sequence for each chromosome of interest. In some other embodiments, step (c) comprises: (i) calculating a sequence tag density ratio for each chromosome of interest by relating the number of sequence tags identified for each chromosome of interest in step (b) to the length of each chromosome of interest; (ii) calculating a sequence tag density ratio for each normalizing segment sequence by relating the number of sequence tags identified for each normalizing segment sequence in step (b) to the length of each normalizing chromosome; and (iii) calculating a single chromosome dose for each chromosome of interest using the sequence tag density ratios calculated in steps (i) and (ii), wherein the chromosome dose is calculated as the ratio of the sequence tag density ratio for each chromosome of interest to the sequence tag density ratio for the normalizing segment sequence of each chromosome of interest.
在以上实施方案的任一个中,这些不同完整染色体性非整倍性选自完整染色体三体性、完整染色体单体性和完整染色体多体性。这些不同染色体性非整倍性选自染色体1-22、X、和Y中的任一个的完整非整倍性。例如,所述不同完整的胎儿染色体性非整倍性选自三体性2、三体性8、三体性9、三体性 20、三体性21、三体性13、三体性16、三体性18、三体性22、47,XXX、47,XYY、以及单体性X。In any of the above embodiments, the different complete chromosomal aneuploidies are selected from complete chromosomal trisomy, complete chromosomal monosomy, and complete chromosomal polysomy. The different chromosomal aneuploidies are selected from complete aneuploidies of any one of chromosomes 1-22, X, and Y. For example, the different complete fetal chromosomal aneuploidies are selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, trisomy 22, 47,XXX, 47,XYY, and monosomy X.
在以上实施方案的任一个中,对于来自不同母体受试者的测试样品重复步骤(a)-(d),并且该方法包括确定在每一测试样品中,存在或不存在任何四个或更多个不同的完整胎儿的染色体性非整倍性。In any of the above embodiments, steps (a)-(d) are repeated for test samples from different maternal subjects, and the method includes determining the presence or absence of any four or more different whole fetal chromosomal aneuploidies in each test sample.
在以上实施方案的任一个中,该方法可以进一步包括计算出一个归一化染色体值(NCV),其中所述NCV使所述染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联,作为:In any of the above embodiments, the method can further comprise calculating a normalized chromosome value (NCV), wherein the NCV relates the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和分别对应地是对于在一组合格样品中的第j个染色体剂量的估算平均值以及标准差,并且xij是对于测试样品i所观察到的第j个染色体剂量。where and are the estimated mean and standard deviation, respectively, for the jth chromosome dose in a set of qualified samples, and x ij is the observed jth chromosome dose for test sample i.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体核酸的母体测试样品中确定存在或不存在不同的、部分的胎儿染色体性非整倍性。该方法的步骤包括:(a)获得针对在样品中的胎儿和母体核酸的序列信息;(b)使用所述序列信息针对每个选自染色体1-22、X、以及Y的感兴趣的任何一个或多个染色体的任何一个或多个区段识别出一定数目的序列标签并且针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段的归一化区段序列识别出一定数目的序列标签;(c)使用针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或多个染色体的任何一个或多个区段中的每一个计算出一个单染色体剂量;并且(d)将针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段中的每个所述单区段剂量与针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种不同的、部分的胎儿染色体性非整倍性。步骤(a)可以包括对测试样品的这些核酸中的至少一部分进行测序,以获得针对该测试样品的胎儿和母体核酸分子的所述序列信息。In another embodiment, a method is provided for determining the presence or absence of different, partial fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids. The steps of the method include: (a) obtaining sequence information for fetal and maternal nucleic acids in the sample; (b) using the sequence information to identify a certain number of sequence tags for each of any one or more segments of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y and a certain number of sequence tags for the normalizing segment sequence of any one or more segments of any one or more chromosomes of interest; (c) using the number of sequence tags identified for any one or more segments of any one or more chromosomes of interest and the number of sequence tags identified for each normalizing segment sequence to calculate a single chromosome dose for each of any one or more segments of any one or more chromosomes of interest; and (d) comparing each of the single segment doses for any one or more segments of any one or more chromosomes of interest to a threshold value for any one or more segments of any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, partial fetal chromosomal aneuploidies in the sample. Step (a) may comprise sequencing at least a portion of the nucleic acids of the test sample to obtain said sequence information for the fetal and maternal nucleic acid molecules of the test sample.
在一些实施方案中,步骤(c)包括对于每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段计算出一个单区段剂量,作为针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段识别出的这个序列标签的数目与针对每个所述感兴趣的任何一个或多个染色体的任何一个或多个区段的所述归一化区段序列识别出的这个序列标签的数目的比率。在一些其他实施方案中,步骤(c)包括:(i)通过使在步骤(b)中针对每个所述感兴趣的每一区段中的识别出的这个序列标签的数目与每个所述感兴趣的区段的长度进行关联来针对每个所述感兴趣的区段计算出一个序列标签密度比;(ii)通过使在步骤(b)中针对每个所述归一化区段序列识别出的这个序列标签的数目与每个所述归一化区段序列的长度进行关联来针对每个所述归一化区段序列计算出一个序列标签密度比;并且(iii)使用在步骤(i)和(ii)中计算出的这些序列标签密度比来针对每个所述感兴趣的区段计算出一个单染色体剂量,其中所述区段剂量是作为针对每个所述感兴趣的区段的序列标签密度比与针对每个所述感兴趣的区段的所述归一化区段序列的序列标签密度比的比率来计算的。该方法可以进一步包括计算出一个归一化区段值(NSV),其中所述NSV 使所述区段剂量与在一组合格样品中的相应的区段剂量的平均值进行关联,作为:In some embodiments, step (c) comprises calculating a single segment dose for each of the one or more segments of any one or more chromosomes of interest as the ratio of the number of sequence tags identified for each of the one or more segments of any one or more chromosomes of interest to the number of sequence tags identified for the normalizing segment sequence for each of the one or more segments of any one or more chromosomes of interest. In some other embodiments, step (c) includes: (i) calculating a sequence tag density ratio for each segment of interest by relating the number of sequence tags identified in each segment of interest in step (b) to the length of each segment of interest; (ii) calculating a sequence tag density ratio for each normalizing segment sequence by relating the number of sequence tags identified in step (b) to the length of each normalizing segment sequence; and (iii) using the sequence tag density ratios calculated in steps (i) and (ii) to calculate a single chromosome dose for each segment of interest, wherein the segment dose is calculated as the ratio of the sequence tag density ratio for each segment of interest to the sequence tag density ratio for the normalizing segment sequence for each segment of interest. The method can further include calculating a normalized segment value (NSV), wherein the NSV relates the segment dose to the average of the corresponding segment doses in a set of qualified samples as:
其中和对应地是对于在一组合格样品中的第j个区段剂量的估算平均值以及标准差,并且xij是对于测试样品i的所观察到的第j个区段剂量。where and are the estimated mean and standard deviation, respectively, for the jth bin dose in a set of qualified samples, and x ij is the observed jth bin dose for test sample i.
在所说明的方法的多个实施方案中,由此使用归一化区段序列来确定染色体剂量或区段剂量,这种归一化区段序列可以是染色体1-22、X、以及Y中任意一项或多项的一个单一区段。可替代地,这种归一化区段序列可以是染色体1-22、X、以及Y中任意一项或多项的一组区段。In various embodiments of the described methods, whereby a normalizing segment sequence is used to determine a chromosome dose or segment dose, the normalizing segment sequence can be a single segment of any one or more of chromosomes 1-22, X, and Y. Alternatively, the normalizing segment sequence can be a group of segments of any one or more of chromosomes 1-22, X, and Y.
对于来自不同母体受试者的多个测试样品重复用于确定存在或不存在部分的胎儿染色体性非整倍性的方法的步骤(a)-(d),并且该方法包括确定在每个所述样品中存在或不存在不同的、部分的胎儿染色体性非整倍性。根据该方法可以确定的部分的胎儿染色体性非整倍性包括任何染色体的任何片段的部分的非整倍性。这些部分的非整倍性可以选自部分的复制、部分的倍增、部分的插入和部分的缺失。根据该方法可以确定的部分非整倍性的实例包括染色体1 的部分单体、染色体4的部分单体、染色体5的部分单体、染色体7的部分单体、染色体11的部分单体、染色体15的部分单体、染色体17的部分单体、染色体18的部分单体、以及染色体22的部分单体。The step (a)-(d) of the method for determining the presence or absence of part fetal chromosomal aneuploidy is repeated for multiple test specimens from different maternal subjects, and the method comprises determining that in each described sample, there is or does not exist different, part fetal chromosomal aneuploidy. The fetal chromosomal aneuploidy of the part that can be determined according to the method comprises the aneuploidy of the part of any segment of any chromosome. The aneuploidy of these parts can be selected from the duplication of part, the multiplication of part, the insertion of part and the disappearance of part. The example of the part aneuploidy that can be determined according to the method comprises the part monomer of chromosome 1, the part monomer of chromosome 4, the part monomer of chromosome 5, the part monomer of chromosome 7, the part monomer of chromosome 11, the part monomer of chromosome 15, the part monomer of chromosome 17, the part monomer of chromosome 18 and the part monomer of chromosome 22.
在上述实施方案的任何一个中,这种测试样品可以是选自血液、血浆、血清、尿和唾液样品的一个母体样品。在这些实施方案的任何一个中,这种测试样品可以是血浆样品。母体样品的这些核酸分子是胎儿的和母体的无细胞 DNA分子。可以使用下一代测序(NGS)来对这些核酸进行测序。在一些实施方案中,测序是使用借助可逆染料终止子的合成法测序的大规模平行测序。在其他实施方案中,测序是连接法测序。仍在其他实施方案中,测序是单分子测序。可任选地,在测序前进行一个扩增步骤。In any one of the above-mentioned embodiments, this test sample can be a maternal sample selected from blood, plasma, serum, urine and saliva samples. In any one of these embodiments, this test sample can be a plasma sample. These nucleic acid molecules of the maternal sample are fetal and maternal cell-free DNA molecules. Next generation sequencing (NGS) can be used to order-check these nucleic acids. In some embodiments, order-checking is a large-scale parallel sequencing using synthetic sequencing with reversible dye terminators. In other embodiments, order-checking is ligation sequencing. Still in other embodiments, order-checking is single molecule sequencing. Optionally, an amplification step is performed before order-checking.
在另一个实施方案中提供了一种方法,用于在包含胎儿和母体的无细胞 DNA分子的混合物的母体血浆测试样品中确定存在或不存在任何二十种或更多种不同的、完整的胎儿染色体性非整倍性。该方法的步骤包括:(a)对无细胞DNA分子中的至少一部分进行测序以便获得针对在该样品中的胎儿和母体的无细胞DNA分子的序列信息;(b)使用所述序列信息来针对选自染色体1-22、 X、以及Y的每一个感兴趣的任何二十个或更多个染色体识别出一定数目的序列标签并且来针对每个所述感兴趣的二十个或更多个染色体的一个归一化染色体识别出一定数目的序列标签;(c)使用针对每个所述感兴趣的二十个或更多个染色体所识别出的所述序列标签的数目以及针对每个所述归一化染色体识别出的所述序列标签的数目来对于每个所述感兴趣的二十个或更多个染色体计算出一个单染色体剂量;并且(d)将针对每个所述感兴趣的二十个或更多个染色体的每个所述单染色体剂量与针对每个所述感兴趣的二十个或更多个染色体的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在任何二十种或更多种不同的、完整的胎儿染色体性非整倍性。In another embodiment, a method is provided for determining the presence or absence of any twenty or more different, complete fetal chromosomal aneuploidies in a maternal plasma test sample comprising a mixture of fetal and maternal cell-free DNA molecules. The method comprises the steps of: (a) sequencing at least a portion of the cell-free DNA molecules to obtain sequence information for the fetal and maternal cell-free DNA molecules in the sample; (b) using the sequence information to identify a number of sequence tags for each of any twenty or more chromosomes of interest selected from chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalizing chromosome for each of the twenty or more chromosomes of interest; (c) using the number of sequence tags identified for each of the twenty or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosomes to calculate a single chromosome dose for each of the twenty or more chromosomes of interest; and (d) comparing each of the single chromosome doses for each of the twenty or more chromosomes of interest to a threshold value for each of the twenty or more chromosomes of interest and thereby determining the presence or absence of any twenty or more different complete fetal chromosomal aneuploidies in the sample.
在另一实施方案中,本发明提供了用于识别在测试样品中的一个感兴趣的序列(例如临床上相关的序列)的拷贝数变异(CNV)的方法,该方法包括以下步骤:(a)获得一个测试样品和多个合格的样品,所述测试样品包括测试核酸分子和所述多个合格的样品,所述多个合格的样品包括合格的核酸分子; (b)获得在所述样品中所述胎儿的和母体的核酸的序列信息;(c)基于所述合格的核酸分子的所述测序计算在每一所述多个合格样品中感兴趣的所述合格序列的合格序列剂量,其中所述计算合格序列剂量包括确定感兴趣的所述合格序列和至少一个合格的归一化序列的参数;(d)基于所述合格序列剂量识别出至少一个合格的归一化序列,其中在所述多个合格样品中所述至少一个合格的归一化序列具有最小变异性和/或最大可分辨性;(e)基于在所述测试样品中所述核酸分子的所述测序,计算感兴趣的所述测试序列的测试序列剂量,其中所述计算测试序列剂量包括确定所述感兴趣的测试序列和至少一个归一化测试序列的参数,所述至少一个归一化测试序列对应于所述至少一个合格的归一化序列;(f)比较所述测试序列剂量与至少一个阈值;以及(g)基于步骤(f)的结果来评定在所述测试样品中所述感兴趣的序列的所述拷贝数变异。在一个实施方案中,针对所述感兴趣的合格序列和至少一个合格的归一化序列的参数使这多个映射到所述感兴趣的合格序列的序列标签与映射到所述合格的归一化序列的这多个标签进行关联,并且其中感兴趣的所述测试序列和至少一个归一化测试序列的所述参数使这多个映射到所述感兴趣的测试序列的序列标签与这多个映射到所述归一化测试序列的标签进行关联。在一些实施方案中,步骤(b) 包括对这些合格的和测试的核酸分子中的至少一部分进行测序,其中测序包括提供用于测试的多个映射的序列标签以及一个感兴趣的合格序列、并且用于至少一个测试和至少一个合格的归一化序列;对测试样品的所述核酸分子中的至少一部分进行测序以获得该测试样品的胎儿和母体核酸分子的序列信息。在一些实施方案中使用了下一代测序方法来进行这个测序步骤。在一些实施方案中,该测序方法可以是大规模平行测序方法,其中该测序方法使用借助可逆染料终止子的合成法测序。在其他实施方案中,该测序方法是连接法测序。在一些实施方案中,测序包括一次扩增。在其他实施方案中,测序是单分子测序。感兴趣的序列的CNV是一种非整倍性,它可以是一个染色体的或一个部分性的非整倍性。在一些实施方案中,这种染色体性非整倍性是选自三体性2、三体性 8、三体性9、三体性20、三体性16、三体性21、三体性13、三体性18、三体性22、格莱弗德氏综合征(klinefelter's syndrome)、47,XXX、47,XYY、以及单体X。在其他实施方案中,这种部分的非整倍性是一个部分染色体缺失或一个部分染色体插入。在一些实施方案中,通过该方法识别的CNV是与癌症相关的一种染色体的或部分性的非整倍性。在一些实施方案中,这些测试的和合格的样品是生物学流体样品,例如:得自怀孕的受试者(如怀孕的人类受试者)的血浆样品。在其他实施方案中,测试的和合格的生物学流体样品(例如血浆样品)是得自已知或怀疑患有癌症的受试者。In another embodiment, the present invention provides a method for identifying copy number variations (CNVs) of a sequence of interest (e.g., a clinically relevant sequence) in a test sample, the method comprising the following steps: (a) obtaining a test sample and a plurality of qualified samples, the test sample comprising a test nucleic acid molecule and the plurality of qualified samples comprising qualified nucleic acid molecules; (b) obtaining sequence information of the fetal and maternal nucleic acids in the sample; (c) calculating a qualified sequence dose for the qualified sequence of interest in each of the plurality of qualified samples based on the sequencing of the qualified nucleic acid molecules, wherein the calculating of the qualified sequence dose comprises determining parameters of the qualified sequence of interest and at least one qualified normalizing sequence; (d) identifying at least one qualified normalizing sequence based on the qualified sequence doses, wherein the at least one qualified normalizing sequence has minimal variability and/or maximum resolvability in the plurality of qualified samples; (e) calculating a test sequence dose for the test sequence of interest based on the sequencing of the nucleic acid molecules in the test sample, wherein the calculating of the test sequence dose comprises determining parameters of the test sequence of interest and at least one normalizing test sequence, the at least one normalizing test sequence corresponding to the at least one qualified normalizing sequence; (f) comparing the test sequence dose to at least one threshold value; and (g) assessing the copy number variation of the sequence of interest in the test sample based on the result of step (f). In one embodiment, the parameters for the qualified sequence of interest and at least one qualified normalizing sequence associate the multiple sequence tags mapped to the qualified sequence of interest with the multiple tags mapped to the qualified normalizing sequence, and wherein the parameters for the test sequence of interest and at least one normalizing test sequence associate the multiple sequence tags mapped to the test sequence of interest with the multiple tags mapped to the normalizing test sequence. In some embodiments, step (b) includes sequencing at least a portion of the qualified and test nucleic acid molecules, wherein sequencing includes providing multiple mapped sequence tags for testing and a qualified sequence of interest, and for at least one test and at least one qualified normalizing sequence; sequencing at least a portion of the nucleic acid molecules of the test sample to obtain sequence information of the fetal and maternal nucleic acid molecules of the test sample. In some embodiments, a next generation sequencing method is used to perform this sequencing step. In some embodiments, the sequencing method can be a massively parallel sequencing method, wherein the sequencing method uses sequencing by synthesis with the help of a reversible dye terminator. In other embodiments, the sequencing method is sequencing by ligation. In some embodiments, sequencing includes a single amplification. In other embodiments, sequencing is single molecule sequencing. The CNV of sequence interested is a kind of aneuploidy, and it can be a chromosomal or a partial aneuploidy.In some embodiments, this chromosomal aneuploidy is selected from trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 16, trisomy 21, trisomy 13, trisomy 18, trisomy 22, Gleifelter's syndrome (klinefelter's syndrome), 47, XXX, 47, XYY and monomer X.In other embodiments, the aneuploidy of this part is a partial chromosome deletion or a partial chromosome insertion.In some embodiments, the CNV identified by the method is a chromosomal or partial aneuploidy relevant to cancer.In some embodiments, these tests and qualified samples are biological fluid samples, for example: derive from the plasma sample of the experimenter (such as the human subject of pregnancy) of pregnancy.In other embodiments, test and qualified biological fluid samples (such as plasma sample) are derive from the experimenter known or suspected of suffering from cancer.
用于在母体测试样品中确定存在或不存在胎儿染色体非整倍性的某些方法可包括以下操作:(a)提供来自该母体测试样品中的胎儿和母体核酸的序列读数,其中这些序列读数是以电子格式来提供的;(b)使用一个计算装置将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(c)以计算的方式识别出来自一个或多个感兴趣的染色体或感兴趣的染色体区段的这些序列标签的数目,并且以计算的方式识别出来自这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个的至少一个归一化染色体序列或归一化染色体区段序列的这些序列标签的数目;(d)使用针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的所述序列标签的数目以及针对所述归一化染色体序列或归一化染色体区段序列中的每一个所识别的所述序列标签的数目,以计算的方式计算出针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个单染色体或区段剂量;并且(e)使用所述计算装置将针对一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的所述单染色体剂量的每一个与针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个相应阈值进行比较,并且由此在所述测试样品中确定存在或不存在至少一种胎儿非整倍性。在某些实现方式中,针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的序列标签的数目是至少约10,000或至少约 100,000。所披露的实施方案还提供一种计算机程序产品,该计算机程序产品包括一个非暂时性计算机可读媒体,在该非暂时性计算机可读媒体上提供了用于执行所述操作和在此描述的其他计算操作的程序指令。Certain methods for determining the presence or absence of a fetal chromosomal aneuploidy in a maternal test sample may include the following operations: (a) providing sequence reads of fetal and maternal nucleic acids in the maternal test sample, wherein the sequence reads are provided in an electronic format; (b) aligning the sequence reads to one or more chromosome reference sequences using a computing device and thereby providing a plurality of sequence tags corresponding to the sequence reads; (c) computationally identifying the number of the sequence tags from one or more chromosomes of interest or chromosome segments of interest, and computationally identifying the number of the sequence tags from at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest or chromosome segments of interest; (d) Using the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest and the number of sequence tags identified for each of the normalizing chromosome sequences or normalizing chromosome segment sequences, a single chromosome or segment dose for each of the one or more chromosomes of interest or chromosome segments of interest is calculated in a computational manner; and (e) using the computing device, each of the single chromosome doses for each of the one or more chromosomes of interest or chromosome segments of interest is compared to a corresponding threshold for each of the one or more chromosomes of interest or chromosome segments of interest, and thereby determining the presence or absence of at least one fetal aneuploidy in the test sample. In certain implementations, the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest is at least about 10,000 or at least about 100,000. The disclosed embodiments also provide a computer program product comprising a non-transitory computer-readable medium on which program instructions for performing the operations and other computational operations described herein are provided.
在某些实施方案中,染色体参考序列具有多个被排除的区域,这些被排除的区域天然地存在于染色体中但它们对于任何染色体或染色体区段而言不影响其序列标签的数目。在某些实施方案中,一种方法另外包括:(i)确定是否将一个予以考虑的读数与在一个染色体参考序列上的一个位点进行比对,而在该位点来自测试样品的另一个读数先前进行了比对;并且(ii)确定是否将这个予以考虑的读数包括在针对一个感兴趣的染色体或一个感兴趣的染色体区段的序列标签的数目之中。染色体参考序列可存储在计算机可读媒体上。In certain embodiments, a chromosome reference sequence has a plurality of excluded regions that occur naturally in chromosomes but do not affect the number of sequence tags for any chromosome or chromosome segment. In certain embodiments, a method further comprises: (i) determining whether a read to be considered is aligned to a site on a chromosome reference sequence where another read from a test sample was previously aligned; and (ii) determining whether the read to be considered is included in the number of sequence tags for a chromosome of interest or a chromosome segment of interest. The chromosome reference sequence may be stored on a computer-readable medium.
在某些实施方案中,一种方法另外包括对所述母体测试样品的所述核酸分子中的至少一部分进行测序,以便获得针对所述测试样品的所述胎儿和母体核酸分子的所述序列信息。测序可包括对来自该母体测试样品的母体和胎儿核酸进行大规模平行测序以产生序列读数。In certain embodiments, a method further comprises sequencing at least a portion of the nucleic acid molecules of the maternal test sample to obtain the sequence information for the fetal and maternal nucleic acid molecules of the test sample. Sequencing can comprise massively parallel sequencing of the maternal and fetal nucleic acids from the maternal test sample to generate sequence reads.
在某些实施方案中,一种方法进一步包括在提供该母体测试样品的人类受试者的患者病历卡中使用处理器自动记录如在(d)中所确定的存在或不存在胎儿染色体非整倍性。记录可包括在计算机可读媒体中记录染色体剂量和/ 或基于所述染色体剂量的诊断。在某些情况下,患者病历卡是由实验室、医生办公室、医院、健康维护组织、保险公司、或个人病历卡网站来保存的。一种方法可进一步包括对获取该母体测试样品的人类受试者开处方、开始治疗、和 /或改变治疗。另外或可替代地,该方法可包括预约和/或执行一种或多种另外的测试。In certain embodiments, a method further includes using a processor to automatically record the presence or absence of fetal chromosome aneuploidy as determined in (d) in the patient's medical record card of the human subject providing this maternal test sample. The record may include recording chromosome dosage and/or the diagnosis based on the chromosome dosage in a computer-readable medium. In some cases, the patient's medical record card is preserved by a laboratory, doctor's office, hospital, health maintenance organization, insurance company, or personal medical record card website. A method may further include prescribing, starting treatment, and/or changing treatment to the human subject obtaining this maternal test sample. Additionally or alternatively, the method may include making an appointment and/or performing one or more other tests.
在此披露的某些方法识别感兴趣的染色体或染色体区段的归一化染色体序列或归一化染色体区段序列。某些所述方法包括以下操作:(a)提供针对感兴趣的染色体或染色体区段的多个合格样品;(b)使用多个潜在的归一化染色体序列或归一化染色体区段序列来针对感兴趣的染色体或染色体区段重复计算染色体剂量,其中这种重复计算是用一个计算装置来执行的;并且(c)单独地或在一种组合中对归一化染色体序列或归一化染色体区段序列进行选择,从而在针对感兴趣的染色体或染色体区段所计算的剂量中给出最小的变异性和/ 或大的可分辨性。Some methods disclosed herein identify a normalizing chromosome sequence or a normalizing chromosome segment sequence for a chromosome or chromosome segment of interest. Some of the methods include the following operations: (a) providing a plurality of qualified samples for a chromosome or chromosome segment of interest; (b) using a plurality of potential normalizing chromosome sequences or normalizing chromosome segment sequences to repeatedly calculate chromosome doses for a chromosome or chromosome segment of interest, wherein this repeated calculation is performed with a computing device; and (c) selecting the normalizing chromosome sequence or normalizing chromosome segment sequence, either individually or in combination, to provide minimal variability and/or large discernibility in the doses calculated for the chromosome or chromosome segment of interest.
所选定的归一化染色体序列或归一化染色体区段序列可以是归一化染色体序列或归一化染色体区段序列的组合的一部分,或可以单独提供,而不是与其他归一化染色体序列或归一化染色体区段序列相组合。The selected normalizing chromosome sequence or normalizing chromosome segment sequence can be part of a combination of normalizing chromosome sequences or normalizing chromosome segment sequences, or can be provided alone rather than in combination with other normalizing chromosome sequences or normalizing chromosome segment sequences.
披露的实施方案提供一种对胎儿基因组中的拷贝数变异进行分类的方法。该方法的操作包括:(a)接收来自一个母体测试样品中的胎儿和母体核酸的序列读数,其中这些序列读数是以电子格式来提供的;(b)使用一个计算装置将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(c)通过使用该计算装置以计算的方式识别出来自一个或多个感兴趣的染色体的这些序列标签的数目,并且确定该胎儿中的一个第一感兴趣的染色体带有拷贝数变异;(d)通过一种第一方法来计算一个第一胎儿分数值,该第一方法不使用来自该第一感兴趣的染色体的标签的信息;(e)通过一种第二方法来计算一个第二胎儿分数值,该第二方法使用来自该第一染色体的标签的信息;并且(f)将该第一胎儿分数值与该第二胎儿分数值进行比较并且使用该比较对该第一染色体的拷贝数变异进行分类。在某些实施方案中,该方法进一步包括对来自该母体测试样品的无细胞DNA进行测序以提供这些序列读数。在某些实施方案中,该方法进一步包括从一个怀孕生物体获得该母体测试样品。在某些实施方案中,操作(b)包括使用一个计算装置比对至少约一百万个读数。在某些实施方案中,操作(f)可包括确定该两个胎儿分数值是否近似相等。The disclosed embodiments provide a method for classifying copy number variation in a fetal genome. The operation of the method includes: (a) receiving sequence reads of fetal and maternal nucleic acids from a maternal test sample, wherein the sequence reads are provided in an electronic format; (b) using a computing device to align the sequence reads with one or more chromosome reference sequences, and thereby providing a plurality of sequence tags corresponding to the sequence reads; (c) using the computing device to computationally identify the number of these sequence tags from one or more chromosomes of interest, and determining that a first chromosome of interest in the fetus carries a copy number variation; (d) calculating a first fetal fraction value by a first method that does not use information from the tags of the first chromosome of interest; (e) calculating a second fetal fraction value by a second method that uses information from the tags of the first chromosome; and (f) comparing the first fetal fraction value to the second fetal fraction value and using the comparison to classify the copy number variation of the first chromosome. In certain embodiments, the method further includes sequencing the cell-free DNA from the maternal test sample to provide the sequence reads. In certain embodiments, the method further includes obtaining the maternal test sample from a pregnant organism. In certain embodiments, operation (b) comprises comparing at least about one million reads using a computing device.In certain embodiments, operation (f) may comprise determining whether the two fetal fraction values are approximately equal.
在某些实施方案中,操作(f)可进一步包括确定该两个胎儿分数值近似相等,并且由此确定该第二方法中暗含的一个倍数性假设是真实的。在某些实施方案中,该第二方法中暗含的该倍数性假设是该第一感兴趣的染色体具有完整染色体非整倍性。在某些这些实施方案中,该第一感兴趣的染色体的完整染色体非整倍性是单体性或三体性。In certain embodiments, operation (f) may further comprise determining that the two fetal fraction values are approximately equal, and thereby determining that a ploidy assumption implicit in the second method is true. In certain embodiments, the ploidy assumption implicit in the second method is that the first chromosome of interest has a complete chromosomal aneuploidy. In certain of these embodiments, the complete chromosomal aneuploidy of the first chromosome of interest is monosomy or trisomy.
在某些实施方案中,操作(f)可包括确定该两个胎儿分数值是否不近似相等,并且进一步包括分析该第一感兴趣的染色体的标签信息以确定(i)该第一感兴趣的染色体是带有一种部分非整倍性,还是(ii)该胎儿是一个嵌合体。In certain embodiments, operation (f) may include determining whether the two fetal fraction values are not approximately equal, and further including analyzing the tag information for the first chromosome of interest to determine (i) whether the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic.
在某些实施方案中,此操作还可以包括将该第一感兴趣的染色体的序列装箱成多个部分;确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸;并且若所述部分中的任一个包含比一个或多个其他部分显著更多或显著更少的核酸,则确定该第一感兴趣的染色体带有部分非整倍性。在一个实施方案中,该操作可进一步包括确定包含比一个或多个其他部分显著更多或显著更少的核酸的该第一感兴趣的染色体的一个部分带有部分非整倍性。In certain embodiments, this operation can further include binning the sequence of the first chromosome of interest into a plurality of portions; determining whether any of the portions contains significantly more or significantly less nucleic acid than one or more other portions; and if any of the portions contains significantly more or significantly less nucleic acid than one or more other portions, determining that the first chromosome of interest carries a partial aneuploidy. In one embodiment, the operation can further include determining that a portion of the first chromosome of interest that contains significantly more or significantly less nucleic acid than one or more other portions carries a partial aneuploidy.
在一个实施方案中,操作(f)还可以包括将该第一感兴趣的染色体的序列装箱成多个部分;确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸;并且若所述部分中都不包含比一个或多个其他部分显著更多或显著更少的核酸,则确定该胎儿是一个嵌合体。In one embodiment, operation (f) can also include binning the sequence of the first chromosome of interest into multiple parts; determining whether any of the parts contains significantly more or significantly less nucleic acid than one or more other parts; and if none of the parts contains significantly more or significantly less nucleic acid than one or more other parts, determining that the fetus is a mosaic.
操作(e)可包括:(a)计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签的数目以确定染色体剂量;并且(b)使用第二方法从该染色体剂量计算胎儿分数值。在某些实施方案中,此操作进一步包括计算归一化的染色体值(NCV),其中该第二方法使用该归一化的染色体值,并且其中该NCV将该染色体剂量与在一组合格样品中的相应染色体剂量的均值相关联,作为:Operation (e) may include: (a) calculating the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) calculating a fetal fraction value from the chromosome dose using a second method. In certain embodiments, this operation further includes calculating a normalized chromosome value (NCV), wherein the second method uses the normalized chromosome value, and wherein the NCV relates the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU分别是对于该组合格样品中第i个染色体剂量的估算均值和标准差,并且RiA是针对感兴趣的染色体计算的染色体剂量。在另一个实施方案中,操作(d)进一步包括第一方法使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算第一胎儿分数值。Wherein σ and σ iU are the estimated mean and standard deviation, respectively, for the i-th chromosome dose in the set of qualified samples, and R iA is the chromosome dose calculated for the chromosome of interest. In another embodiment, operation (d) further comprises a first method calculating a first fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in the fetal and maternal nucleic acids of the maternal test sample.
在不同的实施方案中,若第一胎儿分数值与第二胎儿分数值不近似相等,则该方法进一步包括(i)确定拷贝数变异是由部分非整倍性还是嵌合体引起;并且(ii)若拷贝数变异由部分非整倍性引起,则确定在该第一感兴趣的染色体上的部分非整倍性的基因座。在某些实施方案中,确定在该第一感兴趣的染色体上的部分非整倍性的基因座包括将该第一感兴趣的染色体的这些序列标签分成该第一感兴趣的染色体中的核酸数据箱或基块;并且对每一个数据箱中的这些映射标签进行计数。In various embodiments, if the first fetal fraction value and the second fetal fraction value are not approximately equal, the method further comprises (i) determining whether the copy number variation is caused by partial aneuploidy or mosaicism; and (ii) if the copy number variation is caused by partial aneuploidy, determining the locus of the partial aneuploidy on the first chromosome of interest. In certain embodiments, determining the locus of the partial aneuploidy on the first chromosome of interest comprises dividing the sequence tags for the first chromosome of interest into nucleic acid bins or blocks in the first chromosome of interest; and counting the mapped tags in each bin.
操作(e)可进一步包括通过对以下表达式求值来计算胎儿分数值:Operation (e) may further include calculating a fetal fraction value by evaluating the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是第二胎儿分数值,NCViA是在一个受影响样品中在第i个染色体上的归一化的染色体值,并且CViU是在这些合格样品中确定的感兴趣的染色体的剂量的变异系数。where ff is the second fetal fraction value, NCV iA is the normalized chromosome value on chromosome i in an affected sample, and CV iU is the coefficient of variation of the dose of the chromosome of interest determined in the qualified samples.
在任一个以上实施方案中,该第一感兴趣的染色体是选自下组,该组由染色体1到22、X和Y组成。在任一个以上实施方案中,操作(f)可将拷贝数变异分类成选自下组的一个类别,该组由以下各项组成:完整染色体插入、完整染色体缺失、部分染色体复制、以及部分染色体缺失、以及嵌合体。In any of the above embodiments, the first chromosome of interest is selected from the group consisting of chromosomes 1 to 22, X, and Y. In any of the above embodiments, operation (f) can classify the copy number variation into a category selected from the group consisting of complete chromosome insertion, complete chromosome deletion, partial chromosome duplication, partial chromosome deletion, and mosaicism.
所披露的实施方案还提供一种计算机程序产品,该计算机程序产品包括一个非暂时性计算机可读媒体,在该非暂时性计算机可读媒体上提供了用于对胎儿基因组中的拷贝数变异进行分类的程序指令。该计算机程序产品可包括: (a)用于接收来自一个母体测试样品中的胎儿和母体核酸的序列读数的代码,其中这些序列读数是以电子格式来提供的;(b)使用一个计算装置用于将这些序列读数与一个或多个染色体参考序列进行比对并且由此提供与这些序列读数相对应的多个序列标签的代码;(c)通过使用该计算装置用于以计算的方式识别出来自一个或多个感兴趣的染色体的这些序列标签的数目、并且确定该胎儿中的一个第一感兴趣的染色体带有拷贝数变异的代码;(d)用于通过一种第一方法来计算第一胎儿分数值的代码,该第一方法不使用来自该第一感兴趣的染色体的标签的信息;(e)用于通过一种第二方法来计算第二胎儿分数值的代码,该第二方法使用来自该第一染色体的标签的信息;以及(f)用于将该第一胎儿分数值与该第二胎儿分数值进行比较并且使用该比较对该第一染色体的拷贝数变异进行分类的代码。在某些实施方案中,该计算机程序产品包括用于所披露方法的任一个以上实施方案中的不同的操作和方法的代码。The disclosed embodiments also provide a computer program product comprising a non-transitory computer-readable medium on which program instructions for classifying copy number variation in a fetal genome are provided. The computer program product may include: (a) code for receiving sequence reads of fetal and maternal nucleic acids from a maternal test sample, wherein the sequence reads are provided in an electronic format; (b) code for using a computing device to align the sequence reads with one or more chromosome reference sequences and thereby provide a plurality of sequence tags corresponding to the sequence reads; (c) code for using the computing device to computationally identify the number of sequence tags from one or more chromosomes of interest and determine that a first chromosome of interest in the fetus carries a copy number variation; (d) code for calculating a first fetal fraction value by a first method that does not use information from the tags of the first chromosome of interest; (e) code for calculating a second fetal fraction value by a second method that uses information from the tags of the first chromosome; and (f) code for comparing the first fetal fraction value to the second fetal fraction value and using the comparison to classify the copy number variation of the first chromosome. In certain embodiments, the computer program product includes code for the various operations and methods of any one or more of the disclosed methods.
披露的实施方案还提供一种对胎儿基因组中的拷贝数变异进行分类的系统。该系统包括:(a)用于接收来自一个母体测试样品中的胎儿和母体核酸的至少约10,000个序列读数的一个界面,其中这些序列读数是以电子格式来提供的;(b)用于至少暂时地存储多个所述序列读数的存储器;(c)一个处理器,该处理器被设计或配置为带有多个程序指令,这些程序指令用于:(i)将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(ii)识别来自一个或多个感兴趣的染色体的这些序列标签的一个数目,并且确定该胎儿中的一个第一感兴趣的染色体带有拷贝数变异;(iii)通过一种第一方法来计算一个第一胎儿分数值,该第一方法不使用来自该第一感兴趣的染色体的标签的信息;(iv)通过一种第二方法来计算一个第二胎儿分数值,该第二方法使用来自该第一染色体的标签的信息;并且(v)将该第一胎儿分数值与该第二胎儿分数值进行比较并且使用该比较对该第一染色体的拷贝数变异进行分类。根据不同的实施方案,第一感兴趣的染色体是选自下组,该组由染色体1到22、X和Y组成。在某些实施方案中,用于(c)(v) 的程序指令包括用于将该拷贝数变异分类成选自下组的一个类别的程序指令,该组由以下各项组成:完整染色体插入、完整染色体缺失、部分染色体复制、以及部分染色体缺失、以及嵌合体。根据不同的实施方案,该系统可包括对来自该母体测试样品的无细胞DNA进行测序以提供这些序列读数的程序指令。根据某些实施方案,用于操作(c)(i)的程序指令包括使用计算装置用于比对至少约一百万个读数的程序指令。The disclosed embodiments also provide a system for classifying copy number variation in a fetal genome. The system includes: (a) an interface for receiving at least about 10,000 sequence reads of fetal and maternal nucleic acids from a maternal test sample, wherein the sequence reads are provided in an electronic format; (b) a memory for at least temporarily storing a plurality of the sequence reads; (c) a processor designed or configured with program instructions for: (i) aligning the sequence reads to one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads; (ii) identifying a number of the sequence tags from one or more chromosomes of interest and determining that a first chromosome of interest in the fetus carries a copy number variation; (iii) calculating a first fetal fraction value by a first method that does not use information from the tags of the first chromosome of interest; (iv) calculating a second fetal fraction value by a second method that uses information from the tags of the first chromosome; and (v) comparing the first fetal fraction value to the second fetal fraction value and using the comparison to classify the copy number variation of the first chromosome. According to various embodiments, the first chromosome of interest is selected from the group consisting of chromosomes 1 to 22, X and Y. In certain embodiments, the program instructions for (c) (v) include program instructions for classifying the copy number variation into a category selected from the group consisting of complete chromosome insertion, complete chromosome deletion, partial chromosome duplication, and partial chromosome deletion, and mosaicism. According to various embodiments, the system may include sequencing the cell-free DNA from the maternal test sample to provide program instructions for the sequence reads. According to certain embodiments, the program instructions for operating (c) (i) include program instructions for using a computing device for comparing at least about one million reads.
在某些实施方案中,该系统还包括一个测序仪,该测序仪被配置为用于对一个母体测试样品中的胎儿和母体核酸进行测序并且以电子格式提供序列读数。在不同的实施方案中,该测序仪与该处理器位于分开的设施中,并且该测序仪与该处理器通过网络相连。In certain embodiments, the system further comprises a sequencer configured to sequence fetal and maternal nucleic acids in a maternal test sample and provide sequence reads in an electronic format. In various embodiments, the sequencer and the processor are located in separate facilities, and the sequencer and the processor are connected via a network.
在不同的实施方案中,系统还进一步包括用于从一个怀孕母亲获取母体测试样品的装置。根据某些实施方案,用于获取母体测试样品的该装置与该处理器位于各别设施中。在不同的实施方案中,系统还包括用于从母体测试样品提取无细胞DNA的装置。在某些实施方案中,用于提取无细胞DNA的该装置与该测序仪位于同一个设施中,并且用于获取母体测试样品的该装置位于一个远端设施中。In various embodiments, the system further comprises a device for obtaining a maternal test sample from a pregnant mother. According to certain embodiments, the device for obtaining the maternal test sample and the processor are located in separate facilities. In various embodiments, the system further comprises a device for extracting cell-free DNA from the maternal test sample. In certain embodiments, the device for extracting cell-free DNA is located in the same facility as the sequencer, and the device for obtaining the maternal test sample is located in a remote facility.
根据某些实施方案,用于将该第一胎儿分数值与该第二胎儿分数值进行比较的程序指令还包括用于确定该两个胎儿分数值是否近似相等的程序指令。According to certain embodiments, the program instructions for comparing the first fetal fraction value to the second fetal fraction value further include program instructions for determining whether the two fetal fraction values are approximately equal.
在某些实施方案中,该系统还包括用于在该两个胎儿分数值近似相等时确定第二方法中暗含的倍数性假设是真实的的程序指令。在某些实施方案中,第二方法中暗含的倍数性假设是该第一感兴趣的染色体具有完整染色体非整倍性。在某些实施方案中,该第一感兴趣的染色体的完整染色体非整倍性是单体性或三体性。In some embodiments, the system further comprises program instructions for determining that a ploidy assumption implicit in the second method is true when the two fetal fraction values are approximately equal. In some embodiments, the ploidy assumption implicit in the second method is that the first chromosome of interest has a complete chromosomal aneuploidy. In some embodiments, the complete chromosomal aneuploidy of the first chromosome of interest is monosomy or trisomy.
在某些实施方案中,该系统还包括用于分析该第一感兴趣的染色体的标签信息以确定(i)该第一感兴趣的染色体是带有一种部分非整倍性,还是(ii) 该胎儿是一个嵌合体的程序指令,其中用于分析的这些程序指令被配置为用于在用于将该第一胎儿分数值与该第二胎儿分数值进行比较的程序指令指示该两个胎儿分数值不近似相等时执行。在某些实施方案中,用于分析该第一感兴趣的染色体的标签信息的程序指令包括:用于将该第一感兴趣的染色体的序列装箱成多个部分的程序指令;用于确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸的程序指令;以及用于若所述部分中的任一个包含比一个或多个其他部分显著更多或显著更少的核酸,则确定该第一感兴趣的染色体带有一种部分非整倍性的程序指令。在某些实施方案中,该系统进一步包括用于确定包含比一个或多个其他部分显著更多或显著更少的核酸的该第一感兴趣的染色体的一个部分带有该部分非整倍性的程序指令。In certain embodiments, the system further comprises program instructions for analyzing the tag information of the first chromosome of interest to determine (i) whether the first chromosome of interest carries a partial aneuploidy, or (ii) whether the fetus is a mosaic, wherein the program instructions for analyzing are configured to be executed when the program instructions for comparing the first fetal fraction value to the second fetal fraction value indicate that the two fetal fraction values are not approximately equal. In certain embodiments, the program instructions for analyzing the tag information of the first chromosome of interest comprise: program instructions for binning the sequence of the first chromosome of interest into a plurality of portions; program instructions for determining whether any of the portions contains significantly more or significantly less nucleic acid than one or more other portions; and program instructions for determining that the first chromosome of interest carries a partial aneuploidy if any of the portions contains significantly more or significantly less nucleic acid than one or more other portions. In certain embodiments, the system further comprises program instructions for determining that a portion of the first chromosome of interest that contains significantly more or significantly less nucleic acid than one or more other portions carries the partial aneuploidy.
在某些实施方案中,用于分析该第一感兴趣的染色体的标签信息的程序指令包括:用于将该第一感兴趣的染色体的序列装箱成多个部分的程序指令;用于确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸的程序指令;以及用于若所述部分中都不包含比一个或多个其他部分显著更多或显著更少的核酸,则确定该胎儿是一个嵌合体的程序指令。In certain embodiments, program instructions for analyzing the tag information for the first chromosome of interest include: program instructions for binning the sequence of the first chromosome of interest into a plurality of portions; program instructions for determining whether any of the portions contains significantly more or significantly less nucleic acid than one or more other portions; and program instructions for determining that the fetus is a mosaic if none of the portions contains significantly more or significantly less nucleic acid than one or more other portions.
根据不同的实施方案,该系统可包括用于计算胎儿分数值的第二方法的程序指令,这些程序指令包括:(a)用于计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签的数目以确定染色体剂量的程序指令;和 (b)用于使用第二方法从该染色体剂量计算胎儿分数值的程序指令。According to various embodiments, the system may include program instructions for a second method for calculating a fetal fraction value, the program instructions comprising: (a) program instructions for counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) program instructions for calculating a fetal fraction value from the chromosome dose using the second method.
在某些实施方案中,该系统进一步包括用于计算归一化的染色体值(NCV) 的程序指令,其中用于第二方法的程序指令包括用于使用该归一化的染色体值的程序指令,并且其中用于该NCV的程序指令将该染色体剂量与在一组合格样品中的相应染色体剂量的均值相关联,作为:In certain embodiments, the system further comprises program instructions for calculating a normalized chromosome value (NCV), wherein the program instructions for the second method comprise program instructions for using the normalized chromosome value, and wherein the program instructions for the NCV relate the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU分别是对于该组合格样品中第i个染色体剂量的估算均值和标准差,并且RiA是针对感兴趣的染色体计算的染色体剂量。在不同的实施方案中,用于该第一方法的程序指令包括用于使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算第一胎儿分数值的程序指令。Wherein σ and σ iU are the estimated mean and standard deviation, respectively, for the i-th chromosome dose in the set of qualified samples, and R iA is the calculated chromosome dose for the chromosome of interest. In various embodiments, the program instructions for the first method include program instructions for calculating a first fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids of the maternal test sample.
根据不同的实施方案,用于计算胎儿分数值的第二方法的程序指令包括用于对以下表达式求值的程序指令:According to various embodiments, the program instructions for the second method of calculating a fetal fraction value include program instructions for evaluating the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是第二胎儿分数值,NCViA是在一个受影响样品中在第i个染色体上的归一化的染色体值,并且CViU是在这些合格样品中确定的感兴趣的染色体的剂量的变异系数。where ff is the second fetal fraction value, NCV iA is the normalized chromosome value on chromosome i in an affected sample, and CV iU is the coefficient of variation of the dose of the chromosome of interest determined in the qualified samples.
根据不同的实施方案,该系统进一步包括:(i)用于确定该拷贝数变异是由一种部分非整倍性还是一个嵌合体引起的程序指令;和(ii)用于若该拷贝数变异由一种部分非整倍性引起,则确定在该第一感兴趣的染色体上的该部分非整倍性的基因座的程序指令,其中(i)和(ii)中的这些程序指令被配置为用于在用于将该第一胎儿分数值与该第二胎儿分数值进行比较的这些程序指令确定该第一胎儿分数值与该第二胎儿分数值不近似相等时执行。According to various embodiments, the system further comprises: (i) program instructions for determining whether the copy number variation is caused by a partial aneuploidy or a mosaicism; and (ii) program instructions for determining the locus of the partial aneuploidy on the first chromosome of interest if the copy number variation is caused by a partial aneuploidy, wherein the program instructions in (i) and (ii) are configured to be executed when the program instructions for comparing the first fetal fraction value to the second fetal fraction value determine that the first fetal fraction value and the second fetal fraction value are not approximately equal.
在某些实施方案中,用于确定在第一感兴趣的染色体上的部分非整倍性的基因座的程序指令包括用于将第一感兴趣的染色体的序列标签分成第一感兴趣的染色体中的核酸数据箱或基块的程序指令;和用于对每一个数据箱中的这些映射标签进行计数的程序指令。In certain embodiments, program instructions for determining the locus of a partial aneuploidy on a first chromosome of interest include program instructions for dividing sequence tags for the first chromosome of interest into bins or blocks of nucleic acid data in the first chromosome of interest; and program instructions for counting the number of mapped tags in each bin.
在某些实施方案中,提供用于在哺乳动物(例如人类)中识别癌症存在和/或癌症风险增加的方法,其中这些方法包括:(a)提供来自所述哺乳动物的一个测试样品中的核酸的序列读数,其中所述测试样品可包括来自癌细胞或癌前细胞的基因组核酸与来自构成(种系)细胞的基因组核酸,其中这些序列读数是以电子格式来提供的;(b)使用一个计算装置将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(c)以计算的方式识别出来自一个或多个已知扩增或缺失与癌症有关联的感兴趣的染色体或已知扩增或缺失与癌症有关联的感兴趣的染色体区段的胎儿和母体核酸的序列标签的数目,其中所述染色体或染色体区段是选自染色体1到22、X和Y以及其区段,并且以计算的方式识别出针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个的至少一个归一化染色体序列或归一化染色体区段序列的序列标签的数目,其中针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的序列标签的数目至少约 2,000,或至少约5,000,或至少约10,000;(d)使用针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的所述序列标签的数目以及针对所述归一化染色体序列或归一化染色体区段序列中的每一个所识别的所述序列标签的数目,以计算的方式计算出针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个单染色体或区段剂量;并且(e) 使用所述计算装置将针对一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的所述单染色体剂量的每一个与针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个相应阈值进行比较,并且由此在所述样品中确定存在或不存在非整倍性,其中所述非整倍性存在和/或所述针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的序列标签数目增加指示存在癌症和/或癌症风险增加。在某些实施方案中,风险增加是与不同时间(例如早期)的同一受试者进行比较,与参考群体(例如针对性别和/或种族和/或年龄等任选调整)进行比较,与无一定风险系数的类似受试者进行比较等等。在某些实施方案中,感兴趣的染色体或感兴趣的染色体区段包括扩增和/或缺失已知与癌症(例如在此所描述)有关联的全染色体。在某些实施方案中,感兴趣的染色体或感兴趣的染色体区段包括扩增或缺失已知与一种或多种癌症有关联的染色体区段。在某些实施方案中,染色体区段包括实质上全染色体臂(例如在此所描述)。在某些实施方案中,染色体区段包括全染色体非整倍性。在某些实施方案中,全染色体非整倍性包括丢失,而在某些其他实施方案中,全染色体非整倍性包括获得(例如如表1中所示的获得或丢失)。在某些实施方案中,感兴趣的染色体区段是实质上臂水平的片段,包括染色体 1到22、X和Y中任意一个或多个的短臂或长臂。在某些实施方案中,非整倍性包括染色体的实质臂水平片段的扩增或染色体的实质臂水平片段的缺失。在某些实施方案中,感兴趣的染色体区段实质上包括选自下组的一个或多个臂,该组由以下各项组成:1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、 9p、9q、10p、10q、12p、12q、13q、14q、16p、17p、17q、18p、18q、19p、 19q、20p、20q、21q和/或22q。在某些实施方案中,非整倍性包括选自下组的一个或多个臂的扩增,该组由以下各项组成:1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、12q、13q、14q、16p、17p、 17q、18p、18q、19p、19q、20p、20q、21q、22q。在某些实施方案中,非整倍性包括选自下组的一个或多个臂的缺失,该组由以下各项组成:1p、3p、4p、 4q、5q、6q、8p、8q、9p、9q、10p、10q、11p、11q、13q、14q、15q、16q、 17p、17q、18p、18q、19p、19q、22q。在某些实施方案中,感兴趣的染色体区段是包括表3和/或表5和/或表4和/或表6中所示的区域和/或基因的片段。在某些实施方案中,非整倍性包括表3和/或表5中所示的区域和/或基因的扩增。在某些实施方案中,非整倍性包括表4和/或6中所示的区域和/或基因的缺失。在某些实施方案中,感兴趣的染色体区段是已知含有一种或多种癌基因和/或一种或多种肿瘤抑制基因的片段。在某些实施方案中,非整倍性包括选自下组的一个或多个区域的扩增,该组由以下各项组成:20Q13、19q12、1q21-1q23、8p11-p12、以及ErbB2。在某些实施方案中,非整倍性包括一个或多个包括选自下组的基因的区域的扩增,该组由以下各项组成:MYC、ERBB2(EFGR)、 CCND1(周期素D1)、FGFR1、FGFR2、HRAS、KRAS、MYB、MDM2、CCNE、 KRAS、MET、ERBB1、CDK4、MYCB、ERBB2、AKT2、MDM2、以及CDK4 等等。在某些实施方案中,癌症是选自下组的癌症,该组由以下各项组成:白血病、ALL、脑癌、乳癌、结肠直肠癌、去分化性脂肪肉瘤、食道腺癌、食道鳞状细胞癌、GIST、神经胶质瘤、HCC、肝细胞的癌、肺癌、肺NSC、肺SC、髓母细胞瘤、黑色素瘤、MPD、骨髓增生性障碍、子宫颈癌、卵巢癌、前列腺癌、以及肾癌。在某些实施方案中,生物样品包括选自下组的样品,该组由以下各项组成:全血、血块、唾液/口水、尿、组织活检、胸膜液、心包液、脑髓液、以及腹膜液。在某些实施方案中,染色体参考序列具有多个被排除的区域,这些被排除的区域天然地存在于染色体中但它们对于任何染色体或染色体区段而言不影响其序列标签的数目。在某些实施方案中,该方法进一步包括确定是否将一个予以考虑的读数与在一个染色体参考序列上的一个位点进行比对,而在该位点另一个读数先前进行了比对;并且确定是否将这个予以考虑的读数包括在针对一个感兴趣的染色体或一个感兴趣的染色体区段的序列标签的数目之中,其中两个确定操作都是用该计算装置来执行的。在不同的实施方案中,该方法进一步包括至少暂时在一种计算机可读媒体(例如非暂时性媒体)中存储针对所述样品中所述核酸的序列信息。在某些实施方案中,步骤(d)包括针对感兴趣的区段中所选定的一个以计算的方式计算出区段剂量作为针对该选定的感兴趣的区段所识别的序列标签的数目与针对该选定的感兴趣的区段的相应至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签的数目的比率。在某些实施方案中,所述一个或多个感兴趣的染色体区段包括至少5个或至少10个或至少15个或至少20个或至少50个或至少100个不同的感兴趣的区段。在某些实施方案中,检测至少5个或至少10个或至少15个或至少20个或至少50个或至少100个不同的非整倍性。在某些实施方案中,至少一个归一化染色体序列包括选自下组的一种或多种染色体,该组由染色体 1到22、X和Y组成。在某些实施方案中,对于每一个区段,所述至少一个归一化染色体序列包括与所述区段所位于的染色体相对应的染色体。在某些实施方案中,对于每一个区段,所述至少一个归一化染色体序列包括与正被归一化的染色体区段相对应的染色体区段。在某些实施方案中,至少一个归一化染色体序列或归一化染色体区段序列是针对一种相关联的感兴趣的染色体或区段所选定的一个染色体或区段,这是通过以下方式进行的,即:(i)识别针对该感兴趣的区段的多个合格样品;(ii)使用多个潜在的归一化染色体序列或归一化染色体区段序列来针对该所选定的染色体重复计算染色体剂量;并且(iii) 单独地或在一种组合中对该归一化染色体区段序列进行选择,从而在所计算的染色体剂量中给出最小的变异性和/或最大的可分辨性。在某些实施方案中,该方法进一步包括计算归一化的区段值(NSV),其中如在此所描述,所述NSV 将所述区段剂量与一组合格样品中相应区段剂量的均值相关联。在某些实施方案中,归一化区段序列是染色体1到22、X和Y中任意一个或多个的一个单区段。在某些实施方案中,归一化区段序列是染色体1到22、X和Y中任意一个或多个的一组区段。在某些实施方案中,归一化区段序列包括实质上染色体1到22、X和Y中任意一个或多个的一个臂。在某些实施方案中,该方法进一步包括对所述测试样品的所述核酸分子中的至少一部分进行测序,以便获得所述序列信息。在某些实施方案中,测序包括对来自测试样品的无细胞DNA 进行测序以提供序列信息。在某些实施方案中,测序包括对来自测试样品的细胞DNA进行测序以提供序列信息。在某些实施方案中,测序包括大规模平行测序。在某些实施方案中,该(这些)方法进一步包括在提供测试样品的人类受试者的患者病历卡中自动记录如在(d)中所确定的存在或不存在一种非整倍性,其中该记录是使用处理器来执行的。在某些实施方案中,记录包括在一种计算机可读媒体中记录染色体剂量和/或基于所述染色体剂量的诊断。在不同的实施方案中,患者病历卡是由实验室、医生办公室、医院、健康维护组织、保险公司、或个人病历卡网站来保存的。在某些实施方案中,确定存在或不存在所述非整倍性和/或数目包括一种针对癌症的鉴别诊断中的一个因素。在某些实施方案中,非整倍性的检测指示阳性结果,并且所述方法进一步包括对取测试样品的人类受试者开处方、开始治疗、和/或改变治疗。在某些实施方案中,对取测试样品的人类受试者开处方、开始治疗、和/或改变治疗包括开处方和/ 或执行进一步诊断以确定癌症的存在和/或严重程度。在某些实施方案中,进一步诊断包括针对癌症生物标记物,筛选来自所述受试者的样品,和/或针对癌症,对所述受试者进行成像。在某些实施方案中,当所述方法指示所述哺乳动物中存在赘生性细胞时,治疗所述哺乳动物或使所述哺乳动物进行治疗,以除去所述赘生性细胞和/或抑制所述赘生性细胞的生长或增殖。在某些实施方案中,治疗哺乳动物包括通过手术除去赘生性(例如肿瘤)细胞。在某些实施方案中,治疗哺乳动物包括对所述哺乳动物执行放射线疗法或使所述哺乳动物执行放射线疗法,以杀死赘生性细胞。在某些实施方案中,治疗哺乳动物包括给予或使所述哺乳动物被给予抗癌药(例如马妥珠单抗(matuzumab)、爱必妥(erbitux)、维克替比(vectibix)、尼妥珠单抗(nimotuzumab)、马妥珠单抗、帕尼单抗 (panitumumab)、氟尿嘧啶(flourouracil)、卡培他滨(capecitabine)、5-三氟甲基-2'-脱氧尿苷(5-trifluoromethy1-2'-deoxyuridine)、甲氨蝶呤(methotrexate)、雷替曲塞(raltitrexed)、培美曲塞(pemetrexed)、阿糖胞苷(cytosine arabinoside)、 6-巯基嘌呤(6-mercaptopurine)、硫唑嘌呤(azathioprine)、6-硫代鸟嘌呤 (6-thioguanine)、喷司他丁(pentostatin)、氟达拉滨(fludarabine)、克拉屈滨 (cladribine)、氟尿核苷(floxuridine)、环磷酰胺(cyclophosphamide)、纽沙 (neosar)、异环磷酰胺(ifosfamide)、硫替派(thiotepa)、1,3-双(2-氯乙基)-1- 亚硝基脲、1-(2-氯乙基)-3-环己基-1-亚硝基脲、六甲蜜胺(hexamethylmelamine)、白消安(busulfan)、丙卡巴肼(procarbazine)、氮烯唑胺(dacarbazine)、苯丁酸氮芥(chlorambucil)、美法仑(melphalan)、顺铂(cisplatin)、卡波铂 (carboplatin)、奥沙利铂(oxaliplatin)、苯达莫司汀(bendamustine)、卡莫司汀(carmustine)、氮芥(chloromethine)、氮烯唑胺、福莫司汀(fotemustine)、洛莫司汀(lomustine)、甘露舒凡(mannosulfan)、奈达铂(nedaplatin)、尼莫司汀(nimustine)、泼尼莫司汀(prednimustine)、雷莫司汀(ranimustine)、沙铂(satraplatin)、司莫司汀(semustine)、链脲霉素(streptozocin)、替莫唑胺(temozolomide)、曲奥舒凡(treosulfan)、三亚胺醌(triaziquone)、三乙撑蜜胺(triethylene melamine)、硫替派(thiotepa)、四硝酸三铂(triplatin tetranitrate)、氯乙环磷酰胺(trofosfamide)、尿嘧啶氮芥(uramustine)、小红霉(doxorubicin)、道诺霉素(daunorubicin)、米托蒽醌(mitoxantrone)、依托泊苷(etoposide)、托泊替康(topotecan)、替尼泊苷(teniposide)、依立替康(irinotecan)、卡莫托沙(camptosar)、喜树碱(camptothecin)、贝洛替康(belotecan)、卢比替康 (rubitecan)、长春新碱(vincristine)、长春花碱(vinblastine)、长春瑞滨 (vinorelbine)、长春地辛(vindesine)、紫杉醇(paclitaxel)、多西紫杉醇(docetaxel)、阿布克恩(abraxane)、伊沙匹隆(ixabepilone)、拉若塔西(larotaxel)、奥他塔西(ortataxel)、特塞塔西(tesetaxel)、长春氟宁(vinflunine)、甲磺酸伊马替尼(imatinib mesylate)、苹果酸舒尼替尼(sunitinib malate)、甲苯磺酸索拉非尼(sorafenib tosylate)、尼洛替尼盐酸盐单水合物/、塔斯纳(tasigna)、塞玛克尼(semaxanib)、凡德他尼(vandetanib)、瓦他拉尼(vatalanib)、视黄酸(retinoic acid)、视黄酸衍生物等等)。In certain embodiments, methods for identifying the presence of cancer and/or an increased risk of cancer in a mammal (e.g., a human) are provided, wherein the methods comprise: (a) providing sequence reads of nucleic acids from a test sample from the mammal, wherein the test sample can comprise genomic nucleic acids from cancerous or precancerous cells and genomic nucleic acids from constitutional (germline) cells, wherein the sequence reads are provided in an electronic format; (b) aligning the sequence reads to one or more chromosome reference sequences using a computing device and thereby providing a plurality of sequence tags corresponding to the sequence reads; (c) computationally identifying the number of sequence tags from fetal and maternal nucleic acids from one or more chromosomes of interest whose amplification or deletion is known to be associated with cancer or chromosome segments of interest whose amplification or deletion is known to be associated with cancer, wherein the chromosomes or chromosome segments are selected from chromosomes 1 to 22, X and Y, and segments thereof, and computationally identifying the number of sequence tags for at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest or chromosome segments of interest, wherein the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest is at least about 2,000, or at least about 5,000, or at least about 10,000; (d) computationally calculating a single chromosome or segment dose for each of the one or more chromosomes of interest or chromosome segments of interest using the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest and the number of sequence tags identified for each of the normalizing chromosome sequence or normalizing chromosome segment sequence; and (e) comparing each of the single chromosome doses for each of the one or more chromosomes of interest or chromosome segments of interest to a corresponding threshold value for each of the one or more chromosomes of interest or chromosome segments of interest using the computing device, and thereby determining the presence or absence of an aneuploidy in the sample, wherein the presence of the aneuploidy and/or an increase in the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest indicates the presence of cancer and/or an increased risk of cancer. In certain embodiments, the increased risk is compared with the same subject at a different time (e.g., early stage), compared with a reference population (e.g., optionally adjusted for sex and/or race and/or age), compared with a similar subject without a certain risk factor, etc. In certain embodiments, the chromosome of interest or the chromosome segment of interest includes amplification and/or deletion of a whole chromosome known to be associated with cancer (e.g., as described herein). In certain embodiments, the chromosome of interest or the chromosome segment of interest includes amplification or deletion of a chromosome segment known to be associated with one or more cancers. In certain embodiments, the chromosome segment includes substantially a whole chromosome arm (e.g., as described herein). In certain embodiments, the chromosome segment includes a whole chromosome aneuploidy. In certain embodiments, the whole chromosome aneuploidy includes loss, and in certain other embodiments, the whole chromosome aneuploidy includes acquisition (e.g., acquisition or loss as shown in Table 1). In certain embodiments, the chromosome segment of interest is a fragment of substantially arm level, including any one or more short arms or long arms of chromosomes 1 to 22, X and Y. In certain embodiments, aneuploidy includes the amplification of a substantial arm level fragment of a chromosome or the deletion of a substantial arm level fragment of a chromosome. In certain embodiments, the chromosome segment of interest substantially includes one or more arms selected from the group consisting of 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q. In certain embodiments, the aneuploidy comprises an amplification of one or more arms selected from the group consisting of: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q, 22q. In certain embodiments, the aneuploidy comprises a deletion of one or more arms selected from the group consisting of 1p, 3p, 4p, 4q, 5q, 6q, 8p, 8q, 9p, 9q, 10p, 10q, 11p, 11q, 13q, 14q, 15q, 16q, 17p, 17q, 18p, 18q, 19p, 19q, 22q. In certain embodiments, the chromosome segment of interest is a fragment comprising the regions and/or genes shown in Table 3 and/or Table 5 and/or Table 4 and/or Table 6. In certain embodiments, the aneuploidy comprises an amplification of the regions and/or genes shown in Table 3 and/or Table 5. In certain embodiments, the aneuploidy comprises a deletion of the regions and/or genes shown in Table 4 and/or 6. In certain embodiments, the chromosome segment of interest is a fragment known to contain one or more oncogenes and/or one or more tumor suppressor genes. In certain embodiments, the aneuploidy comprises an amplification of one or more regions selected from the group consisting of 20Q13, 19q12, 1q21-1q23, 8p11-p12, and ErbB2. In certain embodiments, the aneuploidy comprises an amplification of one or more regions comprising genes selected from the group consisting of MYC, ERBB2 (EFGR), CCND1 (Cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2, and CDK4, etc. In certain embodiments, the cancer is a cancer selected from the group consisting of leukemia, ALL, brain cancer, breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST, glioma, HCC, hepatocellular carcinoma, lung cancer, lung NSC, lung SC, medulloblastoma, melanoma, MPD, myeloproliferative disorders, cervical cancer, ovarian cancer, prostate cancer, and renal cancer. In certain embodiments, the biological sample includes a sample selected from the group consisting of whole blood, blood clots, saliva/saliva, urine, tissue biopsy, pleural fluid, pericardial fluid, cerebrospinal fluid, and peritoneal fluid. In certain embodiments, the chromosome reference sequence has a plurality of excluded regions that are naturally present in the chromosome but that do not affect the number of sequence tags for any chromosome or chromosome segment. In certain embodiments, the method further comprises determining whether a read under consideration is aligned to a site on a chromosome reference sequence at which another read was previously aligned; and determining whether the read under consideration is included in the number of sequence tags for a chromosome of interest or a chromosome segment of interest, wherein both determination operations are performed using the computing device. In various embodiments, the method further comprises storing, at least temporarily, sequence information for the nucleic acid in the sample in a computer-readable medium (e.g., a non-transitory medium). In certain embodiments, step (d) comprises calculating a segment dose for a selected one of the segments of interest as the ratio of the number of sequence tags identified for the selected segment of interest to the number of sequence tags identified for at least one normalizing chromosome sequence or normalizing chromosome segment sequence corresponding to the selected segment of interest. In certain embodiments, the one or more chromosome segments of interest include at least 5, or at least 10, or at least 15, or at least 20, or at least 50, or at least 100 different segments of interest. In certain embodiments, at least 5 or at least 10 or at least 15 or at least 20 or at least 50 or at least 100 different aneuploidies are detected. In certain embodiments, at least one normalizing chromosome sequence includes one or more chromosomes selected from the group consisting of chromosomes 1 to 22, X, and Y. In certain embodiments, for each segment, the at least one normalizing chromosome sequence includes a chromosome corresponding to the chromosome in which the segment is located. In certain embodiments, for each segment, the at least one normalizing chromosome sequence includes a chromosome segment corresponding to the chromosome segment being normalized. In certain embodiments, at least one normalizing chromosome sequence or normalizing chromosome segment sequence is a chromosome or segment selected for an associated chromosome of interest or segment, which is performed in the following manner: (i) identifying multiple qualified samples for the segment of interest; (ii) using multiple potential normalizing chromosome sequences or normalizing chromosome segment sequences to repeatedly calculate chromosome doses for the selected chromosome; and (iii) selecting the normalizing chromosome segment sequence alone or in a combination to provide minimal variability and/or maximum distinguishability in the calculated chromosome doses. In certain embodiments, the method further includes calculating a normalized segment value (NSV), wherein as described herein, the NSV associates the segment dose with the mean of the corresponding segment dose in a group of qualified samples. In certain embodiments, the normalizing segment sequence is a single segment of any one or more of chromosomes 1 to 22, X, and Y. In certain embodiments, the normalizing segment sequence is a group of segments of any one or more of chromosomes 1 to 22, X, and Y. In certain embodiments, the normalizing segment sequence includes substantially one arm of any one or more of chromosomes 1 to 22, X and Y. In certain embodiments, the method further includes sequencing at least a portion of the nucleic acid molecules of the test sample to obtain the sequence information. In certain embodiments, sequencing includes sequencing the cell-free DNA from the test sample to provide sequence information. In certain embodiments, sequencing includes sequencing the cell DNA from the test sample to provide sequence information. In certain embodiments, sequencing includes massively parallel sequencing. In certain embodiments, the method (these) further includes automatically recording the presence or absence of a kind of aneuploidy as determined in (d) in the patient medical record card of the human subject providing the test sample, wherein the record is performed using a processor. In certain embodiments, the record includes recording chromosome dosage and/or a diagnosis based on the chromosome dosage in a computer-readable medium. In different embodiments, the patient medical record card is saved by a laboratory, a doctor's office, a hospital, a health maintenance organization, an insurance company, or a personal medical record card website. In certain embodiments, determining the presence or absence of the aneuploidy and/or number includes a factor in a differential diagnosis for cancer. In certain embodiments, the detection of aneuploidy indicates a positive result, and the method further includes prescribing, starting treatment, and/or changing treatment to a human subject taking a test sample. In certain embodiments, prescribing, starting treatment, and/or changing treatment to a human subject taking a test sample includes prescribing and/or performing further diagnosis to determine the presence and/or severity of cancer. In certain embodiments, further diagnosis includes screening samples from the subject for cancer biomarkers, and/or imaging the subject for cancer. In certain embodiments, when the method indicates the presence of neoplastic cells in the mammal, the mammal is treated or the mammal is treated to remove the neoplastic cells and/or inhibit the growth or proliferation of the neoplastic cells. In certain embodiments, treating a mammal includes surgically removing neoplastic (e.g., tumor) cells. In certain embodiments, treating a mammal includes performing radiotherapy on the mammal or causing the mammal to perform radiotherapy to kill the neoplastic cells. In certain embodiments, treating a mammal comprises administering or causing the mammal to be administered an anticancer drug (e.g., matuzumab, erbitux, vectibix, nimotuzumab, matuzumab, panitumumab, floururacil, capecitabine, 5-trifluoromethyl-2'-deoxyuridine, methotrexate, raltitrexed, pemetrexed, cytosine arabinoside, 6-mercaptopurine, azathioprine, 6-thioguanine, pentostatin, fludarabine, cladribine). cladribine, floxuridine, cyclophosphamide, neosar, ifosfamide, thiotepa, 1,3-bis(2-chloroethyl)-1-nitrosourea, 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea, hexamethylmelamine, busulfan, procarbazine, dacarbazine, chlorambucil, melphalan, cisplatin, carboplatin carboplatin, oxaliplatin, bendamustine, carmustine, chloromethine, dacarbazine, fotemustine, lomustine, mannosulfan, nedaplatin, nimustine, prednimustine, ranimustine, satraplatin, semustine, streptozocin, temozolomide, treosulfan, triaziquone, triethylene melamine, thiotepa, triplatin tetranitrate tetranitrate), trofosfamide, uramustine, doxorubicin, daunorubicin, mitoxantrone, etoposide, topotecan, teniposide, irinotecan, camptosar, camptothecin, belotecan, rubitecan, vincristine, vinblastine, vinorelbine (vinorelbine, vindesine, paclitaxel, docetaxel, abraxane, ixabepilone, larotaxel, ortataxel, tesetaxel, vinflunine, imatinib mesylate, sunitinib malate, sorafenib tosylate, nilotinib hydrochloride monohydrate, tasigna, semaxanib, vandetanib, vatalanib, retinoic acid, retinoic acid derivatives, etc.).
在另一个实施方案中,提供一种用于在哺乳动物中确定癌症存在和/或癌症风险增加的计算机程序产品。该计算机程序产品典型地包括:(a)用于提供来自所述哺乳动物的一个测试样品中的核酸的序列读数的代码,其中所述测试样品可包括来自癌细胞或癌前细胞的基因组核酸与来自构成(种系)细胞的基因组核酸,其中这些序列读数是以电子格式来提供的;(b)使用一个计算装置用于将这些序列读数与一个或多个染色体参考序列进行比对并且由此提供与这些序列读数相对应的多个序列标签的代码;(c)用于以计算的方式针对来自一种或多种已知扩增或缺失与癌症有关联的感兴趣的染色体或已知扩增或缺失与癌症有关联的感兴趣的染色体区段识别出来自胎儿和母体核酸的序列标签的数目,其中所述染色体或染色体区段选自染色体1到22、X和Y以及其区段,并且以计算的方式识别出这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个的至少一个归一化染色体序列或归一化染色体区段序列的序列标签的数目的代码,其中针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的序列标签的数目是至少约10,000;(d)使用针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的所述序列标签的数目以及针对所述归一化染色体序列或归一化染色体区段序列中的每一个所识别的所述序列标签的数目,以计算的方式计算出针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个单染色体或区段剂量的代码;以及(e)使用所述计算装置将针对一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的所述单染色体剂量的每一个与所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个的一个相应阈值进行比较、并且由此在所述样品中确定存在或不存在非整倍性的代码,其中所述非整倍性存在和/或所述针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个所识别的序列标签数目增加指示癌症存在和/或癌症风险增加。在不同的实施方案中,代码提供用于执行如上文(和下文)所述的诊断方法的指令。In another embodiment, a computer program product for determining the presence of cancer and/or increased risk of cancer in a mammal is provided. The computer program product typically comprises: (a) code for providing sequence readings of nucleic acids in a test sample from the mammal, wherein the test sample may include genomic nucleic acids from cancer cells or precancerous cells and genomic nucleic acids from constitutional (germline) cells, wherein the sequence readings are provided in an electronic format; (b) using a computing device to align the sequence readings with one or more chromosome reference sequences and thereby provide a plurality of sequence tags corresponding to the sequence readings; (c) code for computationally identifying the number of sequence tags from fetal and maternal nucleic acids for one or more chromosomes of interest whose amplification or deletion is known to be associated with cancer or chromosome segments of interest whose amplification or deletion is known to be associated with cancer, wherein the chromosomes or chromosome segments are selected from chromosomes 1 to 22, X and Y, and segments thereof, and computationally identifying the number of sequence tags of at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest or chromosome segments of interest, wherein the number of sequence tags for the one or more chromosomes of interest or chromosome segments of interest is calculated. The number of sequence tags identified for each of the chromosomes or chromosome segments of interest is at least about 10,000; (d) using the number of sequence tags identified for each of the one or more chromosomes of interest or chromosome segments of interest and the number of sequence tags identified for each of the normalizing chromosome sequence or normalizing chromosome segment sequence, a code for calculating a single chromosome or segment dose for each of the one or more chromosomes of interest or chromosome segments of interest is calculated; and (e) using the computing device to compare each of the single chromosome doses for each of the one or more chromosomes of interest or chromosome segments of interest with a corresponding threshold value for each of the one or more chromosomes of interest or chromosome segments of interest, and thereby determining the presence or absence of aneuploidy in the sample, wherein the presence of aneuploidy and/or the increase in the number of sequence tags identified for each of this or these chromosomes of interest or chromosome segments of interest indicates that cancer exists and/or cancer risk increases. In various embodiments, the code provides instructions for performing the diagnostic method as described above (and below).
还提供治疗癌症受试者的方法。在某些实施方案中,这些方法包括执行如在此所述的一种用于在哺乳动物中识别癌症存在和/或癌症风险增加的方法,该方法使用来自受试者的一个样品或接收对该样品执行的此类方法的结果;并且当该方法单独地或与来自针对癌症的一种鉴别诊断的一个或多个其他指标相组合而表明所述受试者中存在赘生性细胞时,治疗受试者,或使受试者进行治疗,以除去赘生性细胞和/或抑制赘生性细胞的生长或增殖。在某些实施方案中,治疗所述受试者包括通过手术除去细胞。在某些实施方案中,治疗受试者包括对受试者执行放射线疗法或使执行放射线疗法,以杀死所述赘生性细胞。在某些实施方案中,治疗受试者包括给予或使受试者被给予抗癌药(例如马妥珠单抗、爱必妥、维克替比、尼妥珠单抗、马妥珠单抗、帕尼单抗、氟尿嘧啶、卡培他滨、5-三氟甲基-2'-脱氧尿苷、甲氨蝶呤、雷替曲塞、培美曲塞、阿糖胞苷、6-巯基嘌呤、硫唑嘌呤、6-硫代鸟嘌呤、喷司他丁、氟达拉滨、克拉屈滨、氟尿核苷、环磷酰胺、纽沙、异环磷酰胺、硫替派、1,3-双(2-氯乙基)-1-亚硝基脲、1-(2-氯乙基)-3-环己基-1-亚硝基脲、六甲蜜胺、白消安、丙卡巴肼、氮烯唑胺、苯丁酸氮芥、美法仑、顺铂、卡波铂、奥沙利铂、苯达莫司汀、卡莫司汀、氮芥、氮烯唑胺、福莫司汀、洛莫司汀、甘露舒凡、奈达铂、尼莫司汀、泼尼莫司汀、雷莫司汀、沙铂、司莫司汀、链脲霉素、替莫唑胺、曲奥舒凡、三亚胺醌、三乙撑蜜胺、硫替派、四硝酸三铂、氯乙环磷酰胺、尿嘧啶氮芥、小红霉、道诺霉素、米托蒽醌、依托泊苷、托泊替康、替尼泊苷、依立替康、卡莫托沙、喜树碱、贝洛替康、卢比替康、长春新碱、长春花碱、长春瑞滨、长春地辛、紫杉醇、多西紫杉醇、阿布克恩、伊沙匹隆、拉若塔西、奥他塔西、特塞塔西、长春氟宁、甲磺酸伊马替尼、苹果酸舒尼替尼、甲苯磺酸索拉非尼、尼洛替尼盐酸盐单水合物/、塔斯纳、塞玛克尼、凡德他尼、瓦他拉尼、视黄酸、视黄酸衍生物等等)。Also provided are methods for treating cancer subjects. In certain embodiments, these methods include performing a method for identifying the presence of cancer and/or an increased risk of cancer in a mammal as described herein, the method using a sample from the subject or receiving the results of such a method performed on the sample; and when the method alone or in combination with one or more other indicators from a differential diagnosis for cancer indicates the presence of neoplastic cells in the subject, the subject is treated, or the subject is treated to remove neoplastic cells and/or inhibit the growth or proliferation of neoplastic cells. In certain embodiments, treating the subject includes surgically removing the cells. In certain embodiments, treating the subject includes performing radiotherapy on the subject or causing the subject to perform radiotherapy to kill the neoplastic cells. In certain embodiments, treating a subject comprises administering or causing the subject to be administered an anticancer drug (e.g., matuzumab, erbitux, vectibix, nimotuzumab, matuzumab, panitumumab, fluorouracil, capecitabine, 5-trifluoromethyl-2'-deoxyuridine, methotrexate, raltitrexed, pemetrexed, cytarabine, 6-mercaptopurine, azathioprine, 6-thioguanine, pentostatin, fludastatin, Labine, cladribine, floxuridine, cyclophosphamide, neosarcoma, ifosfamide, thiotepa, 1,3-bis(2-chloroethyl)-1-nitrosourea, 1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea, hexamethylmelamine, busulfan, procarbazine, dacarbazine, chlorambucil, melphalan, cisplatin, carboplatin, oxaliplatin, bendamustine, carmustine, nitrogen mustard, dacarbazine, and formox tin, lomustine, ganlusuvan, nedaplatin, nimustine, prednimustine, ranimustine, satraplatin, semustine, streptozotocin, temozolomide, treoxazole, triazoline, triethylene melamine, thiotepa, triplatin tetranitrate, clofosfamide, uracil nitrogen mustard, erythromycin, daunorubicin, mitoxantrone, etoposide, topotecan, teniposide, irinotecan, cabotosac, camptothecin, belotecan , rubitecan, vincristine, vinblastine, vinorelbine, vindesine, paclitaxel, docetaxel, abuconazole, ixabepilone, larotaxis, octacil, tesetacil, vinflunine, imatinib mesylate, sunitinib malate, sorafenib tosylate, nilotinib hydrochloride monohydrate/, tasna, semacni, vandetanib, vatalanib, retinoic acid, retinoic acid derivatives, etc.).
还提供监测癌症受试者的治疗的方法。在不同的实施方案中,这些方法包括在治疗前或治疗期间对来自受试者的一个样品执行如在此所述的一种用于在哺乳动物中识别癌症存在和/或癌症风险增加的方法或接收对该样品执行的此类方法的结果;并且在治疗期间的稍迟时间或治疗后对来自受试者的第二个样品再次执行该方法或接收对该第二个样品执行的此类方法的结果;其中第二次测量(例如与第一次测量进行比较)中非整倍性的数目或严重程度降低(例如非整倍性频率降低和/或某些非整倍性减少或不存在)指示阳性疗程并且第二次测量(例如与第一次测量进行比较)中非整倍性的数目或严重程度相同或增加指示阴性疗程,并且当所述指示呈阴性时,将所述治疗方案调整成更具侵袭性的治疗方案和/或姑息性治疗方案。Also provided is a method for monitoring the treatment of a cancer subject. In various embodiments, these methods include performing a method as described herein for identifying the presence of cancer and/or an increased risk of cancer in a mammal on a sample from a subject before or during treatment, or receiving the result of such a method performed on the sample; and performing the method again on a second sample from the subject later during treatment or after treatment, or receiving the result of such a method performed on the second sample; wherein the number or severity of aneuploidies in the second measurement (for example, compared with the first measurement) is reduced (for example, the frequency of aneuploidies is reduced and/or some aneuploidies are reduced or absent) indicating a positive course of treatment and the number or severity of aneuploidies in the second measurement (for example, compared with the first measurement) is the same or increases indicating a negative course of treatment, and when the indication is negative, the treatment regimen is adjusted to a more aggressive treatment regimen and/or a palliative treatment regimen.
还提供在包含胎儿和母体核酸的混合物的母体样品中确定胎儿核酸的分数的方法。在一个实施方案中,所述用于在一个母体样品中确定胎儿分数的方法包括:(a)接收来自该母体测试样品中的胎儿和母体核酸的序列读数;(b)将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(c)识别出来自于选自染色体1到22、X和Y 以及其区段的一个或多个感兴趣的染色体或感兴趣的染色体区段的那些序列标签的一个数目,并且针对这个或这些感兴趣的染色体或感兴趣的染色体区段中的每一个识别出来自至少一个归一化染色体序列或归一化染色体区段序列的那些序列标签的一个数目,以确定一个染色体剂量或染色体区段剂量,其中,所述一个或多个感兴趣的染色体或感兴趣的染色体区段具有拷贝数变异;和(d) 使用与步骤(c)中所识别的所述拷贝数变异相对应的所述染色体剂量或染色体区段剂量来确定所述胎儿分数。在一些实施方案中,所述拷贝数变异是通过将所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的剂量与针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的一个相应阈值进行比较,来确定的。拷贝数变异可以选自下组,该组由以下各项组成:完整染色体复制、完整染色体缺失、部分复制、部分倍增、部分插入以及部分缺失。Also provided are methods for determining the fraction of fetal nucleic acid in a maternal sample comprising a mixture of fetal and maternal nucleic acid. In one embodiment, the method for determining the fetal fraction in a maternal sample comprises: (a) receiving sequence reads of fetal and maternal nucleic acid from the maternal test sample; (b) aligning the sequence reads to one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads; (c) identifying a number of sequence tags from one or more chromosomes of interest or chromosome segments of interest selected from chromosomes 1 to 22, X and Y, and segments thereof, and identifying a number of sequence tags from at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest or chromosome segments of interest to determine a chromosome dose or chromosome segment dose, wherein the one or more chromosomes of interest or chromosome segments of interest have a copy number variation; and (d) using the chromosome dose or chromosome segment dose corresponding to the copy number variation identified in step (c) to determine the fetal fraction. In some embodiments, the copy number variation is determined by comparing the dose of each of the one or more chromosomes of interest or chromosome segments of interest to a corresponding threshold for each of the one or more chromosomes of interest or chromosome segments of interest. The copy number variation can be selected from the group consisting of: complete chromosome duplication, complete chromosome deletion, partial duplication, partial doubling, partial insertion, and partial deletion.
在某些实施方案中,步骤(c)中的染色体或区段剂量计算为针对所述所选定的感兴趣的染色体或区段所识别的序列标签的数目与针对所选定的感兴趣的染色体或区段的相应的至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签的数目的比率。在一些实施方案中,步骤(c)中的染色体或区段剂量计算为所述选定的感兴趣的染色体或区段的序列标签密度比与每个所述选定的感兴趣的染色体或区段的至少一个相应归一化染色体序列或归一化染色体区段序列的序列标签密度比的比率。In certain embodiments, the chromosome or segment dose in step (c) is calculated as the ratio of the number of sequence tags identified for the selected chromosome or segment of interest to the number of sequence tags identified for the corresponding at least one normalizing chromosome sequence or normalizing chromosome segment sequence of the selected chromosome or segment of interest. In some embodiments, the chromosome or segment dose in step (c) is calculated as the ratio of the sequence tag density ratio of the selected chromosome or segment of interest to the sequence tag density ratio of at least one corresponding normalizing chromosome sequence or normalizing chromosome segment sequence of each selected chromosome of interest or segment of interest.
在某些实施方案中,该方法进一步包括计算出一个归一化染色体值 (NCV),其中计算该NCV使该染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联,作为:In certain embodiments, the method further comprises calculating a normalized chromosome value (NCV), wherein the NCV is calculated to relate the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU对应地是对于在该组合格样品中的第i个染色体剂量的估算平均值以及标准差,并且RiA是是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述感兴趣的染色体。接着根据以下表达式确定胎儿分数:Where σ and σ are the estimated mean and standard deviation of the dose for the i-th chromosome in the set of qualified samples, respectively, and R is the calculated chromosome dose for the i-th chromosome in the test sample, where the i-th chromosome is the chromosome of interest. The fetal fraction is then determined according to the following expression:
ff=2×|NCViACViu|ff=2×|NCV iA CV iu |
其中ff是胎儿分数值,NCViA是在一个受影响样品中在第i个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的第i个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体。Wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on chromosome i in an affected sample, and CV iU is the coefficient of variation of the dose of chromosome i determined in the qualified samples, where chromosome i is the chromosome of interest.
在某些实施方案中,该胎儿分数使用一个归一化区段值(NSV)确定,其中该NSV使该染色体区段剂量与在一组合格样品中的相应的染色体区段剂量的平均值进行关联,作为:In certain embodiments, the fetal fraction is determined using a normalized segment value (NSV), wherein the NSV relates the chromosome segment dose to the mean of the corresponding chromosome segment dose in a set of qualified samples as:
其中和σiu对应地是对于在该组合格样品中的第i个染色体区段剂量的估算平均值以及标准差,并且RiA是针对测试样品中第i个染色体区段计算出的染色体区段剂量,其中所述第i个染色体区段是所述感兴趣的染色体区段。接着根据以下表达式确定胎儿分数:Wherein σ and σ iu are the estimated mean and standard deviation of the dose for the i-th chromosome segment in the set of qualified samples, respectively, and R iA is the calculated chromosome segment dose for the i-th chromosome segment in the test sample, wherein the i-th chromosome segment is the chromosome segment of interest. The fetal fraction is then determined according to the following expression:
ff=2×|NSViACViU|ff=2×|NSV iA CV iU |
其中ff是胎儿分数值,NSViA是在一个受影响样品中在第i个染色体区段上的归一化的染色体区段值,并且CViU是在所述合格样品中确定的第i个染色体区段的剂量的变异系数,其中所述第i个染色体区段是所述感兴趣的染色体区段。Wherein ff is the fetal fraction value, NSV iA is the normalized chromosome segment value on the i-th chromosome segment in an affected sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome segment determined in the qualified samples, wherein the i-th chromosome segment is the chromosome segment of interest.
在某些实施方案中,所述感兴趣的染色体是染色体1-22或者男性胎儿的 X染色体的任意一个染色体,所述感兴趣的染色体区段选自染色体1-22或者男性胎儿的X染色体。In certain embodiments, the chromosome of interest is any one of chromosomes 1-22 or the X chromosome of a male fetus, and the chromosome segment of interest is selected from chromosomes 1-22 or the X chromosome of a male fetus.
在某些实施方案中,用于确定胎儿分数的方法的多个实施方案的该至少一个归一化染色体序列或归一化染色体区段序列是针对一种相关联的感兴趣的染色体或区段所选定的一个染色体或区段,这是通过以下方式进行的,即: (i)识别针对该感兴趣的染色体或区段的多个合格样品;(ii)使用多个潜在的归一化染色体序列或归一化染色体区段序列来针对该所选定的染色体或区段重复计算染色体剂量或染色体区段剂量;并且(iii)单独地或在一种组合中对该归一化染色体序列或归一化染色体区段序列进行选择,从而在所计算的染色体剂量或染色体区段剂量中给出最小的变异性或最大的可分辨性。归一化染色体序列可以是染色体1到22、X和Y中任意一个或多个的一个单染色体。可替代地,归一化染色体序列可以是染色体1到22、X和Y中任何染色体的一组染色体同样,归一化区段序列可以是染色体1到22、X和Y中任意一个或多个的一个单区段。可替代地,归一化区段序列可以是染色体1到22、X和Y 中任意一个或多个的一组区段。In certain embodiments, the at least one normalizing chromosome sequence or normalizing chromosome segment sequence of the various embodiments of the method for determining fetal fraction is a chromosome or segment selected for an associated chromosome or segment of interest by: (i) identifying multiple qualified samples for the chromosome or segment of interest; (ii) repeatedly calculating chromosome doses or chromosome segment doses for the selected chromosome or segment using multiple potential normalizing chromosome sequences or normalizing chromosome segment sequences; and (iii) selecting the normalizing chromosome sequence or normalizing chromosome segment sequence, either alone or in combination, to provide minimal variability or maximum resolvability in the calculated chromosome doses or chromosome segment doses. The normalizing chromosome sequence can be a single chromosome of any one or more of chromosomes 1 to 22, X, and Y. Alternatively, the normalizing chromosome sequence can be a group of chromosomes of any one of chromosomes 1 to 22, X, and Y. Similarly, the normalizing segment sequence can be a single segment of any one or more of chromosomes 1 to 22, X, and Y. Alternatively, the normalizing segment sequence can be a set of segments of any one or more of chromosomes 1 to 22, X, and Y.
在某些实施方案中,所述确定胎儿分数的方法还可以包括将如所述所获得的胎儿分数与可以使用来自展现该母体测试样品的这些胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息所确定的胎儿分数进行比较。用于确定等位基因不平衡的方法在本申请的其他地方进行描述,并且包括使用胎儿与母体基因组之间的多态差异(包括但不限于在SNP或STR序列中检测到的差异)确定胎儿分数。In certain embodiments, the method of determining fetal fraction can further comprise comparing the fetal fraction obtained as described with a fetal fraction that can be determined using information from one or more polymorphisms that exhibit allelic imbalance in the fetal and maternal nucleic acids of the maternal test sample. Methods for determining allelic imbalance are described elsewhere in this application and include determining fetal fraction using polymorphic differences between the fetal and maternal genomes, including but not limited to differences detected in SNP or STR sequences.
在某些实施方案中,该方法进一步包括至少暂时地存储序列读数。In certain embodiments, the method further comprises at least temporarily storing the sequence reads.
提供了一种将胎儿基因组中的拷贝数变异分类的额外方法。该额外的方法包括:(a)获得来自一个母体测试样品中的胎儿和母体核酸的序列读数;(b) 将这些序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;(c)识别出来自一个或多个感兴趣的染色体的这些序列标签的数目,并且确定该胎儿中的一个第一感兴趣的染色体带有一种拷贝数变异;(d)通过一种第一方法来计算一个第一胎儿分数值,该第一方法不使用来自该第一感兴趣的染色体的这些标签的信息;(e)通过一种第二方法来计算一个第二胎儿分数值,该第二方法使用来自该第一染色体的这些标签的信息;并且(f)将该第一胎儿分数值与该第二胎儿分数值进行比较并且使用该比较对该第一染色体的该拷贝数变异进行分类。An additional method for classifying copy number variation in a fetal genome is provided. The additional method includes: (a) obtaining sequence reads of fetal and maternal nucleic acids from a maternal test sample; (b) aligning the sequence reads to one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads; (c) identifying the number of the sequence tags from one or more chromosomes of interest and determining that a first chromosome of interest in the fetus carries a copy number variation; (d) calculating a first fetal fraction value by a first method that does not use information from the tags of the first chromosome of interest; (e) calculating a second fetal fraction value by a second method that uses information from the tags of the first chromosome; and (f) comparing the first fetal fraction value to the second fetal fraction value and using the comparison to classify the copy number variation of the first chromosome.
在某些实施方案中,如该额外的方法的步骤(d)中所述计算胎儿分数值的第一方法包括:使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值;如该额外的方法的步骤(e)中所述计算胎儿分数值的第二方法包括:(a)计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签的数目以确定染色体剂量;并且(b)使用该第二方法从该染色体剂量计算该胎儿分数值。In certain embodiments, the first method of calculating a fetal fraction value as described in step (d) of the additional method comprises: using information from one or more polymorphisms that exhibit allelic imbalance in fetal and maternal nucleic acids of the maternal test sample to calculate the first fetal fraction value; the second method of calculating a fetal fraction value as described in step (e) of the additional method comprises: (a) counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) calculating the fetal fraction value from the chromosome dose using the second method.
在某些实施方案中,该第一方法使用的信息包括通过对预先确定的多态序列进行 测序获得的序列标签,所述多态序列的每一个包括所述一个或多个多态位点。在某些实施方案中,该第一方法使用的信息是通过非测序方法获得的,例如通过qPCR、数字PCR、质谱测定法、或毛细管凝胶电泳等方法获得。In certain embodiments, the information used in the first method comprises sequence tags obtained by sequencing predetermined polymorphic sequences , each of which comprises the one or more polymorphic sites. In certain embodiments, the information used in the first method is obtained by a non-sequencing method, such as qPCR, digital PCR, mass spectrometry, or capillary gel electrophoresis.
在某些实施方案中,该第一方法包括使用来自于不具有拷贝数变异的染色体或染色体区段的标签计算该第一胎儿分数值。举例来说,当该第一感兴趣的染色体是染色体21时,可以将使用来自于染色体21的序列标签所确定的胎儿分数与根据来自于男性胎儿中的染色体X的序列标签所确定的胎儿分数进行比较。已知以非整倍性状态出现或者通过在此描述的任何方法确定了不是非整倍体(例如通过计算其NCV或NSV来确定)的任何染色体或染色体区段都可以用于确定第一胎儿分数。In certain embodiments, the first method includes calculating the first fetal fraction using tags from a chromosome or chromosome segment that does not have a copy number variation. For example, when the first chromosome of interest is chromosome 21, the fetal fraction determined using sequence tags from chromosome 21 can be compared with the fetal fraction determined based on sequence tags from chromosome X in a male fetus. Any chromosome or chromosome segment known to occur in an aneuploid state or determined not to be aneuploid by any of the methods described herein (e.g., by calculating its NCV or NSV) can be used to determine the first fetal fraction.
在某些实施方案中,步骤(e)中第二方法确定的染色体或区段剂量计算为针对所述所选定的感兴趣的染色体或区段所识别的序列标签的数目与针对所选定的感兴趣的染色体或区段的相应的至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签的数目的比率。在某些实施方案中,步骤(e) 中确定的所述染色体剂量或区段剂量计算为所述选定的感兴趣的染色体或区段的序列标签密度比与每个所述选定的感兴趣的染色体或区段的至少一个相应归一化染色体序列或归一化染色体区段序列的序列标签密度比的比率。In certain embodiments, the chromosome or segment dose determined by the second method in step (e) is calculated as the ratio of the number of sequence tags identified for the selected chromosome or segment of interest to the number of sequence tags identified for the corresponding at least one normalizing chromosome sequence or normalizing chromosome segment sequence of the selected chromosome or segment of interest. In certain embodiments, the chromosome dose or segment dose determined in step (e) is calculated as the ratio of the sequence tag density ratio of the selected chromosome or segment of interest to the sequence tag density ratio of at least one corresponding normalizing chromosome sequence or normalizing chromosome segment sequence of each selected chromosome of interest or segment of interest.
该额外的方法的某些实施方案进一步包括计算一个归一化的染色体值 (NCV),其中该第二方法使用该归一化的染色体值,并且其中计算该NCV 将该染色体剂量与在一组合格样品中的相应染色体剂量的均值相关联,作为:Certain embodiments of the additional method further comprise calculating a normalized chromosome value (NCV), wherein the second method uses the normalized chromosome value, and wherein calculating the NCV relates the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU对应地是对于在该组合格样品中的第i个染色体剂量的估算平均值以及标准差,并且RiA是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述感兴趣的染色体。where σ and σ iU are the estimated mean and standard deviation of the dose for the i-th chromosome in the set of qualified samples, respectively, and R iA is the chromosome dose calculated for the i-th chromosome in the test sample, where the i-th chromosome is the chromosome of interest.
在某些实施方案中,计算该胎儿分数值的该第二方法包括对以下表达式求值:In certain embodiments, the second method of calculating the fetal fraction value comprises evaluating the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NSViA是在一个受影响样品或测试样品中在第i 个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的第i 个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体。Wherein ff is the fetal fraction value, NSV iA is the normalized chromosome value on the i-th chromosome in an affected sample or a test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, wherein the i-th chromosome is the chromosome of interest.
在某些实施方案中,所述计算胎儿分数的第一方法包括(a)计算来自所述非所述第一感兴趣染色体的染色体和至少一个归一化染色体序列的序列标签数目,以确定该非所述第一感兴趣染色体的染色体的染色体剂量;以及(b)通过该第一方法从该染色体剂量计算该第一胎儿分数值;所述第二方法包括:(a) 计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签数目以确定一个染色体剂量;以及(b)通过该第二方法从该染色体剂量计算该第二胎儿分数值。In certain embodiments, the first method for calculating the fetal fraction comprises (a) counting the number of sequence tags from the chromosome other than the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose for the chromosome other than the first chromosome of interest; and (b) calculating the first fetal fraction value from the chromosome dose by the first method; the second method comprises: (a) counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) calculating the second fetal fraction value from the chromosome dose by the second method.
优选地,染色体或区段剂量计算为针对所述所选定的感兴趣的染色体或区段所识别的序列标签的数目与针对所选定的感兴趣的染色体或区段的相应的至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签的数目的比率;或者,染色体剂量或区段剂量计算为所述选定的感兴趣的染色体或区段的序列标签密度比与每个所述选定的感兴趣的染色体或区段的至少一个相应归一化染色体序列或归一化染色体区段序列的序列标签密度比的比率。Preferably, the chromosome or segment dose is calculated as the ratio of the number of sequence tags identified for the selected chromosome or segment of interest to the number of sequence tags identified for the corresponding at least one normalizing chromosome sequence or normalizing chromosome segment sequence of the selected chromosome or segment of interest; or, the chromosome dose or segment dose is calculated as the ratio of the sequence tag density ratio of the selected chromosome or segment of interest to the sequence tag density ratio of at least one corresponding normalizing chromosome sequence or normalizing chromosome segment sequence of each of the selected chromosomes of interest or segments.
优选地,该用于分类拷贝数变异的额外的方法还包括计算相应的归一化染色体值(NCV),并且第一方法和第二方法使用相应的NCV。计算NCV将确定的染色体剂量与一组合格样品中的相应染色体剂量的平均值相关联,作为:Preferably, the additional method for classifying copy number variation further comprises calculating a corresponding normalized chromosome value (NCV), and the first method and the second method use the corresponding NCV. Calculating the NCV relates the determined chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU分别是对于该组合格样品中第i个染色体的剂量的估算平均值和标准差,并且RiA是计算的测试样品中第i个染色体的剂量。第一方法和第二方法可以使用NCV计算胎儿分数,通过以下表达式求值:Where σ and σ iU are the estimated mean and standard deviation of the dose of chromosome i in the set of qualified samples, respectively, and R iA is the calculated dose of chromosome i in the test sample. The first and second methods can use NCV to calculate fetal fraction, evaluated by the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是所述测试样品中在第i个染色体上的归一化的染色体值,并且CViU是所述合格样品中第i个染色体的剂量的变异系数。在上述公式中,对于第一种方法,所述第i个染色体不是所述第一感兴趣的染色体;对于用于该第二方法,所述第i个染色体是所述第一感兴趣的染色体。Wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome in the qualified samples. In the above formula, for the first method, the i-th chromosome is not the first chromosome of interest; for the second method, the i-th chromosome is the first chromosome of interest.
该第一感兴趣的染色体选自下组,该组由染色体1到22、X和Y组成。所述非所述第一感兴趣染色体的染色体可以是染色体1到22任意一个,或当胎儿是男性时是X染色体。The first chromosome of interest is selected from the group consisting of chromosomes 1 to 22, X, and Y. The chromosome other than the first chromosome of interest may be any one of chromosomes 1 to 22, or an X chromosome when the fetus is male.
在某些实施方案中,步骤(f)包括确定这两个胎儿分数值是否近似相等。在某些实施方案中,步骤(f)进一步包括:在这两个胎儿分数值近似相等时确定该第二方法中暗含的一种倍数性假设是真实的。第二方法中暗含的该倍数性假设可以是该第一感兴趣的染色体具有一种完整染色体非整倍性。例如,第一感兴趣的染色体的完整染色体非整倍性是一种单体性或一种三体性。In some embodiments, step (f) comprises determining whether the two fetal fraction values are approximately equal. In some embodiments, step (f) further comprises determining that a ploidy assumption implicit in the second method is true when the two fetal fraction values are approximately equal. The ploidy assumption implicit in the second method can be that the first chromosome of interest has a complete chromosomal aneuploidy. For example, the complete chromosomal aneuploidy of the first chromosome of interest is a monosomy or a trisomy.
在某些实施方案中,用于分类拷贝数变异的额外方法进一步包括一个步骤(g):分析该第一感兴趣的染色体的该标签信息,以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii)在这两个胎儿分数值不近似相等时,该胎儿是一个嵌合体。In certain embodiments, the additional method for classifying copy number variation further comprises a step (g): analyzing the tag information for the first chromosome of interest to determine whether (i) the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic when the two fetal fraction values are not approximately equal.
在某些实施方案中,其中所述第一方法包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值,所述多态性存在于非所述第一感兴趣染色体的染色体;和所述第二方法包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第二胎儿分数值,所述多态性存在于所述第一感兴趣的染色体。用于比较的步骤(f)可以包括:当所述第二胎儿分数值与第一胎儿分数值的比率近似为1时确定所述第一感兴趣的染色体为二倍体;当所述第二胎儿分数值与第一胎儿分数值的比率近似为1.5时确定所述第一感兴趣的染色体为三倍体;和,当所述第二胎儿分数值与第一胎儿分数值的比率近似为0.5时确定所述第一感兴趣的染色体为单倍体。用于分类拷贝数变异的额外方法还可以进一步包括当第二胎儿分数值与第一胎儿分数值的比率不是近似为1、1.5或0.5时,分析所述第一感兴趣的染色体的标签信息的步骤(g),以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii) 该胎儿是一个嵌合体。In certain embodiments, wherein the first method comprises calculating the first fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids from the maternal test sample, the polymorphism being present in a chromosome other than the first chromosome of interest; and the second method comprises calculating the second fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids from the maternal test sample, the polymorphism being present in the first chromosome of interest. The comparing step (f) can comprise: determining that the first chromosome of interest is diploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 1; determining that the first chromosome of interest is triploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 1.5; and determining that the first chromosome of interest is haploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 0.5. Additional methods for classifying copy number variation may further include the step (g) of analyzing the tag information for the first chromosome of interest to determine whether (i) the first chromosome of interest has a partial aneuploidy, or (ii) the fetus is a mosaic, when the ratio of the second fetal fraction value to the first fetal fraction value is not approximately 1, 1.5, or 0.5.
在某些实施方案中,利用多态性的第一方法和第二方法使用的信息包括通过对预先确定的多态序列进行测序获得的序列标签,所述多态序列的每一个包括所述一个或多个多态位点。或者,利用多态性的第一方法和第二方法使用的信息不是通过测序方法获得,例如是通过qPCR、数字PCR、质谱测定法、或毛细管凝胶电泳等非测序方法获得的。In certain embodiments, the information used by the first and second methods of utilizing polymorphisms includes sequence tags obtained by sequencing predetermined polymorphic sequences, each of which includes the one or more polymorphic sites. Alternatively, the information used by the first and second methods of utilizing polymorphisms is not obtained by sequencing, for example, by non-sequencing methods such as qPCR, digital PCR, mass spectrometry, or capillary gel electrophoresis.
在某些实施方案中,分析第一感兴趣的染色体的标签信息的步骤(g)包括: (a)将该第一感兴趣的染色体的该序列装箱成多个部分;(b)确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸;并且, (c)在与一个或多个其他部分相比,如果所述部分的任何一个含有显著更多或显著更少的核酸时,确定该第一感兴趣的染色体带有一种部分非整倍性;或者在与一个或多个其他部分相比,如果所述部分都没有包含显著更多或显著更少的核酸时,确定该胎儿是一个嵌合体。因此,该额外的方法可以进一步包括确定包含比一个或多个其他部分显著更多或显著更少的核酸的第一感兴趣的染色体的一个部分带有部分非整倍性。In certain embodiments, step (g) of analyzing the tag information for the first chromosome of interest comprises: (a) binning the sequence of the first chromosome of interest into a plurality of portions; (b) determining whether any of the portions contains significantly more or significantly less nucleic acid than one or more other portions; and, (c) determining that the first chromosome of interest carries a partial aneuploidy if any of the portions contains significantly more or significantly less nucleic acid compared to one or more other portions; or determining that the fetus is a mosaic if none of the portions contains significantly more or significantly less nucleic acid compared to one or more other portions. Thus, the additional method can further comprise determining that a portion of the first chromosome of interest that contains significantly more or significantly less nucleic acid than one or more other portions carries a partial aneuploidy.
用于将拷贝数变异进行分类的该方法的步骤(f)包括将该拷贝数变异分类成选自下组的一个类别,该组由以下各项组成:完整染色体复制或倍增、完整染色体缺失、部分染色体复制、以及部分染色体缺失、以及嵌合体。Step (f) of the method for classifying the copy number variation comprises classifying the copy number variation into a category selected from the group consisting of complete chromosome duplication or multiplication, complete chromosome deletion, partial chromosome duplication, and partial chromosome deletion, and mosaicism.
在将第一胎儿分数值与第二胎儿分数值进行比较的步骤(f)确定第一胎儿分数值与第二胎儿分数值不近似相等的实施方案中,该方法进一步包括:In embodiments where step (f) of comparing the first fetal fraction value to the second fetal fraction value determines that the first fetal fraction value is not approximately equal to the second fetal fraction value, the method further comprises:
(i)确定该拷贝数变异是由部分非整倍性还是嵌合体引起;并且(i) determining whether the copy number variation is caused by partial aneuploidy or mosaicism; and
(ii)当该拷贝数变异由部分非整倍性引起时,确定在该第一感兴趣的染色体上的部分非整倍性的基因座。(ii) when the copy number variation is caused by partial aneuploidy, determining the locus of the partial aneuploidy on the first chromosome of interest.
在某些实施方案中,确定在该第一感兴趣的染色体上的部分非整倍性的基因座包括将该第一感兴趣的染色体的这些序列标签分成该第一感兴趣的染色体中的核酸箱或基块;并且对每一个箱中的这些映射标签进行计数。In certain embodiments, determining the locus of the partial aneuploidy on the first chromosome of interest comprises dividing the sequence tags of the first chromosome of interest into bins or blocks of nucleic acids in the first chromosome of interest; and counting the mapped tags in each bin.
在某些实施方案中,(b)中比对的步骤包括比对至少约1百万个读数。In certain embodiments, the step of aligning in (b) comprises aligning at least about 1 million reads.
在此描述的任何方法都可以进一步包括对母体测试样品中的胎儿和母体核酸(例如无细胞DNA)进行测序以获得序列读数。对来自母体测试样品的母体和胎儿核酸进行测序以产生序列读数包括大规模平行测序。在某些实施方案中,大规模平行测序是合成法测序。合成法测序可以使用可逆染料终止子实现。在其他实施方案中,大规模平行测序是连接法测序。在另外的其他实施方案中,大规模平行测序是单分子测序。Any method described herein can further include sequencing the fetus and maternal nucleic acid (e.g., cell-free DNA) in the maternal test sample to obtain sequence readings. Sequencing the maternal and fetal nucleic acid from the maternal test sample to produce sequence readings includes massive parallel sequencing. In certain embodiments, massive parallel sequencing is sequencing by synthesis. Sequencing by synthesis can be achieved using reversible dye terminators. In other embodiments, massive parallel sequencing is sequencing by ligation. In other embodiments, massive parallel sequencing is single molecule sequencing.
可以根据在此描述的方法用于确定胎儿分数的母体样品包括血液、血浆、血清或尿样品。在某些实施方案中,母体样品是血浆样品。在其他实施方案中,母体样品是全血样品。Maternal samples that can be used to determine fetal fraction according to the methods described herein include blood, plasma, serum, or urine samples. In certain embodiments, the maternal sample is a plasma sample. In other embodiments, the maternal sample is a whole blood sample.
还提供了多个不同的设备,包括用于对样品进行医学分析(例如母体样品)的设备,并且这些设备用以执行上述方法的多个步骤,例如单独地用于确定拷贝数变异,用于确定胎儿分数,或用于将拷贝数变异进行分类。Also provided are a plurality of different apparatuses, including apparatuses for performing medical analysis on samples (e.g., maternal samples), and for performing the various steps of the above methods, e.g., solely for determining copy number variation, for determining fetal fraction, or for classifying copy number variation.
还提供了试剂盒,这些试剂盒包括可以单独地或在与用于确定两个基因组中的一个对来源于该两个基因组的核酸的混合物的影响(例如母体样品中的胎儿分数)的方法组合中用于确定拷贝数变异的试剂。这些试剂盒可以与在此描述的设备结合使用。Also provided are kits comprising reagents that can be used alone or in combination with methods for determining the effect of one of two genomes on a mixture of nucleic acids derived from the two genomes (e.g., fetal fraction in a maternal sample) for determining copy number variation. These kits can be used in conjunction with the devices described herein.
虽然在此这些实例涉及人类并且这些措辞主要是针对人类问题,在此描述的概念也适用于来自任何植物或动物的基因组。Although the examples herein relate to humans and the language is primarily directed to human problems, the concepts described herein are applicable to genomes from any plant or animal.
本发明提供以下技术方案:The present invention provides the following technical solutions:
1.一种医学分析设备,用于确定包含胎儿和母体核酸的混合物的母体测试样品中的胎儿分数,所述设备包括:1. A medical analysis apparatus for determining fetal fraction in a maternal test sample comprising a mixture of fetal and maternal nucleic acids, the apparatus comprising:
(a)一个装置,用于接收来自所述母体测试样品中的所述胎儿和母体核酸多个序列读数;(a) a device for receiving a plurality of sequence reads of the fetal and maternal nucleic acids from the maternal test sample;
(b)一个装置,用于将所述多个序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相应的多个序列标签;(b) an apparatus for aligning the plurality of sequence reads to one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads;
(c)一个装置,用于识别来自一个或多个感兴趣的染色体或感兴趣的染色体区段的那些序列标签的一个数目,这些染色体或染色体区段选自染色体1‐22、X和Y及其区段,并且用于针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个,识别来自至少一个归一化染色体序列或归一化染色体区段序列的那些序列标签的一个数目,以确定一个染色体剂量或染色体区段剂量,(c) a means for identifying a number of sequence tags from one or more chromosomes of interest or chromosome segments of interest, the chromosomes or chromosome segments being selected from chromosomes 1-22, X, and Y, and segments thereof, and for identifying, for each of the one or more chromosomes of interest or chromosome segments of interest, a number of sequence tags from at least one normalizing chromosome sequence or normalizing chromosome segment sequence to determine a chromosome dose or chromosome segment dose,
其中,所述感兴趣的染色体或感兴趣的染色体区段具有拷贝数变异,其中所述拷贝数变异是通过将所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的所述染色体剂量与针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的一个相应阈值进行比较来确定的;wherein the chromosome of interest or chromosome segment of interest has a copy number variation, wherein the copy number variation is determined by comparing the chromosome dose of each of the one or more chromosomes of interest or chromosome segments of interest to a corresponding threshold value for each of the one or more chromosomes of interest or chromosome segments of interest;
(d)一个装置,用于使用所述感兴趣的染色体的剂量或所述感兴趣的染色体区段的剂量来确定所述胎儿分数;以及(d) a means for determining the fetal fraction using the dose of the chromosome of interest or the dose of the chromosome segment of interest; and
(e)一个装置,该装置用于计算归一化的染色体值或归一化的区段值,其中计算该归一化的染色体值将所述染色体剂量与在一组合格样品中的相应染色体剂量的平均值相关联,作为:(e) a device for calculating a normalized chromosome value or a normalized segment value, wherein calculating the normalized chromosome value relates the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中NCViA是在所述测试样品中在第i个染色体上的归一化的染色体值,和σiU分别是对于该组合格样品中第i个染色体剂量的估算平均值和标准差,并且RiA是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述感兴趣的染色体;Wherein NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and σ iU are the estimated mean and standard deviation of the i-th chromosome dose in the set of qualified samples, respectively, and R iA is the chromosome dose calculated for the i-th chromosome in the test sample, wherein the i-th chromosome is the chromosome of interest;
其中计算该归一化的区段值将所述染色体区段剂量与在一组合格样品中的相应染色体区段剂量的平均值相关联,作为wherein the normalized segment value is calculated to relate the chromosome segment dose to the mean of the corresponding chromosome segment dose in a set of qualified samples as
其中NSViA是在所述测试样品中在第i个染色体区段上的归一化的染色体区段值,和σiU分别是对于该组合格样品中第i个染色体区段剂量的估算平均值和标准差,并且RiA是针对测试样品中第i个染色体区段计算出的染色体区段剂量,其中所述第i个染色体区段是所述感兴趣的染色体区段;wherein NSV iA is the normalized chromosome segment value on the i-th chromosome segment in the test sample, and σ iU are the estimated mean and standard deviation of the i-th chromosome segment dose in the set of qualified samples, respectively, and R iA is the calculated chromosome segment dose for the i-th chromosome segment in the test sample, wherein the i-th chromosome segment is the chromosome segment of interest;
其中该装置(d)根据以下表达式确定所述胎儿分数:wherein the means (d) determines the fetal fraction according to the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是在所述测试样品中在第i个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的第i个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体;或wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, wherein the i-th chromosome is the chromosome of interest; or
其中该装置(d)根据以下表达式确定所述胎儿分数;wherein the means (d) determines the fetal fraction according to the following expression;
ff=2×|NsViACViU|ff=2×|NsV iA CV iU |
其中ff是胎儿分数值,NSViA是在所述测试样品中在第i个染色体区段上的归一化的染色体区段值,并且CViU是在所述合格样品中确定的第i个染色体区段的剂量的变异系数,其中所述第i个染色体区段是所述感兴趣的染色体区段。Wherein ff is the fetal fraction value, NSV iA is the normalized chromosome segment value on the i-th chromosome segment in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome segment determined in the qualified samples, wherein the i-th chromosome segment is the chromosome segment of interest.
2.如实施方案1所述的设备,其中通过装置(c)确定的所述染色体剂量或区段剂量是作为针对所述感兴趣的染色体或区段识别的序列标签数目与针对所述感兴趣的染色体或区段的至少一个相应的归一化染色体序列或归一化染色体区段序列识别的序列标签数目的比率来计算的;或者其中通过装置(c)确定的所述染色体剂量或区段剂量是作为所述感兴趣的染色体或区段的序列标签密度比与该归一化染色体序列或归一化染色体区段序列的序列标签密度比的比率来计算的。2. The apparatus of embodiment 1, wherein the chromosome dose or segment dose determined by means (c) is calculated as a ratio of the number of sequence tags identified for the chromosome or segment of interest to the number of sequence tags identified for at least one corresponding normalizing chromosome sequence or normalizing chromosome segment sequence of the chromosome or segment of interest; or wherein the chromosome dose or segment dose determined by means (c) is calculated as a ratio of the sequence tag density ratio of the chromosome or segment of interest to the sequence tag density ratio of the normalizing chromosome sequence or normalizing chromosome segment sequence.
3.如实施方案1所述的设备,其中所述感兴趣的染色体是常染色体或者男性胎儿的X染色体,所述感兴趣的染色体区段选自常染色体或者男性胎儿的X染色体。3. The apparatus according to embodiment 1, wherein the chromosome of interest is an autosome or the X chromosome of a male fetus, and the chromosome segment of interest is selected from an autosome or the X chromosome of a male fetus.
4.如实施方案1所述的设备,其中至少一个归一化染色体序列或归一化染色体区段序列是针对一个相关联的感兴趣的染色体或区段所选定的一个染色体或区段,这是通过以下方式进行的:(i)识别针对该感兴趣的染色体或区段的多个合格样品;(ii)使用多个潜在的归一化染色体序列或归一化染色体区段序列来针对该选定的染色体或区段重复计算染色体剂量或染色体区段剂量;并且(iii)单独地或在一个组合中对该归一化染色体序列或归一化染色体区段序列进行选择,从而在所计算的染色体剂量或染色体区段剂量中给出最小的变异性和/或最大的可分辨性。4. An apparatus as described in embodiment 1, wherein at least one normalizing chromosome sequence or normalizing chromosome segment sequence is a chromosome or segment selected for an associated chromosome or segment of interest by: (i) identifying multiple qualified samples for the chromosome or segment of interest; (ii) using multiple potential normalizing chromosome sequences or normalizing chromosome segment sequences to repeatedly calculate chromosome doses or chromosome segment doses for the selected chromosome or segment; and (iii) selecting the normalizing chromosome sequence or normalizing chromosome segment sequence, alone or in combination, to give minimal variability and/or maximum distinguishability in the calculated chromosome doses or chromosome segment doses.
5.如实施方案1所述的设备,其中所述归一化染色体序列是选自染色体1-22、X、和Y中任意一个或多个的单个染色体或一组染色体。5. The apparatus of embodiment 1, wherein the normalizing chromosome sequence is a single chromosome or a group of chromosomes selected from any one or more of chromosomes 1-22, X, and Y.
6.如实施方案1所述的设备,其中所述归一化区段序列是来自染色体1-22、X、和Y中任意一个或多个的单个区段或一组区段。6. The apparatus of embodiment 1, wherein the normalizing segment sequence is a single segment or a group of segments from any one or more of chromosomes 1-22, X, and Y.
7.如实施方案1所述的设备,其中所述拷贝数变异是选自下组,该组由以下各项组成:完整染色体复制、完整染色体缺失、部分复制、部分倍增、部分插入和部分缺失。7. The apparatus of embodiment 1, wherein the copy number variation is selected from the group consisting of complete chromosome duplication, complete chromosome deletion, partial duplication, partial doubling, partial insertion, and partial deletion.
8.如实施方案1所述的设备,进一步包括一个装置,该装置用于将使用染色体剂量或染色体区段剂量确定的所述胎儿分数与使用来自母体测试样品的胎儿和母体核酸中表现出等位基因不平衡的存在于非所述感兴趣染色体的染色体的一个或多个多态性的信息确定的胎儿分数进行比较。8. The apparatus of embodiment 1, further comprising a device for comparing the fetal fraction determined using the chromosome dose or chromosome segment dose with a fetal fraction determined using information on one or more polymorphisms present in a chromosome other than the chromosome of interest that exhibit allelic imbalance in fetal and maternal nucleic acids from a maternal test sample.
9.如实施方案1所述的设备,其中装置(b)中的该比对包括比对至少一百万个读数。9. The apparatus of embodiment 1, wherein the aligning in means (b) comprises aligning at least one million reads.
10.如实施方案1所述的设备,进一步包括一个测序仪,该测序仪被配置为对所述母体测试样品中的胎儿和母体核酸进行测序,以获得这些序列读数。10. The apparatus of embodiment 1, further comprising a sequencer configured to sequence the fetal and maternal nucleic acids in the maternal test sample to obtain the sequence reads.
11.如实施方案10所述的设备,其中该测序包括对来自该母体测试样品的无细胞DNA进行测序以提供这些序列读数。11. The apparatus of embodiment 10, wherein the sequencing comprises sequencing cell-free DNA from the maternal test sample to provide the sequence reads.
12.如实施方案10所述的设备,其中所述测序包括对来自该母体测试样品的这些母体和胎儿核酸进行大规模平行测序以产生这些序列读数。12. The apparatus of embodiment 10, wherein the sequencing comprises massively parallel sequencing of the maternal and fetal nucleic acids from the maternal test sample to generate the sequence reads.
13.如实施方案12所述的设备,其中所述大规模平行测序是合成法测序。13. The apparatus of embodiment 12, wherein the massively parallel sequencing is sequencing by synthesis.
14.如实施方案13所述的设备,其中所述合成法测序使用可逆染料终止子。14. The apparatus of embodiment 13, wherein the sequencing-by-synthesis uses a reversible dye terminator.
15.如实施方案12所述的设备,其中所述大规模平行测序是连接法测序。15. The apparatus of embodiment 12, wherein the massively parallel sequencing is sequencing by ligation.
16.如实施方案12所述的设备,其中所述大规模平行测序是单分子测序。16. The apparatus of embodiment 12, wherein the massively parallel sequencing is single molecule sequencing.
17.如实施方案1所述的设备,进一步包括用于从一个怀孕生物体获得所述母体测试样品的装置。17. The apparatus of embodiment 1, further comprising means for obtaining said maternal test sample from a pregnant organism.
18.如实施方案1所述的设备,其中所述母体样品是一种血液、血浆、血清、或尿样品。18. The apparatus of embodiment 1, wherein the maternal sample is a blood, plasma, serum, or urine sample.
19.一种用于对胎儿基因组中的拷贝数变异进行分类的医学分析设备,该设备包括:19. A medical analysis device for classifying copy number variation in a fetal genome, the device comprising:
(1)一个装置,用于从一个母体测试样品中的胎儿和母体核酸接收多个序列读数;(1) a device for receiving a plurality of sequence reads from fetal and maternal nucleic acids in a maternal test sample;
(2)一个装置,用于将这些序列读数与一个或多个染色体参考序列进行比对并且由此提供与这些序列读数相对应的多个序列标签;(2) a device for aligning the sequence reads with one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads;
(3)一个装置,用于识别来自一个或多个感兴趣的染色体的那些序列标签的一个数目,并且确定该胎儿中的一个第一感兴趣的染色体带有一种拷贝数变异;(3) a device for identifying a number of sequence tags from one or more chromosomes of interest and determining that a first chromosome of interest in the fetus carries a copy number variation;
(4)一个装置,用于通过一种第一方法来计算一个第一胎儿分数值,该第一方法不使用来自该第一感兴趣的染色体的这些标签的信息;(4) a means for calculating a first fetal fraction value by a first method that does not use information from the labels of the first chromosome of interest;
(5)一个装置,用于通过一种第二方法来计算一个第二胎儿分数值,该第二方法使用来自该第一感兴趣染色体的这些标签的信息,其中所述装置包括计算归一化的染色体值的一个组件,还包括使用该归一化的染色体值的一个组件,该归一化的染色体值的计算将第一感兴趣染色体的染色体剂量与在一组合格样品中的相应染色体剂量的平均值相关联,作为:(5) an apparatus for calculating a second fetal fraction value by a second method that uses information from the labels of the first chromosome of interest, wherein the apparatus includes a component for calculating a normalized chromosome value and a component for using the normalized chromosome value, the calculation of the normalized chromosome value relating the chromosome dose for the first chromosome of interest to the average of the corresponding chromosome doses in a set of qualified samples as:
其中NCViA是在所述测试样品中在第i个染色体上的归一化的染色体值,和σiU分别是对于该组合格样品中第i个染色体的剂量的估算平均值和标准差,并且RiA是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述第一感兴趣的染色体,并且wherein NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and σ iU are the estimated mean and standard deviation of the dose for the i-th chromosome in the set of qualified samples, respectively, and R iA is the calculated chromosome dose for the i-th chromosome in the test sample, wherein the i-th chromosome is the first chromosome of interest, and
(6)一个装置,用于将该第一胎儿分数值与该第二胎儿分数值进行比较,并且使用该比较对该第一感兴趣染色体的拷贝数变异进行分类;(6) a device for comparing the first fetal fraction value with the second fetal fraction value and classifying the copy number variation of the first chromosome of interest using the comparison;
其中所述第一方法的装置(4)中计算所述第一胎儿分数值的组件通过以下表达式求值:The component for calculating the first fetal fraction value in the apparatus (4) of the first method is evaluated by the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是在所述测试样品中在第i个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的第i个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体;或wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, wherein the i-th chromosome is the chromosome of interest; or
其中所述第一方法的装置(4)中计算所述第一胎儿分数值的组件通过以下表达式求值;The component for calculating the first fetal fraction value in the apparatus (4) of the first method is evaluated by the following expression:
ff=2×|NSViACViU|ff=2×|NSV iA CV iU |
其中ff是胎儿分数值,NSViA是在所述测试样品中在第i个染色体区段上的归一化的染色体区段值,并且CViU是在所述合格样品中确定的第i个染色体区段的剂量的变异系数,其中所述第i个染色体区段是所述感兴趣的染色体区段;wherein ff is the fetal fraction value, NSV iA is the normalized chromosome segment value on the i-th chromosome segment in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome segment determined in the qualified samples, wherein the i-th chromosome segment is the chromosome segment of interest;
其中所述第二方法的装置(5)中计算所述第二胎儿分数值的组件通过以下表达式求值:The component for calculating the second fetal fraction value in the apparatus (5) of the second method is evaluated by the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是该第二胎儿分数值,NCViA是所述测试样品中在第i个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的所述第i个染色体的剂量的变异系数,其中所述第i个染色体是所述第一感兴趣的染色体。Wherein ff is the second fetal fraction value, NCV iA is the normalized chromosome value on the i-th chromosome in the test sample, and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, wherein the i-th chromosome is the first chromosome of interest.
20.如实施方案19所述的设备,其中该第一方法的装置(4)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值的一个组件,所述多态性存在于非所述第一感兴趣染色体的染色体;和20. The apparatus of embodiment 19, wherein the means (4) of the first method comprises a component for calculating the first fetal fraction value using information from one or more polymorphisms that exhibit allelic imbalance in fetal and maternal nucleic acids of the maternal test sample, said polymorphisms being present on a chromosome other than the first chromosome of interest; and
其中该第二方法的装置(5)包括:The device (5) of the second method comprises:
(a)一个组件,用于计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签数目以确定一个染色体剂量;以及(a) a component for counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and
(b)一个组件,通过该第二方法由该染色体剂量计算该第二胎儿分数值。(b) a component for calculating the second fetal fraction value from the chromosome dose by the second method.
21.如实施方案20所述的设备,其中该第一方法的装置(4)使用的信息包括通过对预先确定的多态序列进行测序获得的序列标签,所述多态序列的每一个包括所述一个或多个多态位点。21. The apparatus according to embodiment 20, wherein the information used by the apparatus (4) of the first method comprises sequence tags obtained by sequencing predetermined polymorphic sequences, each of the polymorphic sequences comprising the one or more polymorphic sites.
22.如实施方案21所述的设备,其中该第一方法的装置(4)使用的信息是通过非测序方法获得的。22. The apparatus of embodiment 21, wherein the information used by the means (4) of the first method is obtained by a non-sequencing method.
23.如实施方案22所述的设备,其中所述方法是qPCR、数字PCR、质谱测定法、或毛细管凝胶电泳。23. The apparatus of embodiment 22, wherein the method is qPCR, digital PCR, mass spectrometry, or capillary gel electrophoresis.
24.如实施方案19所述的设备,其中所述第一方法的装置(4)包括:24. The apparatus according to embodiment 19, wherein the device (4) of the first method comprises:
(a)一个组件,用于计算来自非所述第一感兴趣染色体的染色体和至少一个归一化染色体序列的序列标签数目以确定一个染色体剂量;以及 (b)一个组件,用于通过该第一方法从该染色体剂量计算该第一胎儿分数值;和(a) a component for counting the number of sequence tags from chromosomes other than the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) a component for calculating the first fetal fraction value from the chromosome dose by the first method; and
其中所述第二方法的装置(5)包括:The device (5) of the second method comprises:
(a)一个组件,用于计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签数目以确定一个染色体剂量;以及(a) a component for counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and
(b)一个组件,用于通过该第二方法从该染色体剂量计算该第二胎儿分数值。(b) a component for calculating the second fetal fraction value from the chromosome dose by the second method.
25.如实施方案24所述的设备,其中所述第一方法的装置(4)和所述第二方法的装置(5)进一步分别包括用于计算归一化的染色体值的一个组件以及使用该归一化的染色体值的组件,其中计算归一化的染色体值是将计算的染色体剂量与一组合格样品中的相应染色体剂量的平均值相关联,25. The apparatus of embodiment 24, wherein the apparatus (4) of the first method and the apparatus (5) of the second method further comprise a component for calculating a normalized chromosome value and a component for using the normalized chromosome value, respectively, wherein calculating the normalized chromosome value is to relate the calculated chromosome dose to the average value of the corresponding chromosome dose in a group of qualified samples,
作为:As:
其中NCViA是在所述测试样品中在第i个染色体上的归一化的染色体值,和σiU分别是对于该组合格样品中第i个染色体的剂量的估算平均值和标准差,并且RiA是计算的测试样品中第i个染色体的剂量,where NCV iA is the normalized chromosome value on chromosome i in the test sample, σ iU and σ iU are the estimated mean and standard deviation of the dose for chromosome i in the set of qualified samples, respectively, and R iA is the calculated dose for chromosome i in the test sample,
其中in
对于该第一方法的装置(4),所述第i个染色体是所述非所述第一感兴趣染色体的染色体;For the device (4) of the first method, the i-th chromosome is a chromosome other than the first chromosome of interest;
对于该第二方法的装置(5),所述第i个染色体是所述第一感兴趣的染色体。For the device (5) of the second method, the i-th chromosome is the first chromosome of interest.
26.如实施方案25所述的设备,其中第一方法的装置(4)和第二方法的装置(5)的计算胎儿分数的组件通过以下表达式求值:26. The apparatus of embodiment 25, wherein the component for calculating the fetal fraction of the apparatus (4) of the first method and the apparatus (5) of the second method is evaluated by the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是所述测试样品中在第i个染色体上的归一化的染色体值,并且CViU是所述合格样品中第i个染色体的剂量的变异系数;Wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on chromosome i in the test sample, and CV iU is the coefficient of variation of the dose of chromosome i in the qualified samples;
其中in
对于用于该第一方法的装置(4),所述第i个染色体是所述非所述第一感兴趣染色体的染色体;For the device (4) used in the first method, the i-th chromosome is a chromosome other than the first chromosome of interest;
对于用于该第二方法的装置(5),所述第i个染色体是所述第一感兴趣的染色体。For the apparatus (5) used in the second method, the i-th chromosome is the first chromosome of interest.
27.如实施方案26所述的设备,其中当所述胎儿是男性时,所述非所述第一感兴趣染色体的染色体是X染色体。27. The apparatus of embodiment 26, wherein when the fetus is male, the chromosome other than the first chromosome of interest is an X chromosome.
28.如实施方案20或24所述的设备,其中该用于将该第一胎儿分数值与该第二胎儿分数值进行比较的装置(6)包括确定该两个胎儿分数值是否近似相等的组件。28. The apparatus of embodiment 20 or 24, wherein the means (6) for comparing the first fetal fraction value to the second fetal fraction value comprises a component for determining whether the two fetal fraction values are approximately equal.
29.如实施方案28所述的设备,其中该装置(6)进一步包含一个组件,用于在该两个胎儿分数值近似相等时确定该第二方法中暗含的一种倍数性假设是真实的。29. The apparatus of embodiment 28, wherein the means (6) further comprises a component for determining that a ploidy assumption implicit in the second method is true when the two fetal fraction values are approximately equal.
30.如实施方案29所述的设备,其中该第二方法中暗含的该倍数性假设是:该第一感兴趣的染色体具有一种完整染色体非整倍性。30. The apparatus of embodiment 29, wherein the ploidy assumption implicit in the second method is that the first chromosome of interest has a complete chromosomal aneuploidy.
31.如实施方案30所述的设备,其中该第一感兴趣的染色体的完整染色体非整倍性是一种单体性或一种三体性。31. The apparatus of embodiment 30, wherein the complete chromosomal aneuploidy of the first chromosome of interest is a monosomy or a trisomy.
32.如实施方案31所述的设备,进一步包括一个装置,该装置用于分析该第一感兴趣的染色体的标签信息以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii)该胎儿是一个嵌合体,其中用于分析该第一感兴趣的染色体的标签信息的装置被配置为在该用于将该第一胎儿分数值与该第二胎儿分数值进行比较的装置指示该两个胎儿分数值不近似相等时执行。32. The apparatus of embodiment 31, further comprising a device for analyzing the tag information of the first chromosome of interest to determine whether (i) the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic, wherein the device for analyzing the tag information of the first chromosome of interest is configured to be performed when the device for comparing the first fetal fraction value to the second fetal fraction value indicates that the two fetal fraction values are not approximately equal.
33.如实施方案19所述的设备,其中所述第一方法的装置(4)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值的一个组件,所述多态性存在于非所述第一感兴趣染色体的染色体;和33. The apparatus of embodiment 19, wherein the means (4) of the first method comprises a component for calculating the first fetal fraction value using information from one or more polymorphisms that exhibit allelic imbalance in fetal and maternal nucleic acids of the maternal test sample, the polymorphisms being present on a chromosome other than the first chromosome of interest; and
所述第二方法的装置(5)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第二胎儿分数值的一个组件,所述多态性存在于所述第一感兴趣的染色体。The apparatus (5) of the second method includes a component for calculating the second fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids of the maternal test sample, said polymorphisms being present on the first chromosome of interest.
34.如实施方案33所述的设备,其中用于比较的装置(6)包括:34. The apparatus according to embodiment 33, wherein the means for comparing (6) comprises:
一个组件,用于当所述第二胎儿分数值与第一胎儿分数值的比率近似为1时确定所述第一感兴趣的染色体为二倍体;a component for determining that the first chromosome of interest is diploid when a ratio of the second fetal fraction value to the first fetal fraction value is approximately 1;
一个组件,用于当所述第二胎儿分数值与第一胎儿分数值的比率近似为1.5时确定所述第一感兴趣的染色体为三倍体;和a component for determining that the first chromosome of interest is triploid when a ratio of the second fetal fraction value to the first fetal fraction value is approximately 1.5; and
一个组件,用于当所述第二胎儿分数值与第一胎儿分数值的比率近似为0.5时确定所述第一感兴趣的染色体为单倍体。A component is configured to determine that the first chromosome of interest is haploid when a ratio of the second fetal fraction value to the first fetal fraction value is approximately 0.5.
35.如实施方案34所述的设备,进一步包括分析所述第一感兴趣的染色体的标签信息的一个装置,以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii)该胎儿是一个嵌合体,其中分析该第一感兴趣的染色体的标签信息的装置被配置为在所述比较第一胎儿分数值与第二胎儿分数值的装置(6)指示第二胎儿分数值与第一胎儿分数值的比率不是近似为1、1.5或0.5时执行。35. The apparatus of embodiment 34, further comprising a means for analyzing the tag information of the first chromosome of interest to determine whether (i) the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic, wherein the means for analyzing the tag information of the first chromosome of interest is configured to be performed when the means (6) for comparing the first fetal fraction value to the second fetal fraction value indicates that the ratio of the second fetal fraction value to the first fetal fraction value is not approximately 1, 1.5, or 0.5.
36.如实施方案32或35所述的设备,其中该用于分析该第一感兴趣的染色体的标签信息的装置包括:36. The apparatus of embodiment 32 or 35, wherein the means for analyzing the tag information of the first chromosome of interest comprises:
(a)一个组件,用于将该第一感兴趣的染色体的序列装箱成多个部分;(a) a component for packaging the sequence of the first chromosome of interest into a plurality of parts;
(b)一个组件,用于确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸;以及(b) a means for determining whether any of the portions comprises significantly more or significantly less nucleic acid than one or more other portions; and
(c)一个组件,用于在与一个或多个其他部分相比如果所述部分任何一个含有显著更多或显著更少的核酸时,确定该第一感兴趣的染色体带有一种部分非整倍性;或者在与一个或多个其他部分相比如果所述部分都没有包含显著更多或显著更少的核酸时,确定该胎儿是一个嵌合体。(c) a component for determining that the first chromosome of interest has a partial aneuploidy if any of the portions contains significantly more or significantly less nucleic acid compared to one or more other portions; or determining that the fetus is a mosaic if none of the portions contains significantly more or significantly less nucleic acid compared to one or more other portions.
37.如实施方案36所述的设备,其中该组件(c)进一步确定,包含比一个或多个其他部分显著更多或显著更少的核酸的该第一感兴趣的染色体的一部分带有该部分非整倍性。37. The apparatus of embodiment 36, wherein component (c) further determines that a portion of the first chromosome of interest that comprises significantly more or significantly less nucleic acid than one or more other portions carries the partial aneuploidy.
38.如实施方案19所述的设备,其中该第一感兴趣的染色体是选自下组,该组由染色体1-22、X、和Y组成。38. The apparatus of embodiment 19, wherein the first chromosome of interest is selected from the group consisting of chromosomes 1-22, X, and Y.
39.如实施方案19所述的设备,其中该装置(6)包括用于将该拷贝数变异分类成选自下组的一个类别的组件,该组由以下各项组成:完整染色体插入、完整染色体缺失、部分染色体复制、以及部分染色体缺失、以及嵌合体。39. An apparatus as described in embodiment 19, wherein the device (6) includes a component for classifying the copy number variation into a category selected from the group consisting of: complete chromosome insertion, complete chromosome deletion, partial chromosome duplication, and partial chromosome deletion, and mosaicism.
40.如实施方案19所述的设备,进一步包括:40. The apparatus of embodiment 19, further comprising:
(i)用于确定该拷贝数变异是由一种部分非整倍性或是一个嵌合体引起的一个装置;以及(i) a means for determining whether the copy number variation is caused by a partial aneuploidy or a mosaicism; and
(ii)在如果该拷贝数变异由一种部分非整倍性引起时用于确定在该第一感兴趣的染色体上的该部分非整倍性的基因座的一个装置,(ii) if the copy number variation is caused by a partial aneuploidy, a means for determining the locus of the partial aneuploidy on the first chromosome of interest,
其中(i)和(ii)中的这些装置被配置为在用于将该第一胎儿分数值与该第二胎儿分数值进行比较的该装置确定该第一胎儿分数值与该第二胎儿分数值不近似相等时执行。Wherein the means in (i) and (ii) are configured to be performed when the means for comparing the first fetal fraction value with the second fetal fraction value determines that the first fetal fraction value and the second fetal fraction value are not approximately equal.
41.如实施方案40所述的设备,其中用于确定在该第一感兴趣的染色体上的该部分非整倍性的该基因座的装置包括用于将该第一感兴趣的染色体的这些序列标签分类进入该第一感兴趣的染色体中的核酸数据箱或基块的一个组件;以及用于对每一个箱中的映射标签进行计数的一个组件。41. An apparatus as described in embodiment 40, wherein the means for determining the locus of the partial aneuploidy on the first chromosome of interest comprises a component for sorting the sequence tags of the first chromosome of interest into nucleic acid data bins or blocks in the first chromosome of interest; and a component for counting the mapped tags in each bin.
42.如实施方案1或19所述的设备,其中所述母体测试样品是一种血液、血浆、血清、或尿样品。42. The apparatus of embodiment 1 or 19, wherein the maternal test sample is a blood, plasma, serum, or urine sample.
43.如实施方案1或19所述的设备,其中所述胎儿和母体核酸是无细胞DNA(cfDNA)。43. The apparatus of embodiment 1 or 19, wherein the fetal and maternal nucleic acid is cell-free DNA (cfDNA).
44.如实施方案1或19所述的设备,进一步包括一个测序仪,该测序仪被配置为用于对一个母体测试样品中的这些胎儿和母体核酸进行测序并且获得这些序列读数。44. The apparatus of embodiment 1 or 19, further comprising a sequencer configured to sequence the fetal and maternal nucleic acids in a maternal test sample and obtain the sequence reads.
45.如实施方案44所述的设备,其中所述测序仪被配置为用于进行合成法测序。45. The apparatus of embodiment 44, wherein the sequencer is configured to perform sequencing by synthesis.
46.如实施方案45所述的设备,其中所述测序仪被配置为使用可逆染料终止子进行合成法测序。46. An apparatus as described in embodiment 45, wherein the sequencer is configured to perform synthesis sequencing using reversible dye terminators.
47.如实施方案44所述的设备,其中所述测序仪被配置为用于进行连接法测序。47. An apparatus as described in embodiment 44, wherein the sequencer is configured to perform ligation sequencing.
48.如实施方案44所述的设备,其中所述测序仪被配置为用于进行单分子测序。48. The apparatus of embodiment 44, wherein the sequencer is configured to perform single molecule sequencing.
49.如实施方案44所述的设备,其中该测序仪和如实施方案1所述的设备的装置(a)-(d),或如实施方案19所述的设备的装置(1)-(6)位于分开的地点中并且通过一个网络而连接。49. An apparatus as described in embodiment 44, wherein the sequencer and devices (a)-(d) of the apparatus as described in embodiment 1, or devices (1)-(6) of the apparatus as described in embodiment 19 are located in separate locations and are connected by a network.
50.如实施方案44所述的设备,进一步包括一个用于从怀孕母亲获取该母体测试样品的装置。50. The apparatus of embodiment 44, further comprising a means for obtaining the maternal test sample from a pregnant mother.
51.如实施方案50所述的设备,其中用于获取该母体测试样品的装置和如实施方案1所述的设备的装置(a)-(d)或如实施方案19所述的设备的装置(1)-(6)位于分开的地点中。51. An apparatus as described in embodiment 50, wherein the means for obtaining the maternal test sample and means (a)-(d) of the apparatus as described in embodiment 1 or means (1)-(6) of the apparatus as described in embodiment 19 are located in separate locations.
52.如实施方案50所述的设备,进一步包括一个用于从该母体测试样品提取无细胞DNA的装置。52. The apparatus of embodiment 50, further comprising a device for extracting cell-free DNA from the maternal test sample.
53.如实施方案52所述的设备,其中用于提取无细胞DNA的该装置与该测序仪位于同一个地点中,并且其中用于获取该母体测试样品的设备位于一个远程地点中。53. The apparatus of embodiment 52, wherein the means for extracting cell-free DNA is located in the same location as the sequencer, and wherein the apparatus for obtaining the maternal test sample is located in a remote location.
54.如实施方案44所述的设备,其中该母体测试样品中的胎儿和母体核酸是无细胞DNA。54. The apparatus of embodiment 44, wherein the fetal and maternal nucleic acids in the maternal test sample are cell-free DNA.
55.如实施方案1或19所述的设备,其中用于比对的该装置(2) 比对至少1百万的读数。55. The apparatus of embodiment 1 or 19, wherein the means for aligning (2) aligns at least 1 million reads.
附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是方法100的流程图,该方法用于在包括核酸的混合物的测试样品中确定存在或不存在拷贝数变异。FIG1 is a flow chart of a method 100 for determining the presence or absence of a copy number variation in a test sample comprising a mixture of nucleic acids.
图2描绘根据如在此描述的伊路纳未删节方案、简略方案(ABB)、两步法以及一步法制备测序文库的工艺流程。“P”表示纯化步骤;并且“X”指示不包括纯化步骤和/或DNA修复。Figure 2 depicts the process flow for preparing sequencing libraries according to the unabridged, abbreviated (ABB), two-step, and one-step protocols described herein. "P" indicates a purification step; and "X" indicates that no purification step and/or DNA repair was included.
图3描绘用于在固体表面上制备测序文库的方法的实施方案的工艺流程。3 depicts a process flow of an embodiment of a method for preparing a sequencing library on a solid surface.
图4展示用于验证进行多步单路测序生物检验的一个样品的完整性的方法的一个实施方案400的流程图。FIG4 shows a flow chart of one embodiment 400 of a method for verifying the integrity of a sample undergoing a multi-step, single-plex sequencing bioassay.
图5展示用于验证进行多步多重测序生物检验的多个样品的完整性的方法的一个实施方案500的流程图。FIG5 shows a flow diagram of one embodiment 500 of a method for verifying the integrity of multiple samples undergoing a multi-step multiplex sequencing bioassay.
图6是用于在包含胎儿与母体核酸的混合物的母体测试样品中同时确定存在或不存在非整倍性和胎儿分数的方法600的流程图。6 is a flow chart of a method 600 for simultaneously determining the presence or absence of aneuploidy and fetal fraction in a maternal test sample comprising a mixture of fetal and maternal nucleic acids.
图7是使用大规模平行测序法或多态核酸序列的尺寸分离,在包含胎儿与母体核酸的混合物的母体测试样品中确定胎儿分数的方法700的流程图。7 is a flow chart of a method 700 for determining fetal fraction in a maternal test sample comprising a mixture of fetal and maternal nucleic acids using massively parallel sequencing or size separation of polymorphic nucleic acid sequences.
图8是用于在富集多态核酸的母体血浆测试样品中同时确定存在或不存在胎儿非整倍性和胎儿分数的方法800的流程图。8 is a flow chart of a method 800 for simultaneously determining the presence or absence of fetal aneuploidy and fetal fraction in a maternal plasma test sample enriched for polymorphic nucleic acids.
图9是用于在富集多态核酸的母体纯化cfDNA测试样品中同时确定存在或不存在胎儿非整倍性和胎儿分数的方法900的流程图。9 is a flow chart of a method 900 for simultaneously determining the presence or absence of fetal aneuploidy and fetal fraction in a maternal purified cfDNA test sample enriched for polymorphic nucleic acids.
图10是用于在从衍生自母体测试样品并且富集多态核酸的胎儿与母体核酸构造的测序文库中同时确定存在或不存在胎儿非整倍性和胎儿分数的方法 1000的流程图。10 is a flow chart of a method 1000 for simultaneously determining the presence or absence of fetal aneuploidy and fetal fraction in a sequencing library constructed from fetal and maternal nucleic acids derived from a maternal test sample and enriched for polymorphic nucleic acids.
图11是概述通过图7中所示的大规模平行测序,确定胎儿分数的方法的替代实施方案的流程图。11 is a flow chart outlining an alternative embodiment of a method for determining fetal fraction by massively parallel sequencing as shown in FIG. 7 .
图12是展示用以在测试样品中确定胎儿分数的胎儿和母体多态序列 (SNP)的识别的柱形图。展示映射到通过rs数(X轴)识别的SNP序列的序列读数的总数(Y轴),以及胎儿核酸的相对含量(*)。Figure 12 is a bar graph showing the identification of fetal and maternal polymorphic sequences (SNPs) used to determine fetal fraction in a test sample. The total number of sequence reads (Y-axis) mapped to the SNP sequence identified by the rs number (X-axis) is shown, as well as the relative amount of fetal nucleic acid (*).
图13是描绘既定基因组位置的胎儿和母体配型状态的分类的框图。FIG13 is a block diagram depicting the classification of fetal and maternal typing status for a given genomic location.
图14展示使用混合物模型以及已知胎儿分数和估算胎儿分数所产生的结果的比较。FIG14 shows a comparison of results produced using a mixture model with known and estimated fetal fractions.
图15示出通过使用具有缺省参数的Eland与人类基因组HG18进行比对的伊路纳GA2数据的30个通路上的测序碱基位置作出的误差估计。FIG15 shows error estimates made by sequencing base positions over 30 lanes of the Eland GA2 data aligned to the human genome HG18 using Eland with default parameters.
图16展示使用机器误差率作为已知参数可使上偏减少一个点。Figure 16 shows that using the machine error rate as a known parameter can reduce the upward bias by one point.
图17展示使用机器误差率作为已知参数,强化情况1和2误差模型的模拟数据使低于0.2的胎儿分数的上偏大大减少到不足一个点。Figure 17 shows that using the machine error rate as a known parameter, the simulated data for the enhanced error models for Cases 1 and 2 significantly reduced the upward bias for fetal fractions below 0.2 to less than one point.
图18是描绘通过比较用两种不同技术计算的胎儿分数值将CNV分类的方法的流程图。18 is a flow chart depicting a method for classifying CNVs by comparing fetal score values calculated using two different techniques.
图19是用于加工测试样品并且最终作出诊断的离散系统的框图。FIG. 19 is a block diagram of a discrete system used to process a test sample and ultimately make a diagnosis.
图20示意性展示在加工测试样品时多少不同的操作可以通过系统的不同元件成群处理。FIG. 20 schematically illustrates how many different operations can be grouped together by different elements of the system when processing a test sample.
图21A和21B展示根据实例2a中描述的简略方案(图21A)和实例2b 中描述的方案(图21B)制备的cfDNA测序文库的电泳图。Figures 21A and 21B show electropherograms of cfDNA sequencing libraries prepared according to the abbreviated protocol described in Example 2a (Figure 21A) and the protocol described in Example 2b (Figure 21B).
图22A到22C提供展示当根据简略方案(ABB;◇)制备测序文库时和当根据无修复两步法(INSOL;□)制备测序文库时映射到每一个人染色体的序列标签的总数百分比的平均值(n=16)(%ChrN;图22A)和序列标签百分比作为染色体尺寸的函数(图22B)的图。图22C展示使用两步法制备文库时映射的标签与使用简略(ABB)法制造文库时获得的标签的比率百分比作为染色体的GC含量的函数。Figures 22A to 22C provide graphs showing the average (n = 16) of the total percentage of sequence tags mapped to each human chromosome when sequencing libraries were prepared according to the abbreviated protocol (ABB; ◇) and when sequencing libraries were prepared according to the two-step method without repair (INSOL; □) (% ChrN; Figure 22A) and the percentage of sequence tags as a function of chromosome size (Figure 22B). Figure 22C shows the ratio of the percentage of tags mapped when the two-step method was used to prepare the library and the tags obtained when the abbreviated (ABB) method was used to make the library as a function of the GC content of the chromosome.
图23A和23B展示提供标签百分比的均值和标准差的柱形图,这些标签映射到从对从10个孕妇的血浆纯化的cfDNA的10个样品进行测序所获得的染色体X(图23A;%ChrX)和Y(图23B;%ChrY)。图23A展示当使用无修复方法(两步)时映射到X染色体的标签数目比使用简略法(ABB)获得的标签数目大。图23B展示使用无修复两步法时映射到Y染色体的标签百分比与使用简略法(ABB)时的标签百分比没有不同。Figures 23A and 23B show bar graphs providing the mean and standard deviation of the percentage of tags mapped to chromosomes X (Figure 23A; %ChrX) and Y (Figure 23B; %ChrY) obtained from sequencing 10 samples of cfDNA purified from the plasma of 10 pregnant women. Figure 23A shows that the number of tags mapped to chromosome X when using the no-repair method (two-step) is greater than the number of tags obtained using the abbreviated method (ABB). Figure 23B shows that the percentage of tags mapped to chromosome Y when using the no-repair two-step method is no different from the percentage of tags when using the abbreviated method (ABB).
图24展示参考基因组(hg18)上非排除位点(NE位点)的数目与映射到5个样品每一者的非排除位点的标签的总数的比率,cfDNA从这些样品中制备并且根据实例2中描述的简略方案(ABB)(实心柱)、溶液中无修复方案(两步;空心柱)以及固体表面无修复方案(一步;灰色柱)用以构造测序文库。Figure 24 shows the ratio of the number of non-excluded sites (NE sites) on the reference genome (hg18) to the total number of tags mapped to the non-excluded sites for each of the five samples from which cfDNA was prepared and used to construct sequencing libraries according to the abbreviated protocol (ABB) (solid bars), the in-solution no-repair protocol (two-step; open bars), and the solid surface no-repair protocol (one-step; gray bars) described in Example 2.
图25A和25B是展示当根据简略方案(ABB;◇)在固体表面上制备测序文库时、当根据无修复两步法(□)制备测序文库时和当根据无修复一步法(Δ) 制备文库时映射到每一个人染色体的序列标签的总数百分比的平均值(n=5) (%ChrN;图25A)和序列标签百分比作为染色体尺寸的函数(图25B)的图。从根据简略方案(ABB;◇)和固体表面无修复方案(两步;□)制备的测序文库获得的映射标签的回归系数。图25C展示从根据无修复两步方案制备的测序文库获得的每一个染色体的映射的序列标签与从根据简略方案(ABB)制备的测序文库获得的每一个染色体的标签的比率百分比作为每一个染色体的GC含量百分比的函数(◇),和从根据无修复一步方案制备的测序文库获得的每一个染色体的映射序列标签与从根据简略方案(ABB)制备的测序文库获得的每一个染色体的标签的比率百分比作为每一个染色体的GC含量百分比的函数(□)。Figures 25A and 25B are graphs showing the mean (n=5) of the total percentage of sequence tags mapped to each human chromosome when sequencing libraries were prepared according to the abbreviated protocol (ABB; ◇), when sequencing libraries were prepared according to the two-step method without repair (□), and when libraries were prepared according to the one-step method without repair (Δ) (% ChrN; Figure 25A) and the percentage of sequence tags as a function of chromosome size (Figure 25B). Regression coefficients for mapped tags obtained from sequencing libraries prepared according to the abbreviated protocol (ABB; ◇) and the solid surface no repair protocol (two-step; □). Figure 25C shows the percentage ratio of the mapped sequence tags for each chromosome obtained from the sequencing libraries prepared according to the two-step protocol without repair to the tags for each chromosome obtained from the sequencing libraries prepared according to the abbreviated protocol (ABB) as a function of the percentage GC content of each chromosome (◇), and the percentage ratio of the mapped sequence tags for each chromosome obtained from the sequencing libraries prepared according to the one-step protocol without repair to the tags for each chromosome obtained from the sequencing libraries prepared according to the abbreviated protocol (ABB) as a function of the percentage GC content of each chromosome (□).
图26A和26B展示标签百分比的均值和标准差的比较,这些标签映射到根据ABB法、两步法以及一步法从对从5个孕妇的血浆纯化的cfDNA的5个样品进行测序所获得的染色体X(图26A)和Y(图26B)。图26A展示当使用无修复方法(两步和一步)时映射到X染色体的标签数目比使用简略法(ABB) 获得的标签数目大。图26B展示使用无修复两步法和一步法时映射到Y染色体的标签百分比与使用简略法时的标签百分比没有不同。Figures 26A and 26B show the mean and standard deviation of the percentage of tags mapped to chromosomes X (Figure 26A) and Y (Figure 26B) obtained by sequencing 5 samples of cfDNA purified from the plasma of 5 pregnant women according to the ABB method, the two-step method, and the one-step method. Figure 26A shows that the number of tags mapped to chromosome X when using the no-repair method (two-step and one-step) is larger than the number of tags obtained using the abbreviated method (ABB). Figure 26B shows that the percentage of tags mapped to chromosome Y when using the no-repair two-step method and the one-step method is no different from the percentage of tags when using the abbreviated method.
图27A和27B展示针对使用ABB法在溶液中制备的61个临床样品(图 27A)和使用无修复固体表面(SS)一步法制备的35个研究样品(图27B),将用以制备测序文库的纯化cfDNA的量与所得文库产物的量相关联。Figures 27A and 27B show the correlation between the amount of purified cfDNA used to prepare sequencing libraries and the amount of resulting library product for 61 clinical samples prepared in solution using the ABB method (Figure 27A) and 35 research samples prepared using the one-step method using a non-repair solid surface (SS) (Figure 27B).
图28展示用以制造文库的cfDNA的量与使用两步(□)、ABB(◇)和一步(Δ)法获得的文库产物的量的相关性。FIG28 shows the correlation between the amount of cfDNA used to make the library and the amount of library product obtained using the two-step (□), ABB (◇), and one-step (Δ) methods.
图29展示当使用一步(空心柱)和两步(实心柱)制备索引文库时获得并且作为6丛(即6个索引样品/流动池通路)测序的索引序列读数的百分比。Figure 29 shows the percentage of index sequence reads obtained when index libraries were prepared using one-step (open bars) and two-step (solid bars) methods and sequenced as 6-plexes (ie, 6 index samples per flow cell lane).
图30A和30B是展示当索引测序文库根据一步法在固体表面上制备并且作为6丛测序时映射到每一个人染色体的序列标签的总数百分比的均值(n=42) (%ChrN;图30A)和所得序列标签百分比作为染色体尺寸的函数(图30B) 的图。Figures 30A and 30B are graphs showing the mean (n=42) percentage of the total number of sequence tags that mapped to each human chromosome (%ChrN; Figure 30A) and the percentage of sequence tags obtained as a function of chromosome size (Figure 30B) when indexed sequencing libraries were prepared on a solid surface according to the one-step method and sequenced as 6-plex.
图31展示映射到Y染色体的序列标签百分比(ChrY)相对于映射到X 染色体的标签百分比(ChrX)。FIG31 shows the percentage of sequence tags that map to chromosome Y (ChrY) relative to the percentage of tags that map to chromosome X (ChrX).
图32A和32B展示了从对cfDNA进行测序所确定的染色体21的染色体剂量的分布,cfDNA是提取自一组48个血液样品,这些样品得自于各自怀有男性或女性胎儿的人类受试者。针对染色体1-12和X(图32A)、并且针对染色体1-22和X(图32B),将对于合格的(即:对于染色体21(O)而言正常的) 染色体21的剂量、以及三体性21测试样品示出为(Δ)。Figures 32A and 32B show the distribution of chromosome doses for chromosome 21 determined from sequencing cfDNA extracted from a set of 48 blood samples obtained from human subjects each carrying a male or female fetus. The doses for qualified (i.e., normal for chromosome 21 (O)) chromosome 21 and trisomy 21 test samples are shown as (Δ) for chromosomes 1-12 and X (Figure 32A) and for chromosomes 1-22 and X (Figure 32B).
图33 展示了从对cfDNA进行测序所确定的染色体18的染色体剂量的分布, cfDNA是提取自一组48个血液样品,这些样品得自于各自怀有男性或女性胎儿的人类受试者。针对染色体1-12和X(图33A)并且针对染色体1-22和X (图33B)示出了对于合格的(即:对于染色体18(O)而言正常的)染色体18 的剂量、以及三体性18(Δ)的测试样品。FIG33 shows the distribution of chromosome 18 chromosomal doses determined from sequencing cfDNA extracted from a panel of 48 blood samples obtained from human subjects each carrying a male or female fetus. The doses for qualified (i.e., normal for chromosome 18 (O)) chromosome 18 and trisomy 18 (Δ) test samples are shown for chromosomes 1-12 and X ( FIG33A ) and for chromosomes 1-22 and X ( FIG33B ).
图34A和34B展示了从对cfDNA进行测序所确定的染色体13的染色体剂量的分布,cfDNA是提取自一组48个血液样品,这些样品得自于各自怀有男性或女性胎儿的人类受试者。针对染色体1-12和X(图34A),并且针对染色体1-22和X(图34B)示出了对于合格的(即:对于染色体13(O)而言正常的)染色体13的剂量、以及三体性13(Δ)的测试样品。Figures 34A and 34B show the distribution of chromosome doses for chromosome 13 determined from sequencing cfDNA extracted from a set of 48 blood samples obtained from human subjects each carrying a male or female fetus. The doses for qualified (i.e., normal for chromosome 13 (O)) chromosome 13 and trisomy 13 (Δ) test samples are shown for chromosomes 1-12 and X (Figure 34A), and for chromosomes 1-22 and X (Figure 34B).
图35A和35B展示了从对cfDNA进行测序所确定的染色体X的染色体剂量的分布,cfDNA提取自一组48个测试血液样品,这些样品得自于各自怀有男性或女性胎儿的人类受试者。针对染色体1-12和X(图35A)、并且针对染色体1-22和X(图35B)示出了对于男性(46,XY;(O))、女性(46,XX;(Δ)) 的染色体X剂量,单体性X(45,X;(+)),以及复杂核型(Cplx(X))的样品。Figures 35A and 35B show the distribution of chromosome doses for chromosome X determined from sequencing cfDNA extracted from a panel of 48 test blood samples obtained from human subjects each carrying a male or female fetus. Chromosome X doses for males (46,XY; (O)), females (46,XX; (Δ)), monosomy X (45,X; (+)), and samples with a complex karyotype (Cplx(X)) are shown for chromosomes 1-12 and X (Figure 35A) and for chromosomes 1-22 and X (Figure 35B).
图36A和36B展示了从对cfDNA进行测序所确定的染色体Y的染色体剂量的分布,cfDNA是提取自一组48个测试血液样品,这些样品得自于各自怀有男性或女性胎儿的人类受试者。针对染色体1-12(图36A)、并且针对染色体1-22(图36B)示出了对于男性(46,XY;(Δ)),女性(46,XX;(O))的染色体Y剂量,单体性X(45,X;(+)),以及复杂核型(Cplx(X))的样品。Figures 36A and 36B show the distribution of chromosome doses for chromosome Y determined from sequencing cfDNA extracted from a set of 48 test blood samples obtained from human subjects each carrying a male or female fetus. Chromosome Y doses for males (46, XY; (Δ)), females (46, XX; (O)), monosomy X (45, X; (+)), and samples with a complex karyotype (Cplx(X)) are shown for chromosomes 1-12 (Figure 36A) and for chromosomes 1-22 (Figure 36B).
图37示出了对于从图32A和32B,33A和33B,以及34A和34B分别示出的剂量来确定的染色体21(■)、18(●)和13(▲)的变异系数(CV)。FIG. 37 shows the coefficient of variation (CV) for chromosomes 21 (■), 18 (●), and 13 (▲) determined from the doses shown in FIGs. 32A and 32B , 33A and 33B , and 34A and 34B , respectively.
图38示出了对于从图35A和35B以及36A和36B中分别示出的剂量来确定的染色体X(■)和Y(●)的变异系数(CV)。FIG. 38 shows the coefficient of variation (CV) for chromosomes X (■) and Y (●) determined from the doses shown in FIGs. 35A and 35B and 36A and 36B, respectively.
图39示出了人类染色体的GC部分的累积性分布。纵轴代表具有低于水平轴上示出的值的GC含量的染色体的频率。Figure 39 shows the cumulative distribution of the GC fraction of human chromosomes. The vertical axis represents the frequency of chromosomes with a GC content lower than the value shown on the horizontal axis.
图40展示了对于从对cfDNA进行测序所确定的染色体11 (81000082-103000103bp)的区段的序列剂量(Y轴),cfDNA是提取自所获得的一组7个合格样品(O)和来自怀孕人类受试者的1个测试样品(◆)。识别了来自一位受试者的样品,这位受试者怀有一个带有染色体11(◆)的一种部分非整倍性的胎儿。Figure 40 shows the sequence dose (Y axis) for a segment of chromosome 11 (81000082-103000103 bp) determined from sequencing cfDNA, which was extracted from a set of 7 qualified samples (O) and 1 test sample (◆) from a pregnant human subject. The sample from one subject was identified, and the subject was pregnant with a fetus with a partial aneuploidy of chromosome 11 (◆).
图41A-41E展示了,相对于在未受影响的样品中的对应染色体的平均值 (Y-轴)的标准差,对于染色体21(41A)、染色体18(41B)、染色体13(41C)、染色体X(41D)以及染色体Y(41E)的归一化的染色体剂量的分布。Figures 41A-41E show the distribution of normalized chromosome doses for chromosome 21 (41A), chromosome 18 (41B), chromosome 13 (41C), chromosome X (41D), and chromosome Y (41E) relative to the standard deviation of the mean (Y-axis) of the corresponding chromosome in unaffected samples.
图42示出了使用如实例12中所述的归一化染色体,对于在来自训练组1 中的样品中确定的染色体21(O)、18(Δ)、和13(□)的归一化的染色体值。42 shows the normalized chromosome values for chromosomes 21 (O), 18 (Δ), and 13 (□) determined in samples from Training Set 1 using the normalizing chromosomes as described in Example 12.
图43示出了使用如实例12中所述的归一化染色体,对于在来自测试组1 中的样品中确定的染色体21(O)、18(Δ)、和13(□)的归一化的染色体值。43 shows the normalized chromosome values for chromosomes 21 (O), 18 (Δ), and 13 (□) determined in samples from Test Group 1 using the normalizing chromosomes as described in Example 12.
图44示出了使用Chiu(赵)等人的归一化方法(对感兴趣的染色体所识别序列标签的数目与在样品中剩余染色体所获得的序列标签的数目进行归一化,参见在本申请其他地方的实例13),对于来自测试组1的样品中确定的染色体21(O)和18(Δ)的归一化的染色体值。Figure 44 shows the normalized chromosome values for chromosomes 21 (O) and 18 (Δ) determined in samples from test group 1 using the normalization method of Chiu et al. (normalizing the number of sequence tags identified for the chromosome of interest to the number of sequence tags obtained for the remaining chromosomes in the sample, see Example 13 elsewhere in this application).
图45示出了使用系统地确定的归一化染色体(如实例13中所述),对于来自训练组1的样品中确定的染色体21(O)、18(Δ)、和13(□)的归一化的染色体值。Figure 45 shows the normalized chromosome values for chromosomes 21 (0), 18 (Δ), and 13 (□) determined in samples from training set 1 using systematically determined normalizing chromosomes (as described in Example 13).
图46展示染色体X(X轴)和Y(Y轴)的归一化的染色体值。箭头指向如实例13中所述,分别在训练集和测试集中识别的5个(图46A)和3个 (图46B)X单体性样品。Figure 46 shows the normalized chromosome values for chromosomes X (X-axis) and Y (Y-axis). Arrows point to the 5 (Figure 46A) and 3 (Figure 46B) X monosomy samples identified in the training and test sets, respectively, as described in Example 13.
图47示出了使用系统地确定的归一化染色体(如实例13中所述),对于来自测试组1的样品中确定的染色体21(O)、18(Δ)、和13(□)的归一化的染色体值。Figure 47 shows the normalized chromosome values for chromosomes 21 (O), 18 (Δ), and 13 (□) determined in samples from Test Group 1 using systematically determined normalizing chromosomes (as described in Example 13).
图48示出了使用系统地确定的归一化染色体(如实例13中所述),对于来自测试组1的样品中确定的染色体9(O)的归一化的染色体值。Figure 48 shows the normalized chromosome values for chromosome 9 (O) determined in samples from Test Group 1 using the systematically determined normalizing chromosome (as described in Example 13).
图49示出了使用系统地确定的归一化染色体(如实例13中所述),对于来自测试组1的样品中确定的染色体1-22的归一化的染色体值。FIG49 shows normalized chromosome values for chromosomes 1-22 determined in samples from Test Group 1 using systematically determined normalizing chromosomes (as described in Example 13).
图50显示实例16中所述的研究的设计(A)和随机抽样方案(B)的流程图。FIG50 shows a flow diagram of the design (A) and random sampling scheme (B) of the study described in Example 16.
图51A到51F展示染色体21、18以及13的分析(分别是图51A到51C) 以及女性、男性以及X单体性的性别分析(分别是图51D到51F)的流程图。椭圆形包括从来自实验室的测序信息获得的结果,矩形包括核型结果,并且具有圆角的矩形展示用以确定测试性能(灵敏性和专一性)的比较结果。图51A 和51B中的虚线表示T21(n=3)与T18(n=1)的嵌合性样本之间的关系,这些样品分别由染色体21和18的分析被检查过,但如实例16中所述正确地确定。Figures 51A to 51F show flow charts for analysis of chromosomes 21, 18, and 13 (Figures 51A to 51C, respectively) and sex analysis for females, males, and monosomy X (Figures 51D to 51F, respectively). The ovals include results obtained from sequencing information from the laboratory, the rectangles include karyotype results, and the rectangles with rounded corners show comparative results used to determine test performance (sensitivity and specificity). The dotted lines in Figures 51A and 51B represent the relationship between mosaic samples for T21 (n=3) and T18 (n=1), which were examined by analysis of chromosomes 21 and 18, respectively, but were correctly determined as described in Example 16.
图52显示针对实例16中所述的研究的测试样品,染色体21(●)、18(■) 以及13(▲)的归一化的染色体值(NCV)对比核型分类关系。圆形样品表示具有三体性核型的未分类样品。Figure 52 shows the normalized chromosome value (NCV) versus karyotype classification for chromosomes 21 (●), 18 (■), and 13 (▲) for the test samples from the study described in Example 16. Circle samples represent unclassified samples with a trisomic karyotype.
图53显示实例16中所述的研究的测试样品的染色体X的归一化的染色体值(NCV)对比性别分类的核型分类关系。展示具有女性核型的样品(○)、具有男性核型的样品(●)、具有45,X的样品(□)以及具有其他核型(即XXX、 XXY以及XYY)的样品(■)。Figure 53 shows the normalized chromosome value (NCV) for chromosome X versus sex-classified karyotype relationships for the test samples from the study described in Example 16. Samples with a female karyotype (○), samples with a male karyotype (●), samples with 45,X (□), and samples with other karyotypes (i.e., XXX, XXY, and XYY) (■) are shown.
图54展示针对实例16中所述的临床研究的测试样品,染色体Y的归一化的染色体值对比染色体X的归一化的染色体值关系的图。展示整倍体男性和女性样品(○)、XXX样品(●)、45,X样品(X)、XYY样品(■)以及XXY 样品(▲)。虚线展示如实例16中所述用于将样品分类的阈值。Figure 54 shows a graph of the normalized chromosome value of chromosome Y versus the normalized chromosome value of chromosome X for the test samples of the clinical study described in Example 16. Shows euploid male and female samples (○), XXX samples (●), 45,X samples (X), XYY samples (■) and XXY samples (▲). The dotted line shows the threshold value for classifying the samples as described in Example 16.
图55示意性展示在此描述的CNV确定方法的一个实施方案。Figure 55 schematically illustrates one embodiment of the CNV determination method described herein.
图56展示来自实例17,在包含来自具有21三体性的孩子的DNA的合成母体样品(1)中使用染色体21的剂量确定的“ff”百分比(ff21)作为使用染色体X的剂量确定的“ff”百分比(ffX)的函数的图。56 shows a graph from Example 17 of the percent "ff" determined using the dose of chromosome 21 (ff 21 ) as a function of the percent "ff" determined using the dose of chromosome X (ff X ) in a synthetic maternal sample (1) comprising DNA from a child with trisomy 21. FIG.
图57展示来自实例17,在包含来自整倍体母亲和其携带染色体7部分缺失的孩子的DNA的合成母体样品(2)中使用染色体7的剂量确定的“ff”百分比(ff7)作为使用染色体X的剂量确定的“ff”百分比(ffX)的函数的图。Figure 57 shows a graph from Example 17 of the percent "ff" determined using the dose for chromosome 7 ( ff7 ) as a function of the percent "ff" determined using the dose for chromosome X (ffX) in the synthetic maternal sample ( 2 ) comprising DNA from a euploid mother and her child carrying a partial deletion of chromosome 7.
图58展示来自实例17,在包含来自整倍体母亲和其具有染色体15部分复制的25%嵌合性孩子的DNA的合成母体样品(3)中使用染色体15的剂量确定的“ff”百分比(ff15)作为使用染色体X的剂量确定的“ff”百分比(ffX)的函数的图。Figure 58 shows a graph from Example 17 of the percent "ff" determined using the dose of chromosome 15 (ff15) as a function of the percent "ff" determined using the dose of chromosome X ( ffX ) in the synthetic maternal sample (3) comprising DNA from a euploid mother and her 25% mosaic child with a partial duplication of chromosome 15 .
图59展示来自实例17,在人工样品(4)中使用染色体22的剂量确定的“ff”百分比(ff22)和从其获得的NCV的图,该人工样品包含0%孩子DNA(i),和来自已知不具有染色体22部分染色体非整倍性的未受影响孪生儿子的10% DNA(ii),以及来自已知具有染色体22部分染色体非整倍性的受影响孪生儿子的10%DNA(iii)。Figure 59 shows a graph of the "ff" percentage ( ff22 ) and the NCV obtained therefrom, determined using the doses for chromosome 22 in an artificial sample (4) from Example 17, comprising 0% child DNA (i), and 10% DNA from an unaffected twin son known not to have a partial chromosomal aneuploidy for chromosome 22 (ii), and 10% DNA from an affected twin son known to have a partial chromosomal aneuploidy for chromosome 22 (iii).
图60展示来自实例18,在包括胎儿T21三体性的样品中确定的CNffx 对比CNff21关系的图。Figure 60 shows a graph from Example 18 of the CNffx versus CNff21 relationship determined in samples comprising fetal T21 trisomy.
图61展示来自实例18,在包括胎儿T18三体性的样品中确定的CNffx 对比CNff18关系的图。Figure 61 shows a graph from Example 18 showing the CNffx versus CNff18 relationship determined in samples comprising fetal T18 trisomy.
图62展示来自实例18,在包括胎儿T13三体性的样品中确定的CNffx 对比CNff13关系的图。Figure 62 shows a graph from Example 18 showing the CNffx versus CNff13 relationship determined in samples including fetal T13 trisomy.
图63展示来自实例19,在测试样品中染色体1到22和X的NCV值的图。Figure 63 shows a graph from Example 19, NCV values for chromosomes 1 to 22 and X in the test samples.
图64展示实例18中针对具有患有T21的女性胎儿的样品所获得的胎儿分数。Figure 64 shows the fetal fractions obtained for samples from female fetuses with T21 in Example 18.
图65展示一种医学分析设备的一个实施方案,该医学分析设备用于确定作为胎儿基因组中所存在的拷贝数变异的函数的胎儿分数。Figure 65 shows one embodiment of a medical analysis apparatus for determining fetal fraction as a function of copy number variation present in the fetal genome.
图66展示用于确定胎儿分数以将胎儿基因组中的拷贝数变异进行分类的一种医学分析设备的一个实施方案。FIG66 illustrates one embodiment of a medical analysis apparatus for determining a fetal fraction to classify copy number variation in a fetal genome.
图67展示一种试剂盒,该试剂盒包括检验对照试剂和用于追踪和验证进行大规模平行测序的母体cfDNA样品的完整性的试剂。Figure 67 shows a kit that includes assay control reagents and reagents for tracking and verifying the integrity of maternal cfDNA samples undergoing massively parallel sequencing.
图68展示一种试剂盒,该试剂盒包括血液收集装置、DNA提取试剂和用于检验母体DNA样品的对照试剂。Figure 68 shows a kit that includes a blood collection device, DNA extraction reagents, and control reagents for testing maternal DNA samples.
图69(A、B、C)展示针对染色体13、18和21的拷贝数变异所检验的内在阳性对照[□]和母体样品[◇]的NCV图。Figure 69 (A, B, C) shows NCV plots of the internal positive control [□] and maternal samples [◇] tested for copy number variations of chromosomes 13, 18 and 21.
详细描述Detailed description
所披露的实施方案涉及多种方法、设备以及系统用于在包括核酸混合物的测试样品中确定感兴趣的序列的拷贝数变异(CNV),已知或怀疑这些核酸在感兴趣的一个或多个序列的量上是不同的。>感兴趣的序列包括例如范围从千碱基(kb)到兆碱基(Mb)到整个染色体的基因组区段序列,已知或怀疑这些序列与遗传情况或疾病情况是相关联的。感兴趣的序列的实例包括与熟知的非整倍性相关联的染色体(例如三体性21)以及在疾病(如癌症)中增加的染色体的区段,例如在急性髓细胞白血病中的部分三体性8。根据本方法可以确定的CNV包括常染色体1-22、以及性染色体X和Y(例如:45,X、47,XXX、 47,XXY和47,XYY)中的任意一个或多个的单体性和三体性,其他染色体多体性,即四体性和五体性(包括但并不局限于XXXX、XXXXX、XXXXY和XYYYY),以及这些染色体中的任一个或多个的区段的缺失和/或复制。The disclosed embodiments relate to methods, apparatus, and systems for determining copy number variations (CNVs) of sequences of interest in a test sample comprising a mixture of nucleic acids that are known or suspected to differ in the amount of one or more sequences of interest. Sequences of interest include, for example, genomic segment sequences ranging from kilobases (kb) to megabases (Mb) to entire chromosomes that are known or suspected to be associated with a genetic condition or disease condition. Examples of sequences of interest include chromosomes associated with well-known aneuploidies (e.g., trisomy 21) and segments of chromosomes that are increased in diseases such as cancer, such as partial trisomy 8 in acute myeloid leukemia. CNVs that can be determined according to this method include monosomy and trisomy of any one or more of autosomes 1-22, and sex chromosomes X and Y (e.g., 45,X, 47,XXX, 47,XXY, and 47,XYY), other chromosomal polysomy, i.e., tetrasomy and pentasomy (including but not limited to XXXX , XXXXX , XXXXY , and XYYYY ), and deletions and/or duplications of segments of any one or more of these chromosomes.
该方法是一种统计方法,该统计方法在一个或多个处理器上实施的并且将源自过程相关的、染色体间(同轮次)的和测序处理间的(轮次间)的变异性的累积性变异性考虑在内。这些方法适用于确定任何胎儿非整倍性的CNV、以及已知或怀疑与多种医学病况相关的CNV。The method is a statistical method that is implemented on one or more processors and takes into account the cumulative variability derived from process-related, inter-chromosomal (within round) and inter-sequencing treatment (between rounds) variability. These methods are applicable to determining CNVs for any fetal aneuploidy, as well as CNVs known or suspected to be associated with a variety of medical conditions.
除非另外指明,本发明的实施涉及通常用于分子生物学、微生物学、蛋白纯化、蛋白工程、蛋白和DNA测序、以及重组DNA领域的常规技术和装置,这些都在本领域的技术内。此类技术和装置对本领域普通技术人员而言是已知的,并且说明于众多文件和参考著作(例如,见Sambrook(萨姆布鲁克) 等人,“Molecular Cloning:A Laboratory Manual(分子克隆实验指南)”,第三版(Cold Spring Harbor(冷泉港)),[2001]);以及Ausubel(奥苏贝尔)等人,“Current Protocols in Molecular Biology(最新分子生物学实验方法汇编)”[1987]。Unless otherwise indicated, the practice of the present invention involves conventional techniques and apparatus commonly used in molecular biology, microbiology, protein purification, protein engineering, protein and DNA sequencing, and recombinant DNA fields, which are within the skill of the art. Such techniques and apparatus are known to those of ordinary skill in the art and are described in numerous documents and reference works (e.g., see Sambrook et al., "Molecular Cloning: A Laboratory Manual," 3rd ed. (Cold Spring Harbor, [2001]); and Ausubel et al., "Current Protocols in Molecular Biology" [1987].
数值范围包括限定该范围的数值。在此的意图是贯穿本说明书给出的每一最大数值限度包括每一较低的数值限度,如同此类较低数值限度在此被明确写出。贯穿本说明书给出的每一最小数值限度将包括每一较高的数值限度,如同此类较高数值限度在此被明确写出。贯穿本说明书给出的每一数值范围将包括落在此类较广的数值范围内的每一较窄数值范围,如同此类较窄数值范围此处被全部明确地写出。Numerical ranges include the values defining the ranges. It is intended that every maximum numerical limit given throughout this specification include every lower numerical limit, as if such lower numerical limits were expressly set forth herein. Every minimum numerical limit given throughout this specification will include every higher numerical limit, as if such higher numerical limits were expressly set forth herein. Every numerical range given throughout this specification will include every narrower numerical range falling within such broader numerical range, as if such narrower numerical ranges were all expressly set forth herein.
在此提供的标题不意欲限制本披露。The headings provided herein are not intended to limit this disclosure.
除非在此另行定义,在此使用的所有技术的和科学的术语都具有本发明所属领域中的一位普通技术人员通常理解的相同含义。包括了在此包含的术语的不同科学字典对于本领域那些技术人员而言是熟知并且是可获得的。虽然类似或等价于在此所述的那些方法和材料的任何方法和材料在实施或测试在此披露的实施方案中找到了用途,但仅说明了一些优选的方法和材料。Unless otherwise defined herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Various scientific dictionaries encompassing the terms contained herein are well known and available to those skilled in the art. Although any methods and materials similar or equivalent to those described herein find use in practicing or testing the embodiments disclosed herein, only some preferred methods and materials are described.
直接在下文中定义的术语通过将本说明书作为整体来参阅即得到更完全地说明。应理解,本披露内容并不局限于所说明的具体方法学、规程、以及试剂,因为这些可以变化,它们被本领域的那些技术人员根据其情况下来使用。The terms defined immediately below are more fully described by reference to the specification as a whole. It is to be understood that this disclosure is not limited to the particular methodology, procedures, and reagents described, as these may vary and be used by those skilled in the art according to their circumstances.
定义definition
如在此所使用的,单数的术语“一个”、“一种”、和“该”包括复数引用,除非上下文清楚地另外指明。除非另外指明,对应地,核酸是按5'到3'方向从左到右书写并且氨基酸序列是按氨基到羧基方向从左到右书写。As used herein, the singular terms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation and amino acid sequences are written left to right in amino to carboxyl orientation, respectively.
术语“评估”当在此在分析核酸样品的CNV的情况下使用时是指将染色体或区段非整倍性的状态表征为三种类型判定之一:“正常”或“未受影响”、“受影响”以及“无判定”。判定正常和受影响的阈值典型地设置。对样品中与非整倍性有关的参数进行测量,并且将这些测量值与阈值进行比较。对于复制类型的非整倍性,如果染色体或区段剂量(或序列含量的其他测量值)超过针对受影响样品所设置的界定阈值,那么判定受影响。对于这些非整倍性,如果染色体或区段剂量低于针对正常样本所设置的阈值,那么判定正常。相比之下,对于缺失类型的非整倍性,如果染色体或区段剂量低于受影响样品的界定阈值,那么判定受影响,并且如果染色体或区段剂量超过针对正常样本所设置的阈值,那么判定正常。举例来说,在三体性存在下,通过例如测试染色体剂量等参数的值低于用户界定的可靠性阈值,确定“正常”判定,并且通过例如测试染色体剂量等参数超过用户界定的可靠性阈值,确定“受影响”判定。通过例如测试染色体剂量等参数位于“正常”或“受影响”判定的阈值之间,确定“无判定”的结果。术语“无判定”与“未分类”互换使用。The term "assessment" when used herein in the context of CNV analysis of nucleic acid samples refers to characterizing the state of aneuploidy of a chromosome or segment as one of three types of determinations: "normal" or "unaffected," "affected," and "no determination." Determining normal and affected thresholds is typically set. Parameters related to aneuploidy in the sample are measured, and these measurements are compared with thresholds. For aneuploidy of the duplication type, if the chromosome or segment dose (or other measured values of sequence content) exceeds the defined threshold value set for the affected sample, then it is determined to be affected. For these aneuploidies, if the chromosome or segment dose is lower than the threshold value set for the normal sample, then it is determined to be normal. In contrast, for aneuploidy of the deletion type, if the chromosome or segment dose is lower than the defined threshold value of the affected sample, then it is determined to be affected, and if the chromosome or segment dose exceeds the threshold value set for the normal sample, then it is determined to be normal. For example, in the presence of trisomy, by, for example, the value of a parameter such as a test chromosome dose is lower than a reliability threshold defined by the user, determining that "normal" is determined, and by, for example, a parameter such as a test chromosome dose exceeds a reliability threshold defined by the user, determining that "affected" is determined. A result of "no call" is determined by a parameter such as test chromosome dosage being between the thresholds for a call of "normal" or "affected." The term "no call" is used interchangeably with "unclassified."
术语“拷贝数变异”在此是指与合格样品中存在的核酸序列的拷贝数相比,测试样品中存在的核酸序列的拷贝数的变化。在某些实施方案中,核酸序列是 1kb或更大。在一些情况下,核酸序列是全染色体或其重要部分。“拷贝数变异体”是指通过将测试样品中感兴趣的序列与感兴趣的序列的预期含量进行比较,发现拷贝数差异的核酸序列。举例来说,将测试样品中感兴趣的序列的含量与合格样品中存在的感兴趣的序列的含量进行比较。拷贝数变异体/变异包括缺失(包括微缺失)、插入(包括微插入)、复制、倍增、倒位、易位以及复杂多位置变异。CNV涵盖染色体非整倍性和部分非整倍性。The term "copy number variation" refers to the change in the copy number of the nucleic acid sequence present in the test sample compared to the copy number of the nucleic acid sequence present in the qualified sample. In certain embodiments, the nucleic acid sequence is 1kb or larger. In some cases, the nucleic acid sequence is a full chromosome or a significant portion thereof. "Copy number variant" refers to a nucleic acid sequence that is found to have a copy number difference by comparing the expected content of the sequence of interest with the sequence of interest in the test sample. For example, the content of the sequence of interest in the test sample is compared with the content of the sequence of interest present in the qualified sample. Copy number variant/variation includes deletion (including microdeletion), insertion (including microinsertion), duplication, multiplication, inversion, translocation and complex multi-position variation. CNV encompasses chromosome aneuploidy and partial aneuploidy.
术语“非整倍性”在此是指由损失或获得整个染色体、或染色体的一部分而引起的遗传物质的不平衡。The term "aneuploidy" as used herein refers to an imbalance in genetic material caused by the loss or gain of an entire chromosome, or a portion of a chromosome.
术语“染色体性非整倍性”和“完整染色体性非整倍性”在此是指由损失或获得整个染色体而引起的遗传物质的不平衡,并且包括种系非整倍性和嵌合性非整倍性。The terms "chromosomal aneuploidy" and "complete chromosomal aneuploidy" as used herein refer to an imbalance in genetic material caused by the loss or gain of an entire chromosome and include germline aneuploidy and mosaic aneuploidy.
术语“部分非整倍性”和“部分染色体性非整倍性”在此是指由损失或获得染色体的一部分(例如,部分单体性和部分三体性)而引起的遗传物质的不平衡,并且涵盖由易位、缺失和插入引起的不平衡。The terms "partial aneuploidy" and "partial chromosomal aneuploidy" herein refer to an imbalance of genetic material caused by the loss or gain of a portion of a chromosome (e.g., partial monosomy and partial trisomy), and encompass imbalances caused by translocations, deletions, and insertions.
术语“非整倍性样本”在此是指表明一位受试者的染色体含量不是整倍体的一个样品,即:该样品表明一位受试者带有染色体或染色体部分的异常拷贝数。The term "aneuploid sample" herein refers to a sample indicating that the chromosome content of a subject is not euploid, ie, the sample indicates that a subject carries an abnormal copy number of a chromosome or a portion of a chromosome.
术语“非整倍性染色体”在此是指一种染色体,它已知或被确定是存在于一个异常拷贝数的样品之中。The term "aneuploid chromosome" as used herein refers to a chromosome that is known or determined to be present in a sample at an abnormal copy number.
术语“多个/多种”在此是指超过一个。举例来说,该术语在此用以指核酸分子或序列标签的数目在使用在此披露的方法下足以识别测试样品和合格样品中拷贝数变异(例如染色体剂量)的显著差别。在一些实施方案中,对于每一测试样品获得了包括在约20和40bp读数之间的至少约3x 106个序列标签、至少约5x 106个序列标签、至少约8x 106个序列标签、至少约10x 106个序列标签、至少约15x 106个序列标签、至少约20x 106个序列标签、至少约30x 106个序列标签、至少约40x 106个序列标签、或至少约50x 106个序列标签。The term "multiple/multiple" refers to more than one at this. For example, the term is used to refer to the number of nucleic acid molecules or sequence tags at this time and is enough to identify the significant difference of copy number variation (such as chromosome dosage) in test specimens and qualified samples under the method disclosed here. In some embodiments, at least about 3x 106 sequence tags, at least about 5x 106 sequence tags, at least about 8x 106 sequence tags, at least about 10x 106 sequence tags, at least about 15x 106 sequence tags, at least about 20x 106 sequence tags, at least about 30x 106 sequence tags, at least about 40x 106 sequence tags or at least about 50x 106 sequence tags are obtained for each test specimen.
术语“多核苷酸”、“核酸”以及“核酸分子”被可互换地使用,并且是指一个共价连接的核苷酸序列(即RNA的核糖核苷酸和DNA的脱氧核糖核苷酸),其中一个核苷酸的戊糖的3'位置被一个磷酸二酯基团连接到下一个核苷酸的戊糖的5'位置上,这包括任何形式的核酸的序列,包括但不局限于RNA和DNA 分子,例如cfDNA分子。术语“多核苷酸”包括而不局限于单链的和双链的多核苷酸。The terms "polynucleotide," "nucleic acid," and "nucleic acid molecule" are used interchangeably and refer to a sequence of covalently linked nucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides for DNA) in which the 3' position of the pentose sugar of one nucleotide is linked to the 5' position of the pentose sugar of the next nucleotide by a phosphodiester group. This includes sequences of any form of nucleic acid, including but not limited to RNA and DNA molecules, such as cfDNA molecules. The term "polynucleotide" includes but is not limited to single-stranded and double-stranded polynucleotides.
术语“部分(portion)”在此被用于提及在一个生物学样品中胎儿和母体核酸分子的序列信息的量,这个量合计小于一个人类基因组的序列信息。The term "portion" is used herein to refer to an amount of sequence information of fetal and maternal nucleic acid molecules in a biological sample that, in total, is less than the sequence information of a human genome.
术语“测试样品”在此是指包含包括至少一个将针对拷贝数变异进行筛选的核酸序列的核酸或核酸混合物的样品,典型地衍生自生物学流体、细胞、组织、器官或生物体。在某些实施方案中,样品包括至少一个怀疑其拷贝数已经变异的核酸序列。这些样品包括但不限于唾液/口水、羊水、血液、血块或细针活组织检查样品(例如手术活组织检查、细针活组织检查等等)、尿、腹膜液、胸膜液等等。虽然样品经常取自人类受试者(例如患者),但是检验可用于来自包括但不限于狗、猫、马、山羊、绵羊、牛、猪等任何哺乳动物的样品中的拷贝数变异(CNV)。样品可以在从生物来源中获得时直接使用,或者在预处理以改变样品特征后使用。举例来说,该预处理可包括从血液制备血浆、稀释粘稠流体等等。预处理的方法还可以包括但不限于过滤、沉淀、稀释、蒸馏、混合、离心、冰冻、冻干、浓缩、扩增、核酸片段化、干扰组分灭活、添加试剂、溶解等等。如果这些预处理的方法用于样品,那么这些预处理的方法典型地会使一种或多种相关核酸优选以与未处理的测试样品(例如即未进行任何这样的预处理方法的样品)中的浓度成比例的浓度保留在测试样品中。对于在此描述的方法,仍然认为这些进行“处理”或“加工”的样品是生物“测试”样品。The term "test sample" refers to a sample comprising at least one nucleic acid or nucleic acid mixture comprising a nucleic acid sequence that will be screened for copy number variation, typically derived from biological fluids, cells, tissues, organs or organisms. In certain embodiments, the sample includes at least one nucleic acid sequence suspected to have mutated its copy number. These samples include but are not limited to saliva/saliva, amniotic fluid, blood, blood clots or fine needle biopsy samples (such as surgical biopsy, fine needle biopsy, etc.), urine, peritoneal fluid, pleural fluid, etc. Although samples are often taken from human subjects (such as patients), the test can be used for copy number variation (CNV) in samples from any mammals such as dogs, cats, horses, goats, sheep, cattle, pigs, etc. The sample can be used directly when obtained from a biological source, or after pretreatment to change sample characteristics. For example, the pretreatment can include preparing plasma from blood, diluting viscous fluids, etc. The pretreatment method can also include but is not limited to filtration, precipitation, dilution, distillation, mixing, centrifugation, freezing, lyophilization, concentration, amplification, nucleic acid fragmentation, interfering component inactivation, adding reagents, dissolving, etc. If these pretreatment methods are used on a sample, then these pretreatment methods typically result in one or more nucleic acids of interest being retained in the test sample preferably at a concentration proportional to the concentration in an untreated test sample (e.g., a sample that has not been subjected to any such pretreatment methods). For the purposes of the methods described herein, these "treated" or "processed" samples are still considered biological "test" samples.
术语“合格样品”在此是指包括以测试样品中的核酸进行比较的已知拷贝数存在的核酸的混合物的样品,并且对于感兴趣的序列来说,此样品是正常的样品,即不是非整倍体样品。在某些实施方案中,合格样品用于识别予以考虑的染色体的一个或多个归一化染色体或区段。举例来说,合格样品可用于识别染色体21的归一化染色体。在此情况下,合格样品是一个不是21三体性样品的样品。合格样品还可以用于确定判定受影响样品的阈值。Term " qualified sample " refers to the sample of the mixture of nucleic acid that comprises the known copy number that compares with the nucleic acid in the test sample at this, and for sequence of interest, this sample is a normal sample, i.e., not an aneuploid sample. In certain embodiments, qualified sample is used to identify one or more normalizing chromosomes or segments of the chromosome being considered. For example, qualified sample can be used to identify the normalizing chromosome of chromosome 21. In this case, qualified sample is a sample that is not a 21 trisomy sample. Qualified sample can also be used to determine the threshold value of the affected sample.
术语“训练组”在此是指一组样品,它们可包括受影响的和未受影响的样品并且被用于发展一种用于分析测试样品的模型。在训练组中未受影响的样品可以用作合格样品来识别归一化序列,例如归一化染色体,而未受影响的样品的染色体剂量被用于为这些感兴趣的序列(例如染色体)中的每一个设定阈值。在一个训练组中的这些受影响的样品可以被用于验证受影响的测试样品可以容易地从未受影响的样品中辨别出来。The term "training group" refers to a group of samples at this time, and they can include affected and unaffected samples and are used to develop a kind of model for analyzing test samples. Unaffected samples can be used as qualified samples to identify normalization sequences, such as normalization chromosomes, in the training group, and the chromosome dosage of unaffected samples is used to set a threshold value for each of these sequences of interest (such as chromosomes). These affected samples in a training group can be used to verify that affected test samples can be easily distinguished from unaffected samples.
术语“合格核酸”是与“合格序列”可互换地使用,这是一个测试序列或测试核酸与之进行比较的序列。合格序列是优选按已知表达(即合格序列的量是已知的)存在于生物学样品中的一种序列。总体而言,合格序列是存在于“合格样品”中的序列。“感兴趣的合格序列”是对其在合格样品中的量已知的一种合格序列,并且它是与带有一种医学情况的个体的序列表达中的一种差异相关联的序列。The term "qualified nucleic acid" is used interchangeably with "qualified sequence," which is a sequence to which a test sequence or test nucleic acid is compared. A qualified sequence is a sequence that is preferably present in a biological sample at known expression (i.e., the amount of the qualified sequence is known). In general, a qualified sequence is a sequence that is present in a "qualified sample." A "qualified sequence of interest" is a qualified sequence whose amount in a qualified sample is known and that is associated with a difference in sequence expression in individuals with a medical condition.
术语“感兴趣的序列”在此是指一种核酸序列,它与在健康对比患病个体的序列表达中的一种差异相关联。一个感兴趣的序列可以是一种染色体上的序列,它在疾病或遗传状况下被错误表达,即:过表达或表达不足。一个感兴趣的序列可以是一个染色体的一部分(即染色体区段)、或一个染色体。例如,一个感兴趣的序列可以是一种染色体(它在非整倍性情况下是过表达的),或者是一种基因(它对在癌症中表达不足的一种肿瘤抑制子进行编码)。感兴趣的序列包括在受试者的细胞的总群或亚群中过表达或表达不足的序列。一个“感兴趣的合格序列”是在合格样品中的感兴趣的序列。一个“感兴趣的测试序列”是在测试样品中的感兴趣的序列。The term "sequence of interest" herein refers to a nucleic acid sequence that is associated with a difference in sequence expression between healthy versus diseased individuals. A sequence of interest can be a sequence on a chromosome that is misexpressed, i.e., overexpressed or underexpressed, in a disease or genetic condition. A sequence of interest can be a portion of a chromosome (i.e., a chromosome segment), or a chromosome. For example, a sequence of interest can be a chromosome that is overexpressed in the case of aneuploidy, or a gene that encodes a tumor suppressor that is underexpressed in cancer. Sequences of interest include sequences that are overexpressed or underexpressed in the total population or subpopulation of cells of a subject. A "qualified sequence of interest" is a sequence of interest in a qualified sample. A "test sequence of interest" is a sequence of interest in a test sample.
术语“归一化序列”在此是指用以将映射到与该归一化序列相关联的感兴趣的序列的序列标签的数目归一化的序列。在某些实施方案中,归一化序列显示映射到归一化序列的序列标签的数目在样品和测序轮次中的变异性,该变异性接近于归一化序列用作归一化参数的感兴趣的序列的变异性,并且可将受影响样品与一个或多个未受影响样品辨别开。在某些实现方式中,与例如其他染色体等其他潜在归一化序列相比,该归一化序列最佳或有效地将受影响样品与一个或多个未受影响样品辨别开。“归一化染色体”或“归一化染色体序列”是“归一化序列”的实例“归一化染色体序列”可以由一个单染色体或一组染色体构成。“一个“归一化区段”是“归一化序列”的另一个实例。一个“归一化区段序列”可以由一个染色体的单一区段构成,或者它可以由相同的或不同的染色体的两个或更多个区段构成。在某些实施方案中,归一化序列是用来针对例如工艺相关的变异性、染色体间(同轮次)的变异性和测序间(轮次间)的变异性等变异性进行归一化。The term "normalizing sequence" refers to a sequence that is used to normalize the number of sequence tags mapped to the sequence of interest associated with the normalizing sequence. In certain embodiments, the normalizing sequence shows the variability of the number of sequence tags mapped to the normalizing sequence in sample and sequencing rounds, and this variability is close to the variability of the sequence of interest used as a normalization parameter for the normalizing sequence, and can distinguish affected samples from one or more unaffected samples. In some implementations, compared with other potential normalizing sequences such as other chromosomes, this normalizing sequence is best or effectively distinguishes affected samples from one or more unaffected samples." normalizing chromosome" or "normalizing chromosome sequence" are examples of "normalizing sequence" and "normalizing chromosome sequence" can be made up of a single chromosome or a group of chromosomes. A "normalizing segment" is another example of a "normalizing sequence." A "normalizing segment sequence" can be composed of a single segment of a chromosome, or it can be composed of two or more segments of the same or different chromosomes. In certain embodiments, the normalizing sequence is used to normalize for variability such as process-related variability, variability between chromosomes (within the same round), and variability between sequencing runs (between runs).
术语“可分辨性”在此是指时一种归一化染色体的特征,这使它能够从一个或多个受影响的(即非整倍性)样品辨别出一个或多个未受影响的(即正常的) 样品。The term "resolvability" herein refers to a characteristic of a normalizing chromosome that enables one or more unaffected (ie, normal) samples to be distinguished from one or more affected (ie, aneuploid) samples.
术语“序列剂量”在此是指将针对感兴趣的序列识别的序列标签的数目与针对归一化序列识别的序列标签的数目相关联的参数。在一些情况下,序列剂量是针对感兴趣的序列所识别的序列标签的数目与针对归一化序列所识别的序列标签的数目的比率。在一些情况下,序列剂量是指将感兴趣的序列的序列标签密度与归一化序列的标签密度相关联的参数。“测试序列剂量”是一个参数,它使一个感兴趣的序列(例如染色体21)的序列标签密度与在一个测试样品中确定的归一化序列(例如染色体9)的序列标签密度进行关联。类似地,一个“合格序列剂量”是一个参数,它使一个感兴趣的序列的序列标签密度与在一个合格样品中确定的归一化序列的标签密度进行关联。The term "sequence dosage" refers to a parameter associated with the number of sequence tags identified for a sequence of interest and the number of sequence tags identified for a normalizing sequence. In some cases, sequence dosage is the ratio of the number of sequence tags identified for a sequence of interest and the number of sequence tags identified for a normalizing sequence. In some cases, sequence dosage refers to a parameter associated with the sequence tag density of a sequence of interest and the tag density of a normalizing sequence." test sequence dosage " is a parameter that associates the sequence tag density of a sequence of interest (such as chromosome 21) with the sequence tag density of the normalizing sequence (such as chromosome 9) determined in a test sample. Similarly, a " qualified sequence dosage " is a parameter that associates the sequence tag density of a sequence of interest with the tag density of the normalizing sequence determined in a qualified sample.
术语“序列标签密度”在此是指序列读数的数目,这些读数被映射到一个参考基因组序列上,例如,针对染色体21的序列标签密度是由测序方法产生的背映射到参考基因组的染色体21上的序列读数的数目。术语“序列标签密度比”在此是指被映射到参考基因组的染色体(例如染色体21)的序列标签数目与参考基因组染色体的长度的比率The term "sequence tag density" herein refers to the number of sequence reads that are mapped to a reference genome sequence. For example, the sequence tag density for chromosome 21 is the number of sequence reads generated by a sequencing method that are mapped to chromosome 21 of the reference genome. The term "sequence tag density ratio" herein refers to the ratio of the number of sequence tags mapped to a chromosome of a reference genome (e.g., chromosome 21) to the length of the reference genome chromosome.
术语“下一代测序(NGS)”在此是指允许对克隆扩增的分子和单个的核酸分子进行大规模平行测序的测序方法。NGS的非限制性实例包括使用可逆染料终止子的合成法测序、以及连接法测序。The term "next generation sequencing (NGS)" herein refers to sequencing methods that allow for massively parallel sequencing of clonally amplified molecules and single nucleic acid molecules. Non-limiting examples of NGS include sequencing by synthesis using reversible dye terminators, and sequencing by ligation.
术语“参数”在此是指表征物理特性的一种数字关系。时常,参数在数字上表征量化数据集和/或量化数据集之间的数字关系。例如,映射到一个染色体上的序列标签的数目和这些标签所映射到上面的染色体的长度之间的比率(或比率的函数)就是一个参数。The term "parameter" as used herein refers to a numerical relationship that characterizes a physical property. Often, a parameter numerically characterizes a quantitative data set and/or a numerical relationship between quantitative data sets. For example, the ratio (or a function of the ratio) between the number of sequence tags mapped to a chromosome and the length of the chromosome to which these tags are mapped is a parameter.
术语“阈值”和“合格阈值”在此是指用作截止以表征例如含有来自怀疑患有一种医学病状的生物体的核酸的测试样品等样品的任何数目。阈值可与参数值进行比较,以确定产生该参数值的样品是否表明该生物体患有该医学病状。在某些实施方案中,使用合格数据集计算合格阈值,并且充当诊断生物体中例如非整倍性等拷贝数变异的界限。如果从在此披露的方法获得的结果超过了一个阈值,那么受试者可以被诊断患有拷贝数变异,例如,三体21。通过分析对于一个训练组的样品计算出的归一化值(例如染色体剂量,NCV或NSV)可以识别用于在此说明的方法的适当阈值。使用包括合格的(即未受影响的)样品和受影响的样品的训练组中的合格的(即未受影响的)样品可以识别阈值。在已知具有染色体性非整倍性的训练组中的这些样品(即受影响的样品)可以用于确认这些选择的阈值在从测试组中的未受影响的样品辨别出受影响的样品中是有用的(参见在此的这些实例)。阈值的选择取决于使用者希望得到的做出分类的置信水平。在一些实施方案中,用于识别适当阈值的训练组包括至少10、至少20、至少30、至少40、至少50、至少60、至少70、至少80、至少90、至少100、至少200、至少300、至少400、至少500、至少600、至少 700、至少800、至少900、至少1000、至少2000、至少3000、至少4000、或更多个合格样品。可能有利的是使用更大组的合格样品来改进阈值的诊断效用。The term "threshold value" and "qualified threshold value" refer to any number of samples such as test samples containing nucleic acids from organisms suspected of having a medical condition as a cutoff. The threshold value can be compared with the parameter value to determine whether the sample producing the parameter value indicates that the organism suffers from the medical condition. In certain embodiments, the qualified threshold value is calculated using a qualified data set and serves as the boundary of copy number variations such as aneuploidy in the diagnosis organism. If the result obtained from the method disclosed herein exceeds a threshold value, the subject can be diagnosed with copy number variation, such as trisomy 21. The normalized value (such as chromosome dose, NCV or NSV) calculated by analyzing the sample of a training group can be identified for the appropriate threshold value of the method described here. Qualified (i.e., unaffected) samples in the training group including qualified (i.e., unaffected) samples and affected samples can be used to identify a threshold value. These samples (i.e., affected samples) in the training group known to have chromosomal aneuploidy can be used to confirm that the threshold values of these selections are useful (referring to these examples herein) in distinguishing affected samples from the unaffected samples in the test group. In some embodiments, the training group for identifying suitable threshold value comprises at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000 or more qualified samples. It may be advantageous to use the qualified sample of larger group to improve the diagnostic utility of threshold value.
术语“归一化值”在此是指一个数值,该数值使针对感兴趣的序列(例如染色体或染色体区段)所识别的序列标签数目与针对归一化序列(例如归一化染色体或归一化染色体区段)识别的序列标签数目进行关联。例如,“归一化值”可以是在本申请的其他地方说明的染色体剂量,或者它可以是在本申请的其他地方说明的NCV(归一化的染色体值),或者它可以是在本申请的其他地方说明的NSV(归一化的区段值)。The term "normalization value" herein refers to a numerical value that relates the number of sequence tags identified for a sequence of interest (e.g., a chromosome or chromosome segment) to the number of sequence tags identified for a normalizing sequence (e.g., a normalizing chromosome or a normalizing chromosome segment). For example, a "normalization value" can be a chromosome dose as described elsewhere in this application, or it can be an NCV (normalized chromosome value) as described elsewhere in this application, or it can be an NSV (normalized segment value) as described elsewhere in this application.
术语“读数”是指来自一部分核酸样品的序列读数。典型地,但不一定,读数表示样品中相邻碱基对的短序列。读数可通过样品样品部分的碱基对序列 (ATCG)象征性地表示。该读数可存储在存储装置中,且酌情处理,以确定该读数是否与参考序列匹配或达到其他指标。读数可直接从测序装置获得,或者从有关样品的存储序列信息间接获得。在一些情况下,术语“读数”指的是一段足够长(比如至少30bp)的能用于识别更大的序列或者是区域的DNA序列,比如能与一个染色体或者一个基因组区域或者一个基因进行比对和针对性的比对。The term "read" refers to a sequence read from a portion of a nucleic acid sample. Typically, but not necessarily, a read represents a short sequence of adjacent base pairs in a sample. The read can be symbolically represented by a base pair sequence (ATCG) of the sample portion of the sample. The read can be stored in a storage device and processed as appropriate to determine whether the read matches a reference sequence or meets other indicators. The read can be obtained directly from a sequencing device or indirectly from stored sequence information of the relevant sample. In some cases, the term "read" refers to a sufficiently long (e.g., at least 30bp) DNA sequence that can be used to identify a larger sequence or region, such as a chromosome or a genomic region or a gene for comparison and targeted comparison.
术语“序列标签”在此与术语“映射的序列标签”可互换地使用,是指已经通过比对确切地分配给(即对映到)更大序列(例如参考基因组)的序列读数。映射的序列标签独特地映射到参考基因组,即它们被分配给参考基因组的单位置。标签可作为数据结构或其他的数据集合提供。在某些实施方案中,标签包括读数序列和该读数的相关信息,例如基因组中序列的位置,例如染色体上的位置。在某些实施方案中,位置以正链方向说明。可对标签进行界定以在与参考基因组比对时提供有限量的错配。可以对映参考基因组中多于一个的位置的标签(即并不独特地映射的标签)可以不包括在分析中。The term "sequence tag" is used interchangeably with the term "mapped sequence tag" herein and refers to a sequence read that has been exactly assigned to (i.e., mapped to) a larger sequence (e.g., a reference genome) by comparison. The mapped sequence tags are uniquely mapped to the reference genome, i.e., they are assigned to a single position of the reference genome. The tag can be provided as a data structure or other data set. In certain embodiments, the tag includes relevant information about the read sequence and the read, such as the position of the sequence in the genome, such as the position on the chromosome. In certain embodiments, the position is described in a positive strand direction. The tag can be defined to provide a limited amount of mismatch when compared to the reference genome. Tags that can be mapped to more than one position in the reference genome (i.e., tags that are not uniquely mapped) may not be included in the analysis.
如在此所用,术语“比对(aligned、alignment或aligning)”是指将读数或标签与参考序列进行比较并且由此确定该参考序列是否包含该读数序列的过程。如果该参考序列包含该读数,那么该读数可映射到参考序列,或者在某些实施方案中,映射到参考序列中的具体位置。在一些情况下,比对简单地告知读数是否是具体参考序列的成员(即读数存在还是不存在于参考序列中)。举例来说,将读数与人染色体13的参考序列进行比对,将告知该读数是否存在于染色体13的参考序列中。提供此信息的工具可被判定集合成员身份测试器。在一些情况下,比对另外指示参考序列中读数或标签所映射的位置。举例来说,如果参考序列是全人类基因组序列,那么比对可指示读数存在于染色体13上,并且可进一步指示读数在染色体13的具体股和/或位点上。As used herein, the term "aligned, alignment or aligning" refers to comparing a reading or label with a reference sequence and determining thus whether the reference sequence comprises the reading sequence. If the reference sequence comprises the reading, the reading can be mapped to the reference sequence, or in certain embodiments, mapped to a specific position in the reference sequence. In some cases, the comparison simply tells whether the reading is a member of a specific reference sequence (i.e., whether the reading exists or does not exist in the reference sequence). For example, the reading is compared with the reference sequence of human chromosome 13, and it will be told whether the reading exists in the reference sequence of chromosome 13. The tool providing this information can be determined as a set membership tester. In some cases, the comparison indicates in addition the position where the reading or label is mapped in the reference sequence. For example, if the reference sequence is a full human genome sequence, the comparison can indicate that the reading is present on chromosome 13, and can further indicate that the reading is on the specific strand and/or site of chromosome 13.
比对的读数或标签是根据其核酸分子的次序,识别为与来自参考基因组的已知序列匹配的一个或多个序列。比对可人工进行,不过比对典型地通过计算机算法实现,因为对于实现在此披露的方法来说,在合理时间内比对读数是不可能的。用于比对序列的算法的一个实例是核苷酸数据有效局部比对 (ELAND)计算机程序,该程序分配为伊路纳基因组学分析管道(Illumina Genomics Analysis pipeline)的一部分。作为替代方案,布隆过滤器(Bloom filter) 或类似的集合成员身份测试器可用于将读数与参考基因组进行比对。参见于 2011年10月27日提交的美国专利申请号61/552,374,该专利申请通过引用以其全文结合于此。比对时序列读数的匹配可以是100%序列匹配或小于100% (非理想匹配)。The reading or label of comparison is identified as one or more sequences matched with the known sequence from the reference genome according to the order of its nucleic acid molecules.Comparison can be carried out manually, but comparison is typically realized by computer algorithm, because for realizing the method disclosed here, it is impossible to compare readings within a reasonable time. An example of an algorithm for comparing sequences is the effective local alignment of nucleotide data (ELAND) computer program, which is assigned as a part of the Illumina Genomics Analysis pipeline. As an alternative, a Bloom filter or similar set membership tester can be used for comparing readings with the reference genome. Referring to U.S. Patent Application No. 61/552,374, filed on October 27, 2011, which is incorporated herein by reference in its entirety. During comparison, the matching of sequence readings can be 100% sequence matching or less than 100% (non-ideal matching).
如在此所使用的,术语“参考基因组”或“参考序列”是指任何生物体或病毒的任何具体的已知基因组序列(无论是部分的或完整的),它可以用于对来自一个受试者的识别的序列进行参比。例如,用于人类受试者连同很多其他生物体的参考基因组可见于theNational Center for Biotechnology Information(美国国家生物技术信息中心),在www.ncbi.nlm.nih.gov。“基因组”是指一个生物体或病毒的完整遗传学信息,这表达在核酸序列中。As used herein, the term "reference genome" or "reference sequence" refers to any specific known genomic sequence (whether partial or complete) of any organism or virus that can be used to compare identified sequences from a subject. For example, reference genomes for human subjects, along with many other organisms, are available at the National Center for Biotechnology Information at www.ncbi.nlm.nih.gov . "Genome" refers to the complete genetic information of an organism or virus, as expressed in nucleic acid sequences.
在不同的实施方案中,参考序列明显大于与其进行比对的读数。举例来说,其可大至少约100倍,或大至少约1000倍,或大至少约10,000倍,或大至少约105倍,或大至少约106倍,或大至少约107倍。In various embodiments, the reference sequence is significantly larger than the read to which it is aligned. For example, it can be at least about 100 times larger, or at least about 1000 times larger, or at least about 10,000 times larger, or at least about 10 5 times larger, or at least about 10 6 times larger, or at least about 10 7 times larger.
在一个实例中,参考序列是全长人类基因组的序列。这些序列可称为基因组参考序列。在另一个实例中,参考序列限于具体的人染色体,例如染色体 13。这些序列可称为染色体参考序列。参考序列的其他实例包括其他物种的基因组以及任何物种的染色体、亚染色体区域(例如股)等。In one example, the reference sequence is the sequence of the full-length human genome. These sequences can be referred to as genomic reference sequences. In another example, the reference sequence is limited to a specific human chromosome, such as chromosome 13. These sequences can be referred to as chromosome reference sequences. Other examples of reference sequences include genomes of other species, as well as chromosomes, subchromosomal regions (e.g., strands), and the like of any species.
在不同的实施方案中,参考序列是衍生自多个个体的共同序列或其他组合。然而,在某些申请中,参考序列可取自一个具体的个体。In various embodiments, the reference sequence is a common sequence or other combination derived from multiple individuals. However, in some applications, the reference sequence may be taken from a specific individual.
术语“人工目标序列基因组”在此是指涵盖已知多态位点的等位基因的已知序列群组。举例来说,“SNP参考基因组”是包括涵盖已知SNP的等位基因的序列群组的人工目标序列基因组。The term "artificial target sequence genome" herein refers to a known sequence group encompassing alleles of a known polymorphic site. For example, a "SNP reference genome" is an artificial target sequence genome comprising a sequence group encompassing alleles of a known SNP.
术语“临床上相关的序列”在此是指一个核酸序列,该序列已知是或被怀疑是与一种遗传的或疾病的情况相关联或与之有牵连。在确定一种医学情况的诊断或确认该医学情况的诊断时、或提供对于一种疾病发展的预测时,确定存在或不存在临床上相关的序列可以是有用的。The term "clinically relevant sequence" herein refers to a nucleic acid sequence that is known or suspected to be associated with or implicated in a genetic or disease condition. Determining the presence or absence of a clinically relevant sequence can be useful in determining or confirming a diagnosis of a medical condition, or in providing a prediction for the development of a disease.
当在一种核酸或一个核酸混合物的背景下使用术语“衍生的”时,在此是指从这种或这些核酸得起源的来源处获得这种或这些核酸的方式。例如,在一个实施方案中,衍生自两个不同基因组的核酸的混合物是指这些核酸(例如 cfDNA)是由细胞通过天然发生的过程(如坏死或凋亡)而自然地释放出的。在另一实施方案中,衍生自两个不同基因组的核酸的混合物是指这些核酸是从来自一个受试者的两种不同类型的细胞中提取的。When the term "derived" is used in the context of a nucleic acid or a mixture of nucleic acids, it refers to the manner in which the nucleic acid(s) are obtained from the source from which the nucleic acid(s) originate. For example, in one embodiment, a mixture of nucleic acids derived from two different genomes means that the nucleic acids (e.g., cfDNA) are naturally released by cells through naturally occurring processes such as necrosis or apoptosis. In another embodiment, a mixture of nucleic acids derived from two different genomes means that the nucleic acids are extracted from two different types of cells from a single subject.
术语“患者样品”在此是指从患者(即医学救助、护理或治疗的接受者)中获得的生物样品。患者样品可以是在此描述的任何样品。在某些实施方案中,患者样品通过非侵入性程序获得,例如周边血样或粪便样品。在此描述的方法不必限于人类。因此,涵盖不同的兽医学应用,在此情况下,患者样品可以是来自非人类哺乳动物的样品(例如猫、猪、马、牛等等)。The term "patient sample" herein refers to a biological sample obtained from a patient (i.e., a recipient of medical assistance, care, or treatment). The patient sample can be any sample described herein. In certain embodiments, the patient sample is obtained by a non-invasive procedure, such as a peripheral blood sample or a fecal sample. The methods described herein are not necessarily limited to humans. Therefore, different veterinary applications are encompassed, in which case the patient sample can be a sample from a non-human mammal (e.g., a cat, pig, horse, cow, etc.).
术语“混合样品”在此是指含有衍生自不同基因组的核酸混合物的样品。The term "mixed sample" herein refers to a sample containing a mixture of nucleic acids derived from different genomes.
术语“母体样品”在此是指从怀孕受试者(例如女性)中获得的生物样品。The term "maternal sample" as used herein refers to a biological sample obtained from a pregnant subject (eg, a female).
术语“生物学流体”在此是指取自生物来源的液体并且包括例如血液、血清、血浆、唾液、灌洗液、脑脊液、尿、精液、汗水、眼泪、唾液等等。如在此所用,术语“血液”、“血浆”以及“血清”清楚地涵盖其部分或加工部分。同样,在样品取自活组织检查、棉签、涂片等等的情况下,“样品”明确地涵盖衍生自活组织检查、棉签、涂片等等的加工部分或部分。The term "biological fluid" herein refers to a liquid obtained from a biological source and includes, for example, blood, serum, plasma, saliva, lavage fluid, cerebrospinal fluid, urine, semen, sweat, tears, saliva, and the like. As used herein, the terms "blood," "plasma," and "serum" expressly encompass fractions or processed portions thereof. Similarly, in the case where the sample is obtained from a biopsy, swab, smear, or the like, "sample" expressly encompasses the processed fractions or portions derived from the biopsy, swab, smear, or the like.
术语“母体核酸”和“胎儿核酸”在此分别是指怀孕女性受试者的核酸和该怀孕女性所携带的胎儿的核酸。The terms "maternal nucleic acid" and "fetal nucleic acid" as used herein refer to the nucleic acid of a pregnant female subject and the nucleic acid of a fetus carried by the pregnant female, respectively.
如在此所用,术语“与……相对应”有时是指存在于不同受试者的基因组中,并且在所有基因组中无需具有相同序列,而是用以提供例如基因或染色体等感兴趣的序列的身份而非遗传信息的例如基因或染色体等核酸序列。As used herein, the term "corresponding to" sometimes refers to a nucleic acid sequence such as a gene or chromosome that is present in the genomes of different subjects and does not necessarily have the same sequence in all genomes, but is used to provide the identity of the sequence of interest, such as a gene or chromosome, rather than genetic information.
如在此所用,术语“实质上无细胞”涵盖从所需样品除去通常与之相连的细胞组分的所需样品制剂。举例来说,通过除去例如红细胞等通常与血浆相连的血细胞,使血浆样品实质上无细胞。在某些实施方案中,对实质上无细胞样品进行加工,以除去细胞,否则这些细胞将对待针对CNV进行测试的所希望的遗传物质产生影响。As used herein, the term "substantially cell-free" encompasses a desired sample preparation in which cellular components normally associated with the desired sample are removed. For example, a plasma sample can be rendered substantially cell-free by removing blood cells, such as red blood cells, that are normally associated with plasma. In certain embodiments, the substantially cell-free sample is processed to remove cells that would otherwise interfere with the desired genetic material to be tested for CNVs.
如在此所用,术语“胎儿分数”是指包括胎儿和母体核酸的样品中存在的胎儿核酸的分数。胎儿分数经常用以表征母亲血液中的cfDNA。As used herein, the term "fetal fraction" refers to the fraction of fetal nucleic acid present in a sample that includes fetal and maternal nucleic acid. Fetal fraction is often used to characterize cfDNA in maternal blood.
如在此所用,术语“染色体”是指活细胞中承担遗传的基因载体,其衍生自染色质并且包括DNA和蛋白质组分(尤其是组蛋白)。在此采用国际上公认的常规个别人类基因组染色体编号系统。As used herein, the term "chromosome" refers to the genetic carrier of inheritance in living cells, which is derived from chromatin and includes DNA and protein components (especially histones). The internationally recognized conventional individual human genome chromosome numbering system is used herein.
如在此所用,术语“多核苷酸长度”是指序列中或参考基因组的区域中核酸分子(核苷酸)的绝对数目。术语“染色体长度”是指以碱基对为单位的已知的染色体长度,例如可见于万维网 genome.ucsc.edu/cgi-bin/hgTracks?hgsid=167155613&chromInfoPage=上的人染色体的NCBI36/hg18集合中所提供。As used herein, the term "polynucleotide length" refers to the absolute number of nucleic acid molecules (nucleotides) in a sequence or region of a reference genome. The term "chromosome length" refers to the known chromosome length in base pairs, such as provided in the NCBI36/hg18 collection of human chromosomes available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgTracks?hgsid=167155613&chromInfoPage=.
术语“受试者”在此是指人类受试者以及非人类受试者,例如哺乳动物、无脊椎动物、脊椎动物、真菌、酵母、细菌以及病毒。虽然在此的实例涉及人类并且语言主要是针对人类问题,但是在此披露的概念适用于来自任何植物或动物的基因组,并且适用于兽医学、畜牧学、研究实验室等等领域。The term "subject" herein refers to human subjects as well as non-human subjects, such as mammals, invertebrates, vertebrates, fungi, yeast, bacteria, and viruses. Although the examples herein involve humans and the language is primarily directed to human issues, the concepts disclosed herein are applicable to genomes from any plant or animal and are applicable to fields such as veterinary medicine, animal husbandry, research laboratories, and the like.
术语“病状”在此是指“医学病状”,作为广义的术语,其包括所有疾病和病症,还可包括[损伤]和例如怀孕等正常健康状况,其可能影响一个人的健康,受益于医疗救护或具有医学治疗的蕴含意义。The term "condition" is used herein to refer to a "medical condition" as a broad term that includes all diseases and disorders and may also include [injuries] and normal health conditions such as pregnancy, which may affect a person's health, benefit from medical attention or have implications for medical treatment.
术语“完整”在此在提及染色体非整倍性时使用,是指整个染色体的获得或丢失。The term "complete" is used herein in reference to chromosomal aneuploidy and refers to the gain or loss of an entire chromosome.
术语“部分”在提及染色体非整倍性时使用时,在此是指染色体的一部分 (即区段)的获得或丢失。The term "partial," when used in reference to a chromosomal aneuploidy, refers herein to the gain or loss of a portion (i.e., a segment) of a chromosome.
术语“嵌合体”在此是指表示一个从单受精卵发育而来的个体中存在具有不同核型的两种细胞群体。嵌合性可能由发育期间仅仅蔓延到一个成人细胞子集的突变引起。The term "chimerism" as used herein refers to the presence of two cell populations with different karyotypes within an individual that develops from a single fertilized egg. Chimerism can arise from a mutation that propagates during development to only a subset of adult cells.
术语“非嵌合体”在此是指包括具有一种核型的细胞的生物体,例如人类胎儿。The term "non-chimeric" as used herein refers to an organism comprising cells having one karyotype, such as a human fetus.
术语“使用染色体”在提及确定染色体剂量时使用时,在此是指使用针对染色体获得的序列信息,即针对染色体获得的序列标签的数目。The term "using a chromosome" when used in reference to determining a chromosome dose refers herein to using the sequence information obtained for the chromosome, ie, the number of sequence tags obtained for the chromosome.
如在此所用的术语“灵敏性”等于真阳性的数目除以真阳性与假阴性之和。The term "sensitivity" as used herein is equal to the number of true positives divided by the sum of true positives and false negatives.
如在此所用的术语“专一性”等于真阴性的数目除以真阴性与假阳性之和。As used herein, the term "specificity" is equal to the number of true negatives divided by the sum of true negatives and false positives.
术语“亚二倍体”在此是指一个染色体数,它比对于该物种而言的染色体组特征的正常单倍体数要小一或更多。The term "hypodiploid" herein refers to a chromosome number that is one or more less than the normal haploid number of chromosomes characteristic of the species.
“多态位点”是发生核苷酸序列歧异的基因座。基因座可以小到一个碱基对。示意性标记物具有至少两个等位基因,每一个出现的频率大于所选定的群体的 1%,并且更典型地大于10%或20%。多态位点可以是单核苷酸多态性(SNP)、小规模多碱基缺失或插入、多核苷酸多态性(MNP)或短串联重复(STR)的位点。术语“多态基因座”与“多态位点”在此互换使用。A "polymorphic site" is a locus at which nucleotide sequence variation occurs. A locus can be as small as a single base pair. An exemplary marker has at least two alleles, each occurring at a frequency greater than 1% of a selected population, and more typically greater than 10% or 20%. A polymorphic site can be a site of a single nucleotide polymorphism (SNP), a small multi-base deletion or insertion, a multiple nucleotide polymorphism (MNP), or a short tandem repeat (STR). The terms "polymorphic locus" and "polymorphic site" are used interchangeably herein.
“多态序列”在此是指包括一个或多个多态位点(例如一个SNP或一个串联SNP)的核酸序列,例如DNA序列。根据本技术的多态序列可用于特定地将包括胎儿与母体核酸混合物的母体样品中母体与非母体等位基因辨别开。"Polymorphic sequence" herein refers to a nucleic acid sequence, such as a DNA sequence, that includes one or more polymorphic sites (e.g., a SNP or a tandem SNP). Polymorphic sequences according to the present technology can be used to specifically distinguish maternal from non-maternal alleles in a maternal sample that includes a mixture of fetal and maternal nucleic acids.
如在此所用,“单核苷酸多态性”(SNP)出现在单核苷酸占据的多态位点上,该位点是等位基因的序列之间发生变异的位点。该位点通常前面与后面是等位基因高度保守的序列(例如在小于群体1/100或1/1000个成员中变化的序列)。SNP通常因多态位点上一个核苷酸被另一个核苷酸取代而产生。转换是一个嘌呤被另一个嘌呤置换或一个嘧啶被另一个嘧啶置换。颠换是嘌呤被嘧啶置换或嘧啶被嘌呤置换。SNP也可以由相对于参考等位基因的核苷酸缺失或核苷酸插入引起。单核苷酸多态性(SNP)是人类群体中两个替代碱基以可观的频率(>1%)出现的状况,并且是最常见类型的人类遗传变异。As used herein, " single nucleotide polymorphism " (SNP) occurs on the polymorphic site occupied by a single nucleotide, and this site is the site that mutates between the sequence of allele.This site is usually preceded and followed by a highly conserved sequence of alleles (for example, a sequence that varies in less than 1/100 or 1/1000 members of a population). SNP is usually caused by the replacement of one nucleotide by another nucleotide on the polymorphic site. Conversion is that one purine is replaced by another purine or one pyrimidine is replaced by another pyrimidine. Transversion is that a purine is replaced by a pyrimidine or a pyrimidine is replaced by a purine. SNP can also be caused by nucleotide deletion or nucleotide insertion relative to a reference allele. Single nucleotide polymorphism (SNP) is the situation that two alternative bases occur with considerable frequency (>1%) in the human population, and is the most common type of human genetic variation.
术语“串联SNP”在此是指在一个多态目标核酸序列内存在的两个或更多个SNP。The term "tandem SNPs" herein refers to two or more SNPs present within a polymorphic target nucleic acid sequence.
如在此所用,术语“短串联重复”或“STR”是指当两个或更多个核苷酸的模式重复并且重复序列直接彼此相邻时出现的一类多态性。该模式的长度可在从 2个到10个碱基对(bp)(例如基因组区域中(CATG)n)范围内,并且典型地在非编码内含子区域中。通过检查若干个STR基因座并且计数在既定基因座上有多少个特定STR序列重复,有可能建立个体独特的基因概况。As used herein, the term "short tandem repeat" or "STR" refers to a class of polymorphisms that occur when a pattern of two or more nucleotides repeats, with the repeated sequences directly adjacent to each other. The pattern can range from 2 to 10 base pairs (bp) in length (e.g., (CATG) n in genomic regions) and is typically in non-coding intronic regions. By examining several STR loci and counting how many times a particular STR sequence is repeated at a given locus, it is possible to create a unique genetic profile for an individual.
如在此所用,术语“miniSTR”在此是指跨越小于约300个碱基对、小于约 250个碱基对、小于约200个碱基对、小于约150个碱基对、小于约100个碱基对、小于约50个碱基对或小于约25个碱基对的四个或更多个碱基对串联重复。“miniSTR”是可从cfDNA模板扩增的STR。As used herein, the term "miniSTR" refers to four or more base pair tandem repeats that span less than about 300 base pairs, less than about 250 base pairs, less than about 200 base pairs, less than about 150 base pairs, less than about 100 base pairs, less than about 50 base pairs, or less than about 25 base pairs. A "miniSTR" is an STR that can be amplified from a cfDNA template.
术语“多态目标核酸”、“多态序列”、“多态目标核酸序列”以及“多态核酸”在此互换使用,是指包括一个或多个多态位点的核酸序列(例如DNA序列)。The terms "polymorphic target nucleic acid," "polymorphic sequence," "polymorphic target nucleic acid sequence," and "polymorphic nucleic acid" are used interchangeably herein to refer to a nucleic acid sequence (eg, a DNA sequence) that includes one or more polymorphic sites.
术语“多个多态目标核酸”在此是指各包括至少一个多态位点(例如一个 SNP)的大量核酸序列,使得1个、2个、3个、4个、5个、6个、7个、8个、 9个、10个、15个、20个、25个、30个、40个或更多个不同多态位点从该多态目标核酸扩增,以识别和/或量化包括胎儿和母体核酸的母体样品中存在的胎儿等位基因。The term "multiple polymorphic target nucleic acids" herein refers to a large number of nucleic acid sequences, each comprising at least one polymorphic site (e.g., a SNP), such that 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40 or more different polymorphic sites are amplified from the polymorphic target nucleic acid to identify and/or quantify fetal alleles present in a maternal sample comprising fetal and maternal nucleic acids.
术语“富集”在此是指将母体样品一部分中所包含的多态目标核酸扩增并且将所扩增产物与除去该部分的母体样品的其余部分组合的过程。举例来说,母体样品的其余部分可以是原始母体样品。The term "enrichment" herein refers to a process in which a polymorphic target nucleic acid contained in a portion of a maternal sample is amplified and the amplified product is combined with the remainder of the maternal sample from which the portion was removed. For example, the remainder of the maternal sample can be the original maternal sample.
术语“原始母体样品”在此是指从充当移除一部分以扩增多态目标核酸的来源的怀孕受试者(例如女性)中获得的非富集生物样品。“原始样品”可以是从怀孕受试者中获得的任何样品和其加工部分,例如从母体血浆样品中提取的纯化cfDNA样品。The term "original maternal sample" herein refers to a non-enriched biological sample obtained from a pregnant subject (e.g., a woman) that serves as a source from which a portion is removed to amplify polymorphic target nucleic acids. A "original sample" can be any sample obtained from a pregnant subject and its processed portion, such as a purified cfDNA sample extracted from a maternal plasma sample.
如在此所用,术语“引物”是指当置于引发与核酸股补偿的引物延伸产物合成的条件下时(即在核苷酸和例如DNA多聚酶等引发剂存在下以及在适合温度和pH值下),能够充当合成起始点的分离寡核苷酸。为最高效率地扩增,引物优选是单股,但作为替代方案,可以是双股。如果是双股,那么在用于制备延伸产物前首先对引物进行处理以分离其股。引物优选是寡脱氧核糖核苷酸。引物必须足够长,以在引发剂存在下引发延伸产物合成。引物的精确长度将取决于许多因素,包括温度、引物来源、方法的使用以及用于引物设计的参数。As used herein, the term "primer" refers to an isolated oligonucleotide that can serve as a starting point for synthesis when placed under conditions that induce the synthesis of primer extension products that are complementary to nucleic acid strands (i.e., in the presence of nucleotides and an initiator such as DNA polymerase and at a suitable temperature and pH). For maximum efficiency amplification, the primer is preferably single-stranded, but as an alternative, can be double-stranded. If double-stranded, the primer is first treated to separate its strands before being used to prepare extension products. The primer is preferably an oligodeoxyribonucleotide. The primer must be long enough to induce extension product synthesis in the presence of an initiator. The exact length of the primer will depend on many factors, including temperature, primer source, method use, and parameters for primer design.
短语“有待采取的行为(cause)”是指医学专业人士(例如医生)或者控制或指导受试者医疗护理的人所采取的控制和/或准许争论中的一种或多种药剂/一种或多种化合物给予受试者的行动。给药可包括诊断和/或确定适当治疗或预防方案,和/或为受试者开出具体药剂/化合物。该开处方可包括例如起草处方组成、写病历卡等等。同样,例如诊断程序的“有待执行的行为(cause)”是指医学专业人士(例如医生)或者控制或指导受试者医疗护理的人所采取的控制和/或准许对受试者执行一个或多个诊断方案的行动。The phrase "cause of action" refers to an action taken by a medical professional (e.g., a physician) or a person who controls or directs the medical care of a subject to control and/or authorize the administration of the one or more agents/compounds at issue to the subject. Administration can include diagnosis and/or determination of an appropriate treatment or prevention regimen, and/or prescribing a specific agent/compound for the subject. This prescribing can include, for example, drafting a prescription composition, writing a medical record card, and the like. Similarly, for example, a "cause of action to be performed" of a diagnostic procedure refers to an action taken by a medical professional (e.g., a physician) or a person who controls or directs the medical care of a subject to control and/or authorize the performance of one or more diagnostic protocols on the subject.
引言introduction
在此披露了用于确定测试样品中不同感兴趣的序列的拷贝数变异(CNV) 的方法、设备、系统以及试剂盒,该测试样品包含衍生自两个不同基因组并且已知或怀疑一个或多个感兴趣的序列的量不同的核酸的混合物。还提供了用于确定由核酸混合物中的两个基因组所贡献的分数的方法、设备、系统以及试剂盒。通过此处披露的方法和设备确定的拷贝数变异包括整个染色体的获得或丢失、涉及到显微镜可见的极大染色体区段的变化以及尺寸从千碱基(kb)到兆碱基(Mb)的DNA片段的大量亚微观拷贝数变异。在不同的实施方案中,这些方法包括一种机器实现的统计方法,该统计方法说明由工艺相关的变异性、染色体间的变异性以及序列间变异性造成的自然增加的变异性。该方法适用于确定任何胎儿非整倍性的CNV,以及已知或怀疑与多种医学病状有关的CNV。可根据本发明方法确定的CNV包括染色体1到22、X和Y中任意一个或多个的三体性和单体性、其他染色体多体性以及任一种或多种染色体的区段的缺失和/或复制,通过仅对测试样品的核酸测序一次,即可检测到。任何非整倍性可从通过仅对测试样品的核酸测序一次即获得的测序信息中确定出。Disclosed herein are methods, devices, systems, and test kits for determining the copy number variation (CNV) of different sequences of interest in a test sample, wherein the test sample comprises a mixture of nucleic acids derived from two different genomes and known or suspected to have different amounts of one or more sequences of interest. Also provided are methods, devices, systems, and test kits for determining the score contributed by two genomes in a nucleic acid mixture. The copy number variation determined by the methods and devices disclosed herein includes a large amount of submicroscopic copy number variations of DNA fragments ranging from kilobases (kb) to megabases (Mb) in size, including acquisition or loss of whole chromosomes, variations in extremely large chromosome segments visible under a microscope, and sizes ranging from kilobases (kb) to megabases (Mb). In various embodiments, these methods include a statistical method implemented by a machine, which illustrates the variability of the natural increase caused by the variability associated with the process, the variability between chromosomes, and the variability between sequences. The method is applicable to determining the CNV of any fetal aneuploidy, and the CNVs known or suspected to be relevant to various medical conditions. CNVs that can be determined using the methods of the present invention include trisomies and monosomies of any one or more of chromosomes 1 to 22, X, and Y, other chromosomal polysomies, and deletions and/or duplications of segments of any one or more chromosomes, which can be detected by sequencing the nucleic acid of the test sample only once. Any aneuploidy can be determined from the sequencing information obtained by sequencing the nucleic acid of the test sample only once.
在人类基因组中的CNV显著影响人类多样性和对疾病的易感性(Redon (雷东)等人,Nature(自然)23:444-454[2006],Shaikh(谢赫)等人.Genome Res(基因组研究)19:1682-1690[2009]。已知CNV通过不同机制构成遗传疾病,导致多数情况下的基因剂量不平衡亦或基因破坏。除了它们直接与遗传性障碍相关,还已知CNV介导可以是有害的表型改变。最近,若干研究已经报道,如与正常对照相比,在复杂失调,例如自闭症、ADHD(多动症)、和精神分裂症中,罕见或重新的CNV的增加的负担,突出了罕见或独特的CNV的潜在致病性(Sebat(塞伯特)等人,316:445-449[2007];Walsh(沃尔什)等人,Science(科学)320:539-543[2008]。来自基因组重排的CNV上升,主要因为缺失、复制、插入、和不平衡的易位事件。CNVs in the human genome significantly influence human diversity and susceptibility to disease (Redon et al., Nature 23:444-454 [2006], Shaikh et al. Genome Res (Genome Research) 19: 1682-1690 [2009]. It is known that CNV constitutes genetic diseases through different mechanisms, resulting in gene dosage imbalance or gene destruction in most cases. In addition to being directly related to hereditary disorders, it is also known that CNV mediation can be harmful phenotypic changes. Recently, several studies have reported that, as compared with normal controls, in complex disorders, such as autism, ADHD (attention deficit hyperactivity disorder) and schizophrenia, the increased burden of rare or new CNVs has highlighted the potential pathogenicity of rare or unique CNVs (Sebat et al., 316: 445-449 [2007]; Walsh et al., Science 320: 539-543 [2008]. CNVs from genomic rearrangements rise, mainly because of deletions, duplications, insertions and unbalanced translocation events.
在此描述的方法、设备或装置可采用进行大规模平行测序的下一代测序技术(NGS)。在某些实施方案中,以流动槽内的大规模平行方式测序克隆地扩增的DNA模板或单DNA分子(例如像在Volkerding(沃克尔丁)等人,Clin Chem(临床化学)55:641-658[2009];Metzker(梅兹可)M,Nature Rev(自然评论)11:31-46[2010]中所述)。除了高通量序列信息,NGS提供了定量信息,其中每一序列读数是可计算的“序列标签”,这些序列标签代表个体克隆 DNA模板或单DNA分子。NGS的测序技术包括焦磷酸测序、借助可逆染料终止子的合成法测序、通过寡核苷酸探针连接的测序和离子半导体测序。可以单独地测序来自单独的样品的DNA(即singleplex测序),或者在单测序轮次时,作为索引基因组分子,来自多个样品的DNA可以被汇集在一起并进行测序(即多重测序),以产生高达若干亿的DNA序列的读数。以下说明测序技术的实例,可以用于获得根据本发明的方法的序列信息。The methods, devices or apparatus described herein can be used to perform large-scale parallel sequencing of the next generation sequencing technology (NGS). In certain embodiments, the clonally amplified DNA template or single DNA molecule is sequenced in a large-scale parallel manner within a flow tank (e.g., as described in Volkerding et al., Clin Chem (Clinical Chemistry) 55: 641-658 [2009]; Metzker (Metzker) M, Nature Rev (Natural Review) 11: 31-46 [2010]). In addition to high-throughput sequence information, NGS provides quantitative information, wherein each sequence read is a calculable "sequence tag" that represents an individual clone DNA template or single DNA molecule. The sequencing technology of NGS includes pyrophosphate sequencing, sequencing by synthesis with reversible dye terminators, sequencing by oligonucleotide probe ligation, and ion semiconductor sequencing. DNA from separate samples can be sequenced individually (i.e., singleplex sequencing), or DNA from multiple samples can be pooled together and sequenced as index genomic molecules in a single sequencing run (i.e., multiplex sequencing) to generate up to hundreds of millions of reads of DNA sequence. The following describes examples of sequencing technologies that can be used to obtain sequence information according to the methods of the present invention.
在一些实施方案中,在此披露的方法和设备可采用以下顺序的一些或全部操作:从患者获得核酸测试样品(典型地通过非侵入性程序);加工测试样品,准备进行测序;对来自测试样品的核酸进行测序,以产生大量读数(例如至少10,000个);将这些读数与参考序列/基因组的一部分进行比对,并且确定映射到参考序列的界定部分(例如界定染色体或染色体区段)的DNA的量(例如读数的数目);通过用映射到针对界定部分所选定的一个或多个归一化染色体或染色体区段的DNA的量归一化映射到界定部分的DNA的量来计算一个或多个界定部分的剂量;确定该剂量是否指示该界定部分“受影响”(例如非整倍性或嵌合体);报导确定并且任选将其转变成诊断;使用该诊断或确定来发展治疗、监测或进一步测试患者的计划。In some embodiments, the methods and apparatus disclosed herein may employ some or all of the following sequence of operations: obtaining a nucleic acid test sample from a patient (typically by non-invasive procedures); processing the test sample in preparation for sequencing; sequencing the nucleic acid from the test sample to produce a large number of reads (e.g., at least 10,000); aligning the reads to a portion of a reference sequence/genome and determining the amount of DNA (e.g., the number of reads) that maps to a defined portion of the reference sequence (e.g., a defined chromosome or chromosome segment); calculating a dose for one or more defined portions by normalizing the amount of DNA mapped to the defined portion with the amount of DNA mapped to one or more normalizing chromosomes or chromosome segments selected for the defined portion; determining whether the dose indicates that the defined portion is "affected" (e.g., aneuploidy or mosaicism); reporting the determination and optionally converting it into a diagnosis; using the diagnosis or determination to develop a plan to treat, monitor, or further test the patient.
确定合格样品中的归一化序列:归一化染色体序列和归一化区段序列Determine the normalizing sequence in qualified samples: normalizing chromosome sequence and normalizing segment sequence
使用来自一组得自受试者的合格样品识别归一化序列,这些受试者已知包括具有感兴趣的任何序列(例如染色体或其区段)的一个正常拷贝数。在图 1中描绘的方法的实施方案的步骤110、120、130、140、和145中概述了归一化序列的确定。从合格样品获得的序列信息用于在统计学上有意义地识别测试样品中的染色体非整倍性(图1步骤165和实例)。Normalizing sequences are identified using a set of qualified samples from subjects known to include a normal copy number of any sequence of interest (e.g., a chromosome or segment thereof). The determination of normalizing sequences is outlined in steps 110, 120, 130, 140, and 145 of the embodiment of the method depicted in FIG1 . The sequence information obtained from the qualified samples is used to statistically meaningfully identify chromosomal aneuploidies in the test sample ( FIG1 step 165 and examples).
图1提供用于确定生物样品中例如染色体或其区段等感兴趣的序列的 CNV的一个实施方案的流程图100。在一些实施方案中,从受试者获得生物学样品,并且该样品包括由不同基因组构成的核酸的混合物。可以由两个个体的样品构成不同基因组,例如由胎儿和怀有胎儿的母体构成不同基因组。可替代地,可以由来自相同受试者的非整倍性癌症细胞和正常整倍细胞的样品(例如来自癌症患者的血浆样品)构成基因组。Fig. 1 provides a flow chart 100 of an embodiment for determining CNV of a sequence of interest such as a chromosome or its segment in a biological sample. In some embodiments, a biological sample is obtained from a subject, and the sample includes a mixture of nucleic acids consisting of different genomes. Different genomes can be formed by samples of two individuals, such as a fetus and a mother carrying the fetus. Alternatively, a genome can be formed by a sample of aneuploid cancer cells and normal euploid cells from the same subject (e.g., a plasma sample from a cancer patient).
除分析患者的测试样品以外,还要选择每一个可能的感兴趣的染色体的一个或多个归一化染色体或一个或多个归一化染色体区段。归一化染色体或区段的识别与患者样品的正常测试异步进行,两者可在一个临床环境中进行。换句话说,在测试患者样品前识别归一化染色体或区段。存储归一化染色体或区段与感兴趣的染色体或区段之间的关联性以在测试期间使用。如以下说明,该关联性典型地保存测试许多样品所跨越的时间段。以下讨论涉及用于选择个别感兴趣的染色体或区段的归一化染色体或染色体区段的实施方案。In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.In some embodiments, the method for the present invention can be used to determine the normalization chromosome of the chromosome of interest.
获得一组合格样品来识别合格的归一化序列,并且来提供变异值,用于确定测试样品中的CNV的统计上有意义的识别。在步骤110中,从多个受试者获得多个生物学合格样品,已知这些受试者包括具有感兴趣的任何一个序列的正常拷贝数的细胞。在一个实施方案中,从怀有胎儿的母体获得合格样品,已经使用细胞遗传学手段确认具有正常拷贝数的染色体。生物学合格样品可以是一种生物学流体,例如血浆,或如以下所述的任何适合的样品。在一些实施方案中,合格样品含有核酸分子(例如cfDNA分子)的混合物。在一些实施方案中,合格样品是含有胎儿的和母体的cfDNA分子的混合物的母体的血浆样品。通过使用任何已知测序方法,对这些核酸中的至少一部分(例如胎儿的和母体的核酸)进行测序,获得归一化染色体和/或其一部分的序列信息。优选地,在本申请的其他地方说明的下一代测序(NGS)方法中的任何一种被用于给作为单或克隆扩增的分子的胎儿的和母体的核酸测序。在不同的实施方案中,合格样品如以下所披露,在测序前和测序期间进行加工。这些样品可使用如在此披露的设备、系统以及试剂盒进行加工。Obtain a set of qualified samples to identify qualified normalized sequences, and to provide variation values for determining statistically significant identification of CNVs in test samples. In step 110, multiple biological qualified samples are obtained from multiple subjects, and it is known that these subjects include cells with normal copy numbers of any one sequence of interest. In one embodiment, qualified samples are obtained from a mother pregnant with a fetus, and chromosomes with normal copy numbers have been confirmed using cytogenetic means. Biological qualified samples can be a biological fluid, such as plasma, or any suitable sample as described below. In some embodiments, qualified samples contain a mixture of nucleic acid molecules (such as cfDNA molecules). In some embodiments, qualified samples are maternal plasma samples containing a mixture of fetal and maternal cfDNA molecules. By using any known sequencing method, at least a portion of these nucleic acids (such as fetal and maternal nucleic acids) is sequenced to obtain sequence information of normalized chromosomes and/or a portion thereof. Preferably, any of the next generation sequencing (NGS) methods described elsewhere in this application is used to sequence fetal and maternal nucleic acids as single or clone-amplified molecules. In various embodiments, qualified samples are processed as described below before and during sequencing. These samples can be processed using the apparatus, systems, and kits disclosed herein.
在步骤120,包含在合格样品内的所有合格核酸的每一个的至少一部分被测序,以产生百万个序列读数,例如36bp读数,这与参考基因组,例如hg18 进行比对。在一些实施方案中,序列读数包括约20bp、约25bp、约30bp、约35bp、约40bp、约45bp、约50bp、约55bp、约60bp、约65bp、约70bp、约75bp、约80bp、约85bp、about90bp、约95bp、约100bp、约110bp、约120bp、约130bp、约140bp、约150bp、约200bp、约250bp、约300bp、约350bp、约400bp、约450bp、或约500bp。期待技术优势将使得能进行大于500bp的单端读数,在产生配对端读数时,该读数使能够用于大于约1000bp 的读数。在一个实施方案中,映射的序列读数包括36bp。在另一个实施方案中,映射的序列读数包括25bp。与参考基因组比对的序列读数,以及独特对映到参考基因组的读数,已知它们作为序列标签。在一个实施方案中,从独特对映参考基因组的读数中获得至少约3x 106个合格序列标签、至少约5x 106个合格序列标签、至少约8x 106个合格序列标签、至少约10x 106个合格序列标签、至少约15x 106个合格序列标签、至少约20x 106个合格序列标签、至少约30x 106个合格序列标签、至少约40x 106个合格序列标签、或至少约50x 106个包括20和40bp读数之间的合格序列标签。At step 120, at least a portion of each of all qualified nucleic acids contained within the qualified samples is sequenced to generate millions of sequence reads, e.g., 36 bp reads, which are aligned to a reference genome, e.g., hg 18. In some embodiments, the sequence reads comprise about 20 bp, about 25 bp, about 30 bp, about 35 bp, about 40 bp, about 45 bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp, about 130 bp, about 140 bp, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp. It is expected that technological advances will enable single-end reads greater than 500bp, which, when generating paired-end reads, will enable reads greater than about 1000bp. In one embodiment, the mapped sequence reads comprise 36bp. In another embodiment, the mapped sequence reads comprise 25bp. Sequence reads aligned to a reference genome, as well as reads uniquely mapped to a reference genome, are known as sequence tags. In one embodiment, at least about 3 x 10 6 qualified sequence tags, at least about 5 x 10 6 qualified sequence tags, at least about 8 x 10 6 qualified sequence tags, at least about 10 x 10 6 qualified sequence tags, at least about 15 x 10 6 qualified sequence tags, at least about 20 x 10 6 qualified sequence tags, at least about 30 x 10 6 qualified sequence tags, at least about 40 x 10 6 qualified sequence tags, or at least about 50 x 10 6 qualified sequence tags comprising between 20 and 40 bp reads are obtained from reads that are uniquely aligned to a reference genome.
在步骤130,计数得自测序合格样品中的核酸的所有标签,以确定合格序列标签密度。在一个实施方案中,序列标签密度被确定为参考对应于参考基因组上感兴趣的序列的这多个合格序列标签。在另一实施方案中,合格序列标签密度为确定为映射到感兴趣的序列的这多个合格序列标签,被归一化为它们映射的感兴趣的合格序列的长度。被确定为标签密度相对于感兴趣的序列的长度的比率的序列标签密度在此称为标签密度比率。并不需要归一化到感兴趣的序列的长度,并且可以被包括为一个步骤,减少一个数中的位数,来简化它用于人工解释。所有合格序列标签被对映并计数到每一合格样品,在合格样品中的感兴趣的序列(例如临床上相关的序列)的序列标签密度被确定,同时顺序识别额外序列(归一化序列来自它)的序列标签密度。In step 130, counting derives from all labels of the nucleic acid in the order-checking qualified sample, to determine qualified sequence label density.In one embodiment, sequence label density is determined as reference corresponding to these multiple qualified sequence labels of the sequence interested on reference genome.In another embodiment, qualified sequence label density is for being determined as these multiple qualified sequence labels mapped to the sequence interested, is normalized to the length of the qualified sequence interested that they map.The sequence label density that is determined as the ratio of label density relative to the length of the sequence interested is referred to as label density ratio at this.Do not need to be normalized to the length of the sequence interested, and can be included as a step, reduce the digit in a number, simplify it and be used for artificial explanation.All qualified sequence labels are mapped and counted to each qualified sample, and the sequence label density of the sequence interested in qualified sample (for example, clinically relevant sequence) is determined, and the sequence label density of sequential identification extra sequence (normalization sequence is from it) is determined simultaneously.
在某些实施方案中,感兴趣的序列是与完整染色体非整倍性相关联的染色体,例如染色体21,并且合格归一化序列是不与染色体非整倍性相关联并且序列标签密度的变化接近例如染色体21等感兴趣的序列(即染色体)的完整染色体。所选定的归一化染色体可以是最接近感兴趣的序列的序列标签密度变化的一个染色体或一组染色体。染色体1-22、X和Y中的任何一个或多个可以是感兴趣的序列,并且这一个或多个染色体可以被识别为合格样品中的任一个染色体1-22、X、Y中的每一个的归一化序列。归一化染色体可以是单独的染色体,或者它可以是本申请的其他地方所述的一组染色体。In certain embodiments, sequence of interest is the chromosome associated with complete chromosome aneuploidy, for example chromosome 21, and qualified normalizing sequence is not associated with chromosome aneuploidy and the variation of sequence label density is close to the complete chromosome of sequence of interest (i.e. chromosome) such as for example chromosome 21. Selected normalizing chromosome can be a chromosome or a group of chromosome that the sequence label density changes closest to sequence of interest. Any one or more in chromosome 1-22, X and Y can be sequence of interest, and these one or more chromosomes can be identified as the normalizing sequence of each in any chromosome 1-22, X, Y in qualified samples. Normalizing chromosome can be independent chromosome, or it can be a group of chromosome described in other places of the application.
在另一个实施方案中,感兴趣的序列是与部分非整倍性(例如染色体缺失或插入或不平衡染色体易位)相关联的染色体区段,并且归一化序列是不与部分非整倍性相关联并且序列标签密度的变化接近与部分非整倍性相关联的染色体区段的一个染色体区段(或一组区段)。所选定的归一化染色体区段可以是最接近感兴趣的序列的序列标签密度变化的一个或多个染色体区段。任何一个或多个染色体1-22、X、和Y的任何一个或多个区段可以是感兴趣的序列。In another embodiment, the sequence of interest is the chromosome segment associated with partial aneuploidy (such as chromosome deletion or insertion or unbalanced chromosome translocation), and the normalizing sequence is not associated with partial aneuploidy and the variation of sequence tag density is close to a chromosome segment (or one group of segment) of the chromosome segment associated with partial aneuploidy. Selected normalizing chromosome segment can be one or more chromosome segments that the sequence tag density changes closest to the sequence of interest. Any one or more segments of any one or more chromosomes 1-22, X and Y can be a sequence of interest.
在其他实施方案中,感兴趣的序列是与部分非整倍性相关联的染色体区段,并且归一化序列是一个全染色体或多个全染色体。在再其他实施方案中,感兴趣的序列是与非整倍性相关联的一个全染色体并且归一化序列是不与该非整倍性相关联的一个染色体区段或多个染色体区段。In other embodiments, the sequence of interest is a chromosome segment associated with a partial aneuploidy, and the normalizing sequence is a full chromosome or multiple full chromosomes. In yet other embodiments, the sequence of interest is a full chromosome associated with an aneuploidy and the normalizing sequence is a chromosome segment or multiple chromosome segments not associated with the aneuploidy.
无论合格样品中单序列或一组序列识别为任一个或多个感兴趣的序列的归一化序列,都可以选择序列标签密度变化最接近或有效接近如在合格样品中确定的感兴趣的序列的合格归一化序列。举例来说,合格归一化序列是当用以对感兴趣的序列进行归一化时,在合格样品间产生最小的变异性的序列,即归一化序列的变异性最靠近合格样品中确定的感兴趣的序列的变异性。换句话说,合格归一化序列是被选择为使序列剂量(感兴趣的序列)在合格样品间的变化最小的序列。因此,该过程选择在用作归一化染色体时,预计会产生感兴趣的序列的不同批次间的染色体剂量中的最小的变异性的序列。No matter single sequence or one group of sequence identification is the normalizing sequence of any one or more sequences interested in qualified samples, it is possible to select the sequence tag density variation to be closest or effectively close to the qualified normalizing sequence of the sequence interested determined in qualified samples.For example, qualified normalizing sequence is when being used to normalize sequence interested, produces the sequence of minimum variability between qualified samples, i.e. the variability of normalizing sequence is closest to the variability of the sequence interested determined in qualified samples. In other words, qualified normalizing sequence is the sequence selected as the minimum variation of sequence dosage (sequence interested) between qualified samples. Therefore, this process selection, when being used as normalizing chromosome, is expected to produce the sequence of minimum variability in the chromosome dosage between different batches of sequence interested.
合格样品中针对任一个或多个感兴趣的序列所识别的归一化序列保持是选择用于在测试样品中确定存在或不存在非整倍性的归一化序列长达数日、数周、数月以及可能数年的时间,其条件是程序需要产生测序文库,并且对样品进行的测序随时间基本不变。如上所述,用于确定存在非整倍性的归一化序列因在样品间(例如不同样品)和测序轮次间(例如同一天和/或不同天进行的测序轮次)映射到其的序列标签数目的变异性最接近使用其作为归一化参数的感兴趣的序列的变异性(以及可能其他理由)而选择。这些程序的实质性更改将影响映射到所有序列的标签的数目,从而又将要确定哪个或哪组序列在相同和 /或不同测序轮次中、同一天或不同天在样品间的变异性最接近感兴趣的序列的变异性,此将需要再确定该组归一化序列。程序的实质性更改包括用于制备测序文库的实验室方案发生变化,包括与制备用于多重测序而非单路测序的样品有关的变化;以及测序平台的变化,包括用于测序的化学物质的变化。The normalizing sequence identified for any one or more sequences of interest in qualified samples is maintained to be selected for determining the presence or absence of aneuploidy in the test sample for a normalizing sequence of up to several days, weeks, months, and possibly several years, with the condition that the program needs to produce a sequencing library, and the sequencing performed on the sample is substantially unchanged over time. As described above, the normalizing sequence for determining the presence of aneuploidy is selected because the variability of the number of sequence tags mapped to it between samples (such as different samples) and sequencing rounds (such as sequencing rounds performed on the same day and/or different days) is closest to the variability (and possible other reasons) of the sequence of interest using it as a normalization parameter. The substantial changes of these programs will affect the number of labels mapped to all sequences, thereby again determining which or which group of sequences in identical and/or different sequencing rounds, on the same day or on different days, the variability between samples is closest to the variability of the sequence of interest, and this will need to determine the group of normalizing sequence again. Substantial changes to procedures include changes in laboratory protocols used to prepare sequencing libraries, including changes related to preparing samples for multiplexed rather than singleplex sequencing, and changes in sequencing platforms, including changes in the chemistry used for sequencing.
在一些实施方案中,归一化序列是从一个或多个受影响的样品中最好地辨别出一个或多个合格样品的序列,这意味着归一化序列是具有最大可分辨性的序列,即归一化序列的可分辨性是这样,使得提供最优差异化给受影响的测试样品中的感兴趣的序列,用来容易地从其他未受影响的样品中辨别出受影响的测试样品。在其他实施方案中,归一化序列是具有最小的变异性与最大的可分辨性的组合的序列。In some embodiments, the normalizing sequence is a sequence that best distinguishes one or more qualified samples from one or more affected samples, which means that the normalizing sequence is a sequence with maximum solvability, that is, the solvability of the normalizing sequence is such that optimal differentiation is provided to the sequence of interest in the affected test sample for easily distinguishing the affected test sample from other unaffected samples. In other embodiments, the normalizing sequence is a sequence with a combination of minimal variability and maximum solvability.
可分辨性的水平可以被确定为在一群合格样品中的序列剂量(例如染色体剂量或区段剂量)和一个或多个测试样品中的这一个或多个染色体剂量之间的统计差异,如以下所述并且在这些实例中示出。例如,可分辨性可以被数字表示为T检验值,它代表一群合格样品中的染色体剂量和一个或多个测试样品中的一个或多个染色体剂量之间的统计差异。z-score for chromosome doses as long as the distribution for the NCV isnormal.<}0{>可替代地,可分辨性可以被数字表示为归一化的染色体值(NCV),只要NCV的分布是正常的,它就是染色体剂量的z分数。类似地,可分辨性可以被数字表示为T检验值,它代表一群合格样品中的区段剂量和一个或多个测试样品中的一个或多个区段剂量之间的统计差异。在染色体区段是感兴趣的序列的情况下,区段剂量的可分辨性可在数字上表示为归一化的区段值(NSV),该归一化的区段值是染色体区段剂量的z分数,只要NSV的分布正常即可。在确定z分数中,可以使用在一组合格样品中的染色体的或区段的剂量的平均值和标准差。可替代地,可以使用包括合格样品和受影响样品的训练组中染色体的或区段的剂量的平均值和标准差。在其他实施方案中,归一化序列是具有最小的变异性和最大的可分辨性或者小的变异性与大的可分辨性的最佳组合的序列。The level of distinguishability can be determined as the statistical difference between the sequence dose (such as chromosome dose or segment dose) in a group of qualified samples and the one or more chromosome doses in one or more test samples, as described below and shown in these examples. For example, distinguishability can be digitally represented as a T test value, which represents the statistical difference between the chromosome dose in a group of qualified samples and the one or more chromosome doses in one or more test samples. z-score for chromosome doses as long as the distribution for the NCV is normal. <}0{> Alternatively, distinguishability can be digitally represented as a normalized chromosome value (NCV), as long as the distribution of NCV is normal, it is the z score of chromosome dose. Similarly, distinguishability can be digitally represented as a T test value, which represents the statistical difference between the segment dose in a group of qualified samples and the one or more segment doses in one or more test samples. In the case where a chromosome segment is a sequence of interest, the distinguishability of the segment dose can be digitally represented as a normalized segment value (NSV), which is the z score of the chromosome segment dose, as long as the distribution of NSV is normal. In determining z score, the mean value and standard deviation of the chromosomal or segmental dosage in one group of qualified samples can be used. Alternatively, the mean value and standard deviation of the chromosomal or segmental dosage in a training group comprising qualified samples and affected samples can be used. In other embodiments, the normalizing sequence is the sequence with the best combination of minimum variability and maximum resolvability or small variability and large resolvability.
该方法识别固有地具有类似特征的序列,并且倾向于样品和测序轮次间的类似变异,并且它对于确定测试样品中的序列剂量是有用的。This method identifies sequences that inherently have similar characteristics and tend to have similar variations between samples and sequencing runs, and it is useful for determining sequence dosage in test samples.
确定合格样品中的序列剂量(即染色体剂量或区段剂量)Determine sequence doses (i.e., chromosome doses or segment doses) in qualified samples
在步骤140,基于计算的合格标签密度,感兴趣的序列的合格序列剂量(即染色体剂量或区段剂量)被确定为感兴趣的序列的序列标签密度和额外序列 (在步骤145随后识别来自它的归一化序列)的合格序列标签密度的比率。识别的归一化序列随后被用于确定测试样品中的序列剂量。At step 140, based on the calculated qualified tag density, a qualified sequence dose (i.e., chromosome dose or segment dose) for the sequence of interest is determined as the ratio of the sequence tag density for the sequence of interest and the qualified sequence tag density for an additional sequence (a normalizing sequence from which is subsequently identified at step 145). The identified normalizing sequence is then used to determine the sequence dose in the test sample.
在一个实施方案中,合格样品中的序列剂量是一个染色体剂量,该染色体剂量被计算为感兴趣的染色体的这个序列标签数目和合格样品中的归一化染色体序列的这个序列标签数目的比率。归一化染色体序列可以是单染色体、一组染色体、一个染色体的区段、或来自不同染色体的一组区段。因此,在样品中感兴趣的染色体的染色体剂量被确定为:(i)感兴趣的染色体的这多个标签和由单染色体构成的归一化染色体序列的这多个标签的比率,(ii)针对感兴趣的染色体的标签的数目与针对包括两个或更多个染色体的归一化染色体序列的标签的数目的比率;(iii)针对感兴趣的染色体的标签的数目与针对包括一个染色体的单区段的归一化区段序列的标签的数目的比率;(iv)针对感兴趣的染色体的标签的数目与针对包括来自一个染色体的两个或更多个区段的归一化区段序列的标签的数目的比率;或(v)针对感兴趣的染色体的标签的数目与针对包括两个或更多个染色体的两个或更多个区段的归一化区段序列的标签的数目的比率。根据(i)-(v),用于确定感兴趣的染色体的染色体剂量的实例如下:感兴趣的染色体(例如染色体21)的染色体剂量被确定为染色体21的序列标签密度和全部剩余染色体(即染色体1-20、染色体22、染色体X、和染色体Y)的每一个的序列标签密度的比率;(i)感兴趣的染色体(例如染色体 21)的染色体剂量被确定为染色体21的序列标签密度和两个或更多个剩余染色体的全部可能组合的序列标签密度的比率;(ii)感兴趣的染色体(例如染色体21)的染色体剂量被确定为染色体21的序列标签密度和另一染色体(例如染色体9)的区段的序列标签密度的比率;(iii)感兴趣的染色体(例如染色体21) 的染色体剂量被确定为染色体21的序列标签密度和另一染色体的两个区段(例如染色体9的两个区段)的序列标签密度的比率;(iv)以及感兴趣的染色体(例如染色体21)的染色体剂量被确定为染色体21的序列标签密度和两个不同染色体的两个区段(例如染色体9的区段和染色体14的区段)的序列标签密度的比率。In one embodiment, the sequence dosage in qualified samples is a chromosome dosage, which is calculated as the ratio of the sequence tag number of the chromosome of interest and the sequence tag number of the normalized chromosome sequence in qualified samples. Normalized chromosome sequence can be a single chromosome, a group of chromosomes, a segment of a chromosome or a group of segments from different chromosomes. Therefore, the chromosome dosage of the chromosome of interest in the sample is determined as: the ratio of these multiple labels of the chromosome of interest and the multiple labels of the normalized chromosome sequence consisting of a single chromosome, (ii) the ratio of the number of labels for the chromosome of interest and the number of labels for the normalized chromosome sequence including two or more chromosomes; (iii) the ratio of the number of labels for the chromosome of interest and the number of labels for the normalized segment sequence including a single segment of a chromosome; (iv) the ratio of the number of labels for the chromosome of interest and the number of labels for the normalized segment sequence including two or more segments from a chromosome; or (v) the ratio of the number of labels for the chromosome of interest and the number of labels for the normalized segment sequence including two or more segments of two or more chromosomes. According to (i)-(v), examples for determining the chromosome dose of a chromosome of interest are as follows: the chromosome dose of a chromosome of interest (e.g., chromosome 21) is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of each of all remaining chromosomes (i.e., chromosomes 1-20, chromosome 22, chromosome X, and chromosome Y); (i) the chromosome dose of a chromosome of interest (e.g., chromosome 21) is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of all possible combinations of two or more remaining chromosomes; (ii) the chromosome dose of a chromosome of interest (e.g., chromosome 21) is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of a segment of another chromosome (e.g., chromosome 9); (iii) the chromosome dose of a chromosome of interest (e.g., chromosome 21) is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of a segment of another chromosome (e.g., chromosome 9); The chromosome dose of is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of two segments of another chromosome (e.g., two segments of chromosome 9); (iv) and the chromosome dose of the chromosome of interest (e.g., chromosome 21) is determined as the ratio of the sequence tag density of chromosome 21 to the sequence tag density of two segments of two different chromosomes (e.g., a segment of chromosome 9 and a segment of chromosome 14).
在另一个实施方案中,合格样品中的序列剂量是区段剂量,其计算为合格样品中针对非全染色体的感兴趣的区段的序列标签的数目与针对归一化区段序列的序列标签的数目的比率。归一化区段序列可以是例如一个全染色体、一组全染色体、一个染色体的一个区段或来自不同染色体的一组区段。举例来说,在合格样品中,感兴趣的区段的区段剂量被确定为(i)感兴趣的区段的这多个标签和由染色体的单一区段构成的归一化区段序列的这多个标签的比率,(ii) 感兴趣的区段的这多个标签和由一个染色体的两个或更多区段构成的归一化区段序列的这多个标签的比率,或(iii)感兴趣的区段的这多个标签和由两个或更多个染色体的两个或更多个区段构成的归一化区段序列的这多个标签的比率。In another embodiment, the sequence dose in qualified samples is a segment dose, which is calculated as the ratio of the number of sequence tags for the segment of interest of non-whole chromosomes in qualified samples to the number of sequence tags for normalizing segment sequences. Normalizing segment sequences can be, for example, a full chromosome, a group of full chromosomes, a segment of a chromosome, or a group of segments from different chromosomes. For example, in qualified samples, the segment dose of a segment of interest is determined as the ratio of the multiple labels of the multiple labels of the normalizing segment sequence consisting of a single segment of a chromosome of (i) the segment of interest, (ii) the multiple labels of the segment of interest and the multiple labels of the normalizing segment sequence consisting of two or more segments of a chromosome, or (iii) the multiple labels of the segment of interest and the multiple labels of the normalizing segment sequence consisting of two or more segments of two or more chromosomes.
在全部合格样品中确定感兴趣的一个或多个染色体的染色体剂量,并且在步骤145中识别归一化染色体序列。类似地,在全部合格样品中确定感兴趣的一个或多个区段的区段剂量,并且在步骤145中识别归一化区段序列。The chromosome doses for one or more chromosomes of interest are determined in all qualified samples, and a normalizing chromosome sequence is identified in step 145. Similarly, the segment doses for one or more segments of interest are determined in all qualified samples, and a normalizing segment sequence is identified in step 145.
从合格序列剂量识别归一化序列Identifying normalized sequences from qualified sequence doses
在步骤145中,基于所计算的序列剂量,识别感兴趣的序列的归一化序列为例如使感兴趣的序列的序列剂量在所有合格样品间最小的变异性的序列。该方法识别固有地具有类似特征的序列,并且倾向于样品和测序轮次的类似变异,并且它对于确定测试样品中的序列剂量是有用的。In step 145, based on the calculated sequence doses, a normalizing sequence for the sequence of interest is identified, e.g., a sequence that minimizes the variability of the sequence dose for the sequence of interest across all qualified samples. This method identifies sequences that inherently have similar characteristics and tend to have similar variations across samples and sequencing runs, and is useful for determining sequence doses in test samples.
在一组合格样品中,可以识别感兴趣的一个或多个序列的归一化序列,并且在合格样品中识别的序列可以随后用于计算每一测试样品中的感兴趣的一个或多个序列的序列剂量(步骤150),以确定在每一测试样品中存在或不存在非整倍性。在使用不同测序平台时,和/或在要测序核酸的纯化和/或测序文库的制备中存在差异时,对感兴趣的染色体或区段,识别的归一化序列可以不同。根据在此描述的方法使用归一化序列为染色体或其区段的拷贝数变异提供专一并且灵敏的测量,不管样品制备和/或使用的测序平台如何。In one group of qualified samples, the normalizing sequence of one or more sequences of interest can be identified, and the sequence identified in the qualified samples can be subsequently used to calculate the sequence dosage (step 150) of the one or more sequences of interest in each test sample, to determine the presence or absence of aneuploidy in each test sample. When using different sequencing platforms, and/or when there are differences in the purification of nucleic acid to be sequenced and/or the preparation of sequencing libraries, for chromosome of interest or segment, the normalizing sequence identified can be different. According to the method described herein, using normalizing sequences for the copy number variation of chromosome or its segment provides dedicated and sensitive measurement, no matter how sample preparation and/or the sequencing platform used.
在一些实施方案中,识别多于一个的归一化序列,即,可以对感兴趣的一个序列确定不同归一化序列,并且可以对感兴趣的一个序列,确定多个序列剂量。例如,在使用染色体14的序列标签密度时,感兴趣的染色体21的染色体剂量中的变异(例如变异系数)最小。然而,可以识别两个、三个、四个、五个、六个、七个、八个或更多个归一化序列,用于在确定测试样品中感兴趣的序列的序列剂量中使用。作为一个实例,可以使用染色体7、染色体9、染色体11或染色体12作为归一化染色体序列,确定在任何一个测试样品中的染色体21的第二剂量,因为这些染色体全部具有接近染色体14的CV的CV(参见实例8表10)。优选地,在选择单染色体作为感兴趣的染色体的归一化染色体序列时,归一化染色体将是一个染色体,该染色体导致感兴趣的染色体的染色体剂量具有跨全部测试样品(例如合格样品)的最小变异性。In some embodiments, identification is more than one normalizing sequence, that is, different normalizing sequences can be determined to a sequence interested, and a sequence interested can be determined to multiple sequence dosages.For example, when using the sequence tag density of chromosome 14, the variation (such as coefficient of variation) in the chromosome dosage of chromosome 21 interested is minimum. However, two, three, four, five, six, seven, eight or more normalizing sequences can be identified, for use in determining the sequence dosage of sequence interested in a test sample. As an example, chromosome 7, chromosome 9, chromosome 11 or chromosome 12 can be used as the normalizing chromosome sequence, determine the second dosage of the chromosome 21 in any one test sample, because these chromosomes all have the CV (see example 8 tables 10) close to the CV of chromosome 14. Preferably, when selecting a single chromosome as the normalizing chromosome sequence of a chromosome interested, the normalizing chromosome will be a chromosome, and this chromosome causes the chromosome dosage of a chromosome interested to have the minimum variability across all test samples (such as qualified samples).
归一化染色体序列作为染色体的归一化序列Normalized chromosome sequence as the normalized sequence of chromosome
在其他事实方案中,归一化染色体序列可以是单序列,或者它可以是一组序列。例如,在一些实施方案中,归一化序列是被识别为染色体1-22、X和 Y的任意一个或多个的归一化序列的一组序列,例如一组染色体。构成感兴趣的染色体的归一化序列(即归一化染色体序列)的该组染色体,可以是一组二、三、四、五、六、七、八、九、十、十一、十二、十三、十四、十五、十六、十七、十八、十九、二十、二十一、或二十二染色体,并且包括或排除染色体 X和Y中的一个或这二者。>被识别为归一化染色体序列的该组染色体是这样一组染色体,它们导致感兴趣的染色体的染色体剂量具有跨全部测试样品(即合格样品)的最小变异性。优选地,在一起测试单独的或多组的染色体,针对它们最佳模拟感兴趣的序列的能力,为此选择它们作为归一化染色体序列。In some embodiments, the normalizing chromosome sequence is a set of chromosomes that are identified as one or more normalizing sequences of chromosome 1-22, X and Y. In other embodiments, the normalizing chromosome sequence is a set of chromosomes that are identified as one or more normalizing sequences of chromosome 1-22, X and Y, for example, one set of chromosome. This set of chromosomes that constitutes the normalizing sequence (i.e. normalizing chromosome sequence) of chromosome interested can be one set of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-one or twenty-two chromosomes, and include or exclude one or the two in chromosome X and Y. > This set of chromosomes that are identified as the normalizing chromosome sequence is such a set of chromosomes, and they cause the chromosome dosage of chromosome interested to have the minimum variability across whole test samples (i.e. qualified samples). Preferably, test together independent or multiple sets of chromosomes, for the ability of their best simulation sequence interested, select them as the normalizing chromosome sequence for this reason.
在一个实施方案中,染色体21的归一化序列是选自染色体9、染色体1、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体10、染色体11、染色体12、染色体13、染色体14、染色体15、染色体 16、和染色体17。在另一实施方案中,染色体21的归一化序列是选自染色体 9、染色体1、染色体2、染色体11、染色体12、和染色体14。可替代地,染色体21的归一化序列是选自染色体9、染色体1、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体10、染色体11、染色体12、染色体13、染色体14、染色体15、染色体16、和染色体17的一组染色体。在另一实施方案中,该组染色体是选自染色体9、染色体1、染色体2、染色体11、染色体12、和染色体14的一个组。In one embodiment, the normalizing sequence for chromosome 21 is selected from the group consisting of chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, and chromosome 17. In another embodiment, the normalizing sequence for chromosome 21 is selected from the group consisting of chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12, and chromosome 14. Alternatively, the normalizing sequence for chromosome 21 is a group of chromosomes selected from the group consisting of chromosome 9, chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, and chromosome 17. In another embodiment, the group of chromosomes is a group selected from the group consisting of chromosome 9, chromosome 1, chromosome 2, chromosome 11, chromosome 12, and chromosome 14.
在一些实施方案中,通过使用归一化序列进一步改进该方法,通过单独地以及在与全部剩余染色体的全部可能组合中使用每一染色体的全部染色体剂量的系统计算确定归一化序列(参见实例13)。例如,通过使用染色体1-22、 X、和Y中任一个,以及染色体1-22、X、和Y中的两个或更多个的组合以确定哪个单个或成组的染色体是归一化染色体,该归一化染色体导致跨一组合格样品的感兴趣的染色体的染色体剂量的最小变异性,由此系统计算全部可能染色体,可以对每一感兴趣的染色体确定系统确定的归一化染色体(参见实例13)。因此,在一个实施方案中,染色体21的系统计算的归一化序列是由染色体4、染色体14、染色体16、染色体20、and染色体22组成的一组染色体。对基因组中的全部染色体,可以确定单个或成组的染色体。In some embodiments, further improve the method by using normalizing sequence, by individually and in all possible combinations with all remaining chromosomes, use the system calculation of the whole chromosome dosage of each chromosome to determine normalizing sequence (referring to example 13).For example, by using any one in chromosome 1-22, X and Y, and two or more combinations in chromosome 1-22, X and Y to determine which single or grouped chromosome is a normalizing chromosome, this normalizing chromosome causes the minimum variability of the chromosome dosage of the chromosome interested across one group of qualified samples, thus system calculation all possible chromosomes, can determine the normalizing chromosome (referring to example 13) that system determines to each chromosome interested.Therefore, in one embodiment, the normalizing sequence of the system calculation of chromosome 21 is a group of chromosome consisting of chromosome 4, chromosome 14, chromosome 16, chromosome 20, and chromosome 22.To all chromosomes in genome, can determine single or grouped chromosome.
在一个实施方案中,染色体18的归一化序列是选自染色体8、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体9、染色体10、染色体11、染色体12、染色体13、和染色体14。优选地,染色体18的归一化序列是选自染色体8、染色体2、染色体3、染色体5、染色体6、染色体12、和染色体14。在一个实施方案中,染色体18的归一化序列是选自染色体8、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体9、染色体10、染色体11、染色体12、染色体13、和染色体14的一组染色体。优选地,该组染色体是选自染色体8、染色体2、染色体3、染色体5、染色体6、染色体12、和染色体14的一个组。In one embodiment, the normalizing sequence for chromosome 18 is selected from the group consisting of chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, and chromosome 14. Preferably, the normalizing sequence for chromosome 18 is selected from the group consisting of chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12, and chromosome 14. In one embodiment, the normalizing sequence for chromosome 18 is a group of chromosomes selected from the group consisting of chromosome 8, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, and chromosome 14. Preferably, the group of chromosomes is a group selected from the group consisting of chromosome 8, chromosome 2, chromosome 3, chromosome 5, chromosome 6, chromosome 12, and chromosome 14.
在另一实施方案中,通过单独地以及按归一化染色体的全部可能组合使用每一可能归一化染色体,由此系统计算全部可能染色体剂量确定染色体18 的归一化序列(如本申请的其他地方所解释的)。因此,在一个实施方案中,染色体18的归一化序列是由一组染色体组成的归一化染色体,该组染色体由染色体2、染色体3、染色体5、和染色体7组成。In another embodiment, the normalizing sequence for chromosome 18 is determined by systematically calculating all possible chromosome doses using each possible normalizing chromosome individually and in all possible combinations of normalizing chromosomes (as explained elsewhere in this application). Thus, in one embodiment, the normalizing sequence for chromosome 18 is a normalizing sequence consisting of a set of chromosomes consisting of chromosome 2, chromosome 3, chromosome 5, and chromosome 7.
在一个实施方案中,染色体X的归一化序列是选自染色体1、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体9、染色体10、染色体11、染色体12、染色体13、染色体14、染色体15、和染色体16。优选地,染色体X的归一化序列是选自染色体2、染色体3、染色体4、染色体5、染色体6和染色体8。在一个实施方案中,染色体X的归一化序列是选自染色体1、染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体9、染色体10、染色体11、染色体12、染色体13、染色体14、染色体15、和染色体16的一组染色体。优选地,该组染色体是选自染色体2、染色体3、染色体4、染色体5、染色体6、和染色体8的一个组。In one embodiment, the normalizing sequence for chromosome X is selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, and chromosome 16. Preferably, the normalizing sequence for chromosome X is selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8. In one embodiment, the normalizing sequence for chromosome X is a group of chromosomes selected from chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, and chromosome 16. Preferably, the group of chromosomes is a group selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8.
在另一实施方案中,通过单独地以及按归一化染色体的全部可能组合使用每一可能归一化染色体,由此系统计算全部可能染色体剂量确定染色体X的归一化序列(如本申请的其他地方所解释的)。因此,在一个实施方案中,染色体X的归一化序列是由染色体4和染色体8的该组所组成的归一化染色体。In another embodiment, the normalizing sequence for chromosome X is determined by systematically calculating all possible chromosome doses using each possible normalizing chromosome individually and in all possible combinations of normalizing chromosomes (as explained elsewhere in this application). Thus, in one embodiment, the normalizing sequence for chromosome X is the normalizing chromosome consisting of the group of chromosomes 4 and chromosome 8.
在一个实施方案中,染色体13的归一化序列是选自染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体9、染色体10、染色体11、染色体12、染色体14、染色体18、和染色体21的一个染色体。优选地,染色体13的归一化序列是选自染色体2、染色体3、染色体4、染色体5、染色体6、and染色体8的一个染色体。在另一实施方案中,染色体13 的归一化序列是选自染色体2、染色体3、染色体4、染色体5、染色体6、染色体7、染色体8、染色体9、染色体10、染色体11、染色体12、染色体14、染色体18、和染色体21的一组染色体。优选地,该组染色体是选自染色体2、染色体3、染色体4、染色体5、染色体6、和染色体8的一个组。In one embodiment, the normalizing sequence for chromosome 13 is a chromosome selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18, and chromosome 21. Preferably, the normalizing sequence for chromosome 13 is a chromosome selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8. In another embodiment, the normalizing sequence for chromosome 13 is a group of chromosomes selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 14, chromosome 18, and chromosome 21. Preferably, the group of chromosomes is a group selected from chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, and chromosome 8.
在另一实施方案中,针对染色体13的归一化序列是单独地使用每一可能归一化染色体以及归一化染色体的全部可能的组合,通过系统计算全部可能染色体剂量来确定的(如本申请的其他地方所解释的)。因此,在一个实施方案中,染色体13的归一化序列是包括染色体4和染色体5的该组的归一化染色体。在另一个实施方案中,染色体13的归一化序列是由染色体4和染色体5 的该组所组成的归一化染色体。In another embodiment, the normalizing sequence for chromosome 13 is determined by systematically calculating all possible chromosome doses using each possible normalizing chromosome individually and all possible combinations of normalizing chromosomes (as explained elsewhere in this application). Thus, in one embodiment, the normalizing sequence for chromosome 13 is a normalizing chromosome comprising the group of chromosomes 4 and 5. In another embodiment, the normalizing sequence for chromosome 13 is a normalizing chromosome consisting of the group of chromosomes 4 and 5.
独立于在确定染色体Y剂量中使用的哪个归一化染色体,染色体Y的染色体剂量中的变异大于30。因此,选自染色体1-22和染色体X的一组两个或更多个染色体可以被用作染色体Y的归一化序列。在一个实施方案中,至少一个归一化染色体是由染色体1-22、和染色体X组成的一组染色体。在另一实施方案中,该组染色体由染色体2、染色体3、染色体4、染色体5、和染色体 6组成。Independent of which normalizing chromosome is used in determining the chromosome Y dose, the variation in chromosome dose for chromosome Y is greater than 30. Thus, a group of two or more chromosomes selected from chromosomes 1-22 and chromosome X can be used as the normalizing sequence for chromosome Y. In one embodiment, at least one normalizing chromosome is a group of chromosomes consisting of chromosomes 1-22 and chromosome X. In another embodiment, the group of chromosomes consists of chromosome 2, chromosome 3, chromosome 4, chromosome 5, and chromosome 6.
在另一实施方案中,通过单独地以及按归一化染色体的全部可能组合使用每一可能归一化染色体,由此系统计算全部可能染色体剂量确定染色体Y的归一化序列(如本申请的其他地方所解释的)。因此,在一个实施方案中,染色体Y的归一化序列是包括由染色体4和染色体6组成的该组染色体的归一化染色体。在另一个实施方案中,染色体Y的归一化序列是由一组染色体组成的归一化染色体,该组染色体由染色体4和染色体6组成。In another embodiment, the normalizing sequence for chromosome Y is determined by systematically calculating all possible chromosome doses using each possible normalizing chromosome individually and in all possible combinations of normalizing chromosomes (as explained elsewhere in this application). Thus, in one embodiment, the normalizing sequence for chromosome Y is a normalizing chromosome comprising the set of chromosomes consisting of chromosome 4 and chromosome 6. In another embodiment, the normalizing sequence for chromosome Y is a normalizing chromosome consisting of a set of chromosomes consisting of chromosome 4 and chromosome 6.
用于计算感兴趣的不同染色体或感兴趣的不同区段的剂量的归一化序列可以是相同的,或者分别对于不同染色体或区段,它可以是不同的归一化序列。例如,归一化序列,感兴趣的染色体A的归一化序列(例如归一化染色体)(一个或一组)可以是相同的,或者它可以不同于感兴趣的染色体B的归一化序列 (例如归一化染色体)(一个或一组)。The normalizing sequence for calculating the dosage of different chromosomes of interest or different segments of interest can be the same, or it can be different normalizing sequences for different chromosomes or segments, respectively. For example, the normalizing sequence, the normalizing sequence of chromosome A of interest (e.g., normalizing chromosome) (one or a group) can be the same, or it can be different from the normalizing sequence of chromosome B of interest (e.g., normalizing chromosome) (one or a group).
完整染色体的归一化序列可以是一个完整染色体或一组完整染色体,或者它可以是染色体的区段,或一个或多个染色体的一组区段。The normalizing sequence for a complete chromosome can be a complete chromosome or a group of complete chromosomes, or it can be a segment of a chromosome, or a group of segments of one or more chromosomes.
归一化区段序列作为染色体的归一化序列Normalizing segment sequence as the normalizing sequence of chromosome
在另一实施方案中,染色体的归一化序列可以是归一化区段序列。归一化区段序列可以是单一区段,或者它可以是一个染色体的一组区段,或者它们可以是来自两个或更多个不同染色体的多个区段。通过基因组中区段序列的全部组合的系统计算,可以确定归一化区段序列。举例来说,染色体21的归一化区段序列可以是比约47Mbp(百万碱基对)的染色体21的尺寸大或小的单区段,例如归一化区段可以是染色体9的一个区段,其约140Mbp。作为替代方案,染色体21的归一化序列可以是例如来自两个不同染色体(例如来自染色体1和来自染色体12)的区段序列的组合。In another embodiment, the normalizing sequence of a chromosome can be a normalizing segment sequence. The normalizing segment sequence can be a single segment, or it can be a group of segments of a chromosome, or they can be multiple segments from two or more different chromosomes. By systematic calculation of all combinations of segment sequences in the genome, the normalizing segment sequence can be determined. For example, the normalizing segment sequence of chromosome 21 can be a single segment that is larger or smaller than the size of chromosome 21 of about 47Mbp (million base pairs), for example, a normalizing segment can be a segment of chromosome 9, which is about 140Mbp. As an alternative, the normalizing sequence of chromosome 21 can be, for example, a combination of segment sequences from two different chromosomes (e.g., from chromosome 1 and from chromosome 12).
在一个实施方案中,针对染色体21的归一化序列是染色体1-20、22、X、和Y的一个区段或一组两个或更多个区段的一个归一化区段序列。在另一实施方案中,针对染色体18的归一化序列是染色体1-17、19-22、X'、和Y的一个区段或多组区段。在另一实施方案中,针对染色体13的归一化序列是染色体 1-12、14-22、X'、和Y的一个区段或多组区段。在另一实施方案中,针对染色体X的归一化序列是染色体1-22、和Y的一个区段或多组区段。在另一实施方案中,针对染色体Y的归一化序列是染色体1-22、和X的一个区段或一组区段。对一个基因组中的全部染色体可以确定单个或多组区段的归一化序列。归一化区段序列的两个或更多个区段可以是来自一个染色体的区段,或者这两个或更多个区段可以是两个或更多个不同染色体的区段。如对归一化染色体序列所说明的,一个归一化区段序列对两个或更多个不同染色体而言可以是相同的。In one embodiment, the normalizing sequence for chromosome 21 is a normalizing segment sequence of one segment or one group of two or more segments of chromosomes 1-20, 22, X, and Y. In another embodiment, the normalizing sequence for chromosome 18 is a segment or multiple groups of segments of chromosomes 1-17, 19-22, X', and Y. In another embodiment, the normalizing sequence for chromosome 13 is a segment or multiple groups of segments of chromosomes 1-12, 14-22, X', and Y. In another embodiment, the normalizing sequence for chromosome X is a segment or multiple groups of segments of chromosomes 1-22 and Y. In another embodiment, the normalizing sequence for chromosome Y is a segment or one group of segments of chromosomes 1-22 and X. The normalizing sequence of a single or multiple groups of segments can be determined for all chromosomes in a genome. The two or more segments of the normalizing segment sequence can be segments from one chromosome, or the two or more segments can be segments of two or more different chromosomes. As explained for normalizing chromosome sequences, a normalizing segment sequence can be identical for two or more different chromosomes.
归一化区段序列作为染色体区段的归一化序列Normalizing segment sequence as the normalizing sequence of chromosome segment
当感兴趣的序列是一个染色体的区段时,可以确定存在或不存在感兴趣的序列的CNV。染色体区段的拷贝数中的变异允许确定存在或不存在一种部分染色体性非整倍性。以下说明的是与不同胎儿异常和病情相关联的部分染色体性非整倍性的实例。染色体的区段可以具有任何长度。例如,它可以范围从千碱基到数亿个碱基。人类基因组只占超过30亿个DNA碱基,它可以被分为数十、数千、数十万以及成百万的具有不同大小的区段,它们的拷贝数可以根据本发明的方法来确定。一个染色体区段的归一化序列是这样一种归一化区段序列,它可以是来自染色体1-22、X和Y中任何一个的单一区段,或者它可以是来自染色体1-22、X、和Y中任何一个的一组区段。When the sequence of interest is a chromosomal segment, it is possible to determine the presence or absence of the CNV of the sequence of interest. The variation in the copy number of a chromosomal segment allows determination of the presence or absence of a partial chromosomal aneuploidy. The following describes an example of a partial chromosomal aneuploidy associated with different fetal abnormalities and an illness. The chromosomal segment can have any length. For example, it can range from kilobases to hundreds of millions of bases. The human genome only accounts for more than 3 billion DNA bases, and it can be divided into tens, thousands, hundreds of thousands and millions of segments with different sizes, and their copy number can be determined according to the method of the present invention. The normalized sequence of a chromosomal segment is such a normalized segment sequence, which can be a single segment from any one of chromosomes 1-22, X and Y, or it can be a group of segments from any one of chromosomes 1-22, X and Y.
针对一个感兴趣的区段的归一化序列是这样一个序列,该序列具有跨多个染色体并且跨多个样品的变异性,该变异性最接近感兴趣的片断的变异性。在该归一化序列是染色体1-22、X和Y中的任意一个或多个的一组区段时,可以如所述来进行归一化序列的确定,用于确定感兴趣的染色体的归一化序列。通过使用针对在一组合格样品(即已知是感兴趣的区段的二倍体的样品)的每一个样品中的感兴趣的区段作为归一化序列的两个或更多个区段的一个以及全部可能的组合来计算区段剂量,可以识别一个或一组区段的归一化区段序列,并且这个归一化序列被确定为是提供了一个区段剂量的归一化序列,这个区段剂量跨全部合格样品针对这个感兴趣的区段具有最低的变异性,正如以上对归一化染色体序列的说明。The normalizing sequence for a segment of interest is a sequence having variability across multiple chromosomes and across multiple samples that is closest to the variability of the segment of interest. When the normalizing sequence is a group of segments of any one or more of chromosomes 1-22, X, and Y, the determination of the normalizing sequence can be performed as described to determine the normalizing sequence of the chromosome of interest. By using the segment of interest in each sample of a group of qualified samples (i.e., samples known to be diploid for the segment of interest) as one and all possible combinations of two or more segments of the normalizing sequence to calculate the segment dose, a normalizing segment sequence for one or a group of segments can be identified, and this normalizing sequence is determined to be a normalizing sequence that provides a segment dose, and this segment dose has the lowest variability for this segment of interest across all qualified samples, as described above for the normalizing chromosome sequence.
例如,对感兴趣的区段它是1Mb(兆碱基),大约3Gb人类基因组中的剩余3百万个区段(减去感兴趣的1mg区段)可以被单独地或相互组合使用,以计算在合格组的样品中的感兴趣的区段的区段剂量,从而确定哪一个或哪组区段将用作合格的和测试的样品的归一化区段序列。感兴趣的区段可以从约 1000个碱基变化到数千万个碱基。归一化区段序列可以由与感兴趣的序列大小相同的一个或多个区段构成。在其他实施方案中,归一化区段序列可以由不同于感兴趣的序列,和/或彼此不同的区段构成。例如,对于一个100,000碱基长度的序列的归一化序列可以是20,000个碱基长,并且可以包括例如在 7,000+8,000+5,000个碱基的不同长度的序列的组合。如在本申请的其他地方对归一化染色体序列所说明的,通过独立地以及以归一化区段的全部可能组合使用每一可能的归一化染色体区段系统地计算全部可能的染色体和/或区段剂量,可以确定归一化区段序列(如本申请的其他地方所解释的)。对基因组中的全部区段和/或染色体,可以确定单个或成组的区段。For example, for a segment of interest, it is 1 Mb (megabase), and the remaining 3 million segments in the approximately 3 Gb human genome (minus the 1 mg segment of interest) can be used individually or in combination with each other to calculate the segment dose of the segment of interest in the sample of the qualified group, thereby determining which one or which group of segments will be used as the normalizing segment sequence for the qualified and tested samples. The segment of interest can vary from about 1000 bases to tens of millions of bases. The normalizing segment sequence can be composed of one or more segments of the same size as the sequence of interest. In other embodiments, the normalizing segment sequence can be composed of segments different from the sequence of interest, and/or different from each other. For example, the normalizing sequence for a sequence of 100,000 bases in length can be 20,000 bases long, and can include, for example, a combination of sequences of different lengths of 7,000+8,000+5,000 bases. As described elsewhere in this application for normalizing chromosome sequences, normalizing segment sequences can be determined by systematically calculating all possible chromosome and/or segment doses using each possible normalizing chromosome segment, both individually and in all possible combinations of normalizing segments (as explained elsewhere in this application). Individual or groups of segments can be determined for all segments and/or chromosomes in the genome.
用于计算感兴趣的不同染色体区段的剂量的归一化序列可以是相同的,或者它可以是针对不同的感兴趣的染色体区段的不同归一化序列。例如,针对感兴趣的染色体区段A的归一化序列,例如一个归一化区段(一个或一组)可以是相同的,或者它可以不同于针对感兴趣的染色体区段B的归一化序列,例如一个归一化区段(一个或一组)。The normalizing sequence for calculating the dose of different chromosome segments of interest can be the same, or it can be different normalizing sequences for different chromosome segments of interest. For example, the normalizing sequence for chromosome segment A of interest, such as a normalizing segment (one or a group) can be the same, or it can be different from the normalizing sequence for chromosome segment B of interest, such as a normalizing segment (one or a group).
归一化染色体序列作为染色体区段的归一化序列Normalized chromosome sequence as the normalized sequence of chromosome segment
在另一个实施方案中,染色体区段的拷贝数变异可使用归一化染色体确定,该归一化染色体可以是如上所述的单染色体或一组染色体。归一化染色体序列可以是通过系统地确定哪个或哪组染色体使一组合格样品中染色体剂量的变异性最低,来针对一组合格样品中感兴趣的染色体识别的归一化染色体或染色体群组。举例来说,为确定存在或不存在染色体7的部分缺失,用于分析部分缺失的归一化染色体或染色体群组是首先在一组合格样品中识别为使整个染色体7的染色体剂量最低的归一化序列的染色体或染色体群组。如在此其他地方针对感兴趣的染色体的归一化染色体序列所述,可通过使用每一个可能归一化染色体个别和归一化染色体的所有可能组合系统地计算所有可能的染色体剂量,来确定染色体区段的归一化染色体序列(如在此其他地方所解释的)。可针对基因组中所有染色体区段确定单染色体或染色体群组。说明使用归一化染色体确定存在部分染色体缺失和部分染色体复制的实例提供为实例17和18。In another embodiment, the copy number variation of chromosome segment can use normalization chromosome to determine, and this normalization chromosome can be single chromosome as above or one group of chromosome.Normalization chromosome sequence can be by systematically determining which or which group of chromosome makes the variability of chromosome dosage in one group of qualified samples minimum, come normalization chromosome or chromosome group for chromosome identification interested in one group of qualified samples.For example, for determining the partial deletion of chromosome 7, be first identified as chromosome or chromosome group that makes the chromosome dosage minimum of whole chromosome 7 in one group of qualified samples for the normalization sequence that is used to analyze the normalization chromosome of partial deletion.As described in this other place for the normalization chromosome sequence of chromosome interested, can systematically calculate all possible chromosome dosages by using each possible normalization chromosome individuation and all possible combinations of normalization chromosome, determine the normalization chromosome sequence (as explained in this other place) of chromosome segment.Can determine single chromosome or chromosome group for all chromosome segments in genome.Illustration uses normalization chromosome to determine that there is partial chromosome deletion and partial chromosome duplication example to be provided as example 17 and 18.
在某些实施方案中,通过首先将感兴趣的染色体再分成可变长度的段或数据箱来确定染色体区段的CNV。数据箱长度可为至少约1kbp、至少约10kbp、至少约100kbp、至少约1mbp、至少约10mbp或至少约100mbp。数据箱长度愈小,获得用以定位感兴趣的染色体中区段的CNV的分辨率愈高。In certain embodiments, CNVs of chromosome segments are determined by first subdividing the chromosome of interest into segments or bins of variable length. The bin length can be at least about 1 kbp, at least about 10 kbp, at least about 100 kbp, at least about 1 mbp, at least about 10 mbp, or at least about 100 mbp. The smaller the bin length, the higher the resolution obtained for locating CNVs in the chromosome segment of interest.
确定存在或不存在感兴趣的染色体区段的CNV可通过将测试样品中感兴趣的染色体的数据箱每一者的剂量与针对一组合格样品中同等长度的数据箱每一者确定的相应数据箱剂量的均值进行比较来实现。每一个数据箱的归一化的二进制值可如以上针对归一化的区段值所述计算为归一化的二进制值 (NBV),该归一化的二进制值将测试样品中的数据箱剂量与一组合格样品中相应数据箱剂量的均值相关联。该NBV计算为:Determining the presence or absence of a CNV for a chromosome segment of interest can be accomplished by comparing the dose for each bin of the chromosome of interest in the test sample with the mean of the corresponding bin doses determined for each bin of the same length in a set of qualified samples. The normalized binary value for each bin can be calculated as a normalized binary value (NBV) as described above for the normalized segment value, which relates the bin dose in the test sample to the mean of the corresponding bin dose in a set of qualified samples. The NBV is calculated as:
其中和分别是对于一组合格样品中第j个数据箱剂量的估算均值和标准差,并且xij是对测试样品i所观测到的第j个数据箱剂量。where and are the estimated mean and standard deviation, respectively, of the jth bin dose for a set of qualified samples, and xij is the observed jth bin dose for test sample i.
确定测试样品中的非整倍性Determine aneuploidy in test samples
基于合格样品中识别的一个或多个归一化序列,针对在测试样品中的一个感兴趣的序列来确定一个序列剂量,该样品包括核酸混合物,这些核酸衍生自在一个或多个感兴趣的序列上不同的基因组。A sequence dose is determined for a sequence of interest in a test sample comprising a mixture of nucleic acids derived from genomes that differ in the one or more sequences of interest based on one or more normalizing sequences identified in qualified samples.
在步骤115,从怀疑或已知携带感兴趣的序列的临床相关CNV的一位受试者获得一个测试样品。这个测试样品可以是一种生物学流体(例如血浆)或如以下所述的任何适合的样品。如所说明,样品可使用例如简单抽血等非侵入性程序获得。在一些实施方案中,测试样品含有核酸分子(例如cfDNA分子) 的混合物。在一些实施方案中,该测试样品是含有胎儿的和母体的cfDNA分子的混合物的一个母体血浆样品。In step 115, a test sample is obtained from a subject suspected or known to carry a clinically relevant CNV of a sequence of interest. This test sample can be a biological fluid (e.g., plasma) or any suitable sample as described below. As described, the sample can be obtained using a non-invasive procedure such as a simple blood draw. In some embodiments, the test sample contains a mixture of nucleic acid molecules (e.g., cfDNA molecules). In some embodiments, the test sample is a maternal plasma sample containing a mixture of fetal and maternal cfDNA molecules.
在步骤125,如对合格样品所说明的情况,对在该测试样品中的至少一部分测试核酸进行测序,以产生成百万的序列读数(例如36bp读数)。如在步骤 120中,从对该测试样品中的核酸进行测序所产生的读数被独特地映射到一个参考基因组上或与一个参考基因组比对以产生标签。如在步骤120中所述,从独特地映射参考基因组的读数中获得至少约3x106个合格序列标签、至少约 5x106个合格序列标签、至少约8x106个合格序列标签、至少约10x106个合格序列标签、至少约15x106个合格序列标签、至少约20x106个合格序列标签、至少约30x106个合格序列标签、至少约40x106个合格序列标签、或至少约 50x106个合格序列标签,这些合格序列标签包括20和40bp之间的读数。在某些实施方案中,通过测序装置产生的读数以电子格式提供。使用如下讨论的计算装置完成比对。将个别读数与经常极大(数百万个碱基对)的参考基因组进行比较,以识别读数与参考基因组独特对应的位点。在某些实施方案中,比对程序允许读数与参考基因组之间有限的错配。在一些情况下,一个读数中允许1个、2个或3个碱基对与参考基因组中相应碱基对错配,然而仍然产生映射。In step 125, as the situation described for qualified samples, at least a portion of the test nucleic acid in this test sample is checked order, to produce the sequence readings (for example 36bp readings) in millions. As in step 120, the reading produced from the nucleic acid in this test sample is mapped uniquely to a reference genome or compared to produce a label with reference to genome. As described in step 120, from the reading uniquely mapped with reference to genome, obtain at least about 3x106 individual qualified sequence labels, at least about 5x106 individual qualified sequence labels, at least about 8x106 individual qualified sequence labels, at least about 10x106 individual qualified sequence labels, at least about 15x106 individual qualified sequence labels, at least about 20x106 individual qualified sequence labels, at least about 30x106 individual qualified sequence labels, at least about 40x106 individual qualified sequence labels or at least about 50x106 individual qualified sequence labels, these qualified sequence labels comprise the reading between 20 and 40bp. In certain embodiments, the reading produced by sequencing apparatus provides with electronic format. Compare with the computing device discussed below.Individual reading is compared with the reference genome of often very large (millions of base pairs), to identify the site corresponding to reading and reference genome uniqueness.In certain embodiments, alignment program allows reading with limited mispairing between reference genome.In some cases, allow 1,2 or 3 base pairs and corresponding base pair mispairing in reference genome in a reading, yet still produce mapping.
在步骤135中,使用如下所述的计算装置,将从对测试样品中的核酸进行测序所获得的所有或大部分标签计数以确定测试序列标签密度。在某些实施方案中,将每一个读数与参考基因组的一个具体区域(在大多数情况下是一个染色体或区段)进行比对,并且通过将位点信息附加到读数上,使读数转变成标签。当该过程开展时,计算装置可保持对映射到参考基因组的每一个区域(在大多数情况下是染色体或区段)的标签/读数的数目进行滚动计数。存储每一个感兴趣的染色体或区段和每一个相应归一化染色体或区段者的计数。In step 135, use computing device as described below, all or most of the labels obtained from the nucleic acid in the test sample are counted to determine the test sequence label density.In certain embodiments, each reading is compared with a specific region (in most cases a chromosome or segment) with reference to genome, and by attaching site information on the reading, the reading is transformed into a label.When the process was carried out, computing device can keep the number of labels/readings mapped to each region (in most cases a chromosome or segment) with reference to genome being rolled counted.Store the count of each chromosome interested or segment and each corresponding normalization chromosome or segment person.
在某些实施方案中,参考基因组具有一个或多个被排除的区域,这个或这些被排除的区域是真正生物基因组的一部分,但不包括在参考基因组中。对可能与这些被排除的区域进行比对的读数不计数。被排除的区域的实例包括长重复序列的区域、X与Y染色体之间的类似区域等等。In certain embodiments, the reference genome has one or more excluded regions that are part of a real biological genome but are not included in the reference genome. Reads that may be aligned with these excluded regions are not counted. Examples of excluded regions include regions of long repetitive sequences, similar regions between chromosomes X and Y, and the like.
在某些实施方案中,该方法确定当多个读数与参考基因组或序列上的同一个位点进行比对时是否对标签计数超过一次。可能存在两个标签具有相同序列因此与参考序列上相同的位点进行比对的时候。用以计数标签的方法在某些情况下可将衍生自相同测序样品的相同标签排斥在计数外。如果既定样品中不相称的数目的标签相同,那么表明程序中存在巨大偏差或其他缺陷。因此,根据某些实施方案,计数法不对来自既定样品的与来自该样品的以前计数过的标签相同的标签进行计数。In certain embodiments, the method determines whether the tag is counted more than once when multiple reads are compared to the same site on the reference genome or sequence. There may be times when two tags have the same sequence and are therefore compared to the same site on the reference sequence. The method for counting tags can exclude identical tags derived from the same sequencing sample from counting in some cases. If a disproportionate number of tags in a given sample are the same, it indicates that there is a huge deviation or other defect in the program. Therefore, according to certain embodiments, the counting method does not count tags from a given sample that are identical to tags from the sample that were previously counted.
当从单一样品忽略相同的标签时,可设置不同的指标用于选择。在某些实施方案中,界定百分比的计数标签必须是独特的。如果比该阈值多的标签不是独特的,那么忽略这些标签。举例来说,如果界定百分比要求至少50%是独特的,那么直到样品的独特标签的百分比超过50%,才计数相同的标签。在其他实施方案中,独特标签的临界数目是至少约60%。在其他实施方案中,独特标签的临界百分比是至少约75%,或至少约90%,或至少约95%,或至少约 98%,或至少约99%。对于染色体21,阈值可以设在90%下。如果30M标签与染色体21进行比对,那么至少27M的标签必须是独特的。如果3M计数标签不是独特的并且第30,000,000标签不是独特的,那么其不计数在内。When ignoring the same label from a single sample, different indicators can be set for selection. In certain embodiments, the count labels defining a percentage must be unique. If the labels more than the threshold value are not unique, these labels are ignored. For example, if the definition percentage requires that at least 50% be unique, then until the percentage of the unique labels of the sample exceeds 50%, the same labels are counted. In other embodiments, the critical number of unique labels is at least about 60%. In other embodiments, the critical percentage of unique labels is at least about 75%, or at least about 90%, or at least about 95%, or at least about 98%, or at least about 99%. For chromosome 21, the threshold can be set at 90%. If 30M labels are compared with chromosome 21, at least 27M labels must be unique. If the 3M count labels are not unique and the 30,000,000th label is not unique, then it is not counted.
可使用适当的统计分析,选择用以确定何时不计数另外相同的标签的具体阈值或其他指标。影响该阈值或其他标准的一个因素是测序样品相对于标签可进行比对的基因组的尺寸的量。其他因素包括读数的尺寸和类似考虑因素。Appropriate statistical analysis can be used to select a specific threshold or other metric for determining when not to count otherwise identical tags. One factor affecting this threshold or other criterion is the amount of sequencing sample relative to the size of the genome to which the tags can be compared. Other factors include the size of the reads and similar considerations.
在一个实施方案中,映射到一个感兴趣的序列上的序列标签数目被归一化到它们映射到上面的一个感兴趣的序列的已知长度上,以提供一个测试序列标签密度比。如对这些合格样品所述,并不一定要求归一化到一个感兴趣的序列的已知长度上,并且这可以被包括为一个步骤来减少一个数目中的数字位数从而将其简化以供人工解读。随着测试样品中全部映射的测试序列标签都被计数,在这些测试样品中针对感兴趣的序列(例如临床上相关的序列)的序列标签密度被确定,同样被确定的是针对额外序列的序列标签密度,这些额外序列对应于在这些合格样品中识别出的至少一个归一化序列。In one embodiment, the number of sequence tags mapped to a sequence of interest is normalized to the known length of the sequence of interest they map to above, to provide a test sequence tag density ratio. As described in these qualified samples, it is not necessarily required to be normalized to the known length of a sequence of interest, and this can be included as a step to reduce the number of digits in a number thereby simplifying it for manual interpretation. Along with all the mapped test sequence tags in the test sample being counted, the sequence tag density for the sequence of interest (e.g., clinically relevant sequence) in these test samples is determined, and what is also determined is the sequence tag density for additional sequences, which correspond to at least one normalizing sequence identified in these qualified samples.
在步骤150,基于在这些合格样品中的至少一个归一化序列的识别,对测试样品中的一个感兴趣的序列确定出有关测试序列剂量。在不同的实施方案中,测试序列剂量通过操作如在此描述的感兴趣的序列和相应归一化序列的序列标签密度以计算的方式确定。负责该任务的计算装置电子存取感兴趣的序列与其相关联的归一化序列之间的关联性,其可存储在数据库、表、图表中或作为代码包括在程序指令中。At step 150, based on the identification of at least one normalizing sequence in the qualified samples, a test sequence dose is determined for a sequence of interest in the test sample. In various embodiments, the test sequence dose is computationally determined by operating on the sequence tag density of the sequence of interest and the corresponding normalizing sequence as described herein. The computing device responsible for this task electronically accesses the association between the sequence of interest and its associated normalizing sequence, which can be stored in a database, table, chart, or included as code in program instructions.
如在本申请的其他地方所说明,该至少一个归一化序列可以是一个单序列或一组序列。在测试样品中针对一个感兴趣的序列的序列剂量是对该测试样品中感兴趣的序列确定的序列标签密度与在该测试样品中确定的至少一个归一化序列的序列标签密度的比率,其中在该测试样品中的归一化序列对应于在这些合格样品中针对感兴趣的具体序列识别的归一化序列。例如,如果针对这些合格样品中的染色体21识别的归一化序列别被确定为是一个染色体(例如染色体14),那么针对染色体21(感兴趣的序列)的测试序列剂量就被确定为针对染色体21的序列标签密度与针对染色体14的序列标签密度的比率,每一个都是在测试样品中确定的。类似地,确定了针对染色体13、18、X、Y以及与多种染色体性非整倍性相关联的其他染色体的染色体剂量。针对感兴趣的染色体的归一化序列可以是一个或一组染色体,或一个或一组染色体区段。如上所述,一个感兴趣的序列可以是染色体的一部分,例如一个染色体区段。因此,针对一个染色体区段的剂量可以被确定为针对在该测试样品中的这个区段确定的序列标签密度与针对该测试样品中的归一化染色体区段的序列标签密度的比率,其中在该测试样品中的归一化区段对应于在这些合格样品中针对感兴趣的具体区段识别的归一化区段(单个或一组区段)。染色体区段在大小上可以是范围从千碱基(kb)到兆碱基(Mb)。(例如约1kb到10kb,或约10kb 到100kb,或约100kb到1Mb)。<}0{> As described elsewhere in the present application, the at least one normalizing sequence can be a single sequence or a group of sequences. In a test sample, the sequence dosage for a sequence interested is the sequence tag density determined for the sequence interested in the test sample and the ratio of the sequence tag density of at least one normalizing sequence determined in the test sample, wherein the normalizing sequence in the test sample corresponds to the normalizing sequence identified for the specific sequence interested in these qualified samples. For example, if the normalizing sequence identified for the chromosome 21 in these qualified samples is not determined to be a chromosome (such as chromosome 14), then the test sequence dosage for chromosome 21 (sequence interested) is just determined to be the ratio of the sequence tag density for chromosome 21 and the sequence tag density for chromosome 14, and each is determined in a test sample. Similarly, the chromosome dosage for chromosome 13,18,X,Y and other chromosomes associated with multiple chromosome aneuploidy is determined. The normalizing sequence for chromosome interested can be one or a group of chromosomes, or one or a group of chromosome segments. As mentioned above, a sequence interested can be a part for chromosome, such as a chromosome segment. Therefore, the dose for a chromosome segment can be determined as the ratio of the sequence tag density determined for this segment in the test sample to the sequence tag density for the normalized chromosome segment in the test sample, wherein the normalized segment in the test sample corresponds to the normalized segment (single or a group of segments) identified for the specific segment of interest in the qualified samples. Chromosome segments can range in size from kilobases (kb) to megabases (Mb). (For example, about 1 kb to 10 kb, or about 10 kb to 100 kb, or about 100 kb to 1 Mb ).
在步骤155,从对多个合格样品中确定的合格序列剂量和对已知是感兴趣的序列的非整倍的样品确定的序列剂量建立的标准差值中衍生出多个阈值。注意该操作典型地与患者测试样品的分析异步执行。它可与例如从合格样品选择归一化序列同时执行。准确分类取决于对于不同类别(即:非整倍性类型)的概率分布之间的差异。在某些实例中,从针对每一类型的非整倍性(例如三体性21)的经验分布中选择出多个阈值。如在实例中所述,用于对三体性13、三体性18、三体性21、和单体性X非整倍性进行分类建立了可能的阈值,它们说明了用于通过对提取自一个母体样品的cfDNA进行测序来确定染色体性非整倍性的方法的用途,这个母体样品包括胎儿的和母体的核酸的混合物。被确定为用于辨别出针对一种染色体的非整倍而受影响的样品的这种阈值与被确定为用于辨别出针对一种不同非整倍性而受影响样品的阈值可以是相同的或不同的。如在这些实例中所示,针对每一感兴趣的染色体的阈值是从跨多个样品和多个测序轮次的感兴趣的染色体的剂量中的变异性来确定的。针对任何感兴趣的任何染色体的染色体剂量的可变性越小,针对跨全部未受影响样品的感兴趣的染色体的剂量中的分散就越窄,而这些样品被用来设定用于确定不同非整倍性的阈值。In step 155, multiple threshold values are derived from the standard deviation values of the sequence doses determined for the qualified sequence doses determined in the multiple qualified samples and the sequence doses determined for the samples known to be aneuploid of the sequence of interest. Note that this operation is typically performed asynchronously with the analysis of the patient's test sample. It can be performed simultaneously with, for example, selecting a normalized sequence from a qualified sample. Accurate classification depends on the difference between the probability distributions for different categories (i.e., aneuploidy types). In some instances, multiple threshold values are selected from the empirical distribution for each type of aneuploidy (e.g., trisomy 21). As described in the example, possible threshold values are established for classifying trisomy 13, trisomy 18, trisomy 21, and monosomy X aneuploidy, which illustrate the use of a method for determining chromosomal aneuploidy by sequencing cfDNA extracted from a maternal sample, wherein the maternal sample includes a mixture of fetal and maternal nucleic acids. This threshold value that is determined as for distinguishing the aneuploidy for a kind of chromosome and affected sample can be identical or different with the threshold value that is determined as for distinguishing the affected sample for a kind of different aneuploidy.As shown in these examples, the threshold value for each chromosome interested is determined from the variability in the dosage of the chromosome interested across multiple samples and multiple sequencing rounds.The variability for the chromosome dosage of any chromosome interested is less, and the dispersion in the dosage of the chromosome interested across all unaffected samples is more narrow, and these samples are used to setting the threshold value for determining different aneuploidies.
回到与对患者测试样品进行分类相关联的工艺流程,在步骤160,通过将针对感兴趣的序列的测试序列剂量与从这些合格样品剂量建立的至少一个阈值进行比较,在该测试样品中确定了感兴趣的序列的拷贝数变异。该操作可以通过用以测量序列标签密度和/或计算区段剂量的相同计算装置执行。Returning to the process flow associated with classifying a patient test sample, at step 160, the copy number variation of the sequence of interest is determined in the test sample by comparing the test sequence dose for the sequence of interest to at least one threshold established from the qualified sample doses. This operation can be performed by the same computing device used to measure sequence tag density and/or calculate segment doses.
在步骤165,将针对感兴趣的测试序列计算的剂量与设定为阈值的剂量进行比较,而这些阈值的选择是根据一个使用者定义的可靠性阈值,以此将该样品分类为“正常的”、“受影响的”或“无判定(no call)”。这些“无判定”样品是对其不能做出有可靠性的确定性诊断的样品。每一种类型受影响样品(例如21 三体性、21部分三体性、X单体性)都具有其自己的阈值,一个用于判定正常 (未受影响)样品并且另一个用于判定受影响样品(虽然在一些情况下两个阈值重合)。如在此其他地方所描述,在某些情况下,如果测试样品中核酸的胎儿分数足够高,那么无判定可以转变成判定(受影响或正常)。测试序列的分类可通过用于该工艺流程的其他操作的计算装置报告。在一些情况下,分类以电子格式报告,并且可显示、发电子邮件、发短信给相关的人等等。In step 165, the dose calculated for the test sequence of interest is compared to the dose set as the threshold, and these thresholds are selected according to a user-defined reliability threshold to classify the sample as "normal", "affected" or "no call". These "no call" samples are samples for which a reliable, definitive diagnosis cannot be made. Each type of affected sample (e.g., trisomy 21, partial trisomy 21, monosomy X) has its own threshold, one for calling normal (unaffected) samples and another for calling affected samples (although in some cases the two thresholds overlap). As described elsewhere herein, in some cases, if the fetal fraction of the nucleic acid in the test sample is high enough, then a no call can be converted to a call (affected or normal). The classification of the test sequence can be reported by the computing device used for other operations of the process flow. In some cases, the classification is reported in an electronic format and can be displayed, emailed, texted to relevant personnel, etc.
某些实施方案提供了一种方法,该方法用于提供在一个包括胎儿的和母体的核酸分子的生物学样品中的胎儿非整倍性的产前诊断。这种诊断是基于以下步骤做出的:获得对衍生自一个生物学测试样品(例如母体血浆样品)的胎儿的和母体的核酸分子混合物中的至少一部分进行测序的序列信息;从该测序数据中计算出针对一个或多个感兴趣的染色体的一个归一化染色体剂量、和/ 或针对一个或多个感兴趣的区段的一个归一化区段剂量;并且确定在对应地该测试样品中的针对这个感兴趣的染色体的染色体剂量和/或针对这个感兴趣的区段的区段剂量与在多个合格的(正常的)样品中确立的一个阈值之间的一个统计学上显著的差异,并且基于该统计差异提供产前诊断。如在该方法的步骤 165中所述,做出一个正常或受影响的诊断。在不能有信心地做出正常或受影响的诊断的情况下,提供一个“无判定”。Certain embodiments provide a method for providing a prenatal diagnosis of fetal aneuploidy in a biological sample comprising fetal and maternal nucleic acid molecules. This diagnosis is made based on the following steps: obtaining sequence information for sequencing at least a portion of a mixture of fetal and maternal nucleic acid molecules derived from a biological test sample (e.g., a maternal plasma sample); calculating a normalized chromosome dose for one or more chromosomes of interest and/or a normalized segment dose for one or more segments of interest from the sequencing data; and determining a statistically significant difference between the chromosome dose for the chromosome of interest and/or the segment dose for the segment of interest in the corresponding test sample and a threshold value established in multiple qualified (normal) samples, and providing a prenatal diagnosis based on the statistical difference. As described in step 165 of the method, a normal or affected diagnosis is made. In the case where a normal or affected diagnosis cannot be made with confidence, a "no determination" is provided.
样品和样品加工Samples and sample processing
样品sample
用于确定例如染色体非整倍性、部分非整倍性等CNV的样品可包括取自任何细胞、组织或器官的将确定一个或多个感兴趣的序列的拷贝数变异的样品。希望这些样品包含存在于细胞中的核酸和/或“无细胞”核酸(例如cfDNA)。Samples for determining CNVs such as chromosomal aneuploidy, partial aneuploidy, etc. may include samples taken from any cell, tissue, or organ in which copy number variations of one or more sequences of interest are to be determined. It is desirable that these samples contain nucleic acids present in cells and/or "cell-free" nucleic acids (e.g., cfDNA).
在某些实施方案中,有利的是获得无细胞核酸,例如无细胞DNA(cfDNA)。包括无细胞DNA在内的无细胞核酸可通过本领域中已知的不同的方法从包括但不限于血浆、血清以及尿的生物样品中获得(参见例如范(Fan)等人,美国国家科学院院刊(Proc Natl AcadSci)105:16266-16271[2008];小出(Koide) 等人,产前诊断(Prenatal Diagnosis)25:604-607[2005];陈(Chen)等人,自然医学(Nature Med.)2:1033-1035[1996];卢(Lo)等人,柳叶刀(Lancet) 350:485-487[1997];波特扎图(Botezatu)等人,临床化学(Clin Chem.)46: 1078-1084,2000;和苏(Su)等人,分子诊断学杂志(J Mol.Diagn.)6:101-107[2004])。为将样品中无细胞DNA与细胞分离,可使用不同的方法,包括但不限于分级分离、离心(例如密度梯度离心)、DNA特异性沉淀或高通量细胞分选和/或其他分离方法。可获得用于人工和自动分离cfDNA的可商购的试剂盒 (印第安纳州印第安纳波利斯市罗氏诊断(Roche Diagnostics,Indianapolis,IN)、加利福尼亚州巴伦西亚市凯杰(Qiagen,Valencia,CA)、特拉华州迪伦市迈凯瑞纳杰尔(Macherey-Nagel,Duren,DE))。包含cfDNA的生物样品已用于通过可检测染色体非整倍性和/或不同的多态性的测序检验,用在确定存在或不存在例如21三体性等染色体异常的检验中。In certain embodiments, it is advantageous to obtain cell-free nucleic acids, such as cell-free DNA (cfDNA). Cell-free nucleic acids, including cell-free DNA, can be obtained from biological samples including, but not limited to, plasma, serum, and urine by different methods known in the art (see, for example, Fan et al., Proc Natl Acad Sci 105: 16266-16271 [2008]; Koide et al., Prenatal Diagnosis 25: 604-607 [2005]; Chen et al., Nature Med. 2: 1033-1035 [1996]; Lo et al., Lancet 350: 485-487 [1997]; Botezatu et al., Clin Chem. 46: 1078-1084, 2000; and Su et al., J Molecular Diagnostics 2000). Mol.Diagn.) 6: 101-107 [2004]). To separate cell-free DNA from cells in a sample, different methods can be used, including but not limited to fractionation, centrifugation (e.g., density gradient centrifugation), DNA-specific precipitation, or high-throughput cell sorting and/or other separation methods. Commercially available kits for manual and automatic separation of cfDNA are available (Roche Diagnostics, Indianapolis, IN, Qiagen, Valencia, CA, Macherey-Nagel, Duren, DE). Biological samples containing cfDNA have been used for sequencing tests that can detect chromosomal aneuploidy and/or different polymorphisms, and are used in tests to determine the presence or absence of chromosomal abnormalities such as trisomy 21.
在不同的实施方案中,存在于样品中的cfDNA可在使用前(例如在制备测序文库前)特定富集或非特定富集。样品DNA的非特定富集是指样品的基因组DNA片段的全基因组扩增,其可用于在制备cfDNA测序文库前提高样品 DNA的含量。非特定富集可以是在包括一个以上基因组的样品中存在的两个基因组之一的选择性富集。举例来说,非特定富集可对母体样品中胎儿基因组具有选择性,其可通过已知方法实现以增加样品中胎儿DNA相对于母体DNA 的比例。作为替代方案,非特定富集可以是样品中存在的两个基因组的非选择性扩增。举例来说,非特定扩增可以是在包括来自胎儿和母体基因组的DNA 的混合物的样品中胎儿和母体DNA的扩增。全基因组扩增的方法在本领域中已知。简并寡核苷酸引物PCR(DOP)、引物延伸PCR技术(PEP)以及多重置换扩增(MDA)是全基因组扩增方法的实例。在某些实施方案中,包括来自不同基因组的cfDNA的混合物的样品不富集混合物中存在的基因组的 cfDNA。在其他实施方案中,包括来自不同基因组的cfDNA的混合物的样品不特定富集存在于样品中的任一个基因组。In various embodiments, the cfDNA present in the sample may be specifically enriched or non-specifically enriched prior to use (e.g., prior to preparing a sequencing library). Non-specific enrichment of sample DNA refers to whole-genome amplification of genomic DNA fragments of the sample, which can be used to increase the amount of sample DNA prior to preparing a cfDNA sequencing library. Non-specific enrichment can be selective enrichment of one of two genomes present in a sample comprising more than one genome. For example, non-specific enrichment can be selective for the fetal genome in a maternal sample, which can be achieved by known methods to increase the ratio of fetal DNA to maternal DNA in the sample. Alternatively, non-specific enrichment can be non-selective amplification of two genomes present in the sample. For example, non-specific amplification can be amplification of fetal and maternal DNA in a sample comprising a mixture of DNA from the fetal and maternal genomes. Methods for whole-genome amplification are known in the art. Degenerate oligonucleotide primer PCR (DOP), primer extension PCR (PEP), and multiple displacement amplification (MDA) are examples of whole-genome amplification methods. In certain embodiments, a sample comprising a mixture of cfDNA from different genomes is not enriched for cfDNA from the genomes present in the mixture. In other embodiments, a sample comprising a mixture of cfDNA from different genomes is not specifically enriched for any one genome present in the sample.
在此描述的方法所应用的包括核酸的样品典型地包括生物样品(“测试样品”),例如以上所述的。在某些实施方案中,通过大量众所周知的方法中的任一方法来纯化或分离准备对一个或多个CNV进行筛选的核酸。Samples comprising nucleic acids to which the methods described herein are applied typically comprise biological samples ("test samples"), such as those described above. In certain embodiments, nucleic acids to be screened for one or more CNVs are purified or isolated by any of a number of well-known methods.
因此,在某些实施方案中,样品包括或其组成为经过纯化或分离的多核苷酸,或可包括例如组织样品、生物学流体样品、细胞样品等样品。适合的生物学流体样品包括但不限于血液、血浆、血清、汗水、眼泪、痰、尿、痰、耳流出物、淋巴、唾液、脑髓液、灌洗液(ravages)、骨髓悬浮液、阴道流体、经子宫颈灌洗液、脑液、腹水、乳汁、呼吸道、肠以及生殖泌尿道分泌物、羊水、乳汁以及白细胞透入样品。在某些实施方案中,样品是通过无创性过程序容易地可获得的样品,例如血液、血浆、血清、汗水、眼泪、痰、尿、痰、耳流出物、唾液或粪便。在某些实施方案中,样品是周边血液样品或周边血液样品的血浆和/或血清部分。在其他实施方案中,这种生物学样品是棉签或涂片、活组织检查标本、或细胞培养。在另一实施方案中,这种样品是两种或更多种生物学样品的混合物,例如生物学样品可以包括两种或更多种生物学流体样品、组织样品、和细胞培养样品。如在此所使用的,术语“血液”、“血浆”和“血清”明确涵盖它们的分级部分或加工的部分。类似地,当一个样品是取自一种活组织检查、棉签、涂片等时,该“样品”明确地涵盖衍生自这种活组织检查、棉签、涂片等的加工的分离部或部分。Therefore, in certain embodiments, sample includes or consists of the polynucleotide through purification or separation, or can include samples such as tissue sample, biological fluid sample, cell sample.Suitable biological fluid sample includes but is not limited to blood, blood plasma, serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavage fluid (ravages), bone marrow suspension, vaginal fluid, cervical lavage fluid, brain fluid, ascites, milk, respiratory tract, intestine and genitourinary tract secretion, amniotic fluid, milk and leukocyte infiltration sample.In certain embodiments, sample is the sample that can be obtained easily by noninvasive procedure, such as blood, blood plasma, serum, sweat, tears, sputum, urine, sputum, ear discharge, saliva or feces.In certain embodiments, sample is the plasma and/or serum part of peripheral blood sample or peripheral blood sample.In other embodiments, this biological sample is cotton swab or smear, biopsy specimen or cell culture. In another embodiment, this sample is a mixture of two or more biological samples, for example, a biological sample can include two or more biological fluid samples, tissue samples, and cell culture samples. As used herein, the terms "blood," "plasma," and "serum" clearly encompass their fractionated or processed parts. Similarly, when a sample is taken from a biopsy, cotton swab, smear, etc., the "sample" clearly encompasses a separation or portion derived from the processing of such a biopsy, cotton swab, smear, etc.
在某些实施方案中,样品可以得自多个来源,包括但不限于:来自不同个体的样品,来自相同或不同个体的不同发展阶段的样品,来自不同的患病个体(例如患有癌症的或怀疑具有遗传性障碍的个体)、正常个体的样品,在个体的疾病的不同阶段获得的样品,得自经历对疾病的不同治疗的个体的样品,来自经历不同环境因素的个体的样品,来自对一种病情易感的个体的样品,来自暴露于一种传染病因素(例如HIV)的个体等等。In certain embodiments, samples can be obtained from multiple sources, including but not limited to: samples from different individuals, samples from different stages of development of the same or different individuals, samples from different diseased individuals (e.g., individuals with cancer or suspected of having a genetic disorder), normal individuals, samples obtained at different stages of an individual's disease, samples from individuals who have undergone different treatments for the disease, samples from individuals who have been exposed to different environmental factors, samples from individuals who are susceptible to a disease condition, samples from individuals exposed to an infectious agent (e.g., HIV), and the like.
在一个示意性但非限制性的实施方案中,这种样品是得自怀孕雌性(例如孕妇)的母体样品。在这种情况下,该样品可以使用在此说明的方法来进行分析,以提供胎儿中潜在染色体异常的产前诊断。这种母体样品可以是组织样品、生物学流体样品、或细胞样品。生物学流体包括(作为非限制性实例):血液,血浆,血清,汗水,眼泪,痰,尿,痰,耳流出物,淋巴,唾液,脑脊液,灌洗液,骨髓悬浮液,阴道流出物,经宫颈的灌洗液,脑液,腹水,乳汁,呼吸、肠和生殖泌尿道的分泌物,和白细胞分离术样品。In an illustrative but non-limiting embodiment, this sample is a maternal sample derived from a pregnant female (e.g., a pregnant woman). In this case, the sample can be analyzed using the method described herein to provide a prenatal diagnosis of potential chromosomal abnormalities in the fetus. This maternal sample can be a tissue sample, a biological fluid sample, or a cell sample. Biological fluids include (as non-limiting examples) blood, plasma, serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavage fluid, bone marrow suspension, vaginal discharge, lavage fluid through the cervix, brain fluid, ascites, milk, secretions of the respiratory, intestinal and genitourinary tracts, and leukopheresis samples.
在另一个示意性但非限制性的实施方案中,母体样品是两种或更多种生物学样品的混合物,例如,该生物学样品可以包括两种或更多种生物学流体样品、组织样品、和细胞培养样品。在一些实施方案中,这种样品是通过无创性过程容易地可获得的样品,例如,血液、血浆、血清、汗水、眼泪、痰、尿、乳汁、痰、耳流出物、唾液和粪便。在一些实施方案中,这种生物学样品是外周血样品、和/或其血浆或血清部分。在其他实施方案中,这种生物学样品是棉签或涂片、活组织检查标本、或细胞培养的样品。如以上披露的,术语“血液”、“血浆”和“血清”明确涵盖它们的分离部或加工的部分。类似地,当一个样品取自活组织检查、棉签、涂片等时,这个“样品”明确涵盖衍生自活组织检查、棉签、涂片等的加工的分离部或部分。In another illustrative but non-restrictive embodiment, the maternal sample is a mixture of two or more biological samples, for example, the biological sample can include two or more biological fluid samples, tissue samples and cell culture samples. In some embodiments, this sample is a sample easily obtainable by a non-invasive process, for example, blood, plasma, serum, sweat, tears, sputum, urine, milk, sputum, ear discharge, saliva and feces. In some embodiments, this biological sample is a peripheral blood sample and/or its plasma or serum portion. In other embodiments, this biological sample is a sample of a cotton swab or smear, biopsy specimen or cell culture. As disclosed above, the terms "blood", "plasma" and "serum" clearly encompass their separations or processed parts. Similarly, when a sample is taken from a biopsy, cotton swab, smear, etc., this "sample" clearly encompasses a separation or part derived from the processing of a biopsy, cotton swab, smear, etc.
在某些实施方案中,样品还可以是得自体外培养的组织、细胞、或其他含多核苷酸的来源。这些培养的样品可以取自多个来源,包括但不限于:维持在不同培养基和条件(例如pH值、压力、或温度)下的培养物(例如组织或细胞),维持了不同长度的时段的培养物(例如组织或细胞),用不同因子或试剂(例如药物候选,或调节剂)处理的培养物(例如组织或细胞),或不同类型的组织和/或细胞的培养物。In certain embodiments, the sample can also be a tissue, cell, or other source containing polynucleotides obtained from an in vitro culture. These cultured samples can be taken from a variety of sources, including but not limited to: cultures (e.g., tissues or cells) maintained under different culture media and conditions (e.g., pH, pressure, or temperature), cultures (e.g., tissues or cells) maintained for different lengths of time, cultures (e.g., tissues or cells) treated with different factors or reagents (e.g., drug candidates, or modulators), or cultures of different types of tissues and/or cells.
从生物学来源分离核酸的方法是人们熟知的、并且取决于源的性质将不同。本领域的普通技术人员可以容易地从一个源分离出如对于在此说明的方法所需要的一种或多种核酸。在一些情况中,将核酸样品中的核酸分子片段化可以是有利的。片段化可以是随机的,或者它可以是特异的,例如使用限制性内切酶消化所达到的情况。用于随机片段化的方法在本领域是为人熟知的,并且包括例如限制性DNA酶消化、碱处理和物理剪切。在一个实施方案中,样品核酸以cfDNA形式获得,其未经历片段化。Methods for isolating nucleic acids from biological sources are well known and will vary depending on the nature of the source. One of ordinary skill in the art can easily isolate one or more nucleic acids as required for the methods described herein from a source. In some cases, it may be advantageous to fragment the nucleic acid molecules in a nucleic acid sample. Fragmentation may be random, or it may be specific, such as that achieved by digestion with a restriction endonuclease. Methods for random fragmentation are well known in the art and include, for example, digestion with restriction endonucleases, alkali treatment, and physical shearing. In one embodiment, the sample nucleic acid is obtained in the form of cfDNA, which has not undergone fragmentation.
在其他示意性实施方案中,样品核酸以基因组DNA形式获得,其被片段化成约300或更多、约400或更多或约500或更多碱基对的片段,并且NGS 方法可容易地应用于其上。In other exemplary embodiments, the sample nucleic acid is obtained in the form of genomic DNA, which is fragmented into fragments of about 300 or more, about 400 or more, or about 500 or more base pairs, and the NGS method can be readily applied thereto.
测序文库制备Sequencing library preparation
在一个实施方案中,在此描述的方法可利用下一代测序技术(NGS),这些技术允许多个样品以基因组分子形式个别测序(即单路测序)或作为包括编索引的基因组分子的汇集样品在单一测序批次上测序(例如多重测序)。这些方法可产生DNA序列的多达几亿个读数。在不同的实施方案中,基因组核酸和/或编索引的基因组核酸的序列可使用例如在此描述的下一代测序技术(NGS) 确定。在不同的实施方案中,可使用如在此描述的一个或多个处理器来分析使用NGS获得的大量序列数据。In one embodiment, the methods described herein can utilize next generation sequencing technology (NGS), which allows multiple samples to be individually sequenced (i.e., single-channel sequencing) in the form of genomic molecules or sequenced (e.g., multiple sequencing) on a single sequencing batch as a pooled sample of genomic molecules comprising indexes. These methods can produce up to hundreds of millions of readings of DNA sequences. In various embodiments, the sequence of genomic nucleic acids and/or indexed genomic nucleic acids can be determined using next generation sequencing technology (NGS) such as described herein. In various embodiments, one or more processors as described herein can be used to analyze the large amount of sequence data obtained using NGS.
在不同的实施方案中,这些测序技术的使用不涉及到测序文库的制备。In various embodiments, the use of these sequencing technologies does not involve the preparation of sequencing libraries.
然而,在某些实施方案中,在此涵盖的测序方法涉及到测序文库的制备。在一个示意性方法中,测序文库的制备包括产生一系列随机的经适配子修饰的准备进行测序的DNA片段(例如多核苷酸)。多核苷酸的测序文库可从包括 DNA或cDNA(例如作为在反转录酶的作用下由RNA模板产生的互补或拷贝 DNA的DNA或cDNA)的同等物、类似物在内的DNA或RNA制备。多核苷酸可起始于双股形式(例如dsDNA(例如基因组DNA片段)、cDNA、PCR扩增产物等等),或在某些实施方案中,多核苷酸可起始于单股形式(例如ssDNA、 RNA等等)并且已转变成dsDNA形式。举例来说,在某些实施方案中,单股 mRNA分子可拷贝成适用于制备测序文库的双股cDNA。主要多核苷酸分子的精确序列总体上对文库制备的方法来说并不重要,并且可能是已知或未知的。在一个实施方案中,多核苷酸分子是DNA分子。更具体地说,在某些实施方案中,多核苷酸分子代表生物体的整个遗传补体或实质上生物体的整个遗传补体,并且是典型地包括内含子序列与外显子序列(编码序列)以及非编码调节序列(例如启动子和强化子序列)的基因组DNA分子(例如细胞DNA、无细胞DNA(cfDNA)等等)。在某些实施方案中,主要多核苷酸分子包括人类基因组DNA分子,例如存在于怀孕受试者的周边血液中的cfDNA分子。However, in certain embodiments, the sequencing methods encompassed herein relate to the preparation of sequencing libraries. In a schematic method, the preparation of sequencing libraries includes generating a series of random aptamer-modified DNA fragments (e.g., polynucleotides) that are ready for sequencing. Sequencing libraries of polynucleotides can be prepared from DNA or RNA, including equivalents or analogs of DNA or cDNA (e.g., DNA or cDNA that is complementary or copies of DNA produced by an RNA template under the action of a reverse transcriptase). Polynucleotides can start from double-stranded forms (e.g., dsDNA (e.g., genomic DNA fragments), cDNA, PCR amplification products, etc.), or in certain embodiments, polynucleotides can start from single-stranded forms (e.g., ssDNA, RNA, etc.) and have been converted to dsDNA forms. For example, in certain embodiments, single-stranded mRNA molecules can be copied into double-stranded cDNAs suitable for preparing sequencing libraries. The precise sequence of the primary polynucleotide molecules is generally unimportant to the method for library preparation and may be known or unknown. In one embodiment, the polynucleotide molecules are DNA molecules. More specifically, in certain embodiments, the polynucleotide molecules represent the entire genetic complement of an organism or substantially the entire genetic complement of an organism and are genomic DNA molecules (e.g., cellular DNA, cell-free DNA (cfDNA), etc.) that typically include intron sequences and exon sequences (coding sequences) as well as non-coding regulatory sequences (e.g., promoter and enhancer sequences). In certain embodiments, the primary polynucleotide molecules include human genomic DNA molecules, such as cfDNA molecules present in the peripheral blood of a pregnant subject.
通过使用包括特定范围的片段尺寸的多核苷酸来促进某些NGS测序平台的测序文库的制备。这些文库的制备典型地包括将大的多核苷酸(例如细胞基因组DNA)片段化以获得所需尺寸范围内的多核苷酸。The preparation of sequencing libraries for certain NGS sequencing platforms is facilitated by using polynucleotides comprising a specific range of fragment sizes. The preparation of these libraries typically comprises fragmenting large polynucleotides (e.g., cellular genomic DNA) to obtain polynucleotides within the desired size range.
可通过本领域普通技术人员已知的多种方法中的任一者来实现片段化。举例来说,可通过包括但不限于喷雾、声处理以及水力剪切的机械手段来实现片段化。然而,机械片段化典型地会使DNA主链在C-O、P-O以及C-C键上裂解,从而产生具有断开的C-O、P-O以及C-C键的钝端与3'-和5'-突出端的多相混合物(参见例如阿奈瑞(Alnemri)和立瓦克(Liwack),生物化学杂志 (J Biol.Chem)265:17323-17333[1990];理查德(Richards)和布瓦耶(Boyer),分子生物学期刊(J Mol Biol)11:327-240[1965]),这些末端可能需要修复,因为其可能缺乏对制备供测序用的DNA所需要的随后酶反应(例如测序适配子的连接)来说所必需的5'-磷酸盐。Fragmentation can be achieved by any of a variety of methods known to those of ordinary skill in the art. For example, fragmentation can be achieved by mechanical means including, but not limited to, spraying, sonication, and hydrodynamic shearing. However, mechanical fragmentation typically causes the DNA backbone to cleave at C-O, P-O, and C-C bonds, thereby producing a heterogeneous mixture of blunt ends with disconnected C-O, P-O, and C-C bonds and 3'- and 5'-overhangs (see, e.g., Alnemri and Liwack, J Biol. Chem 265: 17323-17333 [1990]; Richards and Boyer, J Mol Biol 11: 327-240 [1965]), which may require repair because they may lack the 5'-phosphate necessary for subsequent enzymatic reactions (e.g., connection of sequencing adapters) required for preparing DNA for sequencing.
相比之下,cfDNA典型地以小于约300碱基对的片段形式存在,因此对于使用cfDNA样品来产生测序文库来说,并不典型地需要片段化。In contrast, cfDNA typically exists in fragments smaller than about 300 base pairs, and thus fragmentation is not typically required for generating sequencing libraries using cfDNA samples.
典型地,无论多核苷酸是用力破碎成片段(例如离体破碎成片段),还是天然以片段形式存在,其都要转变成具有5'-磷酸盐和3'-羟基的钝端DNA。例如用于使用例如在此其他地方所述的伊路纳平台测序的方案等标准方案指导用户对样品DNA进行末端修复,以在dA加尾前纯化进行末端修复的产物并且在文库制备的适配子连接步骤前纯化dA加尾的产物。Typically, whether the polynucleotide is fragmented by force (e.g., in vitro) or exists naturally in fragmented form, it is converted to blunt-ended DNA with a 5'-phosphate and a 3'-hydroxyl group. Standard protocols, such as those for sequencing using the ILuna platform described elsewhere herein, instruct the user to perform end-repair on the sample DNA, to purify the end-repaired products before dA tailing, and to purify the dA tailed products before the adapter ligation step of the library preparation.
在此描述的序列文库制备方法的不同的实施方案无需执行标准方案典型地要求的用以获得可通过NGS测序的经修饰的DNA产物的一个或多个步骤。以下描述了简略法(ABB法)、一步法以及两步法。连续的dA加尾和适配子连接在此称为两步工艺。连续的dA加尾、适配子连接以及扩增在此称为一步法。在不同的实施方案中,ABB法以及两步法可在溶液中或固体表面上执行。在某些实施方案中,一步法在固体表面上执行。The different embodiments of the sequence library preparation methods described herein do not require the execution of one or more steps typically required by standard protocols to obtain modified DNA products that can be sequenced by NGS. The following describes a simplified method (ABB method), a one-step method, and a two-step method. Continuous dA tailing and adapter ligation are referred to as a two-step process herein. Continuous dA tailing, adapter ligation, and amplification are referred to as a one-step method herein. In different embodiments, the ABB method and the two-step method can be performed in solution or on a solid surface. In certain embodiments, the one-step method is performed on a solid surface.
图2中图示了例如伊路纳等标准方法与根据本发明的实施方案用于制备 DNA分子供通过NGS进行测序的简略法(ABB;实例2)、两步法以及一步法 (实例3-6)的比较。2 illustrates a comparison of standard methods such as ILUNA with an abbreviated method (ABB; Example 2), a two-step method, and a one-step method (Examples 3-6) for preparing DNA molecules for sequencing by NGS according to embodiments of the present invention.
简略制备-ABBSimple preparation-ABB
在一个实施方案中,提供了用于制备序列文库的简略法(ABB法),其包括末端修复、dA加尾以及适配子连接的连续步骤(ABB)。在用于制备测序文库的无需dA加尾步骤的实施方案(参见例如使用罗氏454和SOLIDTM3平台进行测序的方案)中,末端修复和适配子连接的步骤可不包括在适配子连接前将末端修复的产物进行纯化的步骤。In one embodiment, a simplified method for preparing a sequence library (ABB method) is provided, which includes the consecutive steps of end repair, dA tailing, and adaptor ligation (ABB). In embodiments for preparing a sequencing library that do not require a dA tailing step (see, for example, protocols for sequencing using the Roche 454 and SOLID ™ 3 platforms), the end repair and adaptor ligation steps may not include a step of purifying the end-repaired product before adaptor ligation.
包括末端修复、dA加尾以及适配子连接的连续步骤的测序文库制备方法在此称为简略法(ABB),并且显示出产生了质量出乎意料地改善同时样品分析加快的测序文库(参见例如实例2)。根据该方法的一些实施方案,ABB法可在溶液中执行,如在此所例证。ABB法还可以在固体表面上执行,是通过首先在溶液中对DNA进行末端修复和dA加尾,并且随后如在此其他地方针对在固体表面上的一步或两步制备所描述的将DNA结合到固体表面。包括将适配子连接到带dA尾的DNA上的步骤在内的三个酶促步骤都在没有聚乙二醇的情况下执行。用于执行包括适配子连接到DNA在内的连接反应的公开方案指导用户在存在聚乙二醇的情况下执行连接。申请者确定适配子连接到带 dA尾的DNA上可在没有聚乙二醇的情况下执行。The sequencing library preparation method comprising the consecutive steps of end repair, dA tailing and adapter ligation is referred to herein as the abbreviated method (ABB), and has been shown to produce sequencing libraries with unexpectedly improved quality and accelerated sample analysis (see, e.g., Example 2). According to some embodiments of the method, the ABB method can be performed in solution, as illustrated herein. The ABB method can also be performed on a solid surface by first performing end repair and dA tailing on the DNA in solution, and then binding the DNA to the solid surface as described elsewhere herein for a one-step or two-step preparation on a solid surface. The three enzymatic steps, including the step of ligating the adapter to the DNA with the dA tail, are all performed in the absence of polyethylene glycol. The disclosed protocol for performing the ligation reaction comprising ligating the adapter to the DNA guides the user to perform the ligation in the presence of polyethylene glycol. The applicant has determined that ligating the adapter to the DNA with the dA tail can be performed in the absence of polyethylene glycol.
在另一个实施方案中,制备测序文库无需在dA加尾步骤前对cfDNA进行末端修复。申请者已经确定,无需破碎成片段的cfDNA不必进行末端修复,并且根据本发明的实施方案制备cfDNA测序文库不包括末端修复步骤和纯化步骤,从而组合酶促反应并且进一步简化待测序的DNA的制备。cfDNA以钝端与3'-和5'-突出端的混合物形式存在,这些末端是在使细胞基因组DNA裂解成末端为5'-磷酸盐和3'-羟基的cfDNA片段的核酸酶的作用下在体内产生。末端修复步骤的消除将选择天然以钝端分子形式存在的cfDNA分子和天然具有 5'突出端的cfDNA分子,这些5'突出端通过例如用于如下所述将一个或多个脱氧核苷酸附接到3'-OH上(dA加尾)的克列诺外切聚合酶(Klenow Exo-)等酶的聚合酶活性被填充。cfDNA的末端修复步骤的消除不选择具有3'-突出端 (3'-OH)的cfDNA分子。意外地,这些3'-OHcfDNA分子排除在测序文库之外不影响文库中基因组序列的表达,这表明cfDNA分子的末端修复步骤可以从测序文库的制备中排除掉(参见实例)。除cfDNA外,可用于制备测序文库的其他类型的未修复多核苷酸包括由RNA分子(例如mRNA、siRNA、sRNA) 反转录产生的DNA分子和作为从磷酸化引物合成的DNA扩增子的未修复 DNA分子。当使用未磷酸化引物时,从RNA反转录的DNA和/或从DNA模板扩增的DNA(即DNA扩增子)也可以在通过多核苷酸激酶合成后磷酸化。In another embodiment, the sequencing library is prepared without end-repairing the cfDNA prior to the dA-tailing step. Applicants have determined that cfDNA that does not need to be fragmented does not require end-repair, and according to embodiments of the present invention, the preparation of cfDNA sequencing libraries does not include an end-repair step and a purification step, thereby combining enzymatic reactions and further simplifying the preparation of the DNA to be sequenced. cfDNA exists as a mixture of blunt ends and 3'- and 5'-overhangs, which are generated in vivo by nucleases that cleave cellular genomic DNA into cfDNA fragments terminated with 5'-phosphates and 3'-hydroxyls. Eliminating the end-repair step selects for cfDNA molecules that naturally exist as blunt-ended molecules and for cfDNA molecules that naturally have 5'-overhangs that are filled in by the polymerase activity of an enzyme such as Klenow exopolymerase, which is used to attach one or more deoxynucleotides to the 3'-OH (dA-tailing) as described below. Eliminating the end-repair step for cfDNA does not select for cfDNA molecules with 3'-overhangs (3'-OH). Surprisingly, the exclusion of these 3'-OH cfDNA molecules from the sequencing library did not affect the representation of genomic sequences in the library, suggesting that the end-repair step of cfDNA molecules can be eliminated from the preparation of sequencing libraries (see Examples). In addition to cfDNA, other types of unrepaired polynucleotides that can be used to prepare sequencing libraries include DNA molecules generated by reverse transcription of RNA molecules (e.g., mRNA, siRNA, sRNA) and unrepaired DNA molecules that are DNA amplicons synthesized from phosphorylated primers. When unphosphorylated primers are used, DNA reverse transcribed from RNA and/or DNA amplified from a DNA template (i.e., DNA amplicons) can also be phosphorylated after synthesis by polynucleotide kinases.
在另一个实施方案中,未修复的DNA被用于根据两步法制备测序文库,其中不包括DNA的末端修复,并且未修复的DNA进行dA加尾和适配子连接这两个连续步骤(参见图2)。两步法可在溶液中或固体表面上执行。当在溶液中执行时,两步法包括利用从生物样品获得的DNA,不包括对该DNA进行末端修复的步骤,并且例如通过例如塔克(Taq)聚合酶或克列诺外切聚合酶等某些类型DNA多聚酶的活性来添加单脱氧核苷酸(例如脱氧腺苷(A))到未修复的DNA样品中的多核苷酸的3'-端。在随后连续步骤中,dA加尾的产物连接到适配子,这些产物与可商购的适配子的每一个双螺旋体区域的3'端上存在的`T`突出端相容。dA加尾防止了两个钝端多核苷酸的自我连接,以利于形成经连接适配子的序列。因此,在一些实施方案中,未修复的cfDNA进行dA 加尾和适配子连接的连续步骤,其中带dA尾的DNA是从未修复的DNA制备的、并且在dA加尾反应后不进行纯化步骤。双股适配子可连接到带dA尾的 DNA的两端。可利用一组具有相同序列的适配子或一组两个不同的适配子。在不同的实施方案中,还可以使用一组或多个不同组的相同或不同适配子。适配子可包括索引序列以能够对文库DNA进行多重测序。适配子连接到带dA 尾的DNA上任选地在没有聚乙二醇的情况下执行。In another embodiment, unrepaired DNA is used to prepare a sequencing library according to a two-step method, which does not include the end repair of DNA, and the unrepaired DNA carries out two consecutive steps of dA tailing and adapter connection (see Figure 2). The two-step method can be performed in solution or on a solid surface. When performed in solution, the two-step method includes utilizing the DNA obtained from the biological sample, does not include the step of carrying out end repair on the DNA, and for example, adds a single deoxynucleotide (such as deoxyadenosine (A)) to the 3'-end of the polynucleotide in the unrepaired DNA sample by the activity of certain types of DNA polymerases such as Taq (Taq) polymerase or Klenow exopolymerase. In subsequent consecutive steps, the product of dA tailing is connected to the adapter, and these products are compatible with the ` T ' overhangs present on the 3' end of each double helix region of commercially available adapters. dA tailing prevents the self-connection of two blunt-end polynucleotides, so as to form a sequence through which the adapter is connected. Therefore, in some embodiments, unrepaired cfDNA undergoes sequential dA-tailing and adaptor ligation steps, wherein the dA-tailed DNA is prepared from unrepaired DNA and no purification step is performed after the dA-tailing reaction. Double-stranded adaptors can be ligated to both ends of the dA-tailed DNA. A single set of adaptors with identical sequences or a set of two different adaptors can be utilized. In various embodiments, one or more different sets of the same or different adaptors can also be used. The adaptors can include index sequences to enable multiplexed sequencing of the library DNA. Adapter ligation to the dA-tailed DNA is optionally performed in the absence of polyethylene glycol.
两步-在溶液中制备Two-step preparation in solution
在不同的实施方案中,当两步法在溶液中执行时,可纯化适配子连接反应的产物以除去未连接的适配子、可能已彼此连接的适配子。纯化还可以选择用于成簇产生的模板的尺寸范围,之前可以任选地先进行扩增,例如PCR扩增。连接产物可通过包括但不限于凝胶电泳、固相可逆固定(SPRI)等等的多种方法中的任一者纯化。在一些实施方案中,经过纯化的连接适配子的DNA 在测序前进行扩增,例如PCR扩增。某些测序平台要求文库DNA进一步进行另一次扩增。举例来说,根据伊路纳技术,伊路纳平台要求文库DNA的成簇扩增应作为测序的不可分割的部分被执行。在其他实施方案中,使经过纯化的连接适配子的DNA变性并且使单股DNA分子附接到测序仪的流动池。因此,在某些实施方案中,用于在溶液中从未修复的DNA制备测序文库以供NGS测序的方法包括从样品获得DNA分子;并且对从样品获得的未修复的DNA分子进行dA加尾和适配子连接的连续步骤。In different embodiments, when the two-step method is performed in solution, the product of the adaptor ligation reaction can be purified to remove unconnected adaptors and adaptors that may have been connected to each other. Purification can also select the size range of the template used for cluster generation, and can optionally be amplified before, such as PCR amplification. The ligation product can be purified by any of a variety of methods including but not limited to gel electrophoresis, solid phase reversible immobilization (SPRI), etc. In some embodiments, the purified adaptor-ligated DNA is amplified before sequencing, such as PCR amplification. Certain sequencing platforms require that the library DNA be further amplified. For example, according to the ILuna technology, the ILuna platform requires that cluster amplification of the library DNA should be performed as an integral part of sequencing. In other embodiments, the purified adaptor-ligated DNA is denatured and the single-stranded DNA molecules are attached to the flow cell of the sequencer. Therefore, in certain embodiments, a method for preparing a sequencing library for NGS sequencing from unrepaired DNA in solution includes obtaining DNA molecules from a sample; and performing consecutive steps of dA tailing and adaptor ligation on the unrepaired DNA molecules obtained from the sample.
如以上所指示,在不同的实施方案中,文库制备的这些方法被合并到确定例如非整倍性等拷贝数变异(CNV)的方法中。因此,在一个示意性实施方案中,提供一种用于确定存在或不存在一种或多种胎儿染色体非整倍性的方法,该方法包括:(a)获得包括胎儿与母体无细胞DNA的混合物的母体样品;(b) 将胎儿与母体cfDNA的混合物从所述样品中分离;(c)由胎儿与母体cfDNA 的混合物制备测序文库;其中制备该文库包括对cfDNA进行dA加尾和适配子连接的连续步骤,并且其中制备该文库不包括对cfDNA进行末端修复,并且该制备是在溶液中执行;(d)对该测序文库中的至少一部分进行大规模平行测序,以便获得针对样品中胎儿和母体cfDNA的序列信息;(e)至少暂时地将该序列信息存储在一种计算机可读媒质中;(f)使用该存储的序列信息,以计算的方式识别出一个或多个感兴趣的染色体中每一个的序列标签的数目和任一个或多个感兴趣的染色体中每一个的归一化序列的序列标签的数目;(g)使用这个或这些感兴趣的染色体中每一个的序列标签的数目和这个或这些感兴趣的染色体中每一个的归一化序列的序列标签的数目,针对这个或这些感兴趣的染色体中的每一个以计算的方式计算出染色体剂量;并且(h)将针对这个或这些感兴趣的染色体中的每一个染色体剂量与针对这个或这些感兴趣的染色体中的每一个的一个相应阈值进行比较,并且由此在样品中确定存在或不存在胎儿染色体非整倍性,其中步骤(e)-(h)是使用一个或多个处理器执行的。此方法例证于实例3和4中。As indicated above, in various embodiments, these methods of library preparation are incorporated into methods for determining copy number variation (CNV), such as aneuploidy. Thus, in one illustrative embodiment, a method for determining the presence or absence of one or more fetal chromosomal aneuploidies is provided, the method comprising: (a) obtaining a maternal sample comprising a mixture of fetal and maternal cell-free DNA; (b) isolating the mixture of fetal and maternal cfDNA from the sample; (c) preparing a sequencing library from the mixture of fetal and maternal cfDNA; wherein preparing the library comprises the sequential steps of dA tailing and adapter ligation of the cfDNA, and wherein preparing the library does not comprise end repair of the cfDNA, and the preparation is performed in solution; (d) performing massively parallel sequencing on at least a portion of the sequencing library to obtain sequence information for the fetal and maternal cfDNA in the sample; (e) storing the sequence information, at least temporarily, in a computer-readable medium; and (f) using the stored sequence information to computationally identify the number of sequence tags for each of one or more chromosomes of interest and the number of sequence tags for each of any one or more chromosomes of interest. (g) using the number of sequence tags for each of the one or more chromosomes of interest and the number of sequence tags for the normalizing sequence for each of the one or more chromosomes of interest, a chromosome dose is calculated for each of the one or more chromosomes of interest; and (h) each chromosome dose for the one or more chromosomes of interest is compared with a corresponding threshold for each of the one or more chromosomes of interest, and thereby determining the presence or absence of fetal chromosomal aneuploidy in the sample, wherein steps (e)-(h) are performed using one or more processors. This method is exemplified in Examples 3 and 4.
两步和一步-固相制备Two-step and one-step solid phase preparation
在某些实施方案中,测序文库根据以上针对在溶液中制备文库所描述的两步法在固体表面上制备。根据两步法在固体表面上制备测序文库包括从样品获得例如cfDNA等DNA分子,并且执行dA加尾和适配子连接的连续步骤,其中适配子连接是在固体表面上执行的。可使用修复或未修复的DNA。在某些实施方案中,将连接适配子的产物从固体表面分离、纯化并且在测序前扩增。在其他实施方案中,将连接适配子的产物从固体表面分离、纯化并且在测序前不扩增。在另外的其他实施方案中,将连接适配子的产物扩增、从固体表面分离、并且纯化。在某些实施方案中,对经过纯化的产物进行扩增。在其他实施方案中,不对经过纯化的产物进行扩增。测序方案可包括扩增,例如成簇扩增。在不同的实施方案中,分离的连接适配子的产物在扩增和/或测序前被纯化。In certain embodiments, the sequencing library is prepared on a solid surface according to the two-step method described above for preparing the library in solution. Preparing the sequencing library on a solid surface according to the two-step method includes obtaining DNA molecules such as cfDNA from a sample, and performing the continuous steps of dA tailing and adapter ligation, wherein the adapter ligation is performed on a solid surface. Repaired or unrepaired DNA can be used. In certain embodiments, the product of the adapter is separated from the solid surface, purified and amplified before sequencing. In other embodiments, the product of the adapter is separated from the solid surface, purified and not amplified before sequencing. In other embodiments, the product of the adapter is amplified, separated from the solid surface and purified. In certain embodiments, the purified product is amplified. In other embodiments, the purified product is not amplified. The sequencing scheme may include amplification, such as cluster amplification. In different embodiments, the separated product of the adapter is purified before amplification and/or sequencing.
在某些实施方案中,测序文库是根据一步法在固体表面上制备。在不同的实施方案中,根据一步法在固体表面上制备测序文库包括从样品获得例如 cfDNA等DNA分子,并且执行dA加尾、适配子连接和扩增的连续步骤,其中适配子连接是在固体表面上执行的。连接适配子的产物无需在纯化前被分离。In certain embodiments, the sequencing library is prepared on a solid surface according to a one-step method. In various embodiments, the sequencing library is prepared on a solid surface according to a one-step method, comprising obtaining DNA molecules, such as cfDNA, from a sample and performing the sequential steps of dA tailing, adaptor ligation, and amplification, wherein adaptor ligation is performed on the solid surface. The adaptor-ligated product does not need to be isolated prior to purification.
图3描绘了用于在固体表面上制备测序文库的两步法和一步法。可使用修复或未修复的DNA在固体表面上制备测序文库。在某些实施方案中,使用未修复的DNA。可用于在固体表面上制备测序文库的未修复的DNA的实例包括但不限于cfDNA、已经使用磷酸化引物从RNA反转录的DNA、已经使用磷酸化引物从DNA模板扩增的DNA(即磷酸化DNA扩增子)。可用于在固体表面上制备测序文库的修复的DNA的实例包括但不限于cfDNA和已经形成钝端并且磷酸化的成片段的基因组DNA(即通过例如mRNA、sRNA、siRNA等 RNA反转录所产生的修复的磷酸化DNA)。在某些示意性实施方案中,从母体样品获得的未修复的cfDNA被用于制备测序文库。Fig. 3 depicts a two-step method and a one-step method for preparing a sequencing library on a solid surface. Repaired or unrepaired DNA can be used to prepare a sequencing library on a solid surface. In certain embodiments, unrepaired DNA is used. Examples of unrepaired DNA that can be used to prepare a sequencing library on a solid surface include, but are not limited to, cfDNA, DNA reverse transcribed from RNA using a phosphorylated primer, DNA amplified from a DNA template using a phosphorylated primer (i.e., phosphorylated DNA amplicons). Examples of repaired DNA that can be used to prepare a sequencing library on a solid surface include, but are not limited to, cfDNA and genomic DNA (i.e., phosphorylated DNA of a repair produced by reverse transcription of RNA such as mRNA, sRNA, siRNA, etc.) that has formed blunt ends and phosphorylated fragments. In certain illustrative embodiments, unrepaired cfDNA obtained from a maternal sample is used to prepare a sequencing library.
在固体表面上制备测序文库包括用两部分结合物的第一部分涂布固体表面、通过将两部分结合物的第二部分附接到适配子上来修饰第一适配子、并且通过两部分结合物的第一与第二部分的结合相互作用将适配子固定在固体表面上。举例来说,在固体表面上制备测序文库可包括将多肽、多核苷酸或小分子附接到文库适配子的一个末端,该多肽、多核苷酸或小分子能够与固定在固体表面上的多肽、多核苷酸或小分子形成结合复合物。可用于固定多肽、多核苷酸或小分子的固体表面包括但不限于塑料、纸、薄膜、滤纸、芯片、针或载玻片、硅石或聚合物珠粒(例如聚丙烯、聚苯乙烯、聚碳酸酯)、2D或3D分子骨架或用于固相合成多肽或多核苷酸的任何支撑物。Preparation of a sequencing library on a solid surface comprises coating the solid surface with a first part of a two-part binder, modifying the first adaptor by attaching the second part of the two-part binder to the adaptor, and immobilizing the adaptor on the solid surface through the binding interaction of the first and second parts of the two-part binder. For example, preparation of a sequencing library on a solid surface can comprise attaching a polypeptide, polynucleotide, or small molecule to one end of the library adaptor that is capable of forming a binding complex with the polypeptide, polynucleotide, or small molecule immobilized on the solid surface. Solid surfaces that can be used to immobilize polypeptides, polynucleotides, or small molecules include, but are not limited to, plastics, paper, films, filter paper, chips, needles or slides, silica or polymer beads (e.g., polypropylene, polystyrene, polycarbonate), 2D or 3D molecular scaffolds, or any support for solid phase synthesis of polypeptides or polynucleotides.
多肽-多肽、多肽-多核苷酸、多肽-小分子以及多核苷酸-多核苷酸结合物之间的成键可以是共价或非共价的。优选地,结合复合物通过非共价键结合。举例来说,可用于在固体表面上制备测序文库的结合物包括但不限于抗生蛋白链菌素-生物素结合物、抗体-抗原结合物以及配体-受体结合物。可用于在固体表面上制备测序文库的多肽-多核苷酸结合物的实例包括但不限于DNA-结合蛋白-DNA结合物。可用于在固体表面上制备测序文库的多核苷酸-多核苷酸结合物的实例包括但不限于oligodT-oligoA和oligodT-oligodA。多肽-小分子和多核苷酸-小分子结合物的实例包括抗生蛋白链菌素-生物素。The bonds between polypeptide-polypeptide, polypeptide-polynucleotide, polypeptide-small molecule, and polynucleotide-polynucleotide conjugates can be covalent or non-covalent. Preferably, the binding complex is bound by non-covalent bonds. For example, conjugates that can be used to prepare sequencing libraries on solid surfaces include, but are not limited to, streptavidin-biotin conjugates, antibody-antigen conjugates, and ligand-receptor conjugates. Examples of polypeptide-polynucleotide conjugates that can be used to prepare sequencing libraries on solid surfaces include, but are not limited to, DNA-binding protein-DNA conjugates. Examples of polynucleotide-polynucleotide conjugates that can be used to prepare sequencing libraries on solid surfaces include, but are not limited to, oligodT-oligoA and oligodT-oligodA. Examples of polypeptide-small molecule and polynucleotide-small molecule conjugates include streptavidin-biotin.
根据如图3中所示的固体表面方法的实施方案(一步和两步),用例如抗生蛋白链菌素等多肽来涂布用于制备测序文库的容器(例如聚丙烯PCR管或 96孔盘)的固体表面。第一组适配子的末端通过附接例如生物素分子等小分子来修饰,并且生物素酰化的适配子被结合到固体表面上的抗生蛋白链菌素(1)。随后,未修复或修复的DNA连接到抗生蛋白链菌素结合的生物素酰化适配子上,从而将其固定到固体表面(2)上。第二组适配子连接到固定的DNA(3) 上。According to the embodiments (one-step and two-step) of the solid surface method as shown in FIG3 , the solid surface of a container (e.g., a polypropylene PCR tube or a 96-well plate) for preparing a sequencing library is coated with a polypeptide such as streptavidin. The ends of the first set of adaptors are modified by attaching a small molecule such as a biotin molecule, and the biotinylated adaptors are bound to the streptavidin on the solid surface (1). Subsequently, unrepaired or repaired DNA is attached to the streptavidin-bound biotinylated adaptors, thereby immobilizing them to the solid surface (2). The second set of adaptors is attached to the immobilized DNA (3).
两步-在固相上制备Two-step preparation on solid phase
在一个实施方案中,两步法是使用例如cfDNA等未修复的DNA来执行,用于在固体表面上制备测序文库。未修复的DNA是通过将例如dA等单核苷酸碱基附接到例如cfDNA等未修复的DNA的股的3'端上进行dA加尾。任选地,多个核苷酸碱基可附接到未修复的DNA上。包括经带dA尾的DNA的混合物被加入固定在固体表面上的适配子中,该DNA连接到适配子上。对DNA 进行dA加尾和适配子连接的步骤是连续的,即不执行经过dA加尾的产物的纯化(如图2中针对两步法所示)。如上所述,适配子可具有与未修复的DNA 分子上的突出端互补的突出端。随后,第二组适配子被加入DNA-生物素酰化适配子复合物中以提供连接适配子的DNA文库。任选地,使用修复的DNA 来制备文库。修复的DNA可以是已经成片段并且进行3'和5'端的离体酶修复的基因组DNA。在一个实施方案中,在如针对在溶液中执行的简略法所描述的末端修复、dA加尾以及适配子连接的连续步骤中,对例如母体cfDNA等 DNA进行末端修复,dA加尾以及适配子连接到固定在固体表面上的适配子上。In one embodiment, the two-step method is performed using unrepaired DNA such as cfDNA for preparing a sequencing library on a solid surface. Unrepaired DNA is dA-tailed by attaching a single nucleotide base such as dA to the 3' end of a strand of unrepaired DNA such as cfDNA. Optionally, multiple nucleotide bases can be attached to the unrepaired DNA. A mixture comprising dA-tailed DNA is added to an adaptor immobilized on a solid surface, and the DNA is connected to the adaptor. The steps of dA-tailing and adaptor-ligation of the DNA are continuous, i.e., purification of the dA-tailed product is not performed (as shown in FIG2 for the two-step method). As described above, the adaptor may have an overhang complementary to the overhang on the unrepaired DNA molecule. Subsequently, a second set of adaptors is added to the DNA-biotinylated adaptor complex to provide a DNA library to which the adaptors are connected. Optionally, the repaired DNA is used to prepare the library. The repaired DNA can be genomic DNA that has been fragmented and subjected to ex vivo enzymatic repair of the 3' and 5' ends. In one embodiment, DNA, such as maternal cfDNA, is end-repaired, dA-tailed, and adaptor-ligated to adaptors immobilized on a solid surface in sequential steps of end-repair, dA-tailing, and adaptor-ligation as described for the simplified method performed in solution.
在利用两步法的某些实施方案中,将连接适配子的DNA通过化学或物理手段(例如热、紫外线等等)从固体表面分离(图2中4a)、纯化(图2中5),并且任选地,在开始测序过程前,其在溶液中进行扩增。在其他实施方案中,不对连接适配子的DNA进行扩增。在不扩增的情况下,连接到DNA的适配子可构造成包括与测序仪的流动池上存在的寡核苷酸杂交的序列(库扎日瓦 (Kozarewa)等人,自然方法(Nat Methods)6:291-295[2009]),并且避免了会引入用于将文库DNA与测序仪的流动池杂交的序列的扩增。如针对在溶液中产生的连接适配子的DNA所描述的,对连接适配子的DNA的文库进行大规模平行测序(图2中6)。在某些实施方案中,测序是使用借助可逆染料终止子的合成法测序的大规模平行测序。在其他实施方案中,测序是使用连接法测序进行大规模平行测序。测序工艺可包括固相扩增,例如成簇扩增,如在此其他地方所描述的。In certain embodiments utilizing a two-step method, the DNA of the adapter is separated from the solid surface by chemical or physical means (such as heat, ultraviolet light, etc.) (4a in FIG. 2), purified (5 in FIG. 2), and optionally, before starting the sequencing process, it is amplified in the solution. In other embodiments, the DNA of the adapter is not amplified. In the case of not amplifying, the adapter connected to the DNA can be configured to include a sequence (Kozarewa et al., Natural Methods (Nat Methods) 6: 291-295 [2009]) of oligonucleotide hybridization present on the flow cell of the sequencer, and avoids introducing amplification of the sequence for hybridizing the library DNA with the flow cell of the sequencer. As described for the DNA of the adapter generated in the solution, the library of the DNA of the adapter is subjected to large-scale parallel sequencing (6 in FIG. 2). In certain embodiments, sequencing is performed using a large-scale parallel sequencing by synthesis method sequencing with a reversible dye terminator. In other embodiments, sequencing is performed using a connection method sequencing to perform large-scale parallel sequencing. The sequencing process may include solid phase amplification, such as cluster amplification, as described elsewhere herein.
因此,在不同的实施方案中,用于在固体表面上从未修复的DNA制备测序文库以供NGS的方法可包括从样品获得DNA分子;并且对未修复的DNA 分子进行dA加尾和适配子连接的连续步骤,其中适配子连接是在固相上执行的。在某些实施方案中,适配子可包括索引序列,以允许在单一反应容器(例如流动池的一个通道)内对多个样品进行多重测序。如上所述,DNA分子可以是cfDNA分子,其可以是从RNA转录的DNA分子,其可以是DNA分子的扩增子等等。Thus, in various embodiments, a method for preparing a sequencing library for NGS from unrepaired DNA on a solid surface may include obtaining a DNA molecule from a sample; and performing the consecutive steps of dA tailing and adapter ligation on the unrepaired DNA molecule, wherein the adapter ligation is performed on a solid phase. In certain embodiments, the adapter may include an index sequence to allow multiple samples to be sequenced multiplexed within a single reaction vessel (e.g., a channel of a flow cell). As described above, the DNA molecule may be a cfDNA molecule, a DNA molecule transcribed from RNA, an amplicon of a DNA molecule, and the like.
如以上所指示的,在不同的实施方案中,这些文库制备方法被合并到确定例如非整倍性等拷贝数变异(CNV)的方法中。因此,在某些实施方案中,用于在固体表面上从未修复的cfDNA制备测序文库的方法被合并到用于分析母体样品以确定存在或不存在胎儿染色体非整倍性的方法中。因此,在一个实施方案中,提供一种用于确定存在或不存在一种或多种胎儿染色体非整倍性的方法,该方法包括:(a)获得包括胎儿与母体无细胞DNA的混合物的母体样品;(b)将胎儿与母体cfDNA的混合物从所述样品中分离;(c)由胎儿与母体cfDNA的混合物制备测序文库;其中制备该文库包括对cfDNA进行dA加尾和适配子连接的连续步骤,其中制备该文库不包括对cfDNA进行末端修复,并且制备是在固体表面上执行;(d)对该测序文库中的至少一部分进行大规模平行测序,以便获得针对样品中胎儿和母体cfDNA的序列信息;(e)至少暂时地将该序列信息存储在一种计算机可读媒质中;(f)使用该存储的序列信息,以计算的方式识别出一个或多个感兴趣的染色体中每一个的序列标签的数目和任一个或多个感兴趣的染色体中每一个的归一化序列的序列标签的数目;(g) 使用一个或多个感兴趣的染色体中每一个的序列标签的数目和这个或这些感兴趣的染色体中每一个的归一化序列的序列标签的数目,针对这个或这些感兴趣的染色体中的每一个以计算的方式计算出染色体剂量;并且(h)将针对这个或这些感兴趣的染色体中每一个染色体剂量与针对这个或这些感兴趣的染色体中每一个的一个相应阈值进行比较,并且由此在样品中确定存在或不存在胎儿染色体非整倍性,其中步骤(e)-(h)的使用一个或多个处理器执行的。样品可以是生物学流体样品,例如血浆、血清、尿以及唾液。在某些实施方案中,样品是母体血样、或其血浆和血清部分。此方法例证于实例4中。As indicated above, in various embodiments, these library preparation methods are incorporated into methods for determining copy number variations (CNVs) such as aneuploidy. Thus, in certain embodiments, methods for preparing sequencing libraries from unrepaired cfDNA on a solid surface are incorporated into methods for analyzing maternal samples to determine the presence or absence of fetal chromosomal aneuploidy. Thus, in one embodiment, a method for determining the presence or absence of one or more fetal chromosomal aneuploidies is provided, the method comprising: (a) obtaining a maternal sample comprising a mixture of fetal and maternal cell-free DNA; (b) isolating the mixture of fetal and maternal cfDNA from the sample; (c) preparing a sequencing library from the mixture of fetal and maternal cfDNA; wherein preparing the library comprises the sequential steps of dA tailing and adaptor ligation of the cfDNA, wherein preparing the library does not comprise end repair of the cfDNA, and the preparation is performed on a solid surface; (d) performing massively parallel sequencing on at least a portion of the sequencing library to obtain sequence information for the fetal and maternal cfDNA in the sample; (e) storing the sequence information, at least temporarily, in a computer-readable medium; (f) using the stored sequence information to computationally identify the number of sequence tags for each of one or more chromosomes of interest and the number of sequence tags for a normalizing sequence for each of any one or more chromosomes of interest; and (g) The number of sequence tags of each of the one or more chromosomes of interest and the number of sequence tags of the normalized sequence of each of the chromosomes of interest are used, and chromosome dosage is calculated for each of the chromosomes of interest in a computational manner; and (h) each chromosome dosage is compared with a corresponding threshold value for each of the chromosomes of interest and for this or these chromosomes of interest, and thus determine the presence or absence of fetal chromosome aneuploidy in the sample, wherein the use of one or more processors of step (e)-(h) is performed. The sample can be a biological fluid sample, such as blood plasma, serum, urine and saliva. In certain embodiments, the sample is a maternal blood sample or its blood plasma and serum portion. This method is illustrated in example 4.
一步-在固相上制备One-step preparation on solid phase
在另一个实施方案中,对未修复的DNA进行dA加尾,但在扩增前不对 dA加尾产物进行纯化,这样使得dA加尾、适配子连接以及扩增的步骤连续或连贯地执行。在测序前连续的dA加尾、适配子连接以及扩增、随后纯化在此称为一步工艺。一步法可在固体表面上执行(参见例如图3)。将第一组适配子附接到固体表面(1)、将未修复和带dA尾的DNA连接到表面结合的适配子 (2)上和将第二组适配子连接到表面结合的DNA(3)上的步骤可以如以上针对两步法所述来执行。然而,在一步法中,可对连接适配子的表面结合的DNA 进行扩增,同时附接到固体表面上(图2中4b)。随后,将在固体表面上产生的连接适配子的DNA的所得文库分离并纯化(图2中5),接着如针对在溶液中产生的连接适配子的DNA所述的进行大规模平行测序。在某些实施方案中,测序是使用借助可逆染料终止子的合成法测序的大规模平行测序。在其他实施方案中,测序是使用连接法测序的大规模平行测序。In another embodiment, the unrepaired DNA is dA-tailed, but the dA-tailed product is not purified before amplification, so that the steps of dA-tailing, adaptor ligation, and amplification are performed continuously or consecutively. The continuous dA-tailing, adaptor ligation, and amplification, followed by purification before sequencing, is referred to herein as a one-step process. The one-step process can be performed on a solid surface (see, for example, FIG3 ). The steps of attaching a first set of adaptors to a solid surface (1), attaching the unrepaired and dA-tailed DNA to surface-bound adaptors (2), and attaching a second set of adaptors to surface-bound DNA (3) can be performed as described above for the two-step process. However, in the one-step process, the adaptor-ligated surface-bound DNA can be amplified while attached to the solid surface ( 4 b in FIG2 ). Subsequently, the resulting library of adaptor-ligated DNA produced on the solid surface is separated and purified ( 5 in FIG2 ), followed by massively parallel sequencing as described for adaptor-ligated DNA produced in solution. In certain embodiments, sequencing is massively parallel sequencing using sequencing-by-synthesis with the aid of reversible dye terminators. In other embodiments, sequencing is massively parallel sequencing using sequencing by ligation.
因此,在某些实施方案中,提供一种用于制备供NGS测序的测序文库的方法,该方法通过执行包括以下各项的步骤进行:从一个样品获得DNA分子;并且对DNA分子进行dA加尾、适配子连接以及扩增的连续步骤,其中适配子连接是在固体表面上执行的。如针对两步法所述,在不同的实施方案中,适配子可包括索引序列,以允许在单一反应容器(例如流动池的一个通道)内对多个样品进行多重测序。Thus, in certain embodiments, a method for preparing a sequencing library for NGS sequencing is provided, the method comprising the steps of obtaining DNA molecules from a sample; and performing the sequential steps of dA tailing, adaptor ligation, and amplification on the DNA molecules, wherein the adaptor ligation is performed on a solid surface. As described for the two-step method, in various embodiments, the adaptors may include index sequences to allow multiplex sequencing of multiple samples within a single reaction vessel (e.g., a channel of a flow cell).
在某些实施方案中,DNA可以是修复的。DNA分子可以是cfDNA分子,其可以是从RNA转录的DNA分子,或DNA分子可以是DNA分子的扩增子。适配子连接是如上所述执行的。过量的未连接的适配子可以从固定的连接适配子的DNA上洗去;将扩增所需的试剂加入固定的连接适配子的DNA中,该 DNA经受多轮扩增,例如PCR扩增,如本领域中已知的。在其他实施方案中,不对连接适配子的DNA进行扩增。在不扩增的情况下,连接适配子的DNA 可以通过化学或物理手段(例如热、紫外线灯等)从固体表面除去。在不扩增的情况下,连接到DNA的适配子可包括与测序仪的流动池上存在的寡核苷酸杂交的序列(库扎日瓦(Kozarewa)等人,自然方法(Nat Methods)6:291-295 [2009])。In certain embodiments, DNA can be repaired. The DNA molecule can be a cfDNA molecule, which can be a DNA molecule transcribed from RNA, or the DNA molecule can be an amplicon of a DNA molecule. Adapter ligation is performed as described above. Excess unconnected adapters can be washed off from the fixed DNA connected to the adapter; the reagents required for amplification are added to the fixed DNA connected to the adapter, and the DNA is subjected to multiple rounds of amplification, such as PCR amplification, as known in the art. In other embodiments, the DNA connected to the adapter is not amplified. In the case of no amplification, the DNA connected to the adapter can be removed from the solid surface by chemical or physical means (such as heat, ultraviolet light, etc.). In the case of no amplification, the adapter connected to the DNA may include a sequence hybridized with the oligonucleotide present on the flow cell of the sequencer (Kuzhazhiwa et al., Natural Methods (Nat Methods) 6: 291-295 [2009]).
在不同的实施方案中,样品可以是生物学流体样品(例如血液、血浆、血清、尿、脑髓液、羊水、唾液等等)。在某些实施方案中,在一种用于分析母体样品以确定存在或不存在胎儿染色体非整倍性的方法中包括用于在固体表面上从未修复的cfDNA制备测序文库的该方法作为一个步骤。In various embodiments, the sample can be a biological fluid sample (e.g., blood, plasma, serum, urine, cerebrospinal fluid, amniotic fluid, saliva, etc.). In certain embodiments, a method for analyzing a maternal sample to determine the presence or absence of a fetal chromosomal aneuploidy includes as a step a method for preparing a sequencing library from unrepaired cfDNA on a solid surface.
因此,在一个实施方案中,提供一种用于确定存在或不存在一种或多种胎儿染色体非整倍性的方法,该方法包括:(a)获得包括胎儿与母体无细胞 DNA的混合物的母体样品;(b)将胎儿与母体cfDNA的混合物从所述样品中分离;(c)由胎儿与母体cfDNA的混合物制备测序文库;其中制备该文库包括对cfDNA进行dA加尾、适配子连接以及扩增的连续步骤,并且其中制备是在固体表面上执行的;(d)对该测序文库中的至少一部分进行大规模平行测序,以便获得针对样品中胎儿和母体cfDNA的序列信息;(e)至少暂时地将该序列信息存储在一种计算机可读媒质中;(f)使用该存储的序列信息,以计算的方式识别出一个或多个感兴趣的染色体中的每一个的序列标签的数目和任一个或多个感兴趣的染色体中的每一个的归一化序列的序列标签的数目;(g)使用这个或这些感兴趣的染色体中的每一个的序列标签的数目和这个或这些感兴趣的染色体中的每一个的归一化序列的序列标签的数目,针对这个或这些感兴趣的染色体中的每一个以计算的方式计算出染色体剂量;并且(h)将针对这个或这些感兴趣的染色体中的每一个染色体剂量与针对这个或这些感兴趣的染色体中的每一个的一个相应阈值进行比较,并且由此在样品中确定存在或不存在胎儿染色体非整倍性,其中步骤(e)-(h)是使用一个或多个处理器执行的。在某些实施方案中,对DNA进行末端修复。在其他实施方案中,制备该文库不包括对cfDNA进行末端修复。此方法例证于实例5和6中。Thus, in one embodiment, a method for determining the presence or absence of one or more fetal chromosomal aneuploidies is provided, the method comprising: (a) obtaining a maternal sample comprising a mixture of fetal and maternal cell-free DNA; (b) isolating the mixture of fetal and maternal cfDNA from the sample; (c) preparing a sequencing library from the mixture of fetal and maternal cfDNA; wherein preparing the library comprises the sequential steps of dA tailing, adaptor ligation, and amplification of the cfDNA, and wherein the preparation is performed on a solid surface; (d) performing massively parallel sequencing on at least a portion of the sequencing library to obtain sequence information for the fetal and maternal cfDNA in the sample; (e) storing the sequence information at least temporarily in a computer-readable medium; and (f) using the stored sequence information to computationally identify the number of sequence tags for each of one or more chromosomes of interest. (g) using the number of sequence tags of each of the one or more chromosomes of interest and the number of sequence tags of the normalizing sequence of each of the one or more chromosomes of interest, a chromosome dose is calculated for each of the one or more chromosomes of interest in a computationally induced manner; and (h) each chromosome dose for the one or more chromosomes of interest is compared with a corresponding threshold for each of the one or more chromosomes of interest, and thereby determining the presence or absence of fetal chromosomal aneuploidy in the sample, wherein steps (e)-(h) are performed using one or more processors. In certain embodiments, DNA is end-repaired. In other embodiments, preparing the library does not include end-repairing cfDNA. This method is exemplified in Examples 5 and 6.
如上所述用于制备测序文库的工艺适用于样品分析方法,包括但不限于用于确定拷贝数变异(CNV)的方法,和用于在包含单基因组的样品中和包含被已知或怀疑其一个或多个感兴趣的序列不同的至少两个基因组的混合物的样品中确定存在或不存在任何感兴趣的序列的多态性的方法,。The processes for preparing sequencing libraries as described above are suitable for use in sample analysis methods, including but not limited to methods for determining copy number variation (CNV), and methods for determining the presence or absence of polymorphisms of any sequence of interest in samples comprising a single genome and in samples comprising a mixture of at least two genomes that are known or suspected to differ in one or more sequences of interest.
可能需要在固相上或在溶液中制备的连接适配子的产物的扩增,以将与一些NGS平台中存在的流动池或其他表面进行杂交所需的寡核苷酸序列引入连接适配子的模板分子中。扩增反应的内容是本领域的普通技术人员已知的、并且包括适当底物(例如dNTPs)、酶(例如DNA多聚酶)以及扩增反应所需的缓冲组分。任选地,可省去连接适配子的多核苷酸的扩增。总体上,扩增反应需要至少两个扩增引物,例如引物寡核苷酸,这些引物可相同或不同、并且可包括能够在退火步骤期间在待扩增的多核苷酸分子(或如果模板看作单股,那么其补体)中退火成引物结合序列的“适配子特定部分”。The amplification of the product of the connection adapter prepared on a solid phase or in a solution may be required to introduce the oligonucleotide sequence required for hybridization with the flow cell or other surfaces present in some NGS platforms into the template molecule of the connection adapter. The content of the amplification reaction is known to those of ordinary skill in the art and includes suitable substrates (such as dNTPs), enzymes (such as DNA polymerase) and the buffer components required for the amplification reaction. Optionally, the amplification of the polynucleotide of the connection adapter can be omitted. In general, the amplification reaction requires at least two amplification primers, such as primer oligonucleotides, which may be identical or different and may include "adapter specific portions" that can be annealed into primer binding sequences in the polynucleotide molecule to be amplified (or if the template is considered as a single strand, its complement, so during the annealing step).
一旦形成,根据以上描述的方法制备的模板的文库可用于某些NGS平台可能需要的固相核酸扩增。如在此所用,术语“固相扩增”是指在固体支撑物上或在与固体支撑物相关联地进行的任何核酸扩增反应,使得所有或一部分的扩增产物在其形成时被固定在固体支撑物上。在具体的实施方案中,该术语涵盖固相聚合酶链式反应(固相PCR)和其固相等温扩增,这些反应是类似于标准溶液相扩增的反应,除了正向和反向扩增引物的一者或两者被固定在固体支撑物上。固相PCR还包括例如以下各项系统:乳液,其中一个引物锚定到珠粒并且另一个引物处于自由溶液中;固相凝胶基质中集落形成,其中一个引物锚定到表面并且一个引物处于自由溶液中。Once formed, the library of the template prepared according to the method described above can be used for the solid phase nucleic acid amplification that may be needed by some NGS platforms. As used herein, the term "solid phase amplification" refers to any nucleic acid amplification reaction carried out on a solid support or in association with a solid support, so that all or a portion of the amplified products are fixed on the solid support when it is formed. In a specific embodiment, the term encompasses solid phase polymerase chain reaction (solid phase PCR) and its solid phase isothermal amplification, which are reactions similar to standard solution phase amplifications, except that one or both of the forward and reverse amplification primers are fixed on a solid support. Solid phase PCR also includes, for example, the following systems: emulsion, in which one primer is anchored to beads and the other primer is in free solution; Colony formation in a solid phase gel matrix, in which one primer is anchored to a surface and a primer is in free solution.
在不同的实施方案中,扩增后,可以通过微流体性毛细管电泳来分析测序文库以确保文库不含适配子二聚体或单股DNA。模板多核苷酸分子的文库尤其适用于固相测序方法中。除提供用于固相测序和固相PCR的模板外,文库模板还提供用于全基因组扩增的模板。In various embodiments, after amplification, the sequencing library can be analyzed by microfluidic capillary electrophoresis to ensure that the library does not contain adapter dimers or single-stranded DNA. Libraries of template polynucleotide molecules are particularly suitable for solid-phase sequencing methods. In addition to providing templates for solid-phase sequencing and solid-phase PCR, library templates also provide templates for whole genome amplification.
用于追踪和验证样品完整性的标记物核酸Marker nucleic acids for tracking and verifying sample integrity
在不同的实施方案中,可通过对样品基因组核酸(例如cfDNA)以及例如在加工前已引入样品中的伴随的标记物核酸的混合物的测序来验证样品的完整性和追踪样品。In various embodiments, sample integrity can be verified and samples can be tracked by sequencing a mixture of sample genomic nucleic acid (eg, cfDNA) and accompanying marker nucleic acids that have been introduced into the sample, eg, prior to processing.
标记物核酸可与测试样品(例如生物来源样品)组合并且经受包括例如以下一个或多个步骤的过程:将生物来源样品分级分离,例如从全血样品获得基本无细胞的血浆部分、从进行分级分离的生物来源样品(例如血浆)或未未进行分级分离的生物来源样品(例如组织样品)下纯化核酸、以及测序。在某些实施方案中,测序包括制备测序文库。与来源样品组合的标记物分子的序列或序列组合经过选择而对来源样品来说是独特的。在某些实施方案中,样品中的独特标记物分子都具有相同序列。在其他实施方案中,样品中的独特标记物分子是多个序列,例如两个、三个、四个、五个、六个、七个、八个、九个、十个、十五个、二十个或更多个不同序列的组合。Marker nucleic acid can be combined with a test sample (e.g., a biological source sample) and subjected to a process comprising, for example, one or more of the following steps: fractionating the biological source sample, for example, obtaining a substantially cell-free plasma fraction from a whole blood sample, purifying nucleic acids from a fractionated biological source sample (e.g., plasma) or a fractionated biological source sample (e.g., tissue sample), and sequencing. In certain embodiments, sequencing includes preparing a sequencing library. The sequence or sequence combination of the marker molecules combined with the source sample is unique to the source sample through selection. In certain embodiments, the unique marker molecules in the sample all have the same sequence. In other embodiments, the unique marker molecules in the sample are a plurality of sequences, for example, a combination of two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty or more different sequences.
在一个实施方案中,样品的完整性可使用具有相同序列的多个标记物核酸分子进行验证。作为替代方案,样品的身份可使用具有至少两个、至少三个、至少四个、至少五个、至少六个、至少七个、至少八个、至少九个、至少十个、至少11个、至少12个、至少13、至少14个、至少15个、至少16个、至少 17个、至少18个、至少19个、至少20个、至少25个、至少30个、至少35 个、至少40个、至少50个或更多个不同序列的多个标记物核酸分子进行验证。验证多个生物样品(即两个或更多个生物样品)的完整性需要这两个或更多个样品中的每一个都用具有对所标记的多个测试样品中的每一个来说是独特的序列的标记物核酸进行标记。举例来说,第一个样品可用具有序列A的标记物核酸标记,并且第二个样品可用具有序列B的标记物核酸标记。作为替代方案,第一个样品可用都具有序列A的多个标记物核酸分子标记,并且第二个样品可用序列B与C的混合物标记,其中序列A、B以及C是具有不同序列的标记物分子。In one embodiment, the integrity of sample can use a plurality of marker nucleic acid molecules with identical sequence to verify.As an alternative, the identity of sample can use a plurality of marker nucleic acid molecules with at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50 or more different sequences to verify.Verifying the integrity of multiple biological samples (i.e. two or more biological samples) requires each in these two or more samples to be marked with the marker nucleic acid having a unique sequence for each in the multiple test samples marked.For example, first sample can be marked with the marker nucleic acid labeling with sequence A, and second sample can be marked with the marker nucleic acid labeling with sequence B. Alternatively, the first sample can be labeled with multiple marker nucleic acid molecules all having sequence A, and the second sample can be labeled with a mixture of sequences B and C, where sequences A, B, and C are marker molecules with different sequences.
标记物核酸可在文库制备(如果要制备文库)和测序前发生的样品制备的任何阶段中被加入样品中。在一个实施方案中,标记物分子可与未加工来源样品组合。举例来说,标记物核酸可被提供在用以收集血样的收集管中。作为替代方案,标记物核酸可在抽血后加入血样中。在一个实施方案中,标记物核酸被加入用以收集生物学流体样品的容器中,例如标记物核酸被加入用以收集血样的血液收集管中。在另一个实施方案中,标记物核酸被加入生物学流体样品的一个部分中。举例来说,标记物核酸被加入血样的血浆和/或血清部分(例如母体血浆样品)中。在又另一个实施方案中,标记物分子被加入经过纯化的样品(例如已经从生物样品纯化的核酸样品)中。举例来说,标记物核酸被加入经过纯化的母体和胎儿cfDNA的样品中。同样,标记物核酸可在加工标本前被加入活组织检查标本中。在某些实施方案中,标记物核酸可与递送标记物分子到生物样品的细胞中的载体组合。细胞递送载体包括pH敏感性脂质体和阳离子型脂质体。Marker nucleic acid can be added to the sample at any stage of sample preparation that occurs before library preparation (if a library is to be prepared) and sequencing. In one embodiment, the marker molecule can be combined with an unprocessed source sample. For example, the marker nucleic acid can be provided in a collection tube for collecting a blood sample. As an alternative, the marker nucleic acid can be added to the blood sample after blood is drawn. In one embodiment, the marker nucleic acid is added to a container for collecting a biological fluid sample, for example, the marker nucleic acid is added to a blood collection tube for collecting a blood sample. In another embodiment, the marker nucleic acid is added to a portion of the biological fluid sample. For example, the marker nucleic acid is added to the plasma and/or serum portion (e.g., maternal plasma sample) of the blood sample. In yet another embodiment, the marker molecule is added to a purified sample (e.g., a nucleic acid sample that has been purified from a biological sample). For example, the marker nucleic acid is added to a sample of purified maternal and fetal cfDNA. Similarly, the marker nucleic acid can be added to a biopsy specimen before processing the specimen. In certain embodiments, the marker nucleic acid can be combined with a vector that delivers the marker molecule into cells of a biological sample. Cell delivery vehicles include pH-sensitive liposomes and cationic liposomes.
在不同的实施方案中,标记物分子具有反基因链序列,这些序列是生物学来源样品的基因组中不存在的序列。在一个例示性实施方案中,用以验证人类生物来源样品的完整性的标记物分子具有在人类基因组中不存在的序列。在一个替代实施方案中,标记物分子具有在来源样品中和任一个或多个已知基因组中不存在的序列。举例来说,用以验证人类生物来源样品的完整性的标记物分子具有在人类基因组中和老鼠基因组中不存在的序列。替代方案允许验证包括两个或更多个基因组的测试样品的完整性。举例来说,从被病原体(例如细菌)侵袭的受试者中获得的人类无细胞DNA样品的完整性可使用具有在人类基因组与侵袭细菌的基因组中都不存在的序列的标记物分子进行验证。许多病原体(例如细菌、病毒、酵母、真菌、原生动物等等)的基因组的序列,公众可在万维网ncbi.nlm.nih.gov/genomes上获得。在另一个实施方案中,标记物分子是具有在任何已知基因组中不存在的序列的核酸。标记物分子的序列可通过算法随机产生。In various embodiments, the marker molecule has an anti-gene chain sequence that is not present in the genome of the biological source sample. In an exemplary embodiment, the marker molecule used to verify the integrity of the human biological source sample has a sequence that is not present in the human genome. In an alternative embodiment, the marker molecule has a sequence that is not present in the source sample and in any one or more known genomes. For example, the marker molecule used to verify the integrity of the human biological source sample has a sequence that is not present in the human genome and the mouse genome. The alternative allows verification of the integrity of a test sample comprising two or more genomes. For example, the integrity of a human cell-free DNA sample obtained from a subject attacked by a pathogen (e.g., a bacterium) can be verified using a marker molecule with a sequence that is not present in both the human genome and the genome of the attacking bacterium. The sequences of the genomes of many pathogens (e.g., bacteria, viruses, yeasts, fungi, protozoa, etc.) are publicly available on the World Wide Web at ncbi.nlm.nih.gov/genomes. In another embodiment, the marker molecule is a nucleic acid with a sequence that is not present in any known genome. The sequence of the marker molecule can be randomly generated by an algorithm.
在不同的实施方案中,标记物分子可以是天然存在的脱氧核糖核酸 (DNA)、核糖核酸或人工核酸类似物(核酸模拟物),这些人工核酸类似物包括肽核酸(PMA)、吗啉代核酸、锁核酸、二醇核酸以及苏糖核酸(其与天然存在的DNA或RNA的不同之处在于分子主链发生变化)或不具有磷酸二酯主链的DNA模拟物。脱氧核糖核酸可以来自于天然存在的基因组或可以通过使用酶或通过固相化学合成在实验室中产生。化学方法也可用以产生天然未发现的DNA模拟物。磷酸二酯键被置换,但脱氧核糖保留的可得DNA衍生物包括但不限于具有通过硫甲缩醛或甲酰胺键形成的主链的DNA模拟物,已经证明这些模拟物是优良的结构DNA模拟物。其他的DNA模拟物包括吗啉代衍生物和包含基于N-(2-氨乙基)甘氨酸的假肽主链的肽核酸(PNA)(生物物理学与生物分子结构年评(Ann Rev Biophys Biomol Struct)24:167-183[1995])。 PNA是非常优良的DNA(或核糖核酸[RNA])结构模拟物,并且PNA寡聚物能够与沃森-克里克(Watson-Crick)互补DNA和RNA(或PNA)寡聚物形成很稳定的双螺旋结构,并且其还可以通过螺旋侵入而结合到双螺旋DNA中的目标上(分子生物技术(MolBiotechnol)26:233-248[2004])。可用作标记物分子的另一个优良的DNA类似物的结构模拟物/类似物是磷硫酰DNA,其中一个非桥接氧被硫置换。此修饰降低了包括5'到3'和3'到5'DNA POL 1外切核酸酶、核酸酶S1和P1、核糖核酸酶、血清核酸酶以及蛇毒磷酸二酯酶在内的内切核酸酶和外切核酸酶2的作用。In various embodiments, the marker molecule can be a naturally occurring deoxyribonucleic acid (DNA), ribonucleic acid or artificial nucleic acid analog (nucleic acid mimics), which include peptide nucleic acid (PMA), morpholino nucleic acid, locked nucleic acid, glycol nucleic acid and threose nucleic acid (the difference between it and naturally occurring DNA or RNA is that the molecular backbone changes) or a DNA mimics without a phosphodiester backbone. Deoxyribonucleic acid can come from naturally occurring genome or can be produced in the laboratory using an enzyme or by solid phase chemical synthesis. Chemical methods can also be used to produce natural undiscovered DNA mimics. The phosphodiester bond is replaced, but the available DNA derivatives that deoxyribose retains include but are not limited to the DNA mimics with a backbone formed by thiomethylacetal or formamide bonds, and it has been shown that these mimics are excellent structural DNA mimics. Other DNA mimics include morpholino derivatives and peptide nucleic acids (PNAs) containing a pseudopeptide backbone based on N-(2-aminoethyl)glycine (Ann Rev Biophys Biomol Struct 24:167-183 [1995]). PNAs are very good structural mimics of DNA (or ribonucleic acid [RNA]), and PNA oligomers can form very stable duplex structures with Watson-Crick complementary DNA and RNA (or PNA) oligomers, and can also bind to targets in duplex DNA by helix invasion (Mol Biotechnol 26:233-248 [2004]). Another excellent structural mimic/analog of DNA analogs that can be used as marker molecules is phosphorothioate DNA, in which one of the non-bridging oxygens is replaced by sulfur. This modification reduces the action of endonucleases and exonucleases 2 including 5' to 3' and 3' to 5' DNA Pol 1 exonucleases, nucleases S1 and P1, ribonucleases, serum nucleases, and snake venom phosphodiesterases.
标记物分子的长度可以与样品核酸的长度不同或差不多,即标记物分子的长度可类似于样品基因组分子的长度,或者其可大于或小于样品基因组分子的长度。标记物分子的长度是通过构成标记物分子的核苷酸或核苷酸类似物碱基的数目来测量。可以使用本领域中已知的分离方法将长度不同于样品基因组分子长度的标记物分子与源核酸辨别开。举例来说,标记物与样品核酸分子的长度差异可通过例如毛细管电泳等电泳分离来测定。尺寸区分可能有利于对标记物核酸和样品核酸的质量进行量化和评定。优选地,标记物核酸比基因组核酸短,并且长度足以排除其被映射到样品基因组。举例来说,独特映射到人类基因组需要30碱基的人类序列。因此,在某些实施方案中,用于人类样品的测序生物检验中的标记物分子应为至少30bp长。The length of the marker molecule can be different from or similar to the length of the sample nucleic acid, that is, the length of the marker molecule can be similar to the length of the sample genome molecule, or it can be greater than or less than the length of the sample genome molecule. The length of the marker molecule is measured by the number of nucleotides or nucleotide analog bases that constitute the marker molecule. Separation methods known in the art can be used to distinguish marker molecules with lengths different from the length of the sample genome molecule from source nucleic acid. For example, the length difference between the marker and the sample nucleic acid molecule can be measured by electrophoretic separation such as capillary electrophoresis. Size differentiation may be conducive to quantifying and evaluating the quality of the marker nucleic acid and the sample nucleic acid. Preferably, the marker nucleic acid is shorter than the genomic nucleic acid, and the length is sufficient to exclude it from being mapped to the sample genome. For example, unique mapping to the human genome requires a human sequence of 30 bases. Therefore, in certain embodiments, the marker molecule used in the sequencing bioassay of human samples should be at least 30bp long.
标记物分子长度的选择主要通过用以验证来源样品完整性的测序技术确定。还可以考虑所测序的样品基因组核酸的长度。举例来说,某些测序技术采用多核苷酸的克隆扩增,其可要求待以克隆方式扩增的基因组多核苷酸具有最小长度。举例来说,使用伊路纳GAII序列分析器进行测序包括通过最小长度为110bp的多核苷酸的桥式PCR(亦称成簇扩增)进行离体克隆扩增,适配子连接到这些多核苷酸上,以提供以克隆方式扩增的至少200bp并且小于600bp 的核酸并且测序。在某些实施方案中,连接适配子的标记物分子的长度在约 200bp与约600bp之间,约250bp与550bp之间,约300bp与500bp之间或约 350与450之间。在其他实施方案中,连接适配子的标记物分子的长度是大约 200bp。举例来说,当对母体样品中存在的胎儿cfDNA进行测序时,可选择标记物分子的长度是类似于胎儿cfDNA分子的长度的。因此,在一个实施方案中,用在包括对母体样品中cfDNA进行大规模平行测序以确定存在或不存在胎儿染色体非整倍性的检验中的标记物分子的长度可大约150bp、约160bp、 170bp、约180bp、约190bp或约200bp;标记物分子优选是大约170bp。例如 SOLiD测序、聚合酶克隆测序(Polony Sequencing)以及454测序等其他测序方法使用乳液PCR以克隆方式扩增DNA分子以供测序,并且每一种技术都规定了待扩增分子的最小和最大长度。呈以克隆方式扩增的核酸形式的待测序的标记物分子的长度可达到约600bp。在某些实施方案中,待测序的标记物分子的长度可大于600bp。The selection of marker molecule length is mainly determined by the sequencing technology used to verify the integrity of the source sample. The length of the sample genomic nucleic acid sequenced can also be considered. For example, some sequencing technologies use clonal amplification of polynucleotides, which may require that the genomic polynucleotides to be clonally amplified have a minimum length. For example, sequencing using the Iluna GAII sequence analyzer includes carrying out in vitro clonal amplification by bridge PCR (also known as cluster amplification) of polynucleotides with a minimum length of 110bp, and adapters are attached to these polynucleotides to provide at least 200bp and less than 600bp of nucleic acid clonally amplified and sequenced. In certain embodiments, the length of the marker molecule connected to the adapter is between about 200bp and about 600bp, between about 250bp and 550bp, between about 300bp and 500bp, or between about 350 and 450. In other embodiments, the length of the marker molecule connected to the adapter is about 200bp. For example, when the fetal cfDNA present in the maternal sample is sequenced, the length of the marker molecule can be selected to be similar to the length of the fetal cfDNA molecule. Therefore, in one embodiment, the length of the marker molecule used in the test for the presence or absence of fetal chromosomal aneuploidy including large-scale parallel sequencing of cfDNA in the maternal sample can be about 150bp, about 160bp, 170bp, about 180bp, about 190bp or about 200bp; the marker molecule is preferably about 170bp. Other sequencing methods such as SOLiD sequencing, polymerase clone sequencing (Polony Sequencing) and 454 sequencing use emulsion PCR to clonally amplify DNA molecules for sequencing, and each technology specifies the minimum and maximum length of the molecule to be amplified. The length of the marker molecule to be sequenced in the form of clonally amplified nucleic acid can reach about 600bp. In certain embodiments, the length of the marker molecule to be sequenced can be greater than 600bp.
不采用分子克隆扩增并且能够对在极宽模板长度范围内的核酸进行测序的单分子测序技术在大部分情况下都不要求待测序分子具有任何特定长度。然而,每单位质量的序列产率取决于3'端羟基的数目,因此具有相对短的模板用于测序是比具有长的模板更有效的。如果从长于1000nt的核酸开始,那么总体上宜将这些核酸剪切到100到200nt的平均长度,以便从相同质量的核酸可以产生更多的序列信息。因此,标记物分子的长度可在几十碱基到数千碱基范围内。用于单分子测序的标记物分子的长度可达到约25bp、达到约50bp、达到约75bp、达到约100bp、达到约200bp、达到约300bp、达到约400bp、达到约500bp、达到约600bp、达到约700bp、达到约800bp、达到约900bp、达到约1000bp或更多。The single molecule sequencing technology that does not adopt molecular cloning amplification and can order-check the nucleic acid in the extremely wide template length range does not require that the molecule to be sequenced has any specific length in most cases. However, the sequence yield per unit mass depends on the number of 3' terminal hydroxyl groups, so having a relatively short template for order-checking is more effective than having a long template. If starting from a nucleic acid longer than 1000nt, it is generally advisable to shear these nucleic acids to an average length of 100 to 200nt so that more sequence information can be produced from nucleic acids of the same mass. Therefore, the length of the marker molecule can be in the range of tens of bases to several thousand bases. The length of the marker molecule used for single molecule order-checking can reach about 25bp, reach about 50bp, reach about 75bp, reach about 100bp, reach about 200bp, reach about 300bp, reach about 400bp, reach about 500bp, reach about 600bp, reach about 700bp, reach about 800bp, reach about 900bp, reach about 1000bp or more.
选择用于标记物分子的长度还由所测序的基因组核酸的长度决定。举例来说,cfDNA作为细胞基因组DNA的基因组片段在人类血流中循环。在孕妇血浆中发现的胎儿cfDNA分子总体上比母体cfDNA分子短(陈(Chan)等人, 临床化学(Clin Chem)50:8892[2004])。循环胎儿DNA的尺寸分级分离已经证实,循环胎儿DNA片段的平均长度<300bp,而估计母体DNA在约0.5Kb 与1Kb之间(李(Li)等人,临床化学,50:1002-1011[2004])。这些发现与使用NGS确定胎儿cfDNA很少超过340bp的范(Fan)等人(范等人,临床化学 56:1279-1286[2010])的发现一致。用基于硅石的标准方法从尿分离的DNA由两部分组成:来源于脱落细胞的高分子量DNA和经肾DNA(Tr-DNA)的低分子量(150-250碱基对)部分(波特扎图等人,临床化学46:1078-1084,2000;和苏等人,分子诊断学杂志6:101-107,2004)。新近发展的用于从体液中分离无细胞核酸的技术在分离经肾核酸的应用中显示,尿中存在的DNA和RNA片段比150碱基对短的多(美国专利申请公开号20080139801)。在cfDNA为进行测序的基因组核酸的实施方案中,选择的标记物分子可大致达到cfDNA的长度。举例来说,呈单核酸分子形式或呈以克隆方式扩增的核酸形式的、用于待测序的母体cfDNA样品中的标记物分子的长度可在约100bp与600之间。在其他实施方案中,样品基因组核酸是较大分子的片段。举例来说,进行测序的样品基因组核酸是成片段的细胞DNA。在对成片段的细胞DNA进行测序的实施方案中,标记物分子的长度可达到DNA片段的长度。在某些实施方案中,标记物分子的长度至少是将序列读数独特映射到适当参考基因组所需要的最小长度。在其他实施方案中,标记物分子的长度是排除标记物分子被映射到样品参考基因组所需要的最小长度。The length of the marker molecule selected is also determined by the length of the genomic nucleic acid sequenced. For example, cfDNA circulates in the human bloodstream as a genomic fragment of cellular genomic DNA. Fetal cfDNA molecules found in maternal plasma are generally shorter than maternal cfDNA molecules (Chan et al., Clinical Chemistry (Clin Chem) 50: 8892 [2004]). Size fractionation of circulating fetal DNA has confirmed that the average length of circulating fetal DNA fragments is <300bp, while maternal DNA is estimated to be between about 0.5Kb and 1Kb (Li et al., Clinical Chemistry, 50: 1002-1011 [2004]). These findings are consistent with the findings of Fan et al. (Fan et al., Clinical Chemistry 56: 1279-1286 [2010]) who used NGS to determine that fetal cfDNA rarely exceeds 340bp. DNA isolated from urine using standard silica-based methods consists of two parts: high-molecular-weight DNA derived from exfoliated cells and a low-molecular-weight (150-250 base pairs) fraction of transrenal DNA (Tr-DNA) (Portzato et al., Clinical Chemistry 46:1078-1084, 2000; and Su et al., Journal of Mol. Diagnostics 6:101-107, 2004). Recently developed techniques for isolating cell-free nucleic acids from body fluids have shown that DNA and RNA fragments present in urine are much shorter than 150 base pairs in applications to isolating transrenal nucleic acids (U.S. Patent Application Publication No. 20080139801). In embodiments where cfDNA is the genomic nucleic acid being sequenced, the selected marker molecule can be roughly the length of the cfDNA. For example, the length of the marker molecule used in a maternal cfDNA sample to be sequenced, in the form of a single nucleic acid molecule or in the form of a clonally amplified nucleic acid, can be between about 100 bp and 600. In other embodiments, the sample genomic nucleic acid is a fragment of a larger molecule. For example, the sample genomic nucleic acid being sequenced is fragmented cellular DNA. In embodiments where fragmented cellular DNA is sequenced, the length of the marker molecule can be up to the length of the DNA fragment. In certain embodiments, the length of the marker molecule is at least the minimum length required for uniquely mapping the sequence read to an appropriate reference genome. In other embodiments, the length of the marker molecule is the minimum length required to exclude the marker molecule from being mapped to the sample reference genome.
此外,标记物分子可用于验证未通过核酸测序进行检验并且可通过除测序以外的常见生物技术(实时PCR)验证的样品。Furthermore, marker molecules can be used to authenticate samples that have not been tested by nucleic acid sequencing and can be authenticated by common biotechniques other than sequencing (real-time PCR).
样品对照(例如用于测序和/或分析的过程中阳性对照)Sample controls (e.g., positive controls used in sequencing and/or analysis)
在不同的实施方案中,例如以上所述的引入样品中的标记物序列可充当阳性对照,以验证测序以及随后加工和分析的准确性和效力。In various embodiments, marker sequences introduced into a sample, such as those described above, can serve as positive controls to verify the accuracy and efficacy of sequencing and subsequent processing and analysis.
因此,提供了用于提供对样品中DNA进行测序的过程中阳性对照(IPC) 的组合物和方法。在某些实施方案中,提供了用于对包括基因组混合物的样品中的cfDNA进行测序的阳性对照。IPC可用于将从不同组样品(例如在不同测序批次上在不同时间进行测序的样品)中获得的序列信息的基线位移相关联。因此,举例来说,IPC可将针对母体测试样品获得的序列信息与从在不同时间进行测序的一组合格样品获得的序列信息相关联。Thus, compositions and methods are provided for providing an in-process positive control (IPC) for sequencing DNA in a sample. In certain embodiments, a positive control for sequencing cfDNA in a sample comprising a genomic mixture is provided. The IPC can be used to correlate baseline shifts in sequence information obtained from different sets of samples (e.g., samples sequenced at different times on different sequencing batches). Thus, for example, the IPC can correlate sequence information obtained for a maternal test sample with sequence information obtained from a set of qualified samples sequenced at different times.
同样,在片段分析的情况下,IPC可将从受试者针对具体的片段获得的序列信息与从在不同时间进行测序的一组合格样品获得的序列(类似序列)相关联。在某些实施方案中,IPC可将从受试者针对具体的癌症相关基因座获得的序列信息与从一组合格样品(例如从已知扩增/缺失等等)获得的序列信息相关联。Similarly, in the case of fragment analysis, IPC can correlate sequence information obtained from a subject for a specific fragment with sequences (similar sequences) obtained from a set of qualified samples sequenced at different times. In certain embodiments, IPC can correlate sequence information obtained from a subject for a specific cancer-associated locus with sequence information obtained from a set of qualified samples (e.g., from known amplifications/deletions, etc.).
此外,IPC可用作在测序过程中追踪样品的标记物。IPC还可以提供感兴趣的染色体的一种或多种非整倍性(例如21三体性、13三体性、18三体性) 的定性阳性序列剂量值(例如NCV)以提供更恰当的解读并且确保数据的可靠性和准确性。在某些实施方案中,可建立包括来自男性和女性基因组的核酸的IPC,以提供母体样品中染色体X和Y的剂量,从而确定胎儿是否是男性。In addition, IPC can be used as a marker to track samples during sequencing. IPC can also provide a qualitative positive sequence dose value (e.g., NCV) for one or more aneuploidies of the chromosome of interest (e.g., trisomy 21, trisomy 13, trisomy 18) to provide a more appropriate interpretation and ensure the reliability and accuracy of the data. In certain embodiments, an IPC comprising nucleic acids from male and female genomes can be established to provide the doses of chromosomes X and Y in the maternal sample, thereby determining whether the fetus is male.
过程中对照的类型和数目取决于所需测试的类型或性质。举例来说,对于需要对来自包括基因组混合物的样品的DNA进行测序以确定是否存在染色体非整倍性的测试,过程中对照可包括从已知包括相同染色体非整倍性的测试样品获得的DNA。在某些实施方案中,IPC包括来自已知包括感兴趣的染色体非整倍性的样品的DNA。举例来说,用以确定在母体样品中存在或不存在胎儿三体性(例如21三体性)的测试的IPC包括从具有21三体性的个体获得的 DNA。在某些实施方案中,IPC包括从两个或更多个具有不同非整倍性的个体获得的DNA的混合物。举例来说,对于用以确定存在或不存在13三体性、18 三体性、21三体性和X单体性的测试,IPC包括从各自携带测试三体性之一的胎儿的孕妇获得的DNA样品的组合。除完整染色体非整倍性外,可建立为用以确定存在或不存在部分非整倍性的测试提供阳性对照的IPCs。The type and number of controls during the process depend on the type or property of the test required. For example, for the test that needs to be sequenced to determine whether there is a chromosome aneuploidy from the DNA of the sample including the genomic mixture, the control can include the DNA obtained from the test sample known to include the same chromosome aneuploidy during the process. In certain embodiments, the IPC includes the DNA from the sample known to include the chromosome aneuploidy of interest. For example, the IPC for determining the presence or absence of a fetal trisomy (such as trisomy 21) in the maternal sample includes the DNA obtained from the individual with trisomy 21. In certain embodiments, the IPC includes a mixture of DNA obtained from two or more individuals with different aneuploidies. For example, for determining the presence or absence of trisomy 13, trisomy 18, trisomy 21 and X monosomy, the IPC includes a combination of DNA samples obtained from pregnant women each carrying the fetus of one of the test trisomy. Except for complete chromosome aneuploidy, IPCs can be established to provide positive controls for determining the presence or absence of a partial aneuploidy test.
充当用于检测单一非整倍性的对照的IPC可使用从两个受试者获得的细胞基因组DNA的混合物来建立,其中一个受试者是非整倍体基因组的捐助者。举例来说,作为用以确定胎儿三体性(例如21三体性)的测试的对照的IPC 可以通过将来自携带该三体性染色体的男性或女性受试者的基因组DNA与已知不携带该三体性染色体的女性受试者的基因组DNA进行组合来建立。基因组DNA可从两个受试者的细胞中提取,并且进行剪切以提供约100bp到400bp 之间、约150bp到350bp之间或约200bp到300bp之间的片段以模拟母体样品中的循环cfDNA片段。选择来自携带非整倍性(21三体性)的受试者的成片段的DNA的比例以便模拟在母体样品中发现的循环胎儿cfDNA的比例,而提供包括包含约5%、约10%、约15%、约20%、约25%、约30%的来自携带该非整倍性的受试者的DNA的成片段的DNA混合物的IPC。该IPC可包括来自各自携带不同非整倍性的不同受试者的DNA。举例来说,IPC可包括约80%的未患病女性DNA,并且剩余20%可以是来自各自携带一种三体性染色体21、三体性染色体13以及三体性染色体18的三个不同受试者的DNA。制备片段式的DNA的混合物用于测序。对成片段的DNA的混合物进行加工可包括制备测序文库,该测序文库可以使用任何大规模平行方法以单路或多重模式测序。基因组IPC的原液可存储并且用于多个诊断性试验。An IPC serving as a control for detecting a single aneuploidy can be established using a mixture of cellular genomic DNA obtained from two subjects, one of which is a donor of the aneuploid genome. For example, an IPC serving as a control for a test to determine fetal trisomy (e.g., trisomy 21) can be established by combining genomic DNA from a male or female subject carrying the trisomy chromosome with genomic DNA from a female subject known not to carry the trisomy chromosome. Genomic DNA can be extracted from cells of two subjects and sheared to provide fragments of between about 100 bp and 400 bp, between about 150 bp and 350 bp, or between about 200 bp and 300 bp to simulate the circulating cfDNA fragments in the maternal sample. The proportion of fragmented DNA from a subject carrying aneuploidy (trisomy 21) is selected to simulate the proportion of circulating fetal cfDNA found in maternal samples, and an IPC comprising a fragmented DNA mixture of about 5%, about 10%, about 15%, about 20%, about 25%, about 30% of the DNA from a subject carrying the aneuploidy is provided. The IPC may include DNA from different subjects each carrying a different aneuploidy. For example, the IPC may include about 80% unaffected female DNA, and the remaining 20% may be DNA from three different subjects each carrying a trisomy chromosome 21, trisomy chromosome 13, and trisomy chromosome 18. A mixture of fragmented DNA is prepared for sequencing. Processing the mixture of fragmented DNA may include preparing a sequencing library, which may be sequenced in a single-channel or multiplex mode using any massively parallel method. The stock solution of the genomic IPC may be stored and used for multiple diagnostic tests.
作为替代方案,IPC可使用从已知携带了具有已知染色体非整倍性的胎儿的母亲中获得的cfDNA来建立。举例来说,cfDNA可以从携带具有21三体性的胎儿的孕妇获得。cfDNA是从母体样品中提取的,并且克隆到细菌载体中并在细菌中生长,以提供不间断的IPC来源。可使用限制酶将DNA从细菌载体中提取。作为替代方案,克隆的cfDNA可以通过例如PCR扩增。可对IPC DNA 进行加工,以在与来自待分析存在或不存在染色体非整倍性的测试样品的 cfDNA相同的批次中测序。Alternatively, IPCs can be established using cfDNA obtained from a mother known to carry a fetus with a known chromosomal aneuploidy. For example, cfDNA can be obtained from a pregnant woman carrying a fetus with trisomy 21. cfDNA is extracted from a maternal sample and cloned into a bacterial vector and grown in bacteria to provide an uninterrupted source of IPCs. Restriction enzymes can be used to extract the DNA from the bacterial vector. Alternatively, the cloned cfDNA can be amplified by, for example, PCR. The IPC DNA can be processed to be sequenced in the same batch as the cfDNA from the test sample to be analyzed for the presence or absence of chromosomal aneuploidy.
虽然以上描述了IPC相对于三体性的建立,但是应了解,可建立反映包括例如不同的片段扩增和/或缺失在内的其他部分非整倍性的IPC。因此,举例来说,在已知不同的癌症与具体的扩增相关联的情况下(例如乳癌与20Q13 相关联),可建立合并了那些已知的扩增的IPCs。While the above description describes the creation of IPCs relative to trisomy, it will be appreciated that IPCs can be created that reflect other partial aneuploidies, including, for example, different segment amplifications and/or deletions. Thus, for example, where different cancers are known to be associated with specific amplifications (e.g., breast cancer is associated with 20Q13), IPCs can be created that incorporate those known amplifications.
测序方法Sequencing methods
如上文所指出,作为鉴别拷贝数变异的程序的一部分,对所制备的样品 (例如,测序文库)进行测序。可以利用多种测序技术中的任一种。As noted above, as part of the process of identifying copy number variation, the prepared sample (e.g., a sequencing library) is sequenced. Any of a variety of sequencing technologies can be utilized.
有些测序技术在商业上可购得,诸如阿非美公司(桑尼维尔,CA) (AffymetrixInc.(Sunnyvale,CA))的杂交法测序平台及454生命科学(布拉德福德,CT)(454LifeSciences(Bradford,CT))、伊鲁米/索乐科萨(海沃德,CA) (Illumina/Solexa(Hayward,CA))以及海里科思生物科学(坎布里奇,MA) (Helicos Biosciences(Cambridge,MA))的合成法测序平台、以及应用生物系统 (福斯特城,CA)(Applied Biosystems(Foster City,CA))的连接法测序平台,如下文所述。除使用海里科思生物科学的合成测序法进行的单分子测序之外,其他单分子测序技术包括但不限于太平洋生物科学(Pacific Biosciences)的 SMRTTM技术、ION TORRENTTM技术、以及例如牛津纳米孔技术(Oxford NanoporeTechnologies)开发的纳米孔测序法。Some sequencing technologies are commercially available, such as the hybridization sequencing platform of Affymetrix Inc. (Sunnyvale, CA) and the synthesis sequencing platforms of 454 Life Sciences (Bradford, CT), Illumina/Solexa (Hayward, CA), and Helicos Biosciences (Cambridge, MA), and the ligation sequencing platform of Applied Biosystems (Foster City, CA), as described below. In addition to single molecule sequencing using Helicon Biosciences' sequencing-by-synthesis method, other single molecule sequencing technologies include, but are not limited to, Pacific Biosciences' SMRT ™ technology, ION TORRENT ™ technology, and nanopore sequencing methods developed by, for example, Oxford Nanopore Technologies.
虽然自动化的桑格尔方法(Sanger method)被认为是‘第一代’技术,但在此所述的方法中也可以使用包括自动化桑格尔测序法的桑格尔测序法。另外的适当测序方法包括但不限于核酸成像技术,例如原子力显微镜(AFM)或透射电子显微术(TEM)。示意性的测序技术更详细地描述于下文中。Although the automated Sanger method is considered a 'first generation' technology, Sanger sequencing methods including automated Sanger sequencing methods may also be used in the methods described herein. Additional suitable sequencing methods include, but are not limited to, nucleic acid imaging techniques such as atomic force microscopy (AFM) or transmission electron microscopy (TEM). Schematic sequencing techniques are described in more detail below.
在一个示意性但非限制性的实施方案中,在此所述的方法包括使用海里科思真正的单分子测序(tSMS)技术(例如,哈里斯T.D.(Harris T.D.)等人,科学(Science)320:106-109[2008]中所述)这种单分子测序技术来获得测试样品中的核酸的序列信息,例如母体样品中的cfDNA、针对癌症所筛查的受试者的cfDNA或细胞DNA等等。在tSMS技术中,DNA样品分裂成具有大致100 个到200个核苷酸的股,并且多A序列被添加到各个DNA股的3’端。各个股通过添加荧光标记的腺苷核苷酸加以标记。然后使DNA股与流动池杂交,流动池含有数百万个固定到流动池表面的寡T捕捉位点。在某些实施方案中,模板密度可为大约1亿个模板/cm2。然后将流动池装载于仪器中,例如 HeliScopeTM测序仪,并且激光照射流动池表面,从而显示各个模板的位置。 CCD摄像机可以测定模板在流动池表面上的位置。模板荧光标记然后分裂并且洗掉。测序反应通过引入DNA多聚酶和荧光标记的核苷酸开始。寡T核酸充当引物。聚合酶使所标记的核苷酸以模板引导方式结合到引物中。除去聚合酶和未结合的核苷酸。引导荧光标记的核苷酸的结合的模板通过流动池表面成像来辨别。成像之后,分裂步骤除去了荧光标记,并且对其他荧光标记的核苷酸重复该程序,直到获得所希望的读取长度。利用各个核苷酸添加步骤收集序列信息。通过单分子测序技术进行全基因组测序可在制备测序文库时排除或典型地避免基于PCR的扩增,并且这些方法允许直接测量样品,而非测量那个样品的拷贝。In one illustrative but non-limiting embodiment, the methods described herein include using Helicos True Single Molecule Sequencing (tSMS) technology (e.g., as described in Harris TD et al., Science 320:106-109 [2008]) to obtain sequence information for nucleic acids in a test sample, such as cfDNA from a maternal sample, cfDNA from a subject being screened for cancer, or cellular DNA. In tSMS, a DNA sample is split into strands of approximately 100 to 200 nucleotides, and a poly A sequence is added to the 3' end of each DNA strand. Each strand is labeled by adding a fluorescently labeled adenosine nucleotide. The DNA strands are then hybridized to a flow cell containing millions of oligo-T capture sites fixed to the flow cell surface. In certain embodiments, the template density can be approximately 100 million templates/ cm2 . The flow cell is then loaded into an instrument, such as a HeliScope ™ sequencer, and a laser illuminates the flow cell surface to visualize the position of each template. A CCD camera can measure the position of the template on the flow cell surface. The template fluorescent label is then split and washed off. The sequencing reaction begins by introducing DNA polymerase and fluorescently labeled nucleotides. Oligo T nucleic acid serves as a primer. Polymerase binds the labeled nucleotides to the primer in a template-guided manner. Polymerase and unbound nucleotides are removed. The template guiding the binding of the fluorescently labeled nucleotides is distinguished by imaging the flow cell surface. After imaging, the splitting step removes the fluorescent label, and the procedure is repeated for other fluorescently labeled nucleotides until the desired read length is obtained. Sequence information is collected using each nucleotide addition step. Carrying out whole genome sequencing by single molecule sequencing technology can exclude or typically avoid PCR-based amplification when preparing sequencing libraries, and these methods allow direct measurement of samples rather than measuring copies of that sample.
在另一个示意性但非限制性的实施方案中,在此所述的方法包括使用454 测序法(Roche)(例如,玛古纳斯M.(Margulies,M.)等人,自然(Nature) 437:376-380[2005]中所述)获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA、针对癌症所筛查的受试者的cfDNA或细胞DNA等等。454 测序法典型地包括两个步骤。第一步,将DNA剪切成具有大致300到800个碱基对的片段,并且这些片段为钝端的。然后将寡核苷酸适配子连接到片段的末端。适配子充当片段扩增以及测序的引物。片段可使用例如含有5’-生物素标签的适配子B附接于DNA捕捉珠粒上,例如涂布抗生蛋白链菌素的珠粒。附接于珠粒上的片段在水包油乳液滴内进行PCR扩增。结果为以克隆方式扩增的DNA片段在各个珠粒上的多重拷贝。第二步,将珠粒捕捉于孔(例如,皮升尺寸的孔)中。对各个DNA片段平行进行焦磷酸测序。添加一个或多个核苷酸产生光信号,该光信号在测序仪器中被CCD摄像机记录到。信号强度与结合的核苷酸数目成比例。焦磷酸测序法是利用焦磷酸(PPi)在核苷酸添加时可脱离。PPi在腺苷5’磷酸硫酸盐存在下通过ATP硫酸化酶被转化为ATP。荧光素酶使用ATP将荧光素转化为氧化荧光素,并且此反应产生光,测量这个光并且加以分析。In another illustrative but non-limiting embodiment, the method described herein includes using 454 sequencing (Roche) (e.g., as described in Margulies, M. et al., Nature 437:376-380 [2005]) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample, cfDNA or cellular DNA from a subject screened for cancer, etc. The 454 sequencing method typically includes two steps. In the first step, the DNA is sheared into fragments having approximately 300 to 800 base pairs, and these fragments are blunt-ended. Oligonucleotide adapters are then attached to the ends of the fragments. The adapters serve as primers for fragment amplification and sequencing. The fragments can be attached to DNA capture beads, such as beads coated with streptavidin, using, for example, adapter B containing a 5'-biotin tag. The fragments attached to the beads are PCR amplified within oil-in-water emulsion droplets. The result is multiple copies of the clonally amplified DNA fragments on each bead. In the second step, the beads are captured in a well (e.g., a picoliter-sized well). Pyrophosphate sequencing is performed in parallel on each DNA fragment. Adding one or more nucleotides generates a light signal, which is recorded by a CCD camera in the sequencing instrument. The signal intensity is proportional to the number of nucleotides bound. Pyrophosphate sequencing utilizes pyrophosphate (PPi) that can be detached when nucleotides are added. PPi is converted into ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase uses ATP to convert luciferin into oxyluciferin, and this reaction produces light, which is measured and analyzed.
在另一个示意性的但非限制性的实施方案中,在此所述的方法包括使用 SOLiDTM技术(应用生物系统公司(Applied Biosystems))来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA、针对癌症所筛查的受试者的cfDNA或细胞DNA等等。在SOLiDTM连接测序法中,将基因组DNA剪切成片段,并且将适配子附接于片段的5’端和3’端以产生片段文库。作为替代方案,可以如下引入内适配子:将适配子连接到片段的5’端和3’端,使片段成环,消化该成环片段以产生内适配子,并且将适配子附接到所得片段的5’端和3’端以产生配对文库。接下来,在含有珠粒、引物、模板以及PCR组分的微型反应器中制备克隆珠粒群。继PCR之后,将模板变性并且富集珠粒以分离具有已扩增的模板的珠粒。对选出的珠粒上的模板进行3’修饰,以允许结合到载玻片上。可以通过部分随机寡核苷酸与通过具体荧光团鉴别的中心测定的碱基 (或碱基对)的依序杂交和连接来测定序列。记录颜色之后,将所连接的寡核苷酸分裂并且除去,然后重复该过程。In another illustrative but non-limiting embodiment, the method described herein includes using SOLiD ™ technology (Applied Biosystems) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample, cfDNA or cellular DNA of a subject screened for cancer, and the like. In the SOLiD ™ ligation sequencing method, genomic DNA is sheared into fragments, and adapters are attached to the 5' and 3' ends of the fragments to produce a fragment library. As an alternative, internal adapters can be introduced as follows: adapters are attached to the 5' and 3' ends of the fragments, the fragments are circularized, the circularized fragments are digested to produce internal adapters, and adapters are attached to the 5' and 3' ends of the resulting fragments to produce a paired library. Next, a clonal bead population is prepared in a microreactor containing beads, primers, templates, and PCR components. Following PCR, the template is denatured and beads are enriched to separate beads with amplified templates. The template on the selected beads is 3' modified to allow binding to a slide. The sequence can be determined by sequential hybridization and ligation of partially random oligonucleotides to a central determined base (or base pair) identified by a specific fluorophore. After the color is recorded, the ligated oligonucleotide is cleaved and removed, and the process is repeated.
在另一个示意性的但非限制性的实施方案中,在此所述的方法包括使用太平洋生物科学公司的单分子实时(SMRTTM)测序技术来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA、针对癌症所筛查的受试者的 cfDNA或细胞DNA等等。在SMRT测序法中,在DNA合成期间,对染料标记的核苷酸的连续结合进行成像。单DNA多聚酶分子附接于获得了序列信息的单独零模式波长检测器(ZMW检测器)的底表面,而磷酸连接的核苷酸正结合成生长的引物股。ZMW检测器包含密闭结构,其允许以在ZMW范围外 (例如微秒)快速扩散的荧光核苷酸为背景观测单一核苷酸通过DNA多聚酶的结合。核苷酸结合成生长股典型地需要数毫秒。在此期间,荧光标记被激发并且产生荧光信号,并且使荧光标签分裂。测量相应的染料荧光指示了哪个碱基被结合。重复该过程以得到序列。In another illustrative but non-limiting embodiment, the method described herein includes using Pacific Biosciences' single molecule real-time (SMRT ™ ) sequencing technology to obtain sequence information of nucleic acids in test samples, such as cfDNA in maternal test samples, cfDNA or cellular DNA of subjects screened for cancer, etc. In the SMRT sequencing method, during DNA synthesis, the continuous binding of dye-labeled nucleotides is imaged. A single DNA polymerase molecule is attached to the bottom surface of a separate zero-mode wavelength detector (ZMW detector) that has obtained sequence information, and the nucleotides connected by phosphate are just combined into growing primer strands. The ZMW detector includes a closed structure that allows the binding of single nucleotides by DNA polymerase to be observed against the background of fluorescent nucleotides that diffuse rapidly outside the ZMW range (e.g., microseconds). It typically takes several milliseconds for nucleotides to bind to growing strands. During this period, the fluorescent label is excited and produces a fluorescent signal, and the fluorescent label is split. Measuring the corresponding dye fluorescence indicates which base is combined. Repeating the process to obtain a sequence.
在另一个示意性的但非限制性的实施方案中,在此所述的方法包括使用纳米孔测序法(例如,索里GV和麦乐A.,临床化学(Clin Chem)53:1996-2001 [2007])来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA、针对癌症所筛查的受试者的cfDNA或细胞DNA等等。纳米孔测序DNA分析技术已由多个公司开发,包括例如牛津纳米孔技术公司(英国牛津城)(Oxford Nanopore Technologies(Oxford,United Kingdom))、斯魁龙公司(Sequenom)、纳博思公司(NABsys)等等。纳米孔测序法为单分子测序技术,其中当单分子DNA穿过纳米孔时直接对其测序。纳米孔为小孔,其直径典型地为大约1 纳米。将纳米孔浸入导电流体中并且横跨其施加电位(电压),因离子传导通过纳米孔而产生微小电流。流过的电流量对纳米孔的尺寸和形状敏感。当DNA 分子通过纳米孔时,DNA分子上的各个核苷酸对纳米孔造成不同程度的阻塞,从而使通过纳米孔的电流量值发生不同程度的变化。因此,当DNA分子通过纳米孔时发生的电流的此变化提供了DNA序列的读数。In another illustrative but non-limiting embodiment, the method described herein includes using nanopore sequencing (e.g., Sorey GV and Mellor A., Clin Chem 53: 1996-2001 [2007]) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample, cfDNA or cellular DNA of a subject screened for cancer, etc. Nanopore sequencing DNA analysis technology has been developed by multiple companies, including, for example, Oxford Nanopore Technologies (Oxford, United Kingdom), Sequenom, NABsys, etc. Nanopore sequencing is a single-molecule sequencing technology in which single-molecule DNA is directly sequenced as it passes through a nanopore. A nanopore is a small hole with a diameter typically of about 1 nanometer. The nanopore is immersed in a conductive fluid and a potential (voltage) is applied across it, generating a small current due to ion conduction through the nanopore. The amount of current flowing is sensitive to the size and shape of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule blocks the nanopore to varying degrees, causing the magnitude of the current passing through the nanopore to change to varying degrees. Therefore, this change in current as the DNA molecule passes through the nanopore provides a readout of the DNA sequence.
在另一个示意性的但非限制性的实施方案中,在此所述的方法包括使用化学敏感性场效应晶体管(chemFET)阵列(例如,美国专利申请公开号 2009/0026082中所述)来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA、针对癌症所筛查的受试者的cfDNA或细胞DNA等等。在此技术的一个实例中,可以把DNA分子放入反应室中,并且可以使模板分子与结合到聚合酶上的测序引物杂交。一个或多个三磷酸盐在测序引物3’端结合成新的核酸股可以通过chemFET以电流变化加以辨别。一个阵列可以具有多个 chemFET传感器。在另一个实例中,可以使单核酸附接于珠粒,并且可以在珠粒上扩增核酸,并且可以将单独的珠粒转移到chemFET阵列上的单独反应室中,其中每个室具有chemFET传感器,并且可以对核酸进行测序。In another illustrative but non-limiting embodiment, the methods described herein include using a chemically sensitive field effect transistor (chemFET) array (e.g., as described in U.S. Patent Application Publication No. 2009/0026082) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample, cfDNA from a subject being screened for cancer, or cellular DNA, among others. In one example of this technology, a DNA molecule can be placed in a reaction chamber and the template molecule can be hybridized with a sequencing primer bound to a polymerase. One or more triphosphates are incorporated into new nucleic acid strands at the 3' end of the sequencing primer, which can be discerned by a chemFET as a change in current. An array can have multiple chemFET sensors. In another example, a single nucleic acid can be attached to a bead, and the nucleic acid can be amplified on the bead, and the individual beads can be transferred to separate reaction chambers on the chemFET array, each chamber having a chemFET sensor, and the nucleic acid can be sequenced.
在另一个实施方案中,本发明方法包括利用使用透射电子显微术(TEM) 的霍尔康分子技术(Halcyon Molecular’s technology)来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA。称为单独分子安置快速纳米传递 (IMPRNT)的方法包括:利用单原子分辨率透射电子显微镜对经重原子标记物选择性标记的高分子量(150kb或更大)DNA进行成像,以及使这些分子以一致的碱基到碱基间距、以高度密集(3nm股到股)的平行阵列排列在超薄薄膜上。电子显微镜用来对薄膜上的分子成像以测定重原子标记物的位置并且提取 DNA的碱基序列信息。该方法进一步描述于PCT专利公开WO 2009/046445 中。该方法允许在十分钟以内测定完全人类基因组的序列。In another embodiment, the method of the present invention includes utilizing Halcyon Molecular's technology using transmission electron microscopy (TEM) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample. The method, called Individual Molecular Placement Rapid Nanotransfer (IMPRNT), includes imaging high molecular weight (150 kb or greater) DNA selectively labeled with heavy atom markers using a single-atom resolution transmission electron microscope, and arranging these molecules on an ultrathin film in a highly dense (3 nm strand to strand) parallel array with consistent base-to-base spacing. Electron microscopy is used to image the molecules on the film to determine the position of the heavy atom markers and extract the base sequence information of the DNA. This method is further described in PCT patent publication WO 2009/046445. This method allows the sequence of a complete human genome to be determined in less than ten minutes.
在另一个实施方案中,DNA测序技术为离子流(Ion Torrent)单分子测序法,其将半导体技术与简单测序化学技术配合以将化学编码信息(A、C、G、 T)直接转换成半导体芯片上的数字信息(0、1)。实质上,当核苷酸通过聚合酶结合成DNA股时,氢离子作为副产物释放。离子流是使用微加工孔的高密度阵列、以大规模平行方式进行这个生化过程。每个孔容纳不同的DNA分子。孔下方为离子敏感层,并且离子敏感层下方为离子传感器。当添加核苷酸(例如C)到DNA模板、然后结合成DNA股时,将释放氢离子。那个离子的电荷将改变溶液的pH值,这可以通过离子流(Ion Torrent)的离子传感器检测到。测序仪(基本上为世界上最小的固态PH计)读取碱基(从化学信息直接到数字信息)。离子个人基因组机器(PGMTM)测序仪然后用核苷酸一个接一个地连续冲击芯片。若冲击芯片的下一个核苷酸不匹配,则不会记录到电压变化并且不会被判定碱基。若DNA股上存在两个相同碱基,则电压会加倍,并且芯片会记录所被判定的两个相同碱基。直接检测可记录数秒内的核苷酸结合。In another embodiment, the DNA sequencing technology is ion torrent single-molecule sequencing, which combines semiconductor technology with simple sequencing chemistry to convert chemically encoded information (A, C, G, T) directly into digital information (0, 1) on a semiconductor chip. Essentially, when nucleotides are combined into DNA strands by polymerase, hydrogen ions are released as byproducts. Ion torrent uses a high-density array of micromachined wells to perform this biochemical process in a massively parallel manner. Each well accommodates a different DNA molecule. Below the well is an ion-sensitive layer, and below the ion-sensitive layer is an ion sensor. When a nucleotide (e.g., C) is added to the DNA template and then combined into a DNA strand, a hydrogen ion is released. The charge of that ion changes the pH of the solution, which can be detected by the ion sensor of the ion torrent. The sequencer (essentially the world's smallest solid-state pH meter) reads the bases (directly from chemical information to digital information). The Ion Personal Genome Machine (PGM ™ ) sequencer then continuously bombards the chip with nucleotides one by one. If the next nucleotide to hit the chip does not match, no voltage change will be recorded and the base will not be called. If two identical bases are present on a DNA strand, the voltage doubles and the chip records the presence of two identical bases. Direct detection can record nucleotide incorporation within seconds.
在另一个实施方案中,本发明方法包括使用杂交测序法获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA。杂交测序法包括使多个多核苷酸序列与多个多核苷酸探针接触,其中多个多核苷酸探针中的每一者可以任选地系留到底物上。底物可能是包含已知核苷酸序列阵列的平坦表面。与该阵列杂交的图案能够用来测定样品中存在的多核苷酸序列。在其他实施方案中,每个探针系留到珠粒上,例如磁珠等等。可以测定与珠粒的杂交且用来鉴别样品内的多个多核苷酸序列。In another embodiment, the method of the present invention includes obtaining sequence information of nucleic acids in a test sample using a hybrid sequencing method, such as cfDNA in a maternal test sample. The hybrid sequencing method includes contacting a plurality of polynucleotide sequences with a plurality of polynucleotide probes, wherein each of the plurality of polynucleotide probes can be optionally tethered to a substrate. The substrate may be a flat surface comprising an array of known nucleotide sequences. The pattern hybridized with the array can be used to determine the polynucleotide sequences present in the sample. In other embodiments, each probe is tethered to a bead, such as a magnetic bead, etc. Hybridization with the bead can be determined and used to identify a plurality of polynucleotide sequences in the sample.
在另一个实施方案中,本发明方法包括使用伊鲁米纳(Illumina)合成测序法以及基于可逆终止子的测序化学技术(例如,本特利(Bentley)等人,自然(Nature)6:53-59[2009]中所述),通过对数百万DNA片段进行大规模平行测序来获得测试样品中的核酸的序列信息,例如母体测试样品中的cfDNA。模板DNA可以为基因组DNA,例如cfDNA。在某些实施方案中,所分离细胞的基因组DNA用作模板,并且将其片段化成为几百个碱基对的长度。在其他实施方案中,cfDNA用作模板,并且因为cfDNA作为短片段存在,所以不要求片段化。举例来说,胎儿cfDNA作为长度大致170个碱基对(bp)的片段在血流中循环(范(Fan)等人,临床化学(Clin Chem)56:1279-1286[2010]),并且在测序之前,不要求将DNA片段化。伊鲁米纳测序技术依赖于成片段的基因组DNA附接到寡核苷酸锚所结合的光学透明平坦表面上。模板DNA末端经修复而产生5'-磷酸化钝端,并且克列诺片段(Klenow fragment)的聚合酶活性用来使单A碱基添加到钝端磷酸化DNA片段的3'端。这个添加制备了用于连接到寡核苷酸适配子上的DNA片段,这些片段在其3'端具有单T碱基突出端以提高连接效率。适配子寡核苷酸与流动池锚互补。在限制性稀释条件下,将经适配子修饰的单股模板DNA添加到流动池中并且通过杂交固定到锚上。延伸并且桥式扩增所附接的DNA片段以建立具有亿万丛的超高密度测序流动池,每个丛含有约1,000个拷贝的相同模板。在一个实施方案中,随机成片段的基因组DNA(例如cfDNA)在经受成簇扩增之前使用PCR加以扩增。作为替代方案,使用无扩增的基因组文库制剂,并且单独使用成簇扩增法(高纳娃 (Kozarewa)等人,自然方法(Nature Methods)6:291-295[2009])富集随机成片段的基因组DNA,例如cfDNA。利用使用了具有可去除荧光染料的可逆终止子的可靠四色DNA合成测序技术对模板测序。使用激光激发和全内反射光学装置获得高灵敏度荧光检测。将约20bp到40bp(例如36bp)的短序列读数对照经重复片段遮蔽的参考基因组进行比对,并且使用专门开发的数据分析管道软件来鉴别短序列读数对参考基因组的唯一映射。还可以使用非重复片段遮蔽的参考基因组。无论使用重复片段遮蔽的参考基因组,还是非重复片段遮蔽的参考基因组,只对唯一映射到参考基因组的读数计数。第一次读取完成之后,可以将模板原位再生以便从片段的相反端能够进行第二次读取。因此,可以使用DNA片段的单端或配对端测序。对存在于样品中的DNA片段进行部分测序,并且对包含预定长度(例如36bp)的读数、映射到已知参考基因组的序列标签进行计数。在一个实施方案中,参考基因组序列为NCBI36/hg18序列,其可在万维网genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105获得。作为替代方案,参考基因组序列为GRCh37/hg19,其可在万维网 genome.ucsc.edu/cgi-bin/hgGateway获得。其他公用序列信息来源包括GenBank、 dbEST、dbSTS、EMBL(欧洲分子生物学实验室(European Molecular Biology Laboratory))以及DDBJ(日本DNA数据库)。有多种计算机算法可供比对序列使用,包括但不限于BLAST(奥茨秋(Altschul)等人,1990)、BLITZ(MPsrch) (斯特罗科和柯林斯(Sturrock&Collins),1993)、FASTA(普尔逊和李普曼(Person&Lipman),1988)、BOWTIE(郎格米(Langmead)等人,基因组生物学(GenomeBiology)10:R25.1-R25.10[2009])、或ELAND(伊鲁米纳公司,圣地亚哥,CA,USA(Illumina,Inc.,San Diego,CA,USA))。在一个实施方案中,对血浆cfDNA分子的以克隆方式扩增的拷贝的一端进行测序并且通过伊鲁米纳基因组分析仪(Illumina Genome Analyzer)的生物信息学比对分析加以处理,伊鲁米纳基因组分析仪使用大规模高效比对的核苷酸数据库(ELAND) 软件。In another embodiment, the method of the present invention comprises using Illumina synthesis sequencing and reversible terminator-based sequencing chemistry (e.g., as described in Bentley et al., Nature 6:53-59 [2009]) to obtain sequence information of nucleic acids in a test sample, such as cfDNA in a maternal test sample, by massively parallel sequencing of millions of DNA fragments. The template DNA can be genomic DNA, such as cfDNA. In certain embodiments, genomic DNA from isolated cells is used as a template and is fragmented into lengths of several hundred base pairs. In other embodiments, cfDNA is used as a template, and because cfDNA exists as short fragments, fragmentation is not required. For example, fetal cfDNA circulates in the bloodstream as fragments of approximately 170 base pairs (bp) in length (Fan et al., Clin Chem 56:1279-1286 [2010]), and prior to sequencing, the DNA does not need to be fragmented. Illumina sequencing technology relies on the attachment of fragmented genomic DNA to an optically transparent flat surface to which oligonucleotide anchors are bound. The template DNA ends are repaired to produce 5'-phosphorylated blunt ends, and the polymerase activity of the Klenow fragment is used to add single A bases to the 3' ends of the blunt-end phosphorylated DNA fragments. This addition prepares DNA fragments for connection to oligonucleotide adapters, which have single T base overhangs at their 3' ends to improve connection efficiency. The adapter oligonucleotides are complementary to the flow cell anchors. Under limiting dilution conditions, single-stranded template DNA modified with adapters is added to the flow cell and fixed to the anchor by hybridization. The attached DNA fragments are extended and bridge-amplified to establish an ultra-high-density sequencing flow cell with hundreds of millions of clusters, each containing about 1,000 copies of the same template. In one embodiment, randomly fragmented genomic DNA (e.g., cfDNA) is amplified using PCR before undergoing cluster amplification. As an alternative, a genomic library preparation without amplification is used, and cluster amplification method (Kozarewa et al., Nature Methods 6:291-295 [2009]) is used alone to enrich for random fragmented genomic DNA, such as cfDNA. Template sequencing is performed using a reliable four-color DNA synthesis sequencing technology with a reversible terminator that can remove fluorescent dyes. High-sensitivity fluorescence detection is obtained using laser excitation and total internal reflection optical devices. Short sequence reads of about 20bp to 40bp (e.g., 36bp) are compared against a reference genome shielded by repeat fragments, and a specially developed data analysis pipeline software is used to identify the unique mapping of short sequence reads to the reference genome. A reference genome shielded by non-repeat fragments can also be used. Whether a reference genome shielded by repeat fragments or a reference genome shielded by non-repeat fragments is used, only the readings uniquely mapped to the reference genome are counted. After the first reading is completed, the template can be regenerated in situ so that a second reading can be performed from the opposite end of the fragment. Therefore, single-end or paired-end sequencing of DNA fragments can be used. The DNA fragments present in the sample are partially sequenced, and the reads comprising a predetermined length (e.g., 36 bp) mapped to the sequence tags of the known reference genome are counted. In one embodiment, the reference genome sequence is the NCBI36/hg18 sequence, which is available on the world wide web at genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105. Alternatively, the reference genome sequence is GRCh37/hg19, which is available on the world wide web at genome.ucsc.edu/cgi-bin/hgGateway. Other public sequence information sources include GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Database of Japan). There are a variety of computer algorithms available for aligning sequences, including but not limited to BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10: R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, CA, USA). In one embodiment, one end of a clonally amplified copy of a plasma cfDNA molecule is sequenced and processed by bioinformatics alignment analysis on an Illumina Genome Analyzer using the Efficiently Aligned Nucleotide Database (ELAND) software.
在此所述方法的某些实施方案中,所映射的序列标签包括约20bp、约25 bp、约30bp、约35bp、约40bp、约45bp、约50bp、约55bp、约60bp、约65bp、约70bp、约75bp、约80bp、约85bp、约90bp、约95bp、约100 bp、约110bp、约120bp、约130bp、约140bp、约150bp、约200bp、约250 bp、约300bp、约350bp、约400bp、约450bp或约500bp的序列读数。预计技术进步将能够实现大于500bp的单端读数,当产生配对端读数时,能够实现大于约1000bp的读数。在一个实施方案中,所映射的序列标签包括36bp 序列读数。通过比较标签序列与参考序列来确定所测序的核酸(例如cfDNA) 分子的染色体起点可获得序列标签的映射,并且不需要具体的遗传序列信息。较小程度的错配(每个序列标签0到2个错配)可以解释参考基因组与混合样品中的基因组之间可能存在的微小多态性。In certain embodiments of the methods described herein, the mapped sequence tags comprise sequence reads of about 20bp, about 25bp, about 30bp, about 35bp, about 40bp, about 45bp, about 50bp, about 55bp, about 60bp, about 65bp, about 70bp, about 75bp, about 80bp, about 85bp, about 90bp, about 95bp, about 100bp, about 110bp, about 120bp, about 130bp, about 140bp, about 150bp, about 200bp, about 250bp, about 300bp, about 350bp, about 400bp, about 450bp, or about 500bp. It is expected that technological advancements will enable single-end reads greater than 500bp, and when paired-end reads are produced, it is possible to achieve reads greater than about 1000bp. In one embodiment, the mapped sequence tags comprise 36bp sequence reads. Sequence tag mapping is achieved by comparing tag sequences to a reference sequence to determine the chromosomal origin of sequenced nucleic acid (e.g., cfDNA) molecules, without requiring specific genetic sequence information. Small mismatches (0 to 2 mismatches per sequence tag) can account for minor polymorphisms that may exist between the reference genome and the genomes in the mixed sample.
每种样品典型地获得多个序列标签。在某些实施方案中,利用读数映射到参考基因组,每种样品获得了至少约3×106个序列标签、至少约5×106个序列标签、至少约8×106个序列标签、至少约10×106个序列标签、至少约15×106个序列标签、至少约20×106个序列标签、至少约30×106个序列标签、至少约 40×106个序列标签、至少约50×106个序列标签,这些序列标签包含20bp与 40bp之间(例如36bp)的读数。在一个实施方案中,所有序列读数被映射到参考基因组的所有区域。在一个实施方案中,对已经映射到参考基因组的所有区域(例如所有染色体)的标签进行计数,并且测定混合DNA样品中感兴趣的序列(例如染色体或其一部分)的CNV(即,过度代表或代表不足)。该方法不要求在两个基因组之间作出区分。 In certain embodiments, the method for preparing the genomic DNA of the present invention is to use a plurality of sequence tags.In ...
正确确定样品中是否存在或缺乏CNV(例如非整倍性)所必需的准确性是根据测序操作中映射到参考基因组的序列标签数目在各样品间的变化(染色体间变异性)、和不同测序操作中映射到参考基因组的序列标签数目的变化(序列间变异性)判断的。举例来说,映射到富GC或贫GC参考序列的标签的变化可能特别显著。其他变化可因使用不同的核酸提取和纯化方案、制备测序文库以及使用不同的测序平台所引起。本发明方法根据对归一化序列(归一化染色体序列或归一化区段序列)的了解而使用序列剂量(染色体剂量或区段剂量),从而在本质上解释因染色体间变异性(同批)和序列间变异性(轮次间)和平台相关的变异性所致的自然增加的变异性。染色体剂量是基于对归一化染色体序列的了解,归一化染色体序列可以包括单染色体,或包括两个或更多个选自染色体1到22、X和Y的染色体。作为替代方案,归一化染色体序列可以包括单染色体区段,或包括一个染色体或两个或更多个染色体的两个或更多个区段。区段剂量是基于对归一化区段序列的了解,归一化区段序列可以包括任一个染色体的单区段,或包括染色体1到22、X和Y中的任何两个或更多个染色体的两个或更多个区段。Correctly determine whether there is or lacks the necessary accuracy of CNV (such as aneuploidy) in the sample and be mapped to the variation (inter-chromosome variability) of the number of sequence tags of reference genome between each sample according to the sequencing operation and the variation (inter-sequence variability) of the number of sequence tags mapped to reference genome in different sequencing operations.For example, the variation of the label that is mapped to rich GC or poor GC reference sequence may be particularly significant.Other variations can be caused by using different nucleic acid extraction and purification schemes, preparing sequencing libraries and using different sequencing platforms.The inventive method uses sequence dosage (chromosome dosage or segment dosage) according to the understanding of normalized sequence (normalized chromosome sequence or normalized segment sequence), thereby explaining the variability of the natural increase caused by the variability related to inter-chromosome variability (same batch) and inter-sequence variability (between rounds) and platform in essence.Chromosome dosage is based on the understanding of normalized chromosome sequence, and normalized chromosome sequence can include single chromosome, or include two or more chromosomes selected from chromosome 1 to 22, X and Y. Alternatively, the normalizing chromosome sequence can include a single chromosome segment, or two or more segments of one chromosome or two or more chromosomes. The segment dose is based on knowledge of the normalizing segment sequence, which can include a single segment of any chromosome, or two or more segments of any two or more chromosomes of chromosomes 1 to 22, X, and Y.
单重测序Singleplex sequencing
图4展示了该方法的一个实施方案的流程图,其中将标记物核酸与单样品的源样品核酸组合以分析遗传异常,同时确定生物学源样品的完整性。在步骤410中,获得了包含基因组核酸的生物学源样品。在步骤420中,将标记物核酸与生物学源样品组合而得到标记物样品。在步骤430中制备以克隆方式扩增的源样品基因组核酸与标记物核酸的混合物的测序文库,并且在步骤440中以大规模平行方式对文库进行测序以提供与样品源基因组核酸和标记物核酸有关的测序信息。大规模平行测序方法提供了关于序列读数的测序信息,这些序列读数被映射到一个或多个参考基因组以产生可以分析的序列标签。在步骤 450中,分析所有测序信息,并且在步骤460中,根据与标记物分子有关的测序信息,检验源样品的完整性。检验源样品完整性是通过确定在步骤450获得的标记物分子的测序信息与在步骤420添加到原始源样品中的标记物分子的已知序列之间的一致性完成的。可以对分别测序的多个样品应用相同过程,其中每个样品包含具有该样品独有的序列的分子,即一个样品用唯一的标记物分子标记,并且对其与测序仪的流动池或载玻片中的其他样品分开测序。若检验样品完整性,则可以分析与样品基因组核酸有关的测序信息,以提供例如与源样品所得自的受试者的状况有关的信息。举例来说,若检验样品完整性,则分析与基因组核酸有关的测序信息以确定存在或不存在染色体异常。若不检验样品完整性,则不考虑测序信息。Fig. 4 shows a flow chart of an embodiment of the method, wherein marker nucleic acid is combined with the source sample nucleic acid of a single sample to analyze genetic abnormalities, while determining the integrity of the biological source sample. In step 410, a biological source sample comprising genomic nucleic acid is obtained. In step 420, marker nucleic acid is combined with the biological source sample to obtain a marker sample. In step 430, a sequencing library of a mixture of source sample genomic nucleic acid and marker nucleic acid amplified in a clonal manner is prepared, and in step 440, the library is sequenced to provide sequencing information relevant to sample source genomic nucleic acid and marker nucleic acid in a large-scale parallel manner. Large-scale parallel sequencing methods provide sequencing information about sequence readings, and these sequence readings are mapped to one or more reference genomes to produce sequence tags that can be analyzed. In step 450, all sequencing information is analyzed, and in step 460, according to the sequencing information relevant to the marker molecule, the integrity of the source sample is verified. Verification of the source sample integrity is accomplished by determining the consistency between the sequencing information of the marker molecule obtained in step 450 and the known sequence of the marker molecule added to the original source sample in step 420. The same process can be applied to multiple samples that are sequenced separately, wherein each sample contains molecules with sequences unique to that sample, i.e., a sample is labeled with a unique marker molecule and sequenced separately from other samples in the flow cell or slide of the sequencer. If the sample integrity is tested, the sequencing information associated with the sample genomic nucleic acid can be analyzed to provide information, for example, about the condition of the subject from which the source sample was obtained. For example, if the sample integrity is tested, the sequencing information associated with the genomic nucleic acid is analyzed to determine the presence or absence of chromosomal abnormalities. If the sample integrity is not tested, the sequencing information is not considered.
图4中所描绘的方法还适用于包括对单分子进行单重测序的生物学分析,例如海里科思的tSMS、太平洋生物科学的SMRT、牛津纳米孔的BASE、及其他技术,诸如IBM提出的技术,其不要求制备文库。The method depicted in FIG4 is also applicable to biological assays involving singleplex sequencing of single molecules, such as Helicon's tSMS, Pacific Biosciences' SMRT, Oxford Nanopore's BASE, and other technologies, such as that proposed by IBM, which do not require library preparation.
多重测序Multiplex sequencing
每批测序操作可以获得的大量序列读数允许对合并的样本进行分析,即多重分析,其最大化了测序能力并且减少了工作流程。举例来说,使用伊鲁米纳基因组分析仪的八泳道流动池对八个文库进行的大规模平行测序可以多重进行以对每个泳道中的两个或更多个样品测序,以便在单操作中对16、24、32 等等或更多个样品进行测序。对多个样品进行平行测序(即,多重测序)要求在测序文库制备期间将样品特定性索引序列(亦称为条形码)合并。测序索引是在基因组核酸和标记物核酸的3’端添加的大约5个、大约10个、大约15个、大约20个、大约25个或更多个碱基的独特碱基序列。多重系统能够在单批测序操作中对数百个生物样品进行测序。可以通过将索引序列并入用于成簇扩增的PCR引物之一中来制备编索引的测序文库以用于对以克隆方式扩增的序列进行测序。作为替代方案,索引序列可以并入适配子中,在PCR扩增之前连接到cfDNA。用于单分子测序的索引文库可以通过在位于标记物和基因组分子的3’端或添加与流动池锚杂交所需的序列(例如添加多A尾以便使用tSMS进行单分子测序)的5’端合并索引序列来建立。对唯一标记且编索引的核酸进行测序提供了鉴别所合并样品文库中的样品的索引序列信息,并且标记物分子的序列信息使基因组核酸的测序信息与样品源相互关联。在对多个样品单独进行测序(即,单重测序)的实施方案中,只需要修饰每个样品的标记物和基因组核酸分子以按照需要通过测序平台包含适配子序列并且排除索引序列。The large amount of sequence readings that can be obtained for each batch of sequencing operations allow the sample merged to be analyzed, i.e., multiplex analysis, which maximizes sequencing capabilities and reduces workflow. For example, the large-scale parallel sequencing performed by the eight-lane flow cell of the Illumina genome analyzer to eight libraries can be multiplexed to perform sequencing of two or more samples in each lane, so that 16, 24, 32 or more samples are sequenced in a single operation. Parallel sequencing (i.e., multiplex sequencing) is required to merge sample specificity index sequences (also referred to as barcodes) during the sequencing library preparation. Sequencing index is a unique base sequence of approximately 5, approximately 10, approximately 15, approximately 20, approximately 25 or more bases added to the 3 ' ends of genomic nucleic acid and marker nucleic acid. Multiplex systems can sequence hundreds of biological samples in a single batch of sequencing operations. The sequencing library indexed can be prepared for sequencing the sequence amplified in a clonal manner by incorporating the index sequence into one of the PCR primers for clustered amplification. Alternatively, the index sequence can be incorporated into the adapter and connected to the cfDNA before PCR amplification. The index library for single molecule sequencing can be established by merging the index sequence at the 3' end of the marker and the genomic molecule or adding the sequence required for hybridization with the flow cell anchor (e.g., adding multiple A tails so as to use tSMS for single molecule sequencing). Sequencing the uniquely labeled and indexed nucleic acid provides index sequence information for identifying the sample in the combined sample library, and the sequence information of the marker molecule correlates the sequencing information of the genomic nucleic acid with the sample source. In an embodiment in which multiple samples are sequenced separately (i.e., single sequencing), it is only necessary to modify the marker and genomic nucleic acid molecules of each sample to include the adapter sequence and exclude the index sequence as needed by the sequencing platform.
图5提供了用于检验样品完整性的方法的实施方案500的流程图,对这些样品进行多步骤多重测序生物学分析,即,将个别样品的核酸组合并且作为复杂混合物加以测序。在步骤510中,获得多个生物学源样品,每个样品包含基因组核酸。在步骤520中,将唯一标记物核酸与各生物学源样品组合而得到多个唯一标记样品。在步骤530中,针对每个唯一标记样品制备样品基因组核酸和标记物核酸的测序文库。预定进行多重测序的样品的文库制备包括将独特索引标签并入样品和每个唯一标记样品的标记物核酸中以提供其源核酸序列能够与对应标记物核酸序列相互关联并且在复合物溶液中得以鉴别的样品。在包括可以进行酶法修饰的标记物分子(例如DNA)的方法的实施方案中,可以在样品和标记物分子的3’端通过连接包含索引序列的可测序适配子序列来并入索引分子。在包括不能进行酶法修饰的标记物分子(例如不具有磷酸主链的DNA类似物)的方法的实施方案中,索引序列是在合成期间在类似物标记物分子的3’端并入的。将两个或更多个样品的测序文库合并并且加载到测序仪的流动池中,在步骤540中以大规模平行方式对它们测序。在步骤550中,分析所有测序信息并且在步骤560中,根据与标记物分子有关的测序信息而检验源样品的完整性。检验多个源样品每一者的完整性是通过首先将与相同索引序列有关的序列标签分组以使属于由多个样品的基因组分子构成的每个文库的这些基因组序列和标记物序列与判别序列相关而完成的。然后对所分组的标记物和基因组序列进行分析,以检验针对标记物分子所获得的序列对应于添加到对应源样品中的已知唯一序列。若检验样品完整性,则可以分析与样品基因组核酸有关的测序信息,以提供与源样品所得自的受试者有关的遗传信息。举例来说,若检验样品完整性,则分析与基因组核酸有关的测序信息以确定存在或不存在染色体异常。标记物分子的测序信息与已知序列之间缺乏一致性表示样品混乱,并且不考虑与基因组cfDNA分子有关的附随测序信息。Fig. 5 provides a flow chart of an embodiment 500 of the method for checking sample integrity, and these samples are subjected to multi-step multiple sequencing biological analysis, that is, the nucleic acid of individual samples is combined and sequenced as a complex mixture. In step 510, multiple biological source samples are obtained, each sample comprising genomic nucleic acid. In step 520, a unique marker nucleic acid is combined with each biological source sample to obtain multiple uniquely labeled samples. In step 530, a sequencing library of sample genomic nucleic acid and marker nucleic acid is prepared for each uniquely labeled sample. The library preparation of the sample scheduled for multiple sequencing includes incorporating a unique index tag into the marker nucleic acid of the sample and each uniquely labeled sample to provide a sample in which its source nucleic acid sequence can be correlated with the corresponding marker nucleic acid sequence and identified in the complex solution. In embodiments of the method comprising a marker molecule (e.g., DNA) that can be enzymatically modified, an index molecule can be incorporated into the 3' end of the sample and the marker molecule by connecting a sequenceable adapter sequence comprising an index sequence. In the embodiment of the method comprising a marker molecule that cannot be enzymatically modified (e.g., a DNA analog without a phosphate backbone), an index sequence is incorporated into the 3' end of the analog marker molecule during synthesis. The sequencing libraries of two or more samples are merged and loaded into the flow cell of a sequencer, and in step 540, they are sequenced in a large-scale parallel manner. In step 550, all sequencing information is analyzed and in step 560, the integrity of the source sample is verified based on the sequencing information relevant to the marker molecule. The integrity of each of the multiple source samples is verified by first grouping the sequence tags relevant to the same index sequence so that these genomic sequences and marker sequences of each library consisting of the genomic molecules of multiple samples are related to the discriminant sequence and completed. The grouped markers and genomic sequences are then analyzed to verify that the sequence obtained for the marker molecule corresponds to the known unique sequence added to the corresponding source sample. If the sample integrity is verified, the sequencing information relevant to the sample genomic nucleic acid can be analyzed to provide the genetic information relevant to the subject derived from the source sample. For example, if the sample integrity is verified, the sequencing information relevant to the genomic nucleic acid is analyzed to determine the presence or absence of chromosomal abnormalities. The lack of concordance between the sequencing information of the marker molecules and the known sequence indicates sample confounding and does not take into account the accompanying sequencing information associated with the genomic cfDNA molecules.
测定CNV用于产前诊断Detection of CNV for prenatal diagnosis
在母体血液中循环的无细胞胎儿DNA和RNA可以被用于数目不断增加的遗传状况的早期无创性产前诊断(NIPD),既可用于妊娠管理也可帮助生殖决策。在血流中循环的无细胞DNA的存在已经为人所知超过50年了。最近,在怀孕期间的母体血流中发现了存在小量循环的胎儿DNA(Lo(罗)等人, Lancet(柳叶刀)350:485-487[1997])。被认为是源自垂死的胎盘细胞,无细胞胎儿DNA(cfDNA)已经被证明是由长度上典型地小于200bp的短片段组成,(Chan(陈)等人,临床化学,50:88-92[2004]),在早到只有4周妊娠的时候可以被辨明(Illanes(伊拉尼斯)等人,Early Human Dev(早期人类发育), 83:563-566[2007]),并且已知在分娩的数小时内即被从母体循环中清除(Lo (罗)等人,Am J Hum Genet(美国人类遗传学杂志),64:218-224[1999])。除了cfDNA外,在母体血流中还可以辨明无细胞胎儿RNA的(cfRNA)的片段,这是源自在胎儿或胎盘中被转录的基因。来自母体血液样品的这些胎儿遗传要素的提取和随后的分析提供了用于NIPD的新机会。Cell-free fetal DNA and RNA circulating in maternal blood can be used for early non-invasive prenatal diagnosis (NIPD) of a growing number of genetic conditions, both for pregnancy management and to aid reproductive decision-making. The presence of cell-free DNA circulating in the bloodstream has been known for over 50 years. Recently, the presence of small amounts of circulating fetal DNA in the maternal bloodstream during pregnancy was discovered (Lo et al., Lancet 350:485-487 [1997]). Thought to be derived from dying placental cells, cell-free fetal DNA (cfDNA) has been shown to be composed of short fragments typically less than 200 bp in length (Chan et al., Clinical Chemistry, 50:88-92 [2004]), can be identified as early as 4 weeks of gestation (Illanes et al., Early Human Dev, 83:563-566 [2007]), and is known to be cleared from the maternal circulation within hours of delivery (Lo et al., Am J Hum Genet, 64:218-224 [1999]). In addition to cfDNA, fragments of cell-free fetal RNA (cfRNA) can also be identified in the maternal bloodstream, which are derived from genes transcribed in the fetus or placenta. The extraction and subsequent analysis of these fetal genetic elements from maternal blood samples provides new opportunities for NIPD.
本方法是一种独立于多态性的方法,它是供在NIPD中使用的并且它不要求从母体cfDNA辨别出胎儿cfDNA以便能够确定胎儿非整倍性。在一些实施方案中,该非整倍性是一种完整染色体三体性或单体性,或一种部分三体性或单体性。部分非整倍性是由获得或丢失部分染色体引起的,并且涵盖染色体不平衡,这些不平衡生成自不平衡的易位、不平衡的倒位、缺失和插入。至今,与生命能共存的最常见的已知非整倍性是三体性21,即唐氏综合症(DS),它由存在部分或全部的染色体21引起的。很少情况下,DS可以由一种遗传的或偶发的缺陷引起,由此染色体21的全部或部分的一个额外拷贝变成附接到另一染色体(通常是染色体14)上,以形成一个单畸变染色体。DS是与智力损伤、严重的学习困难以及由长期健康问题(例如心脏病)引起的超额死亡率相关联的。具有已知临床显著性的其他非整倍性包括爱德华综合症(三体性18) 和帕塔综合征(三体性13),它们在前几个月的生命经常是致命性的。与性染色体数目相关的非整倍性也是已知的、并且包括单体性X,例如在女性新生儿中的特纳综合征(XO))和三倍X综合症(XXX),以及在男性新生儿中的柯林菲特综合征(XXY)和XYY综合症,它们全部都与包括不育和智力技能降低的不同表型相关联。单体性X[45,X]是早孕流产的常见原因,其在自发性流产中占大约7%。基于1-2/10,000的45,X(也称为特纳综合征)活产频率,估计不到1%的45,X胎体幸存到分娩期。大约30%的特纳综合征患者是45,X细胞系和46,XX细胞系或含有重排X染色体的细胞系的嵌合体(虎克(Hook)和沃伯顿(Warburton),1983)。活产婴儿的表型相对温和(考虑高胚胎致死率) 且已假设患有特纳综合征的可能所有活产女性携带含有两个性染色体的细胞系。单体性X能够以45,X或以45,X/46XX发生于女性中,并且以45,X/46XY 发生于男性中。人类中的常染色体单体性总体上被认为是与生命不相容的;然而,相当多的细胞遗传学报告描述了活产幼儿的一个染色体21的完全单体性 (沃斯兰娃(Vosranova)等人,分子细胞遗传学(MolecularCytogen.)1:13[2008];朱坦(Joosten)等人,产前诊断(Prenatal Diagn.)17:271-5[1997])。在此描述的方法可用于产前诊断这些及其他染色体异常。This method is a method independent of polymorphism, it is for use in NIPD and it does not require to distinguish fetal cfDNA from maternal cfDNA so that fetal aneuploidy can be determined. In some embodiments, the aneuploidy is a complete chromosome trisomy or monosomy, or a partial trisomy or monosomy. Partial aneuploidy is caused by gaining or losing part of a chromosome, and encompasses chromosomal imbalances, which are generated from unbalanced translocations, unbalanced inversions, deletions, and insertions. So far, the most common known aneuploidy that can coexist with life is trisomy 21, i.e. Down syndrome (DS), which is caused by the presence of part or all of chromosome 21. Rarely, DS can be caused by a hereditary or sporadic defect, whereby an extra copy of all or part of chromosome 21 becomes attached to another chromosome (usually chromosome 14) to form a single aberrant chromosome. DS is associated with intellectual impairment, severe learning difficulties, and excess mortality caused by long-term health problems (such as heart disease). Other aneuploidies with known clinical significance include Edward's syndrome (trisomy 18) and Patau syndrome (trisomy 13), which are often fatal in the first few months of life. Aneuploidies related to the number of sex chromosomes are also known and include monosomy X, such as Turner syndrome (XO) and triple X syndrome (XXX) in female newborns, and Klinefelter syndrome (XXY) and XYY syndrome in male newborns, all of which are associated with different phenotypes including infertility and reduced intellectual skills. Monosomy X [45, X] is a common cause of early pregnancy miscarriage, accounting for approximately 7% of spontaneous miscarriages. Based on a live birth frequency of 45, X (also known as Turner syndrome) of 1-2/10,000, it is estimated that less than 1% of 45, X fetuses survive to the delivery period. Approximately 30% of patients with Turner syndrome are mosaics with a 45,X cell line and a 46,XX cell line, or a cell line containing a rearranged X chromosome (Hook and Warburton, 1983). The phenotype in liveborn infants is relatively mild (given the high embryonic lethality rate), and it has been hypothesized that likely all liveborn females with Turner syndrome carry a cell line containing both sex chromosomes. Monosomy X can occur in females as 45,X or 45,X/46XX, and in males as 45,X/46XY. Autosomal monosomy in humans is generally considered incompatible with life; however, a number of cytogenetic reports have described complete monosomy of one chromosome 21 in live-born children (Vosranova et al., Molecular Cytogen. 1:13 [2008]; Joosten et al., Prenatal Diagn. 17:271-5 [1997]). The methods described herein can be used for prenatal diagnosis of these and other chromosomal abnormalities.
根据一些实施方案,在此披露的方法可以确定染色体1到22、X和Y中任一染色体的染色体三体性的存在或不存在。可以根据本发明方法检测的染色体三体性实例包括但不限于三体性21(T21;唐氏综合征)、三体性18(T18;爱德华综合征)、三体性16(T16)、三体性20(T20)、三体性22(T22;猫眼综合征)、三体性15(T15;普瑞德威利综合征)、三体性13(T13;帕塔综合征)、三体性8(T8;华卡尼综合征(Warkany Syndrome))、三体性9、和XXY (克莱里菲尔特综合征)、XYY或XXX三体性。其他常染色体的完全三体性以非嵌合态存在时是致命的,但是以嵌合态存在时可以与生命相容。应了解,在胎儿cfDNA中,不同的完全三体性(不论以嵌合态或非嵌合态存在)以及部分三体性可以根据在此提供的传授内容加以测定。According to some embodiments, the methods disclosed herein can determine the presence or absence of a chromosomal trisomy for any of chromosomes 1 to 22, X, and Y. Examples of chromosomal trisomies that can be detected by the methods of the present invention include, but are not limited to, trisomy 21 (T21; Down syndrome), trisomy 18 (T18; Edwards syndrome), trisomy 16 (T16), trisomy 20 (T20), trisomy 22 (T22; cat eye syndrome), trisomy 15 (T15; Prader-Willi syndrome), trisomy 13 (T13; Patau syndrome), trisomy 8 (T8; Warkany syndrome), trisomy 9, and XXY (Klinefelter syndrome), XYY, or XXX trisomy. Complete trisomies of other autosomes are lethal when present in a non-mosaic state, but can be compatible with life when present in a mosaic state. It will be appreciated that in fetal cfDNA, various complete trisomies (whether present in a mosaic or non-mosaic state) as well as partial trisomies can be determined according to the teachings provided herein.
可以利用本发明方法测定的部分三体性的非限制性实例包括但不限于部分三体性1q32-44、三体性9p、三体性4嵌合体、三体性17p、部分三体性 4q26-qter、部分2p三体性、部分三体性1q和/或部分三体性6p/单体性6q。Non-limiting examples of partial trisomies that can be determined using the methods of the present invention include, but are not limited to, partial trisomy 1q32-44, trisomy 9p, trisomy 4 mosaicism, trisomy 17p, partial trisomy 4q26-qter, partial 2p trisomy, partial trisomy 1q, and/or partial trisomy 6p/monosomy 6q.
在此披露的方法还可以用于测定染色体单体性X、染色体单体性21、以及部分单体性,诸如单体性13、单体性15、单体性16、单体性21以及单体性 22,已知它们与怀孕流产有关。还可以利用在此描述的方法来测定典型地与完全非整倍性有关的染色体的部分单体性。可以根据本发明的方法确定的缺失综合征的非限制性实例包括因染色体部分缺失所致的综合征。可以根据在此描述的方法测定的部分缺失的实例包括但不限于染色体1、4、5、7、11、18、15、 13、17、22以及10的部分缺失,其描述于下文中。The method disclosed herein can also be used to measure chromosome monosomy X, chromosome monosomy 21 and partial monosomy, such as monosomy 13, monosomy 15, monosomy 16, monosomy 21 and monosomy 22, which are known to be relevant to pregnancy and miscarriage. The method described herein can also be utilized to measure the partial monosomy of the chromosome typically relevant to complete aneuploidy. Non-limiting examples of deletion syndromes that can be determined according to the method of the present invention include syndromes caused by partial deletion of chromosomes. Examples of partial deletions that can be measured according to the method described herein include but are not limited to the partial deletion of chromosomes 1, 4, 5, 7, 11, 18, 15, 13, 17, 22 and 10, which are described below.
1q21.1缺失综合征或1q21.1(复发性)微缺失是染色体1的罕见畸形。缺失综合征之后,还存在1q21.1复制综合征。尽管缺失综合征在特定点缺少 DNA的一部分,但复制综合征在相同点存在DNA的类似部分的两个或三个拷贝。文献中提及了缺失和复制是1q21.1拷贝数变异(CNV)。1q21.1缺失可以与TAR综合征(血小板减少症伴桡骨缺失)有关。1q21.1 deletion syndrome or 1q21.1 (recurrent) microdeletion is a rare abnormality of chromosome 1. Following deletion syndrome, there is also 1q21.1 duplication syndrome. While deletion syndromes involve a missing portion of DNA at a specific point, duplication syndromes involve two or three copies of a similar portion of DNA at the same point. Deletions and duplications are referred to in the literature as 1q21.1 copy number variations (CNVs). 1q21.1 deletions can be associated with TAR syndrome (thrombocytopenia with absent radius).
沃尔夫-赫奇霍恩综合征(Wolf-Hirschhorn syndrome,WHS)(OMIN #194190)是一种与染色体4p16.3的半合子缺失有关的毗连基因缺失综合征。沃尔夫-赫奇霍恩综合征是一种先天性畸形综合征,其特征为出生前和出生后生长不足、不同程度的发育障碍、有特点的颅面特征(呈‘希腊战士头盔’外貌的鼻、高前额、凸颊、器官距离过远、高弓形眉毛、眼睛突出、内眦赘皮、短人中、嘴巴明晰嘴角下转、以及小下颌)、以及癫痫症。Wolf-Hirschhorn syndrome (WHS) (OMIN #194190) is a contiguous gene deletion syndrome associated with a hemizygous deletion of chromosome 4p16.3. WHS is a congenital malformation syndrome characterized by prenatal and postnatal growth failure, varying degrees of developmental impairment, distinctive craniofacial features (a 'Greek warrior helmet'-like nose, high forehead, prominent cheeks, hypertelorism, high-arched eyebrows, protruding eyes, epicanthus, short philtrum, a well-defined mouth with downturned corners, and a micrognathia), and epilepsy.
染色体5的部分缺失(亦称为5p-或5p减,并且称为猫叫综合征(Cris du Chatsyndrome(OMIN#123450))是因染色体5的短臂(短臂)(5p15.3-p15.2) 缺失所致。患有此病状的婴儿经常发出听起来象猫叫的高音调叫声。该病症的特征为智能障碍和发育延缓、头尺寸小(小头畸形)、出生体重低、以及婴儿期肌张力弱(张力过弱)、有特点的面部特征以及可能存在的心脏缺陷。Partial deletion of chromosome 5 (also known as 5p- or 5p-minus, and Cris du Chat syndrome (OMIN#123450)) is caused by the deletion of the short arm of chromosome 5 (5p15.3-p15.2). Infants with this condition often have a high-pitched vocalization that sounds like a cat. The condition is characterized by intellectual disability and developmental delays, small head size (microcephaly), low birth weight, weak muscle tone (hypotonia) in infancy, distinctive facial features, and possible heart defects.
亦称为染色体7q11.23缺失综合征(OMIN 194050)的威廉-毕仁综合征(Williams-Beuren Syndrome)是导致多系统病症的毗连基因缺失综合征,其因染色体7q11.23上的1.5Mb到1.8Mb的半合子缺失所引起,这个半合子缺失含有大致28个基因。Williams-Beuren syndrome (WBS), also known as chromosome 7q11.23 deletion syndrome (OMIN 194050), is a multisystem contiguous gene deletion syndrome caused by a hemizygous deletion of 1.5 to 1.8 Mb on chromosome 7q11.23 encompassing approximately 28 genes.
亦称为11q缺失病症的雅克布森综合征(Jacobsen Syndrome)是一种罕见的先天性病症,其因包括区带11q24.1的染色体11的末端区域缺失所引起。其可导致智能障碍、有特点的面貌、以及各种各样的实际问题,包括心脏缺陷和流血病症。Jacobsen syndrome, also known as 11q deletion disorder, is a rare congenital disorder caused by a deletion of the terminal region of chromosome 11, including band 11q24.1. It can lead to intellectual disability, distinctive facies, and a variety of practical problems, including heart defects and bleeding disorders.
被称为单体性18p的染色体18的部分单体性是一种罕见的染色体病症,其中缺失染色体18的全部或部分的短臂(p)(单染色体的)。这种疾病典型地特征在于身材矮小,程度可变的精神发育迟缓,语言发育迟缓,颅骨和面部 (颅面)区域的畸形,和/或额外的身体异常。对于不同案例,相关颅面缺损可以在范围和严重性上变化很大。Partial monosomy of chromosome 18, known as monosomy 18p, is a rare chromosomal disorder in which all or part of the short arm (p) of chromosome 18 is missing (monosomal). This disease is typically characterized by short stature, variable mental retardation, delayed speech development, malformations of the skull and facial (craniofacial) regions, and/or additional physical abnormalities. The associated craniofacial defects can vary greatly in scope and severity from case to case.
由染色体15的结构或拷贝数目的变化引起的病况包括安格曼综合征和普瑞德-威利氏综合征,它们涉及在染色体15的同一个部分(15q11-q13区域) 中的基因活性的丢失。应当理解,在父母携带者中,若干易位和微缺失可以是无症状的,但仍可以引起后代中的主要遗传疾病。例如,携带15q11-q13微缺失的健康母亲可以生出患有安格曼综合征(一种严重的神经变性疾病)的孩子。因此,在此描述的方法、设备以及系统可以用于识别胎儿中的此类部分缺失和其他缺失。Conditions caused by changes in the structure or copy number of chromosome 15 include Angelman syndrome and Prader-Willi syndrome, which involve the loss of gene activity in the same part of chromosome 15 (15q11-q13 region). It should be understood that in parental carriers, some translocations and microdeletions can be asymptomatic, but can still cause major genetic diseases in offspring. For example, a healthy mother carrying a 15q11-q13 microdeletion can give birth to a child suffering from Angelman syndrome (a serious neurodegenerative disease). Therefore, the methods, devices, and systems described herein can be used to identify such partial deletions and other deletions in a fetus.
部分单体性13q是一种罕见的染色体疾病,它发生在染色体13的长臂(q) 的一段缺失时(单体的)。出生时患有部分单体性13q的婴儿会表现出低出生体重、头和面部(颅面区域)的畸形、骨骼异常(尤其是手和脚)、以及其他身体异常。精神发育迟缓是该病况的特征。在出生时患有该疾病的个体中,婴儿期的死亡率是很高的。几乎所有部分单体性13q的病例都没有明显原因而随机发生(偶发性的)。Partial monosomy 13q is a rare chromosomal disorder that occurs when a segment of the long arm (q) of chromosome 13 is deleted (monosomic). Infants born with partial monosomy 13q have low birth weight, malformations of the head and face (craniofacial region), skeletal abnormalities (especially in the hands and feet), and other physical abnormalities. Mental retardation is a hallmark of this condition. Among individuals born with the disorder, the mortality rate during infancy is high. Almost all cases of partial monosomy 13q occur randomly (sporadic) for no apparent reason.
史密斯-玛吉尼斯综合征(Smith-Magenis syndrome)(SMS-OMIM#182290) 是因染色体17的一个拷贝上的缺失或遗传物质丢失所致。这个有名的综合征与发育迟缓、精神发育迟缓、智力低下、先天异常(诸如心脏和肾脏缺陷)、以及神经行为异常(诸如严重睡眠紊乱和自我伤害行为)有关。史密斯-玛吉尼斯综合征(SMS)在大多数情况(90%)下是因染色体17p11.2中的3.7-Mb 中间缺失所致。Smith-Magenis syndrome (SMS-OMIM#182290) is caused by a deletion, or loss of genetic material, on one copy of chromosome 17. This well-known syndrome is associated with developmental delay, mental retardation, intellectual disability, congenital anomalies (such as heart and kidney defects), and neurobehavioral abnormalities (such as severe sleep disturbances and self-injurious behaviors). Smith-Magenis syndrome (SMS) is caused in most cases (90%) by a 3.7-Mb interstitial deletion on chromosome 17p11.2.
22q11.2缺失综合征,也称为迪格奥尔格综合征,是由一小段染色体22 的缺失引起的综合症。这种缺失(22q11.2)发生在这对染色体之一的长臂上的染色体中部附近。该综合症的特征甚至在同一家族的成员中也会变化非常广,并且影响身体的很多部分。特征性迹象和症状可以包括出生缺陷,如先天性心脏病,最常见地涉及关闭的神经肌肉问题(腭咽关闭不全)的颚缺陷,学习障碍,面部特征中的轻微差异,以及复发性感染。染色体区域22q11.2中的微缺失是与精神分裂症的20至30倍的风险增加相关联的。22q11.2 deletion syndrome, also known as DiGeorge syndrome, is a syndrome caused by the deletion of a small segment of chromosome 22. This deletion (22q11.2) occurs near the middle of the chromosome on the long arm of one of the pair of chromosomes. The features of the syndrome can vary widely even among members of the same family and affect many parts of the body. Characteristic signs and symptoms can include birth defects such as congenital heart disease, jaw defects that most commonly involve a neuromuscular problem of closing (velopharyngeal insufficiency), learning disabilities, slight differences in facial features, and recurrent infections. Microdeletions in the chromosome region 22q11.2 are associated with a 20- to 30-fold increased risk of schizophrenia.
染色体10短臂上的缺失与迪格奥尔格综合征样的表型有关。染色体10p 的部分单体性是罕见的,但是已经在一部分显示迪格奥尔格综合征特征的患者中观察到。Deletions on the short arm of chromosome 10 are associated with a DiGeorge syndrome-like phenotype. Partial monosomy of chromosome 10p is rare but has been observed in a subset of individuals with features of DiGeorge syndrome.
在一个实施方案中,在此描述的方法、设备以及系统被用来测定部分单体性,包括但不限于染色体1、4、5、7、11、18、15、13、17、22以及10的部分单体性,还可以使用该方法来测定例如部分单体性1q21.11、部分单体性 4p16.3、部分单体性5p15.3-p15.2、部分单体性7q11.23、部分单体性11q24.1、部分单体性18p、染色体15的部分单体性(15q11-q13)、部分单体性13q、部分单体性17p11.2、染色体22的部分单体性(22q11.2)、以及部分单体性10p。In one embodiment, the methods, apparatus, and systems described herein are used to determine partial monosomy, including but not limited to partial monosomy of chromosomes 1, 4, 5, 7, 11, 18, 15, 13, 17, 22, and 10. The methods can also be used to determine, for example, partial monosomy 1q21.11, partial monosomy 4p16.3, partial monosomy 5p15.3-p15.2, partial monosomy 7q11.23, partial monosomy 11q24.1, partial monosomy 18p, partial monosomy of chromosome 15 (15q11-q13), partial monosomy 13q, partial monosomy 17p11.2, partial monosomy of chromosome 22 (22q11.2), and partial monosomy 10p.
可以根据在此描述的方法测定的其他部分单体性包括:不平衡易位 t(8;11)(p23.2;p15.5);11q23微缺失;17p11.2缺失;22q13.3缺失;Xp22.3微缺失;10p14缺失;20p微缺失[del(22)(q11.2q11.23)]、7q11.23以及7q36缺失; 1p36缺失;2p微缺失;1型神经纤维瘤病(17q11.2微缺失)、Yq缺失;4p16.3 微缺失;1p36.2微缺失;11q14缺失;19q13.2微缺失;鲁宾斯坦-泰比综合征 (Rubinstein-Taybi)(16p13.3微缺失);7p21微缺失;米勒-迪克综合征 (Miller-Dieker syndrome)(17p13.3);以及2q37微缺失。部分缺失可以是染色体的一部分的小缺失,或其可以是染色体的微缺失,其中可以发生单基因的缺失。Other partial monosomy that can be determined according to the methods described herein include: unbalanced translocation t(8;11)(p23.2;p15.5); 11q23 microdeletion; 17p11.2 deletion; 22q13.3 deletion; Xp22.3 microdeletion; 10p14 deletion; 20p microdeletion [del(22)(q11.2q11.23)], 7q11.23, and 7q36 deletions; 1p36 deletion; 2p microdeletion; neurofibromatosis type 1 (17q11.2 microdeletion), Yq deletion; 4p16.3 microdeletion; 1p36.2 microdeletion; 11q14 deletion; 19q13.2 microdeletion; Rubinstein-Taybi syndrome (16p13.3 microdeletion); 7p21 microdeletion; Miller-Dieker syndrome (19q13.2 microdeletion); syndrome) (17p13.3); and 2q37 microdeletion. A partial deletion can be a small deletion of part of a chromosome, or it can be a microdeletion of a chromosome, in which the deletion of a single gene can occur.
已经鉴别出因染色体臂的一部分复制所致的若干种复制综合征(参见 OMIN[在线人类孟德尔遗传(Online Mendelian Inheritance in Man),在 ncbi.nlm.nih.gov/omim在线查看)。在一个实施方案中,本发明方法可用于确定染色体1到22、X和Y中任一种染色体区段的复制和/或扩增的存在或不存在。可以根据本发明方法确定的复制综合征的非限制性实例包括染色体8、15、12 以及17的一部分的复制,其描述于下文中。Several duplication syndromes have been identified that result from duplication of a portion of a chromosome arm (see OMIN [Online Mendelian Inheritance in Man, available online at ncbi.nlm.nih.gov/omim]. In one embodiment, the methods of the present invention can be used to determine the presence or absence of duplication and/or amplification of any of chromosome segments 1 to 22, X, and Y. Non-limiting examples of duplication syndromes that can be determined according to the methods of the present invention include duplication of a portion of chromosomes 8, 15, 12, and 17, which are described below.
8p23.1复制综合征是因人类染色体8的一个区域的复制所引起的罕见遗传性障碍。这个复制综合征在出生者中的发病率估计为1/64,000,并且是8p23.1 缺失综合征的的倒数。8p23.1复制与不同表型有关,包括说话迟缓、发育迟缓、轻度异常形态、伴有前额凸出和弓形眉、以及先天性心脏病(CHD)中的一项或多项。8p23.1 duplication syndrome is a rare genetic disorder caused by a duplication of a region of human chromosome 8. This duplication syndrome has an estimated incidence of 1 in 64,000 births and is the reciprocal of 8p23.1 deletion syndrome. 8p23.1 duplication is associated with a diverse phenotype, including one or more of delayed speech, developmental delay, mild dysmorphism with forehead bossing and arched eyebrows, and congenital heart disease (CHD).
染色体15q复制综合征(Dup15q)是一种临床上能够鉴别的综合征,其因染色体15q11-13.1的复制所引起。患有Dup15q的婴儿通常呈现张力过弱(肌张力低)、生长迟缓;他们可能生来患有唇裂和/或腭裂或心脏、肾脏或其他器官畸形;他们显示某些程度的认知迟缓/障碍(精神发育迟缓)、说话和语言迟缓、以及感官处理失调。Chromosome 15q duplication syndrome (Dup15q) is a clinically identifiable syndrome caused by a duplication of chromosome 15q11-13.1. Infants with Dup15q typically present with hypotonia (low muscle tone) and growth retardation; they may be born with cleft lip and/or palate or malformations of the heart, kidneys, or other organs; and they display some degree of cognitive delay/impairment (mental retardation), speech and language delays, and sensory processing disorders.
帕尼斯特-凯廉综合征(Pallister Killian syndrome)是额外#12染色体物质的结果。通常存在细胞混合物(嵌合体),有些具有额外的#12物质,而有些为正常的(不具有额外#12物质的46条染色体)。患有这个综合征的婴儿存在很多问题,包括严重精神发育迟缓、肌张力低、“粗俗”的面部特征、以及前额凸出。他们倾向于具有非常薄的上唇、较厚的下唇、以及短鼻。其他健康问题包括癫痫、喂养不良、关节强硬、成人期白内障、听力损失和心脏缺陷。患有帕尼斯特-凯廉综合征的人寿命缩短。Pallister-Killian syndrome is the result of an extra chromosome #12. There is usually a mixture of cells (mosaicism), some with the extra #12 material and some normal (46 chromosomes without the extra #12 material). Babies with this syndrome have many problems, including severe mental retardation, low muscle tone, "coarse" facial features, and a protruding forehead. They tend to have a very thin upper lip, a thicker lower lip, and a short nose. Other health problems include seizures, poor feeding, ankylosing of the joints, cataracts in adulthood, hearing loss, and heart defects. People with Pallister-Killian syndrome have a shortened lifespan.
患有指定为dup(17)(p11.2p11.2)或dup17p的遗传病状的个体在染色体17 的短臂上携带额外的遗传信息(被称为复制)。染色体17p11.2的复制导致伯托奇-鲁普奇综合征(Potocki-Lupski syndrome,PTLS),其为刚识别出的遗传病状,医学文献中报告的病例只有几十例。具有这种复制的患者经常呈现肌张力低、喂养不良、以及婴儿期的发育停滞,并且还呈现动作和语言里程碑的发展延缓。患有PTLS的很多个体在发音和语言处理上有困难。另外,患者可能具有类似于自闭症或自闭症谱系障碍患者中所见的行为特征。患有PTLS的个体可能患有心脏缺陷和睡眠呼吸暂停。包括基因PMP22的染色体17p12中的较大区域的复制已知可导致查考特-玛利-吐斯疾病(Charcot-Marie-Tooth disease)。Individuals with a genetic condition designated dup(17)(p11.2p11.2) or dup17p carry extra genetic information (called a duplication) on the short arm of chromosome 17. Duplication of chromosome 17p11.2 causes Potocki-Lupski syndrome (PTLS), a newly identified genetic condition with only a few dozen cases reported in the medical literature. Patients with this duplication often experience low muscle tone, poor feeding, and growth retardation in infancy, and also exhibit delayed development of motor and language milestones. Many individuals with PTLS have difficulty with articulation and language processing. In addition, patients may have behavioral characteristics similar to those seen in patients with autism or autism spectrum disorder. Individuals with PTLS may suffer from heart defects and sleep apnea. Duplication of a large region in chromosome 17p12, including the gene PMP22, is known to cause Charcot-Marie-Tooth disease.
CNV已与死产有关。然而,由于传统细胞遗传学的固有限制,因此认为 CNV导致死产是未被充分代表的(哈里斯(Harris)等人,产前诊断(Prenatal Diagn)31:932-944[2011])。正如实例中所显示和本文中其他处所述,本方法能够确定部分非整倍性的存在,例如染色体区段的缺失和扩增,并且可用于鉴别和确定与死产有关的CNV的存在或不存在。CNVs have been associated with stillbirth. However, due to the inherent limitations of traditional cytogenetics, CNVs are considered to be underrepresented as contributing to stillbirth (Harris et al., Prenatal Diagn 31:932-944 [2011]). As shown in the Examples and described elsewhere herein, the present method can determine the presence of partial aneuploidies, such as deletions and amplifications of chromosome segments, and can be used to identify and determine the presence or absence of CNVs associated with stillbirth.
确定完整的胎儿染色体非整倍性Determine complete fetal chromosomal aneuploidy
在一个实施方案中,提供了用于在包含胎儿和母体核酸分子的母体测试样品中确定存在或不存在任何一种或多种不同的、完整的胎儿染色体非整倍性的方法。优选地,该方法确定了存在或不存在任何四种或更多种不同的、完整的胎儿染色体非整倍性。该方法的步骤包括:(a)获得针对在母体测试样品中的胎儿和母体核酸的序列信息;并且(b)使用该序列信息来针对选自染色体 1-22、X、以及Y的任何一个或多个感兴趣的染色体中每一个而识别出序列标签的一个数目,并且针对用于所述任何一个或多个感兴趣的染色体中每一个的一个归一化染色体序列而识别出序列标签的一个数目。这种归一化染色体序列可以是一个单染色体,或者它可以是选自染色体1-22、X、和Y的一组染色体。该方法进一步在步骤(c)中使用针对所述任何一个或多个感兴趣的染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化染色体序列识别出的所述序列标签的数目来针对所述任何一个或多个感兴趣的染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述任何一个或多个感兴趣的染色体中的每一个的每个所述单染色体剂量与针对所述任何一个或多个感兴趣的染色体中的每一个的一个阈值进行比较,由此来确定在该母本测试样品中存在或不存在任何一种或多种完整的、不同的胎儿染色体非整倍性。In one embodiment, a method for determining the presence or absence of any one or more different, complete fetal chromosome aneuploidy in a maternal test sample comprising a fetus and maternal nucleic acid molecules is provided. Preferably, the method determines the presence or absence of any four or more different, complete fetal chromosome aneuploidy. The steps of the method include: (a) obtaining sequence information for the fetus and maternal nucleic acid in the maternal test sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing chromosome sequence for each of the any one or more chromosomes of interest. This normalizing chromosome sequence can be a single chromosome, or it can be a group of chromosomes selected from chromosomes 1-22, X, and Y. The method further uses the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each of the normalizing chromosome sequences to calculate a single chromosome dose for each of the one or more chromosomes of interest in step (c); and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, thereby determining the presence or absence of any one or more complete, different fetal chromosomal aneuploidies in the maternal test sample.
在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single chromosome dose for each of the chromosomes of interest as the ratio of the number of sequence tags identified for each of the chromosomes of interest to the number of sequence tags identified for the normalizing chromosome sequence for each of the chromosomes of interest.
在其他实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化染色体识别出的序列标签数目的比率。在其他实施方案中,步骤(c)包括通过使针对感兴趣的染色体获得的序列标签数目与感兴趣的染色体的长度进行关联、并且使针对感兴趣的染色体的相应的归一化染色体序列的标签数目与归一化染色体序列的长度进行关联来计算一个感兴趣的染色体的序列标签比率,并且针对感兴趣的染色体来计算一个染色体剂量作为感兴趣的染色体的序列标签密度与针对归一化染色体序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。针对来自不同母体受试者的测试样品可以重复步骤(a)-(d)。In other embodiments, step (c) comprises calculating a single chromosome dosage for each described chromosome interested, as the ratio of the sequence label number identified for each described chromosome interested and the sequence label number identified for the described normalization chromosome of each described chromosome interested.In other embodiments, step (c) comprises by making the sequence label number obtained for chromosome interested be associated with the length of chromosome interested and making the number of labels for the corresponding normalization chromosome sequence of chromosome interested be associated with the length of normalization chromosome sequence to calculate the sequence label ratio of chromosome interested, and calculate a chromosome dosage as the sequence label density of chromosome interested and the ratio of the sequence label density for normalization chromosome sequence for chromosome interested.Repeat this calculation for each of all sequences interested.Can repeat step (a)-(d) for the test sample from different maternal subjects.
通过该实施方案的一个实例在一个包含胎儿和母体无细胞DNA分子的混合物的母体测试样品中确定了四种或更多种完整的胎儿染色体非整倍性,该实例包括:(a)对无细胞DNA分子中的至少一部分进行测序以便获得针对在测试样品中的胎儿和母体的无细胞DNA分子的序列信息;(b)使用该序列信息来针对选自染色体1-22、X、以及Y中的每一个感兴趣的任何二十个或更多个染色体识别出序列标签的一个数目并且来针对所述感兴趣的二十个或更多个染色体中每个的一个归一化染色体识别出序列标签的一个数目;(c)使用针对所述感兴趣的二十个或更多个染色体中每个所识别出的所述序列标签的数目以及针对每个归一化染色体识别出的序列标签的数目来对于所述感兴趣的二十个或更多个染色体中的每一个计算出一个单染色体剂量;并且(d)将针对所述感兴趣的二十个或更多个染色体中每一个的每个单染色体剂量与针对所述感兴趣的二十个或更多个染色体中每一个的一个阈值进行比较,并且由此来确定在测试样品中存在或不存在任何二十种或更多种不同的、完整的胎儿染色体非整倍性。By one example of this embodiment, four or more complete fetal chromosomal aneuploidies are determined in a maternal test sample comprising a mixture of fetal and maternal cell-free DNA molecules, the example comprising: (a) sequencing at least a portion of the cell-free DNA molecules to obtain sequence information for the fetal and maternal cell-free DNA molecules in the test sample; (b) using the sequence information to identify a number of sequence tags for each of any twenty or more chromosomes of interest selected from chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalizing chromosome for each of the twenty or more chromosomes of interest; (c) using the number of sequence tags identified for each of the twenty or more chromosomes of interest and the number of sequence tags identified for each normalizing chromosome to calculate a single chromosome dose for each of the twenty or more chromosomes of interest; and (d) comparing each single chromosome dose for each of the twenty or more chromosomes of interest to a threshold value for each of the twenty or more chromosomes of interest, and thereby determining the presence or absence of any twenty or more different complete fetal chromosomal aneuploidies in the test sample.
在另一实施方案中,如以上所述的用于确定在母体测试样品中存在或不存在任何一个或多个不同的、完整的胎儿染色体非整倍性的方法使用了一个归一化区段序列用于确定感兴趣的染色体的剂量。在这种情况中,该方法包括: (a)获得针对在所述样品中的胎儿和母体核酸的序列信息;并且(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的任何一个或多个感兴趣的染色体中的每一个识别出序列标签的一个数目,并且针对用于所述任何一个或多个感兴趣的染色体中的每一个的一个归一化染色体序列识别出序列标签的一个数目。该归一化区段序列可以是染色体的单区段,或者它可以是来自一个或多个不同染色体的一组区段。该方法进一步在步骤(c)中使用针对所述任何一个或多个感兴趣的染色体中的每一个识别出的所述序列标签数目以及针对所述归一化区段序列识别出的所述序列标签数目来针对所述任何一个或多个感兴趣的染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述任何一个或多个感兴趣的染色体中的每一个的每个所述单染色体剂量与针对所述一个或多个感兴趣的染色体中的每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种不同的、完整的胎儿染色体非整倍性。In another embodiment, the method for determining the presence or absence of any one or more different, complete fetal chromosomal aneuploidies in a maternal test sample as described above uses a normalizing segment sequence for determining the dosage of the chromosome of interest. In this case, the method comprises: (a) obtaining sequence information for fetal and maternal nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing chromosome sequence for each of any one or more chromosomes of interest. The normalizing segment sequence can be a single segment of a chromosome, or it can be a group of segments from one or more different chromosomes. The method further calculates a single chromosome dose for each of the one or more chromosomes of interest using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for the normalizing segment sequence in step (c); and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, and thereby determining the presence or absence of one or more different complete fetal chromosomal aneuploidies in the sample.
在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化区段序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single chromosome dose for each of the chromosomes of interest as the ratio of the number of sequence tags identified for each of the chromosomes of interest to the number of sequence tags identified for the normalizing segment sequence of each of the chromosomes of interest.
在其他实施方案中,步骤(c)包括通过使针对感兴趣的染色体获得的序列标签数目与感兴趣的染色体的长度进行关联、并且使针对感兴趣的染色体的相应的归一化区段序列的标签数目与归一化区段序列的长度进行关联来计算一个感兴趣的染色体的序列标签比率,并且针对所述感兴趣的染色体来计算一个染色体剂量作为所述感兴趣的染色体的序列标签密度与针对归一化区段序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。可以针对来自不同母体受试者的测试样品重复步骤(a)-(d)。In other embodiments, step (c) includes calculating a sequence tag ratio of a chromosome of interest by associating the number of sequence tags obtained for the chromosome of interest with the length of the chromosome of interest and associating the number of tags for the corresponding normalizing segment sequence of the chromosome of interest with the length of the normalizing segment sequence, and calculating a chromosome dose as the ratio of the sequence tag density of the chromosome of interest to the sequence tag density of the normalizing segment sequence for the chromosome of interest. This calculation is repeated for each of all sequences of interest. Steps (a)-(d) can be repeated for test samples from different maternal subjects.
通过确定归一化的染色体值(NCV)提供了用于比较不同样品组的染色体剂量的手段,这使测试样品中的染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联。计算这个NCV,作为:A means for comparing chromosome doses of different sample groups is provided by determining a normalized chromosome value (NCV), which relates the chromosome dose in a test sample to the mean of the corresponding chromosome dose in a set of qualified samples. This NCV is calculated as:
其中和分别对应地是对于在一组合格样品中的第j个染色体剂量的估算平均值以及标准差,并且是对于测试样品i所观察到的第j个染色体剂量。where and are the estimated mean and standard deviation, respectively, for the jth chromosome dose in a set of qualified samples, and is the observed jth chromosome dose for test sample i.
在一些实施方案中,确定了存在或不存在至少一种完整的胎儿染色体非整倍性。在其他实施方案中,在一个样品中确定了存在或不存在至少两种、至少三种、至少四种、至少五种、至少六种、至少七种、至少八种、至少九种、至少十种、至少十一种、至少石二种、至少十三种、至少十四种、至少十五种、至少十六种、至少十七种、至少十八种、至少十九种、至少二十种、至少二十一种、至少二十二种、至少二十三种、或二十四种完整的胎儿染色体非整倍性,其中完整的胎儿染色体非整倍性中的二十二种对应于任何一种或多种常染色体的的完整的染色体性非整倍性;第二十三和第二十四种染色体性非整倍性对应于染色体X和Y的完整的胎儿染色体非整倍性。因为性染色体的非整倍性可以包括四体性、五体性和其他多体性,所以可以根据本方法确定的不同的完整的染色体性非整倍性的数目可以是至少24种、至少25种、至少26种、至少27种、至少28种、至少29种、或至少30种完整的染色体性非整倍性。因此,被确定的不同的完整的染色体性非整倍性的数目与选择用于分析的感兴趣的染色体的数目是相关的。In some embodiments, the presence or absence of at least one complete fetal chromosomal aneuploidy is determined. In other embodiments, the presence or absence of at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least twelve, at least thirteen, at least fourteen, at least fifteen, at least sixteen, at least seventeen, at least eighteen, at least nineteen, at least twenty, at least twenty-one, at least twenty-two, at least twenty-three, or twenty-four complete fetal chromosomal aneuploidies is determined in a sample, wherein twenty-two of the complete fetal chromosomal aneuploidies correspond to complete fetal chromosomal aneuploidies of any one or more autosomes; and the twenty-third and twenty-fourth chromosomal aneuploidies correspond to complete fetal chromosomal aneuploidies of chromosomes X and Y. Because sex chromosome aneuploidies can include tetrasomy, pentasomy and other polysomy, the number of different complete chromosomal aneuploidies that can be determined according to the present method can be at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 complete chromosomal aneuploidies. Therefore, the number of different complete chromosomal aneuploidies determined is related to the number of chromosomes of interest selected for analysis.
在一个实施方案中,如以上所述的确定在母体测试样品中存在或不存在任何一个或多个不同的、完整的胎儿染色体非整倍性使用了针对一个感兴趣的染色体的归一化区段序列,它是选自染色体1-22、X、和Y。在其他实施方案中,两个或更多个感兴趣的染色体是选自染色体1、2、3、4、5、6、7、8、9、 10、11、12、13、14、15、16、17、18、19、20、21、22、X、或Y中的任何两个或更多个。在一个实施方案中,选自染色体1-22、X、和Y的任何一个或多个感兴趣的染色体包括选自染色体1-22、X、和Y的至少二十个染色体,并且其中确定了存在或不存在至少二十种不同的、完整的胎儿染色体非整倍性。在其他实施方案中,选自染色体1-22、X、和Y的任何一个或多个感兴趣的染色体是全部的染色体1-22、X、和Y,并且其中确定了存在或不存在全部染色体1-22、X、和Y的完整的胎儿染色体非整倍性。可以确定的完整的不同胎儿染色体非整倍性包括完整染色体三体性、完整染色体单体性以及完整染色体多体性。完整的胎儿染色体非整倍性的实例包括但不局限于:任何一个或多个常染色体的三体性,例如三体性2、三体性8、三体性9、三体性20、三体性21、三体性13、三体性16、三体性18、三体性22;性染色体的三体性,例如47,XXY、 47XXX、以及47XYY;性染色体的四体性,例如48,XXYY、48,XXXY、48XXXX、以及48,XYYY;性染色体的五体性,例如49,XXXYY、49,XXXXY、49,XXXXX、 49,XYYYY;以及单体性X。以下将说明可以根据本方法确定的其他完整的胎儿染色体非整倍性。In one embodiment, determining the presence or absence of any one or more different complete fetal chromosomal aneuploidies in a maternal test sample as described above uses a normalizing segment sequence for a chromosome of interest selected from chromosomes 1-22, X, and Y. In other embodiments, two or more chromosomes of interest are selected from any two or more of chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y include at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least twenty different complete fetal chromosomal aneuploidies is determined. In other embodiments, any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y are all chromosomes 1-22, X, and Y, and wherein a complete fetal chromosomal aneuploidy is determined for the presence or absence of all chromosomes 1-22, X, and Y. Different complete fetal chromosomal aneuploidies that can be determined include complete chromosomal trisomies, complete chromosomal monosomies, and complete chromosomal polysomies. Examples of complete fetal chromosomal aneuploidies include, but are not limited to, trisomy of any one or more autosomes, such as trisomy 2, trisomy 8, trisomy 9, trisomy 20, trisomy 21, trisomy 13, trisomy 16, trisomy 18, and trisomy 22; trisomy of sex chromosomes, such as 47,XXY, 47XXX, and 47XYY; tetrasomy of sex chromosomes, such as 48,XXYY, 48,XXXY, 48XXXX, and 48,XYYY; pentasomy of sex chromosomes, such as 49,XXXYY, 49,XXXXY, 49,XXXXX, and 49,XYYYY; and monosomy X. Other complete fetal chromosomal aneuploidies that can be determined according to the present method are described below.
确定部分的胎儿染色体非整倍性Determine partial fetal chromosomal aneuploidy
在另一个实施方案中,提供了用于在包含胎儿和母体核酸分子的母体测试样品中确定存在或不存在任何一种或多种不同的、部分的胎儿染色体非整倍性的方法。该方法的步骤包括:(a)获得针对所述样品中的胎儿和母体核酸的序列信息;并且(b)使用该序列信息来针对选自染色体1-22、X、以及Y的任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个识别出序列标签的一个数目,并且针对用于任何一个或多个感兴趣的染色体中的所述任何一个或多个区段中的每一个的一个归一化区段序列识别出序列标签的一个数目。该归一化区段序列可以是一个染色体的单区段,或者它可以是来自一个或多个不同染色体的一组区段。该方法在步骤(c)中进一步使用针对所述任何一个或多个感兴趣的染色体的任何一个或多个区段识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个计算出一个单区段剂量;并且(d)将针对所述任何一个或多个感兴趣的染色体的任何一个或多个区段中每一个的每个所述单染色体剂量与针对所述任何一个或多个感兴趣的染色体的任何一个或多个染色体区段每一个的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种不同的、部分的胎儿染色体非整倍性。In another embodiment, a method for determining the presence or absence of any one or more different, partial fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acid molecules is provided. The steps of the method include: (a) obtaining sequence information for the fetal and maternal nucleic acids in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more segments of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing segment sequence for each of any one or more segments of any one or more chromosomes of interest. The normalizing segment sequence can be a single segment of a chromosome, or it can be a group of segments from one or more different chromosomes. The method further uses the number of sequence tags identified for any one or more segments of any one or more chromosomes of interest and the number of sequence tags identified for each of the normalizing segment sequences to calculate a single segment dose for each of any one or more segments of any one or more chromosomes of interest in step (c); and (d) compares each of the single chromosome doses for each of any one or more segments of any one or more chromosomes of interest to a threshold value for each of any one or more chromosome segments of any one or more chromosomes of interest, and thereby determines the presence or absence of one or more different partial fetal chromosomal aneuploidies in the sample.
在一些实施方案中,步骤(c)包括对于任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个计算出一个单区段剂量,作为针对任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个识别出的序列标签数目与针对所述任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个的所述归一化区段序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single segment dose for each of any one or more segments of any one or more chromosomes of interest as the ratio of the number of sequence tags identified for each of any one or more segments of any one or more chromosomes of interest to the number of sequence tags identified for the normalizing segment sequence for each of any one or more segments of any one or more chromosomes of interest.
在其他实施方案中,步骤(c)包括如下针对一个感兴趣的区段计算出一个序列标签比率:通过使针对感兴趣的区段获得的序列标签的数目与感兴趣的区段的长度进行关联、并且使针对感兴趣的区段的相应的归一化区段序列的标签的数目与归一化区段序列的长度进行关联、并且针对感兴趣的区段来计算一个区段剂量作为感兴趣的区段的序列标签密度与针对该归一化区段序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。可以针对来自不同母体受试者的测试样品重复步骤(a)-(d)。In other embodiments, step (c) comprises calculating a sequence tag ratio for a segment of interest by relating the number of sequence tags obtained for the segment of interest to the length of the segment of interest, and relating the number of tags for the corresponding normalizing segment sequence for the segment of interest to the length of the normalizing segment sequence, and calculating a segment dose for the segment of interest as the ratio of the sequence tag density for the segment of interest to the sequence tag density for the normalizing segment sequence. This calculation is repeated for each of all sequences of interest. Steps (a)-(d) can be repeated for test samples from different maternal subjects.
通过确定一个归一化的区段值(NSV)提供了用于比较不同样品组的区段剂量的手段,这使一个测试样品中的区段剂量与一组合格样品中的相应的区段剂量的平均值进行关联。计算NSV,作为:A means for comparing segment doses across different sample groups is provided by determining a normalized segment value (NSV), which relates the segment dose in a test sample to the mean of the corresponding segment doses in a set of qualified samples. The NSV is calculated as:
其中和对应地是对于在一组合格样品中的第j个区段剂量的估算平均值以及标准差,并且xij是对于测试样品i所观察到的第j个区段剂量。where and are the estimated mean and standard deviation, respectively, for the jth bin dose in a set of qualified samples, and x ij is the observed jth bin dose for test sample i.
在一些实施方案中,确定了存在或不存在一种部分的胎儿染色体非整倍性。在其他实施方案中,在一个样品中确定了存在或不存在两种、三种、四种、五种、六种、七种、八种、九种、十种、十五种、二十种、二十五种、或更多种部分的胎儿染色体非整倍性。在一个实施方案中,选自染色体1-22、X、和 Y中的任何一个的感兴趣的一个区段是选自染色体1-22、X、和Y。在另一实施方案中,选自染色体1-22、X、和Y的感兴趣的两个或更多个区段是选自染色体染色体1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、 18、19、20、21、22、X、或Y。在一个实施方案中,选自染色体1-22、X、和Y的感兴趣的任何一个或多个区段包括选自染色体1-22、X、和Y的至少一个、五个、十个、15个、20个、25个或更多个区段,并且其中确定了存在或不存在至少一种、五种、十种、15种、20种、25种不同的、部分的胎儿染色体非整倍性。可以确定的不同的、部分的胎儿染色体非整倍性包括部分复制、部分倍增、部分插入以及部分缺失。部分的胎儿染色体非整倍性的实例包括常染色体的部分单体性和部分三体性。常染色体的部分单体性包括染色体1的部分单体性、染色体4的部分单体性、染色体5的部分单体性、染色体7的部分单体性、染色体11的部分单体性、染色体15的部分单体性、染色体17的部分单体性、染色体18的部分单体性、以及染色体22的部分单体性。以下将说明可以根据本方法确定的其他部分的胎儿染色体非整倍性。In some embodiments, the presence or absence of a partial fetal chromosomal aneuploidy is determined. In other embodiments, the presence or absence of two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five, or more partial fetal chromosomal aneuploidies are determined in a sample. In one embodiment, a segment of interest selected from any one of chromosomes 1-22, X, and Y is selected from chromosomes 1-22, X, and Y. In another embodiment, two or more segments of interest selected from chromosomes 1-22, X, and Y are selected from chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, or Y. In one embodiment, any one or more segments of interest selected from chromosomes 1-22, X, and Y include at least one, five, ten, 15, 20, 25 or more segments selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least one, five, ten, 15, 20, 25 different, partial fetal chromosomal aneuploidies is determined. Different, partial fetal chromosomal aneuploidies that can be determined include partial duplication, partial multiplication, partial insertion, and partial deletion. Examples of partial fetal chromosomal aneuploidies include partial monosomy and partial trisomy of autosomes. Partial monosomy of autosomes includes partial monosomy of chromosome 1, partial monosomy of chromosome 4, partial monosomy of chromosome 5, partial monosomy of chromosome 7, partial monosomy of chromosome 11, partial monosomy of chromosome 15, partial monosomy of chromosome 17, partial monosomy of chromosome 18, and partial monosomy of chromosome 22. Other partial fetal chromosomal aneuploidies that can be determined according to this method will be described below.
在上述任何一个实施方案中,这种测试样品是选自血液、血浆、血清、尿和唾液样品的母本样品。在一些实施方案中,该母体测试样品是血浆样品。该母体样品的核酸分子是胎儿的和母体的无细胞DNA分子的混合物。可以使用如在本申请的其他地方所说明的下一代测序(NGS)进行核酸的测序。在一些实施方案中,测序是使用借助可逆染料终止子的合成法测序的大规模平行测序。在其他实施方案中,测序是连接法测序。在另外的其他实施方案中,测序是单分子测序。可任选地,在测序前进行一个扩增步骤。In any one of the above embodiments, this test sample is a maternal sample selected from blood, plasma, serum, urine and saliva samples. In some embodiments, the maternal test sample is a plasma sample. The nucleic acid molecules of the maternal sample are a mixture of fetal and maternal cell-free DNA molecules. The sequencing of nucleic acids can be carried out using next generation sequencing (NGS) as described elsewhere in this application. In some embodiments, sequencing is a large-scale parallel sequencing using synthesis sequencing with reversible dye terminators. In other embodiments, sequencing is ligation sequencing. In other embodiments, sequencing is single molecule sequencing. Optionally, an amplification step is performed before sequencing.
测定临床病症的CNVDetermining CNVs for clinical conditions
除早期测定出生缺陷之外,在此描述的方法可以用于测定基因组内的遗传序列在表达上的任何异常。基因组内的遗传序列在表达上的异常数目已经与不同病状有关。此类病状包括但不限于癌症、传染性以及自身免疫性疾病、神经系统疾病、新陈代谢和/或心血管疾病等等。In addition to early detection of birth defects, the methods described herein can be used to detect any abnormality in the expression of genetic sequences within the genome. Abnormal expression of genetic sequences within the genome has been associated with various conditions. Such conditions include, but are not limited to, cancer, infectious and autoimmune diseases, neurological diseases, metabolic and/or cardiovascular diseases, and the like.
相应地,在不同的实施方案中考虑了将在此所述的方法用于诊断和/或监视和/或治疗此等病状的用途。举例来说,这些方法可以用于确定疾病的存在或不存在、监视疾病的进展和/或治疗方案的功效、确定病原体(例如病毒)核酸的存在或不存在、确定与移植物抗宿主疾病(GVHD)有关的染色体异常、以及确定个体在法医检定法中的作用。Accordingly, in various embodiments, the methods described herein are contemplated for use in diagnosing and/or monitoring and/or treating such conditions. For example, the methods can be used to determine the presence or absence of a disease, monitor the progression of a disease and/or the efficacy of a treatment regimen, determine the presence or absence of pathogen (e.g., viral) nucleic acid, determine chromosomal abnormalities associated with graft-versus-host disease (GVHD), and determine the role of an individual in forensic medicine.
癌症的CNVCNVs in cancer
已经证明,来自癌症患者的血浆和血清DNA含有可测量的量值的肿瘤 DNA,它可以被回收并用作肿瘤DNA的代用源,并且肿瘤的特征为非整倍性、或者基因序列或甚至完整染色体的不适当数目。确定在来自一位个体的样品中的一个给定序列(即感兴趣的序列)的量的差异可以因此用于医学情况的预后和诊断。在一些实施方案中,本方法可以用于确定在怀疑或已知患有癌症的患者中存在或不存在染色体性非整倍性。It has been shown that plasma and serum DNA from cancer patients contain measurable tumor DNA, which can be recovered and used as a surrogate source of tumor DNA, and the characteristics of the tumor are aneuploidy or an inappropriate number of gene sequences or even complete chromosomes. The difference in the amount of a given sequence (i.e., sequence of interest) determined in a sample from an individual can therefore be used for the prognosis and diagnosis of medical conditions. In some embodiments, this method can be used to determine the presence or absence of chromosomal aneuploidy in a patient suspected or known to suffer from cancer.
在某些实施方案中,非整倍性是受试者的基因组的特征、并且引起了癌症易患性的总体提高。在某些实施方案中,易患肿瘤形成或肿瘤形成易患性提高的特定细胞(例如,肿瘤细胞、原肿瘤赘生性细胞等等)具有非整倍性特征。特定非整倍性与特定癌症或特定癌症易患性有关,如下文所述。In certain embodiments, aneuploidy is a characteristic of the subject's genome and causes an overall increase in cancer susceptibility. In certain embodiments, specific cells (e.g., tumor cells, protumor neoplastic cells, etc.) that are susceptible to tumor formation or have increased susceptibility to tumor formation have aneuploidy characteristics. Specific aneuploidy is associated with specific cancers or specific cancer susceptibility, as described below.
相应地,在此所述方法的不同实施方案提供了对受试者的测试样品中感兴趣的序列(例如临床相关序列)拷贝数变异的测定,其中拷贝数的某种变异提供了对存在癌症和/或癌症易患性的指标。在某些实施方案中,该样品包含来源于两种或更多种细胞的核酸的混合物。在一个实施方案中,该核酸混合物来源于正常细胞和癌细胞,癌细胞是来源于罹患医学病状(例如癌症)的受试者。Accordingly, the different embodiments of the methods described herein provide a measure of sequence (e.g., clinically relevant sequence) copy number variation of interest in a test sample from a subject, wherein a certain variation of copy number provides an index of cancer and/or cancer susceptibility. In certain embodiments, the sample comprises a mixture of nucleic acids derived from two or more cells. In one embodiment, the nucleic acid mixture derives from normal cells and cancer cells, and the cancer cell is a subject derived from a medical condition (e.g., cancer).
癌症的发展经常伴随全染色体数目的变化,即完全染色体非整倍性,和/ 或染色体区段数目的变化,即部分非整倍性,这些变化起因于被称为染色体不稳定性(CIN)的过程(汤姆(Thoma)等人,瑞士医学周刊(Swiss Med Weekly) 2011:141:w13170)。人们相信很多实体瘤(诸如乳癌)通过若干遗传畸形的积累而从开始发展到转移。[萨托(Sato)等人,癌症研究(Cancer Res.), 50:7184-7189[1990];简斯玛(Jongsma)等人,临床病理学杂志:分子病理学 (J Clin Pathol:Mol Path)55:305-309[2002])]。此类遗传畸形当积累时可能赋予增生性优势、遗传不稳定性和快速发展抗药性的附带能力、以及血管新生增强、蛋白质水解和转移。遗传畸形可能影响隐性的“肿瘤抑制基因”或显性作用的癌基因。缺失和导致杂合性丢失(LOH)的再组合通过揭露突变的肿瘤抑制等位基因而被认为在肿瘤进展中起主要作用。The development of cancer is often accompanied by changes in the number of whole chromosomes, i.e., complete chromosomal aneuploidy, and/or changes in the number of chromosome segments, i.e., partial aneuploidy, which are caused by a process called chromosomal instability (CIN) (Thoma et al., Swiss Med Weekly 2011:141:w13170). It is believed that many solid tumors (such as breast cancer) develop from initiation to metastasis through the accumulation of several genetic abnormalities. [Sato et al., Cancer Res., 50:7184-7189 [1990]; Jongsma et al., J Clin Pathol: Mol Path 55:305-309 [2002]). When accumulated, such genetic abnormalities may confer a proliferative advantage, genetic instability, and the accompanying ability to rapidly develop drug resistance, as well as enhanced angiogenesis, proteolysis, and metastasis. Genetic abnormalities may affect recessive "tumor suppressor genes" or dominant-acting oncogenes. Deletions and recombination leading to loss of heterozygosity (LOH) are thought to play a major role in tumor progression by unmasking mutant tumor suppressor alleles.
cfDNA已经被发现在诊断患有恶性病的患者的循环系统中,这些恶性病包括但不限于肺癌(帕萨卡(Pathak)等人,临床医学52:1833-1842[2006])、前列腺癌(薛华兹巴奇(Schwartzenbach)等人,临床癌症研究(Clin Cancer Res) 15:1032-8[2009])和乳癌(薛华兹巴奇等人,可在 breast-cancer-research.com/content/11/5/R71在线获得[2009])。鉴别与癌症(能够根据癌症病人的循环cfDNA确定)有关的基因组不稳定性是一种潜在的诊断和预后工具。在一个实施方案中,在此所述的方法被用来测定样品(例如包含核酸混合物的样品,这些核酸来源于怀疑患有或已知患有癌症的受试者,例如癌、肉瘤、淋巴瘤、白血病、生殖细胞瘤以及母细胞瘤)中一个或多个感兴趣的序列的CNV。在一个实施方案中,该样品是外周血液所衍生(经处理) 的血浆样品,该外周血液可能包含来源于正常细胞和癌细胞的cfDNA的混合物。在另一个实施方案中,需要确定是否存在CNV的生物样品是来源于其他生物学组织的细胞,若存在癌症,则该细胞包括癌细胞和非癌细胞的混合物,其他生物学组织包括但不限于生物学流体,诸如血清、汗水、眼泪、痰、尿、痰、耳流出物、淋巴、唾液、脑脊髓液、灌洗液、骨髓悬浮液、阴道流体、经子宫颈灌洗液、脑流体、腹水、乳汁,呼吸道、肠道以及生殖泌尿道的分泌液,以及白细胞清除术样品,或在组织活检、棉签或涂片中。在其他实施方案中,该生物样品是大便(粪便)样品。cfDNA has been found in the circulation of patients diagnosed with malignancies including, but not limited to, lung cancer (Pathak et al., Clin Med 52:1833-1842 [2006]), prostate cancer (Schwartzenbach et al., Clin Cancer Res 15:1032-8 [2009]), and breast cancer (Schwartzenbach et al., available online at breast-cancer-research.com/content/11/5/R71 [2009]). Identifying genomic instability associated with cancer (which can be determined from circulating cfDNA in cancer patients) is a potential diagnostic and prognostic tool. In one embodiment, the methods described herein are used to determine CNVs of one or more sequences of interest in a sample (e.g., a sample comprising a mixture of nucleic acids derived from a subject suspected of or known to have a cancer, such as a carcinoma, sarcoma, lymphoma, leukemia, germ cell tumor, and blastoma). In one embodiment, the sample is a plasma sample derived (processed) from peripheral blood, which may contain a mixture of cfDNA derived from normal cells and cancer cells. In another embodiment, the biological sample for determining whether CNV is present is a cell derived from other biological tissues. If cancer is present, the cell includes a mixture of cancer cells and non-cancerous cells. Other biological tissues include, but are not limited to, biological fluids such as serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavage fluid, bone marrow suspension, vaginal fluid, transcervical lavage fluid, brain fluid, ascites, milk, secretions of the respiratory tract, intestinal tract, and genitourinary tract, as well as leukocyte apheresis samples, or in tissue biopsy, cotton swabs, or smears. In other embodiments, the biological sample is a stool (feces) sample.
在此所述的方法不限于cfDNA的分析。应了解,可以对细胞DNA样品进行类似分析。The methods described herein are not limited to the analysis of cfDNA. It will be appreciated that similar analyses can be performed on cellular DNA samples.
在不同的实施方案中,感兴趣的序列包括已知或怀疑在癌症发展和/或进展中起作用的核酸序列。感兴趣的序列的实例包括在如下文所述的癌细胞中扩增或缺失的核酸序列,例如完全染色体和/或染色体区段。In various embodiments, the sequence of interest includes a nucleic acid sequence that is known or suspected to play a role in cancer development and/or progression. Examples of sequences of interest include nucleic acid sequences that are amplified or deleted in cancer cells as described below, such as complete chromosomes and/or chromosome segments.
总CNV数以及癌症风险。Total CNV number and cancer risk.
常见癌症SNPs和以此类推的常见癌症CNVs各自使疾病风险只产生微小的增大。然而,总体而言,他们可能导致癌症风险实质上升高。关于这一点,应指出已经报告的大DNA片段的种系获得和丢失作为个体易患成神经细胞瘤、前列腺癌和结肠直肠癌、乳癌和BRCA1相关卵巢癌的因素(参见例如克莱匹奇(Krepischi)等人,乳癌研究(Breast Cancer Res.),14:R24[2012];迪斯金 (Diskin)等人,自然(Nature)2009,459:987-991;刘(Liu)等人,癌症研究 (Cancer Res)2009,69:2176-2179;鲁斯托(Lucito)等人,癌症生物学及治疗(Cancer Biol Ther)2007,6:1592-1599;斯恩(Thean)等人,基因染色体癌症 (GenesChromosomes Cancer)2010,49:99-106;范卡塔查兰(Venkatachalam) 等人,国际癌症杂志(Int J Cancer)2011,129:1635-1642;和吉原(Yoshihara) 等人,基因染色体癌症(GenesChromosomes Cancer)2011,50:167-177)。应指出,在健康种群中时常发现的CNVs(常见CNVs)被认为在癌症病因学中起作用(参见例如丝林(Shlien)和麦金(Malkin)(2009)基因组医学(Genome Medicine),1(6):62)。在一项研究测试中,测试如下假设:常见CNVs与恶性病(丝林(Shlien)等人,美国国家科学院院刊(Proc Natl Acad Sci USA)2008, 105:11264-11269)有关,这是一种每个已知CNV的映射,其基因座与真实癌症相关基因的基因座一致(如哈金(Higgins)等人,核酸研究(Nucleic Acids Res)2007,35:D721-726中所分类)。这些CNV称为“癌症CNVs”。在最初分析(丝林(Shlien)等人,美国国家科学院院刊(ProcNatl Acad Sci USA)2008, 105:11264-11269)中,使用阿非美500K(Affymetrix 500K)阵列集(其平均探针间距离为5.8kb)评估770个健康基因组。由于总体上认为CNVs在基因区中被排除(雷唐(Redon)等人(2006),自然(Nature)2006,444:444-454),因此令人惊讶地发现,在一个大参考种群的多人中,49个癌症基因被CNV直接涵盖或重叠。在前十个基因中,可以在四个或更多的人中发现癌症CNVs。Common cancer SNPs, and by extension, common cancer CNVs, each confer only a small increase in disease risk. However, taken together, they may lead to a substantial increase in cancer risk. In this regard, it should be noted that germline gains and losses of large DNA fragments have been reported as factors that predispose individuals to neuroblastoma, prostate and colorectal cancer, breast cancer, and BRCA1-associated ovarian cancer (see, e.g., Krepischi et al., Breast Cancer Res., 14:R24 [2012]; Diskin et al., Nature 2009, 459:987-991; Liu et al., Cancer Res 2009, 69:2176-2179; Lucito et al., Cancer Biol Ther 2007, 6:1592-1599; Thean et al., Genes Chromosomes Cancer 2010, 49:99-106; Venkatachalam et al., Int J Cancer 2011, 49:99-106). Cancer) 2011, 129: 1635-1642; and Yoshihara et al., Genes Chromosomes Cancer 2011, 50: 167-177). It should be noted that CNVs frequently found in healthy populations (common CNVs) are thought to play a role in cancer etiology (see, for example, Shlien and Malkin (2009) Genome Medicine, 1(6): 62). In a research test, the following hypothesis was tested: common CNVs are associated with malignant disease (Shlien et al., Proc Natl Acad Sci USA 2008, 105: 11264-11269), a mapping of each known CNV whose locus is consistent with that of a true cancer-associated gene (as classified by Higgins et al., Nucleic Acids Res 2007, 35: D721-726). These CNVs are referred to as "cancer CNVs". In an initial analysis (Shlien et al., Proc Natl Acad Sci USA 2008, 105: 11264-11269), 770 healthy genomes were evaluated using the Affymetrix 500K array set (with an average probe-to-probe distance of 5.8 kb). Since CNVs are generally considered to be excluded in gene regions (Redon et al. (2006), Nature 2006, 444:444-454), it was surprising to find that 49 cancer genes were directly covered or overlapped by CNVs in multiple people from a large reference population. Among the top ten genes, cancer CNVs could be found in four or more people.
因此认为,可使用CNV频率作为癌症风险的度量(参见例如美国专利公开号:2010/0261183A1)。CNV频率可以简单地通过有机体的组成性基因组来测定或它可以代表来源于一种或多种肿瘤(赘生性细胞)(若这些存在)的部分。It is therefore believed that CNV frequency can be used as a measure of cancer risk (see, for example, US Patent Publication No. 2010/0261183A1). CNV frequency can be determined simply from the constitutive genome of an organism or it can represent the fraction derived from one or more tumors (neoplastic cells), if these exist.
在某些实施方案中,使用在此针对拷贝数变异所述的方法测定测试样品 (例如包含组成性(种系)核酸的样品)中或核酸混合物(例如种系核酸和来源于赘生性细胞的核酸)中的CNVs数目。识别出测试样品中的CNVs数目提高(例如与参考值相比)表示受试者有癌症风险或有癌症易患性。应理解,参考值可以随指定种群而变。还应理解,CNV频率增幅的绝对值将取决于用于测定CNV频率及其他参数的方法的分辨率而变。典型地,测定出CNV频率增加为参考值的至少约1.2倍表示癌症风险(参见例如美国专利公开号: 2010/0261183A1),例如例如,CNV频率增加为参考值的至少1.5倍或约1.5 倍或更大(诸如参考值的2到4倍)是癌症风险提高的指标(例如,与正常健康参考种群相比)。In certain embodiments, the number of CNVs in a test sample (e.g., a sample comprising constitutive (germline) nucleic acids) or in a nucleic acid mixture (e.g., germline nucleic acids and nucleic acids derived from neoplastic cells) is determined using the methods described herein for copy number variation. Identifying an increase in the number of CNVs in a test sample (e.g., compared to a reference value) indicates that the subject is at risk for cancer or has a susceptibility to cancer. It should be understood that the reference value can vary with a given population. It should also be understood that the absolute value of the CNV frequency increase will vary depending on the resolution of the method used to determine the CNV frequency and other parameters. Typically, determining an increase in CNV frequency of at least about 1.2 times the reference value indicates a risk of cancer (see, e.g., U.S. Patent Publication No.: 2010/0261183A1), for example, an increase in CNV frequency of at least 1.5 times or about 1.5 times or greater (such as 2 to 4 times the reference value) is an indicator of an increased risk of cancer (e.g., compared to a normal, healthy reference population).
还认为测定出哺乳动物基因组的结构变异(与参考值相比)表示癌症风险。在此上下文中,在一个实施方案中,术语“结构变异”可用哺乳动物的CNV 频率乘以哺乳动物的平均CNV尺寸(bp)加以定义。因此,高结构变异分数将因为CNV频率增大和/或因发生大基因组核酸缺失或复制的。因此,在某些实施方案中,使用在此所述的方法测定测试样品(例如,包含组成性(种系) 核酸的样品)中的CNVs数目,以测定拷贝数变异尺寸和数目。在某些实施方案中,大于约1兆碱基、或大于约1.1兆碱基、或大于约1.2兆碱基、或大于约1.3兆碱基、或大于约1.4兆碱基、或大于约1.5兆碱基、或大于约1.8兆碱基、或大于约2兆碱基DNA的基因组DNA内的结构变异总分表示癌症风险。It is also believed that determining the structural variation of the mammalian genome (compared to a reference value) represents a cancer risk. In this context, in one embodiment, the term "structural variation" can be defined by multiplying the CNV frequency of the mammal by the average CNV size (bp) of the mammal. Therefore, a high structural variation score will be because the CNV frequency increases and/or because of the occurrence of large genomic nucleic acid deletions or duplications. Therefore, in certain embodiments, the number of CNVs in a test sample (for example, a sample comprising a constitutive (germline) nucleic acid) is measured using the methods described herein to measure copy number variation size and number. In certain embodiments, the total score for structural variation in the genomic DNA greater than about 1 megabase, or greater than about 1.1 megabases, or greater than about 1.2 megabases, or greater than about 1.3 megabases, or greater than about 1.4 megabases, or greater than about 1.5 megabases, or greater than about 1.8 megabases, or greater than about 2 megabases of DNA represents a cancer risk.
这些方法被认为可提供任何癌症风险的度量,这些癌症包括但不限于急性和慢性白血病、淋巴瘤、间质或上皮组织的很多实体瘤、脑癌、乳癌、肝癌、胃癌、结肠癌、B细胞淋巴瘤、肺癌、支气管癌、结肠直肠癌、前列腺癌、乳癌、胰腺癌、胃癌、卵巢癌、膀胱癌、脑癌或中枢神经系统癌症、外周神经系统癌症、食道癌、宫颈癌、黑色素瘤、子宫癌或子宫内膜癌、口腔癌或咽癌、肝癌、肾癌、胆道癌、小肠或阑尾癌、唾液腺癌、甲状腺癌、肾上腺癌、骨肉瘤、软骨肉瘤、脂肉瘤、睾丸癌、以及恶性纤维组织细胞瘤、以及其他癌症。These methods are believed to provide a measure of risk for any cancer, including but not limited to acute and chronic leukemias, lymphomas, many solid tumors of mesenchymal or epithelial tissue, brain cancer, breast cancer, liver cancer, stomach cancer, colon cancer, B-cell lymphomas, lung cancer, bronchial cancer, colorectal cancer, prostate cancer, breast cancer, pancreatic cancer, stomach cancer, ovarian cancer, bladder cancer, brain cancer or central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, melanoma, uterine cancer or endometrial cancer, oral cancer or pharyngeal cancer, liver cancer, kidney cancer, biliary tract cancer, small intestine or appendix cancer, salivary gland cancer, thyroid cancer, adrenal cancer, osteosarcoma, chondrosarcoma, liposarcoma, testicular cancer, and malignant fibrous histiocytoma, among other cancers.
完全染色体非整倍性。Complete chromosomal aneuploidy.
如上文所指出,在癌症中存在高频率的非整倍性。在检查体细胞拷贝数变异(SCNAs)在癌症中的流行率的某些研究中,已经发现非整倍性的全臂 SCNAs或全染色体SCNAs对典型癌细胞的四分之一基因组有影响(参见例如伯柔金(Beroukhim)等人,自然(Nature)463:899-905[2010])。在若干癌症类型中反复观察到全染色体变异。例如例如,在10%到20%的急性髓细胞白血病(acute myeloid leukaemia,AML)病例中,以及某些实体瘤(包括艾文氏肉瘤(Ewing’s Sarcoma)和纤维样瘤)中见到染色体8的获得(参见例如拜耳纳德(Barnard)等人,白血病(Leukemia)10:5-12[1996];茅里茨(Maurici) 等人,癌症遗传学和细胞遗传学(Cancer Genet.Cytogenet.)100:106-110[1998];奇(Qi)等人,癌症遗传学和细胞遗传学(Cancer Genet.Cytogenet.)92:147-149 [1996];拜耳纳德D.R.(Barnard,D.R.)等人,血液(Blood)100:427-434[2002];等等。人类癌症中染色体获得和丢失的示意性但非限制性目录显示于表1中。As noted above, there is a high frequency of aneuploidy in cancer. In some studies examining the prevalence of somatic copy number variation (SCNAs) in cancer, it has been found that full-arm SCNAs or full-chromosome SCNAs of aneuploidy have an impact on a quarter of the genomes of typical cancer cells (see, for example, Beroukhim et al., Nature 463:899-905 [2010]). Full-chromosome variation has been repeatedly observed in several cancer types. For example, gain of chromosome 8 is seen in 10% to 20% of cases of acute myeloid leukemia (AML), as well as in certain solid tumors, including Ewing's Sarcoma and fibroid tumors (see, e.g., Barnard et al., Leukemia 10:5-12 [1996]; Maurici et al., Cancer Genet. Cytogenet. 100:106-110 [1998]; Qi et al., Cancer Genet. Cytogenet. 92:147-149). [1996]; Barnard, D.R. et al., Blood 100:427-434 [2002]; etc. An illustrative but non-limiting list of chromosomal gains and losses in human cancers is shown in Table 1.
表1:人类癌症中示意性的特定重现染色体的获得和丢失(参见例如戈登(Gordon)Table 1: Schematic representation of specific recurrent chromosomal gains and losses in human cancers (see, e.g., Gordon et al. 等人(2012),自然综述遗传学(Nature Rev.Genetics),13:189-203)。et al. (2012), Nature Rev. Genetics, 13: 189-203).
在不同的实施方案中,在此所述的方法可用于检测和/或量化与癌症总体上有关和/或与具体癌症有关的全染色体非整倍性。因此,例如,在某些实施方案中,考虑了检测和/或量化其特征在于以表1中所示的获得或丢失的全染色体非整倍性。In various embodiments, the methods described herein can be used to detect and/or quantify whole chromosome aneuploidies associated with cancer in general and/or associated with specific cancers. Thus, for example, in certain embodiments, detection and/or quantification of whole chromosome aneuploidies characterized by the gains or losses shown in Table 1 are contemplated.
臂水平染色体区段拷贝数变异。Arm-level chromosome segment copy number variation.
多项研究已报告臂水平拷贝数变异跨大量癌症标本的模式(林(Lin)等人,癌症研究(Cancer Res)68,664-673(2008);乔治(George)等人,PLoS ONE 2,e255(2007);戴米彻里斯(Demichelis)等人,基因染色体癌症(Genes Chromosomes Cancer)48:366-380(2009);伯柔金(Beroukhim)等人,自然 (Nature.)463(7283):899-905[2010])。另外已经观察观察到,臂水平拷贝数变异的频率随着染色体臂长度而降低。根据此倾向调整,大部分染色体臂表现优先获得或丢失的有力证据,但跨多个癌症谱系,两者均罕见(参见例如伯柔金 (Beroukhim)等人,自然(Nature)463(7283):899-905[2010])。Several studies have reported patterns of arm-level copy number variation across a large number of cancer specimens (Lin et al., Cancer Res 68, 664-673 (2008); George et al., PLoS ONE 2, e255 (2007); Demichelis et al., Genes Chromosomes Cancer 48: 366-380 (2009); Beroukhim et al., Nature. 463(7283): 899-905 [2010]). It has also been observed that the frequency of arm-level copy number variation decreases with chromosome arm length. Adjusted for this tendency, most chromosome arms show strong evidence for preferential gain or loss, but across multiple cancer lineages, both are rare (see, e.g., Beroukhim et al., Nature 463(7283):899-905 [2010]).
因此,在一个实施方案中,在此描述的方法用来测定样品中的臂水平 CNVs(包含一个染色体臂或基本上一个染色体臂的CNVs)。在包含组成性(种系)核酸的测试样品中的CNVs中,CNVs能够被测定,并且在些组成性核酸中,臂水平CNVs能够被识别。在某些实施方案中,在包含核酸混合物(例如,来源于正常细胞的核酸和来源于赘生性细胞的核酸)的样品中识别臂水平 CNVs(若存在)。在某些实施方案中,样品来源于怀疑或已知患有癌症(例如,癌、肉瘤、淋巴瘤、白血病、生殖细胞瘤、母细胞瘤、以及类似癌症)的受试者。在一个实施方案中,样品是外周血液所衍生(经处理)的血浆样品,该外周血液可以包含来源于正常细胞和癌细胞的cfDNA的混合物。在另一个实施方案中,用于确定存在的CNV的生物样品是否来源于细胞,若存在癌症,则这些细胞包括来自其他生物学组织的癌细胞和非癌细胞的混合物,该其他生物学组织包括但不限于生物学流体,例如血清,汗水,眼泪,痰,尿,痰,耳流出物,淋巴,唾液,脑脊髓液,灌洗液(ravages),骨髓悬浮液,阴道流体,经子宫颈灌洗液,脑流体,腹水,乳汁,呼吸道、肠道和生殖泌尿道分泌液,以及白细胞分离术样品,或在组织活检、棉签或涂片中。在其他实施方案中,生物样品是粪便(粪便的)粪便(粪便的)样品。Therefore, in one embodiment, the method described herein is used to determine the arm level CNVs (CNVs comprising a chromosome arm or substantially a chromosome arm) in a sample. In the CNVs in a test sample comprising constitutive (germline) nucleic acids, CNVs can be determined, and in these constitutive nucleic acids, arm level CNVs can be identified. In certain embodiments, arm level CNVs (if present) are identified in a sample comprising a mixture of nucleic acids (e.g., nucleic acids derived from normal cells and nucleic acids derived from neoplastic cells). In certain embodiments, the sample is derived from a subject suspected or known to have cancer (e.g., cancer, sarcoma, lymphoma, leukemia, germ cell tumor, blastoma, and similar cancers). In one embodiment, the sample is a plasma sample derived (processed) from peripheral blood, which may comprise a mixture of cfDNA derived from normal cells and cancer cells. In another embodiment, the biological sample used to determine whether the presence of a CNV is derived from cells that, if cancer is present, include a mixture of cancerous and non-cancerous cells from other biological tissues, including but not limited to biological fluids such as serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavages, bone marrow suspensions, vaginal fluid, transcervical lavages, brain fluid, ascites, breast milk, respiratory, intestinal and genitourinary tract secretions, and leukapheresis samples, or in a tissue biopsy, swab or smear. In other embodiments, the biological sample is a fecal (fecal) stool (fecal) sample.
在不同的实施方案中,经识别表示癌症存在或癌症风险增大的CNVs包括但不限于表2中所列举的臂水平CNVs。如在表2中所说明,包括实质性臂水平获得的某些CNVs表示存在癌症或某些癌症风险增大。因此,例如,1q 获得表示急性成淋巴细胞白血病(ALL)、乳癌、GIST、HCC、肺NSC、髓母细胞瘤、黑素瘤、MPD、卵巢癌和/或前列腺癌存在或风险增大。3q获得表示食道鳞状细胞癌、肺SC和/或MPD存在或风险增大。7q获得表示结肠直肠癌、神经胶质瘤、HCC、肺NSC、髓母细胞瘤、黑素瘤、前列腺癌和/或肾癌存在或风险增大。7p获得表示乳癌、结肠直肠癌、食道腺癌、神经胶质瘤、HCC、肺NSC、髓母细胞瘤、黑素瘤、和/或肾癌存在或风险增大。20q获得表示乳癌、结肠直肠癌、去分化性脂肪肉瘤、食道腺癌、食道鳞癌、神经胶质瘤癌、HCC、肺NSC、黑素瘤、卵巢癌、和/或肾癌等等存在或风险增大。In various embodiments, the CNVs identified as indicative of the presence of a cancer or an increased risk of a cancer include, but are not limited to, the arm-level CNVs listed in Table 2. As illustrated in Table 2, certain CNVs, including substantial arm-level gains, indicate the presence of a cancer or an increased risk of certain cancers. Thus, for example, a gain in 1q indicates the presence of or an increased risk of acute lymphoblastic leukemia (ALL), breast cancer, GIST, HCC, lung NSC, medulloblastoma, melanoma, MPD, ovarian cancer, and/or prostate cancer. A gain in 3q indicates the presence of or an increased risk of esophageal squamous cell carcinoma, lung SC, and/or MPD. A gain in 7q indicates the presence of or an increased risk of colorectal cancer, glioma, HCC, lung NSC, medulloblastoma, melanoma, prostate cancer, and/or renal cancer. A gain in 7p indicates the presence of or an increased risk of breast cancer, colorectal cancer, esophageal adenocarcinoma, glioma, HCC, lung NSC, medulloblastoma, melanoma, and/or renal cancer. A gain of 20q indicates the presence or increased risk of breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, glioma, HCC, lung NSC, melanoma, ovarian cancer, and/or renal cancer, among others.
类似地,如表2中所说明,包括实质性臂水平丢失的某些CNVs表示某些癌症存在和/或风险增大。因此,例如,1p丢失表示胃肠基质肿瘤存在或风险增大。4q丢失表示结肠直肠癌、食道腺癌、肺sc、黑素瘤、卵巢癌和/或肾癌存在或风险增大。17p丢失表示乳癌、结肠直肠癌、食道腺癌、HCC、肺NSC、肺SC、和/或卵巢癌等等存在或风险增大。Similarly, as illustrated in Table 2, certain CNVs, including losses at the parenchymal arm level, indicate the presence and/or increased risk of certain cancers. Thus, for example, a loss of 1p indicates the presence or increased risk of gastrointestinal stromal tumors. A loss of 4q indicates the presence or increased risk of colorectal cancer, esophageal adenocarcinoma, lung sc, melanoma, ovarian cancer, and/or renal cancer. A loss of 17p indicates the presence or increased risk of breast cancer, colorectal cancer, esophageal adenocarcinoma, HCC, lung NSC, lung sc, and/or ovarian cancer, among others.
表2:16种癌症亚型(乳癌、结肠直肠癌、去分化性脂肪肉瘤、食道腺癌、食道鳞癌、Table 2: 16 cancer subtypes (breast cancer, colorectal cancer, dedifferentiated liposarcoma, esophageal adenocarcinoma, esophageal squamous cell carcinoma, GIST(胃肠基质肿瘤)、神经胶质瘤、HCC(肝细胞癌)、肺NSC、肺SC、髓母细胞瘤、黑素瘤、MPDGIST (gastrointestinal stromal tumor), glioma, HCC (hepatocellular carcinoma), lung NSC, lung SC, medulloblastoma, melanoma, MPD (骨髓增生性障碍)、卵巢癌、前列腺癌、急性成淋巴细胞白血病(ALL)和肾癌)的每一个中的(myeloproliferative disorders), ovarian cancer, prostate cancer, acute lymphoblastic leukemia (ALL), and renal cancer) 显著臂水平染色体区段拷贝数变异(参见例如伯柔金(Beroukhim)等人,自然(Nature)Significant arm-level chromosome segment copy number variation (see, e.g., Beroukhim et al., Nature (2010)463(7283):899-905)。(2010)463(7283):899-905).
臂水平拷贝数变异之间关系的实例旨在为示意性的而非限制性的。其他臂水平拷贝数变异和其癌症关系已为本领域那些技术人员所知。The examples of relationships between arm-level copy number variations are intended to be illustrative and non-limiting. Other arm-level copy number variations and their relationships to cancer are known to those skilled in the art.
更小(例如焦点)拷贝数变异。Smaller (e.g., focal) copy number variations.
如上文所指出,在某些实施方案中,在此描述的方法可用于测定染色体扩增的存在或不存在。在一些实施方案中,染色体扩增是一个或多个整个染色体的获得。在其他实施方案中,染色体扩增是染色体中一个或多个区段的获得。仍在其他其他实施方案中,染色体扩增是两个或更多个染色体中两个或更多个区段的获得。在不同的实施方案中,染色体扩增可以涉及一个或多个癌基因的获得。As noted above, in certain embodiments, the methods described herein can be used to determine the presence or absence of chromosomal amplification. In some embodiments, chromosomal amplification is the acquisition of one or more entire chromosomes. In other embodiments, chromosomal amplification is the acquisition of one or more segments in a chromosome. In still other embodiments, chromosomal amplification is the acquisition of two or more segments in two or more chromosomes. In various embodiments, chromosomal amplification can involve the acquisition of one or more oncogenes.
与人类实体瘤相关联的显性开放基因典型地通过过表达或改变的表达来发挥它们的作用。基因扩增是导致基因表达上调的一种常见机制。来自细胞遗传学研究的证据表明,在超过50%的人乳癌中发生了显著扩增。最值得注意的是,位于染色体17(17(17q21-q22))上的原癌基因人表皮生长因子受体2(HER2) 的扩增造成了在细胞表面上的HER2受体的过表达,从而在导致乳癌和其他恶性肿瘤中的过量的并且调节异常的信号(Park(朴)等人,Clinical Breast Cancer (临床乳癌),8:392-401[2008])。在其他人类恶性肿瘤中已经发现了多种癌基因被扩增。人类肿瘤中细胞癌基因扩增的实例包括以下各项的扩增:前髓细胞性白血病细胞系HL60、以及小细胞肺癌中的c-myc,原发成神经细胞瘤(阶段III和IV)、成神经细胞瘤细胞系、视网膜母细胞瘤细胞系和原发肿瘤、以及小细胞肺癌细胞系和肿瘤中的N-myc,小细胞肺癌细胞系和肿瘤中的L-myc,急性髓细胞性白血病中和结肠癌细胞系中的c-myb,表皮样癌细胞、以及原发神经胶质瘤中的c-erbb,肺、结肠、膀胱、以及直肠的原发癌中的c-K-ras-2,乳腺癌细胞系中的N-ras(Varmus(瓦穆斯)H.,Ann Rev Genetics(遗传学年鉴),18:553-612(1984),[引用在Watson(沃森)等人,Molecular Biology ofthe Gene(基因的分子生物学)(第4版;Benjamin/Cummings Publishing Co. 公司1987)]。The dominant open genes associated with human solid tumors typically play their role by overexpression or altered expression. Gene amplification is a common mechanism leading to upregulated gene expression. Evidence from cytogenetic studies shows that significant amplification has occurred in more than 50% of human breast cancers. Most notably, the amplification of the proto-oncogene human epidermal growth factor receptor 2 (HER2) located on chromosome 17 (17 (17q21-q22)) causes overexpression of the HER2 receptor on the cell surface, thereby causing excessive and dysregulated signals in breast cancer and other malignancies (Park et al., Clinical Breast Cancer, 8: 392-401 [2008]). Multiple oncogenes have been found to be amplified in other human malignancies. Examples of cellular oncogene amplifications in human tumors include amplification of c-myc in the promyelocytic leukemia cell line HL60 and small cell lung cancer, N-myc in primary neuroblastomas (stages III and IV), neuroblastoma cell lines, retinoblastoma cell lines and primary tumors, and small cell lung cancer cell lines and tumors, L-myc in small cell lung cancer cell lines and tumors, c-myb in acute myeloid leukemia and colon cancer cell lines, c-erbB in epidermoid carcinoma cells and primary gliomas, c-K-ras-2 in primary carcinomas of the lung, colon, bladder, and rectum, N-ras in breast cancer cell lines (Varmus H., Ann Rev Genetics, 18:553-612 (1984), [cited in Watson et al., Molecular Biology of the Gene (4th ed.; Benjamin/Cummings Publishing Co. Company 1987)].
癌基因复制是很多类型的癌症的共同病因,P70-S6激酶1扩增和乳癌就是这种情况。在此类情况下,遗传复制发生于体细胞中并且只影响癌细胞自身 (而不是整个有机体)的基因组,对于任何后来的子代的影响则小得多。在人类癌症中扩增的癌基因的其他实例包括乳癌中的MYC、ERBB2(EFGR)、CCND1 (周期素D1)、FGFR1和FGFR2;宫颈癌中的MYC和ERBB2;宫颈癌中的 HRAS、KRAS和MYB;食道癌中的MYC、CCND1和MDM2;胃癌中的CCNE、 KRAS和MET;胶质母细胞瘤中的ERBB1和CDK4;头颈癌中的CCND1、 ERBB1和MYC;肝细胞癌中的CCND1;成神经细胞瘤中的MYCB;MYC:卵巢癌中的ERBB2和AKT2;肉瘤中的MDM2和CDK4;小细胞肺癌中的 MYC。在一个实施方案中,本发明方法可用于确定与癌症有关的癌基因的扩增存在或不存在。在某些实施方案中,所扩增的癌基因与乳癌、宫颈癌、结肠直肠癌、食道癌、胃癌、胶质母细胞瘤、头颈癌、肝细胞癌、成神经细胞瘤、卵巢癌、肉瘤和小细胞肺癌有关。Oncogene duplication is a common cause of many types of cancer, as is the case with p70-S6 kinase 1 amplification and breast cancer. In these cases, the genetic duplication occurs in somatic cells and affects only the genome of the cancer cell itself (not the entire organism), with much less impact on any subsequent progeny. Other examples of oncogenes amplified in human cancers include MYC, ERBB2 (EFGR), CCND1 (Cyclin D1), FGFR1, and FGFR2 in breast cancer; MYC and ERBB2 in cervical cancer; HRAS, KRAS, and MYB in cervical cancer; MYC, CCND1, and MDM2 in esophageal cancer; CCNE, KRAS, and MET in gastric cancer; ERBB1 and CDK4 in glioblastoma; CCND1, ERBB1, and MYC in head and neck cancer; CCND1 in hepatocellular carcinoma; MYCB in neuroblastoma; MYC: ERBB2 and AKT2 in ovarian cancer; MDM2 and CDK4 in sarcomas; and MYC in small cell lung cancer. In one embodiment, the methods of the present invention can be used to determine the presence or absence of amplification of an oncogene associated with cancer. In certain embodiments, the amplified oncogene is associated with breast cancer, cervical cancer, colorectal cancer, esophageal cancer, gastric cancer, glioblastoma, head and neck cancer, hepatocellular carcinoma, neuroblastoma, ovarian cancer, sarcoma, and small cell lung cancer.
在一个实施方案中,本方法可以被用于确定存在或不存在一种染色体缺失。在一些实施方案中,这种染色体缺失是丢失一个或多个完整染色体。在其他实施方案中,这种染色体缺失是丢失染色体的一个或多个区段。在另外的其他实施方案,这种染色体缺失是丢失两个或更多个染色体的两个或更多个区段。这种染色体缺失可以涉及丢失一个或多个肿瘤抑制基因。In one embodiment, the present method can be used to determine the presence or absence of a chromosomal deletion. In some embodiments, the chromosomal deletion is the loss of one or more complete chromosomes. In other embodiments, the chromosomal deletion is the loss of one or more segments of a chromosome. In yet other embodiments, the chromosomal deletion is the loss of two or more segments of two or more chromosomes. The chromosomal deletion may involve the loss of one or more tumor suppressor genes.
涉及肿瘤抑制基因的染色体缺失被认为在实体瘤的发展和进展中起一种重要作用。视网膜母细胞瘤肿瘤抑制基因(Rb-1)(位于染色体13q14)是最广泛地特征化的肿瘤抑制基因。Rb-1基因产物(一种105kDa的核磷蛋白)显然在细胞周期调控中起到重要作用(Howe(豪依)等人,Proc Natl Acad Sci(美国国家科学院院刊)(美国),87:5883-5887[1990])。由通过一个点突变亦或染色体缺失的这两个基因的等位基因的失活引起Rb蛋白的改变的或丢失的表达。已经发现Rb-i基因改变不仅存在于视网膜母细胞瘤中,而且还存在于其他恶性肿瘤中,如骨肉瘤、小细胞肺癌(Rygaard(瑞格德)等人,Cancer Res(癌症研究),50:5312-5317[1990)])和乳癌。限制性片段长度多态性(RFLP)研究已经表明,此类肿瘤类型经常在13q丢失了杂合性,提示由于总的染色体缺失, Rb-1基因的等位基因之一已经被丢失(Bowcock(伯考克)等人,Am J Hum Genet(美国人类遗传学杂志),46:12[1990])。包括涉及染色体6和其他同伴染色体的复制、缺失和不平衡易位的染色体1异常表明染色体1的区域,特别是q21-1q32和1p11-13,可能容纳与骨髓组织增殖性赘生物的慢性和高级阶段发病上有关的癌基因或肿瘤抑制基因(Caramazza(卡拉马萨)等人,Eur J Hematol(欧洲血液学杂志),84:191-200[2010])。骨髓组织增殖性赘生物还与染色体5的缺失相关联。染色体5的完整丢失或中间缺失是骨髓增生异常综合征(MDS)中最常见的核型异常。分离的del(5q)/5q-MDS患者具有比患有额外核型缺陷的那些患者更有利的预后,他们倾向于发展骨髓组织增殖性赘生物 (MPN)和急性髓细胞性白血病。不平衡的染色体5缺失的频率已经引出一个想法,即:5q容纳一个或多个肿瘤抑制基因,这些基因在造血干细胞/造血祖细胞(HSCsHPC)的生长控制中起到根本作用。通常缺失的区域(CDR)的细胞遗传学映射集中在5q31和5q32识别的候选肿瘤抑制基因,包括核糖体亚基RPS14、转录因子Egr1/Krox20和细胞骨架重塑蛋白、α-联蛋白(Eisenmann (艾斯曼),Oncogene(癌基因),28:3429-3441[2009])。新鲜肿瘤和肿瘤细胞系的细胞遗传学和等位基因型研究已经证明,来自染色体3p上的若干明确区域(包括3p25、3p21–22、3p21.3、3p12–13和3p14)的等位基因的丢失是在肺癌、乳癌、肾癌、头颈癌、卵巢癌、子宫颈癌、结肠癌、胰腺癌、食道癌、膀胱癌和其他器官的癌症的广谱的主要上皮癌中所涉及的最早和最常见的基因组异常。若干肿瘤抑制基因已经被映射到染色体3p区域,并且认为中间缺失或启动子高度甲基化先于在癌的发展中的3p或完整染色体3的丢失 ((Angeloni(安格罗尼)D.,Briefings Functional Genomics(功能基因组学简报),6:19-39[2007])。Chromosomal deletions involving tumor suppressor genes are believed to play an important role in the development and progression of solid tumors. The retinoblastoma tumor suppressor gene (Rb-1) (located on chromosome 13q14) is the most widely characterized tumor suppressor gene. The Rb-1 gene product (a 105 kDa nuclear phosphoprotein) apparently plays an important role in cell cycle regulation (Howe et al., Proc Natl Acad Sci (USA), 87:5883-5887 [1990]). Inactivation of either allele of these two genes, either by a point mutation or chromosomal deletion, results in altered or lost expression of the Rb protein. Rb-1 gene alterations have been found not only in retinoblastoma, but also in other malignancies such as osteosarcoma, small cell lung cancer (Rygaard et al., Cancer Res, 50:5312-5317 [1990]) and breast cancer. Restriction fragment length polymorphism (RFLP) studies have shown that such tumor types often lose heterozygosity at 13q, suggesting that one of the alleles of the Rb-1 gene has been lost due to total chromosomal loss (Bowcock et al., Am J Hum Genet (American Journal of Human Genetics), 46:12 [1990]). Chromosome 1 abnormalities, including duplications, deletions, and unbalanced translocations involving chromosome 6 and other companion chromosomes, suggest that regions of chromosome 1, particularly q21-1q32 and 1p11-13, may house oncogenes or tumor suppressor genes involved in the pathogenesis of chronic and advanced stages of myeloproliferative neoplasms (Caramazza et al., Eur J Hematol (European Journal of Hematology), 84:191-200 [2010]). Myeloproliferative neoplasms are also associated with deletions of chromosome 5. Complete or interstitial deletions of chromosome 5 are the most common karyotypic abnormalities in myelodysplastic syndromes (MDS). Patients with isolated del(5q)/5q-MDS have a more favorable prognosis than those with additional karyotypic defects, and they are prone to developing myeloproliferative neoplasms (MPNs) and acute myeloid leukemia. The frequency of unbalanced chromosome 5 deletions has led to the idea that 5q houses one or more tumor suppressor genes that play a fundamental role in the growth control of hematopoietic stem cells/hematopoietic progenitor cells (HSCs/HPCs). Cytogenetic mapping of commonly deleted regions (CDRs) has focused on candidate tumor suppressor genes identified in 5q31 and 5q32, including the ribosomal subunit RPS14, the transcription factor Egr1/Krox20, and the cytoskeletal remodeling protein, α-catenin (Eisenmann, Oncogene, 28:3429-3441 [2009]). Cytogenetic and allelic studies of fresh tumors and tumor cell lines have demonstrated that loss of alleles from several well-defined regions on chromosome 3p (including 3p25, 3p21–22, 3p21.3, 3p12–13, and 3p14) is the earliest and most common genomic abnormality involved in a broad spectrum of major epithelial cancers, including lung, breast, kidney, head and neck, ovarian, cervical, colon, pancreatic, esophageal, bladder, and other organs. Several tumor suppressor genes have been mapped to the chromosome 3p region, and it is believed that interstitial deletions or promoter hypermethylation precede the loss of 3p or the entire chromosome 3 in the development of cancer (Angeloni D., Briefings Functional Genomics, 6:19-39 [2007]).
患有唐氏综合征(DS)的新生儿和儿童通常呈现先天的暂时性白血病并且具有急性髓细胞性白血病和急性成淋巴细胞白血病的增加的风险。染色体21 (容纳约300个基因)可以牵涉多种结构畸变,例如在白血病、淋巴瘤、和实体瘤中的易位、缺失、以及扩增。此外,已经识别位于染色体21上的基因在肿瘤发生中所起的重要作用。染色体21的实体数目的连同结构的畸变是与白血病相关联的,并且特定基因包括RUNX1、TMPRSS2、和TFF,它们位于21q,在肿瘤发生中起作用(Fonatsch(冯纳茨克)C,Gene Chromosomes Cancer (基因、染色体和癌),49:497-508[2010])。The newborn infant and the child who suffers from Down syndrome (DS) usually present congenital transient leukemia and have the risk of increase of acute myeloid leukemia and acute lymphoblastic leukemia.Chromosome 21 (accommodating about 300 genes) can involve multiple structural aberrations, such as translocation, disappearance and amplification in leukemia, lymphoma and solid tumor.In addition, the important role played by the gene on chromosome 21 in tumorigenesis has been identified.The aberration of the entity number of chromosome 21 together with structure is associated with leukemia, and specific genes include RUNX1, TMPRSS2 and TFF, which are located at 21q, and work in tumorigenesis (Fonatsch (Fonatsch) C, Gene Chromosomes Cancer (gene, chromosome and cancer), 49:497-508[2010]).
考虑到上述内容,在不同的实施方案中,在此描述的方法可用于确定区段CNVs,这些CNVs已知包括一个或多个癌基因或肿瘤抑制基因和/或已知与癌症或癌症风险增大有关。在某些实施方案中,可以测定包含组成性(种系) 核酸的测试样品中的CNVs,并且在那些组成性核酸中可以识别区段。在某些实施方案中,在包含核酸混合物(例如,来源于正常细胞的核酸和来源于赘生性细胞的核酸)的样品中识别区段CNVs(若存在)。在某些实施方案中,样品来源于怀疑或已知患有癌症(例如,癌、肉瘤、淋巴瘤、白血病、生殖细胞瘤、母细胞瘤等等)的受试者。在一个实施方案中,样品是外周血液所衍生(经处理)的血浆样品,该外周血液可以包含来源于正常细胞和癌细胞的cfDNA的混合物。在另一个实施方案中,用于确定存在德尔CNV的生物样品是否来源于细胞,若存在癌症,则该细胞包括来自其他生物学组织的癌细胞和非癌细胞的混合物,该其他生物学组织包括但不限于生物学流体,例如血清、汗水、眼泪、痰、尿、痰、耳流出物、淋巴、唾液、脑脊髓液、灌洗液(ravages)、骨髓悬浮液、阴道流体、经子宫颈灌洗液、脑流体、腹水、乳汁、呼吸道、肠道和生殖泌尿道分泌液、和白细胞分离术样品,或在组织活检、棉签或涂片中。在其他实施方案中,生物样品是粪便(粪便的)样品。In view of the above, in various embodiments, the methods described herein can be used to determine segment CNVs that are known to include one or more oncogenes or tumor suppressor genes and/or are known to be associated with cancer or an increased risk of cancer. In certain embodiments, CNVs in a test sample comprising constitutive (germline) nucleic acids can be determined, and segments can be identified in those constitutive nucleic acids. In certain embodiments, segment CNVs (if present) are identified in a sample comprising a mixture of nucleic acids (e.g., nucleic acids derived from normal cells and nucleic acids derived from neoplastic cells). In certain embodiments, the sample is derived from a subject suspected or known to have cancer (e.g., cancer, sarcoma, lymphoma, leukemia, germ cell tumor, blastoma, etc.). In one embodiment, the sample is a plasma sample derived (processed) from peripheral blood, which may contain a mixture of cfDNA derived from normal cells and cancer cells. In another embodiment, the biological sample used to determine whether a CNV is present is derived from a cell that, if cancer is present, comprises a mixture of cancerous and non-cancerous cells from other biological tissues, including but not limited to biological fluids such as serum, sweat, tears, sputum, urine, sputum, ear discharge, lymph, saliva, cerebrospinal fluid, lavages, bone marrow suspensions, vaginal fluid, transcervical lavages, brain fluid, ascites, breast milk, respiratory, intestinal and genitourinary tract secretions, and leukapheresis samples, or in a tissue biopsy, swab, or smear. In other embodiments, the biological sample is a stool (fecal) sample.
用于确定癌症存在和/或癌症风险增大的CNVs可以包括扩增或缺失。CNVs used to determine the presence of cancer and/or increased risk of cancer can include amplifications or deletions.
在不同的实施方案中,经识别表示癌症存在或癌症风险增大的CNVs包括表3中所示的一个或多个扩增。In various embodiments, the CNVs identified as indicative of the presence of cancer or increased risk of cancer include one or more amplifications shown in Table 3.
表3:其特征为与癌症有关的扩增的示意性但非限制性的染色体区段。所列举的癌Table 3: Illustrative but non-limiting chromosomal segments characterized by amplification associated with cancer. 症类型是伯柔金(Beroukhim),自然(Nature)18:463:899-905中所识别的那些。Syndrome types are those identified in Beroukhim, Nature 18:463:899-905.
在某些实施方案中,与上文(在此)所述的扩增组合或分别地,经识别表示存在癌症或癌症风险增大的CNVs包括表4中所示的一个或多个缺失。In certain embodiments, in combination with or separately from the amplifications described above (herein), the CNVs identified as indicative of the presence of cancer or increased risk of cancer include one or more deletions shown in Table 4.
表4:其特征为与癌症有关的缺失的示意性但非限制性的染色体区段。所列举的癌Table 4: Illustrative but non-limiting chromosomal segments characterized by deletions associated with cancer. 症类型是伯柔金(Beroukhim),自然(Nature)18:463:899-905中所识别的那些。Syndrome types are those identified in Beroukhim, Nature 18:463:899-905.
经识别表征不同癌症的非整倍性(例如,表3和表4中所识别的非整倍性)可包含已知牵涉癌症病因学的基因(例如肿瘤抑制子、癌基因等)。还可以探测这些非整倍性以识别相关的、但是预先未知的基因。Aneuploidies identified as characterizing different cancers (e.g., aneuploidies identified in Tables 3 and 4) may include genes known to be involved in the etiology of cancer (e.g., tumor suppressors, oncogenes, etc.) These aneuploidies can also be probed to identify related, but previously unknown, genes.
例如,上述伯柔金(Beroukhim)等人利用GRAIL(所牵涉的Loci20之间的基因关系)(一种搜索基因组区域之间函数关系的算法),根据拷贝数改变来评估潜在致癌基因。基于提及基因的所有论文的公开摘要在某些目标基因以共同途径起作用的观点上的文本相似性,GRAIL评价一组基因组区域中的每一基因与其他区域中的基因的‘相关性’。这些方法允许识别/表征争议中的预先与具体癌症不相关的基因。表5说明已知位于所识别的扩增区段和预测基因内的目标基因,并且表6说明已知位于所识别的缺失区段和预测基因内的目标基因。For example, the aforementioned Beroukhim et al. utilized GRAIL (gene relationships between involved Loci20), an algorithm that searches for functional relationships between genomic regions, to assess potential oncogenes based on copy number changes. GRAIL evaluates the 'correlation' of each gene in a set of genomic regions with genes in other regions based on the textual similarity of the published abstracts of all papers mentioning the gene in terms of the idea that certain target genes act in common pathways. These methods allow the identification/characterization of controversial genes that were not previously associated with specific cancers. Table 5 illustrates target genes known to be located within the identified amplified segments and predicted genes, and Table 6 illustrates target genes known to be located within the identified deleted segments and predicted genes.
表5:已知或预测存在于其特征在于不同癌症中的扩增的区域中的示意性、但非限Table 5: Illustrative, but non-limiting, lists of regions known or predicted to be present in regions characterized by amplification in different cancers 制性染色体区段和基因(参见例如上述伯柔金(Beroukhim)等人)。Regulatory chromosome segments and genes (see, e.g., Beroukhim et al., supra).
[]表6:已知或预测存在于其特征在于不同癌症中的扩增的区域中的示意性、但非[ ] Table 6: Illustrative, but not specific, regions known or predicted to be present in regions characterized by amplification in different cancers 限制性染色体区段和基因(参见例如上述伯柔金(Beroukhim)等人)。Restricted chromosome segments and genes (see, e.g., Beroukhim et al., supra).
在不同的实施方案中,考虑了使用在此识别的方法识别包含表5中所识别的扩增区域或基因的区段的CNV,和/或使用在此识别的方法识别包含表6 中所识别的缺失区域或基因的区段的CNV。In various embodiments, it is contemplated to use the methods identified herein to identify CNVs in segments comprising amplified regions or genes identified in Table 5, and/or to use the methods identified herein to identify CNVs in segments comprising deleted regions or genes identified in Table 6.
在一个实施方案中,在此描述的这些方法提供了一种手段来评定基因扩增和肿瘤演化的程度之间的关联性。扩增和/或缺失以及癌症阶段或等级之间的关联对于预后可以是重要的,因为此类信息可以构成遗传性肿瘤等级的定义,这会更好地预测具有最坏预后的更晚期肿瘤的未来病程。此外,关于早期扩增和 /或缺失事件的信息在将这些事件作为随后疾病进展的预测因素方面进行关联时可以是有用的。In one embodiment, the methods described herein provide a means to assess the correlation between gene amplification and the extent of tumor progression. The correlation between amplification and/or deletion and cancer stage or grade can be important for prognosis, as such information can form the definition of a hereditary tumor grade, which better predicts the future course of more advanced tumors with the worst prognosis. In addition, information about early amplification and/or deletion events can be useful in correlating these events as predictors of subsequent disease progression.
可以将通过本方法识别的基因扩增和缺失与其他已知参数(如肿瘤等级、病史、Brd/Urd标记物指数、激素状态、淋巴结转移、肿瘤大小、生存时间和从流行病学和生物统计学研究可获得的其他肿瘤特性)进行关联。例如,有待通过本方法进行测试的肿瘤DNA可以包括不典型增生、导管的原位癌、阶段 I-III的癌症以及转移性淋巴结,以便允许识别在扩增和缺失与阶段之间的关联性。所做出的关联可以使得有效的治疗性干预成为可能。例如,一致扩增的区域可以含有一个过表达的基因,其产物也许能够接受治疗性附接(例如,生长因子受体酪氨酸激酶p185HER2)。Gene amplification and deletion identified by this method can be associated with other known parameters (such as tumor grade, medical history, Brd/Urd marker index, hormonal status, lymph node metastasis, tumor size, survival time and other tumor characteristics available from epidemiological and biostatistical studies). For example, tumor DNA to be tested by this method can include atypical hyperplasia, ductal carcinoma in situ, stage I-III cancer and metastatic lymph nodes, so as to allow identification of the correlation between amplification and deletion and stage. The association made can make effective therapeutic intervention possible. For example, the region of consistent amplification can contain an overexpressed gene, whose product may be able to accept therapeutic attachment (for example, growth factor receptor tyrosine kinase p185 HER2 ).
在不同的实施方案中,在此描述的这些方法通过确定从原发癌症到已经转移到其他部位的细胞的那些核酸序列的拷贝数变异,可以用于识别与抗药性相关的扩增和/或缺失事件。如果基因扩增和/或缺失是允许抗药性迅速发展的核型不稳定性的一种表现,那么与来自化疗敏感的患者的肿瘤相比,将会期待在来自化疗抗性的患者的原发肿瘤中的更多扩增和/或缺失。例如,如果特定基因的扩增造成了抗药性的发展,那么在来自化疗抗性的患者的肿瘤细胞中而不是在原发肿瘤中将会期待围绕那些基因的区域得到了一致的扩增。在基因扩增和 /或缺失与抗药性发展之间的关联性的发现可以允许识别将能够或将不能受益于辅助疗法的患者。In various embodiments, the methods described herein can be used to identify amplification and/or deletion events associated with drug resistance by determining the copy number variation of nucleic acid sequences from the primary cancer to cells that have metastasized to other sites. If gene amplification and/or deletion is a manifestation of karyotypic instability that allows rapid development of drug resistance, then more amplification and/or deletion would be expected in primary tumors from chemotherapy-resistant patients compared to tumors from chemotherapy-sensitive patients. For example, if amplification of specific genes contributes to the development of drug resistance, then the regions surrounding those genes would be expected to be consistently amplified in tumor cells from chemotherapy-resistant patients but not in primary tumors. The discovery of a correlation between gene amplification and/or deletion and the development of drug resistance can allow identification of patients who will or will not benefit from adjuvant therapy.
以类似于针对确定在母体样品中确定存在或不存在完整的和/或部分的胎儿染色体非整倍性所说明的方式,在此描述的方法、设备、和系统可以被用于确定在包含核酸(例如DNA或cfDNA)的任何患者样品(包括不是母体样品的患者样品)中确定存在或不存在完整的和/或部分的染色体性非整倍性。这种患者样品可以是如在本申请的其他地方所说明的任何生物学样品类型。优选地,这种样品是通过无创性过程获得的。例如,这种样品可以是血液样品,或其血清和血浆部分。可替代地,这种样品可以是尿样品或粪样品。在另外的其他实施方案,这种样品是一种组织活检样品。在全部情况下,这种样品包括核酸,例如cfDNA或基因组DNA,它被纯化,并且使用上述任何NGS测序方法进行测序。In a manner similar to that described for determining the presence or absence of complete and/or partial fetal chromosomal aneuploidy in a maternal sample, the methods, devices, and systems described herein can be used to determine the presence or absence of complete and/or partial chromosomal aneuploidy in any patient sample (including patient samples that are not maternal samples) comprising nucleic acids (e.g., DNA or cfDNA). Such patient samples can be any biological sample type as described elsewhere in this application. Preferably, such sample is obtained by a non-invasive process. For example, such sample can be a blood sample, or its serum and plasma fractions. Alternatively, such sample can be a urine sample or a fecal sample. In other embodiments, such sample is a tissue biopsy sample. In all cases, such sample includes nucleic acids, such as cfDNA or genomic DNA, which is purified and sequenced using any of the above-mentioned NGS sequencing methods.
与癌症的形成和进展相关联的完整的以及部分的染色体性非整倍性二者都可以根据本方法来确定。Both complete and partial chromosomal aneuploidies associated with the development and progression of cancer can be determined according to the present method.
在不同的实施方案中,当使用在此描述的方法确定癌症存在和/或风险增大时,可以相对于所测定的CNV的一个或多个染色体将数据归一化。在某些实施方案中,可以相对于所测定的CNV的一个或多个染色体臂将数据归一化。在某些实施方案中,可以相对于所测定的CNV的一个或多个具体区段将数据归一化。In various embodiments, when determining the presence and/or increased risk of cancer using the methods described herein, the data can be normalized relative to the chromosome or chromosomes of the CNV being determined. In certain embodiments, the data can be normalized relative to one or more chromosome arms of the CNV being determined. In certain embodiments, the data can be normalized relative to one or more specific segments of the CNV being determined.
除CNV在癌症中的作用之外,CNV还与越来越多的常见复杂疾病有关,包括人免疫缺陷症病毒(HIV)、自身免疫性疾病和一系列的神经精神病症。In addition to the role of CNVs in cancer, CNVs are also associated with a growing number of common complex diseases, including human immunodeficiency virus (HIV), autoimmune diseases, and a range of neuropsychiatric disorders.
传染性疾病和自身免疫性疾病中的CNVCNVs in infectious and autoimmune diseases
迄今为止,大量研究已经报告涉及发炎和免疫反应的基因的CNV与HIV、哮喘、克罗恩疾病(Crohn’s disease)及其他自身免疫性病症之间的关系(范茨尼(Fanciulli)等人,临床遗传学(Clin Genet)77:201-213[2010])。例如, CCL3L1中的CNV已和HIV/AIDS易感性(CCL3L1,17q11.2缺失)、类风湿性关节炎(CCL3L1,17q11.2缺失)以及川崎氏病(Kawasakidisease)(CCL3L1, 17q11.2复制)有牵连;HBD-2中的CNV已报告易患结肠性克罗恩病(HDB-2, 8p23.1缺失)和牛皮癣(HDB-2,8p23.1缺失);FCGR3B中的CNV已表明易患系统性红斑狼疮中的肾小球性肾炎(FCGR3B,1q23缺失,1q23复制)、抗嗜中性细胞质抗体(ANCA)相关血管炎(FCGR3B,1q23缺失),以及患类风湿性关节炎的风险增大。至少有两种发炎或自身免疫性疾病已表明与不同基因座的CNV有关。例如,克罗恩病不仅与HDB-2的拷贝数低有关,而且与编码p47免疫性相关GTPase家族成员的IGRM基因上游的常见缺失多态性有关。除与FCGR3B拷贝数有关之外,还报告SLE易感性在补体组成部分C4拷贝数较低的受试者中显著增加。To date, numerous studies have reported associations between CNVs in genes involved in inflammation and immune response and HIV, asthma, Crohn's disease, and other autoimmune disorders (Fanciulli et al., Clin Genet 77:201-213 [2010]). For example, CNVs in CCL3L1 have been implicated in HIV/AIDS susceptibility (CCL3L1, 17q11.2 deletion), rheumatoid arthritis (CCL3L1, 17q11.2 deletion), and Kawasaki disease (CCL3L1, 17q11.2 duplication). CNVs in HBD-2 have been reported to predispose to colonic Crohn's disease (HDB-2, 8p23.1 deletion) and psoriasis (HDB-2, 8p23.1 deletion). CNVs in FCGR3B have been shown to predispose to glomerulonephritis in systemic lupus erythematosus (FCGR3B, 1q23 deletion, 1q23 duplication), anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (FCGR3B, 1q23 deletion), and an increased risk of rheumatoid arthritis. At least two inflammatory or autoimmune diseases have been linked to CNVs at different loci. For example, Crohn's disease is associated not only with low HDB-2 copy numbers but also with common deletion polymorphisms upstream of the IGRM gene, which encodes members of the p47 immunity-related GTPase family. In addition to being associated with FCGR3B copy number, SLE susceptibility has been reported to be significantly increased in subjects with low copy numbers of the complement component C4.
GSTM1(GSTM1,1q23缺失)和GSTT1(GSTT1,22q11.2缺失)基因座的基因组缺失与变应性哮喘风险增大之间的关系已经报告于大量的独立研究中。在一些实施方案中,在此描述的方法可用于确定与发炎和/或自身免疫性疾病有关的CNV的存在或不存在。例如,这些方法可用于确定怀疑患有HIV、哮喘或克罗恩病的患者中CNV的存在。与此类疾病有关的CNV实例包括但不限于17q11.2、8p23.1、1q23和22q11.2处的缺失,以及17q11.2和1q23处的复制。在一些实施方案中,本发明方法可用于确定基因中CNV的存在,这些基因包括但不限于CCL3L1、HBD-2、FCGR3B、GSTM、GSTT1、C4和IRGM。The relationship between genomic deletions of the GSTM1 (GSTM1, 1q23 deletion) and GSTT1 (GSTT1, 22q11.2 deletion) loci and an increased risk of allergic asthma has been reported in a large number of independent studies. In some embodiments, the methods described herein can be used to determine the presence or absence of CNVs associated with inflammatory and/or autoimmune diseases. For example, these methods can be used to determine the presence of CNVs in patients suspected of having HIV, asthma, or Crohn's disease. Examples of CNVs associated with such diseases include, but are not limited to, deletions at 17q11.2, 8p23.1, 1q23, and 22q11.2, and duplications at 17q11.2 and 1q23. In some embodiments, the methods of the present invention can be used to determine the presence of CNVs in genes including, but not limited to, CCL3L1, HBD-2, FCGR3B, GSTM, GSTT1, C4, and IRGM.
神经系统的CNV疾病CNV diseases of the nervous system
新生CNV和遗传CNV与若干常见神经病学和精神病学疾病之间的关系已经报告于自闭症、精神分裂症和癫痫、以及神经变性的疾病的某些病例中,例如帕金森病、肌萎缩性侧索硬化症(ALS)和常染色体显性阿兹海默病(范茨尼(Fanciulli)等人,临床遗传学(ClinGenet)77:201-213[2010])。已经在患有自闭症和自闭症谱系障碍(ASD)的患者中观察到在15q11-q13处存在复制的细胞遗传异常。根据自闭症基因组计划联盟(Autism Genomeproject Consortium),包括若干复发性CNV的154CNV位于染色体15q11-q13亦或新的基因组位置,包括染色体2p16、1q21,以及在与史密斯-马吉利综合征有关、与ASD重叠的区域中的17p12。染色体16p11.2上的复发性微缺失或微复制已强调以下观察结果:新生CNV在已知可调控突触分化和调控谷氨酸能神经传递质释放的基因的基因座检测到,例如SHANK3(22q13.3缺失)、突触前膜外伸蛋白1(NRXN1,2p16.3缺失)和神经胶质素(NLGN4,Xp22.33缺失)。精神分裂症还与多个新生CNV有关。与精神分裂症有关的微缺失和微复制包含属于神经发育和谷氨酸能途径的基因过度代表,提示影响这些基因的多个 CNV可直接构成精神分裂症的发病机理,例如ERBB4,2q34缺失;SLC1A3, 5p13.3缺失;RAPEGF4,2q31.1缺失;CIT,12.24缺失;和具有新生CNV的多基因。CNV还与其他神经病症有关,包括癫痫(CHRNA7,15q13.3缺失)、帕金森病(SNCA 4q22复制)和ALS(SMN1,5q12.2.-q13.3缺失;和SMN2 缺失)。在一些实施方案中,在此描述的方法可用于确定与神经系统疾病有关的CNV的存在或不存在。例如,这些方法可用于确定怀疑患有自闭症、精神分裂症、癫痫、神经变性的疾病(诸如帕金森病)、肌萎缩性侧索硬化症(ALS) 或常染色体显性阿兹海默病的患者中的CNV的存在。方法可用于测定与神经系统疾病(包括但不限于自闭症谱系障碍(ASD)、精神分裂症和癫痫中的任一者)有关的基因的CNV,以及与神经变性的病症(诸如帕金森病)有关的基因的CNV。与此类疾病有关的CNV实例包括但不限于15q11-q13、2p16、 1q21、17p12、16p11.2和4q22处的复制,以及在22q13.3、2p16.3、Xp22.33、 2q34、5p13.3、2q31.1、12.24、15q13.3和5q12.2处的缺失。在一些实施方案中,这些方法可用于确定基因中CNV的存在,这些基因包括但不限于SHANK3、 NLGN4、NRXN1、ERBB4、SLC1A3、RAPGEF4、CIT、CHRNA7、SNCA、SMN1 及SMN2。The relationship between newborn CNV and hereditary CNV and some common neurological and psychiatric diseases has been reported in some cases of autism, schizophrenia and epilepsy and neurodegenerative diseases, such as Parkinson's disease, amyotrophic lateral sclerosis (ALS) and autosomal dominant Alzheimer's disease (Fanciulli et al., Clinical Genetics (ClinGenet) 77:201-213[2010]). In patients with autism and autism spectrum disorder (ASD), it has been observed that there is a cytogenetic abnormality of duplication at 15q11-q13. According to the Autism Genome Project Alliance (Autism Genomeproject Consortium), 154 CNVs including some recurrent CNVs are located at chromosome 15q11-q13 or new genomic positions, including chromosome 2p16, 1q21, and 17p12 in the region relevant to Smith-Magenis syndrome, overlapping with ASD. Recurrent microdeletions or microduplications on chromosome 16p11.2 have highlighted the observation that de novo CNVs have been detected at the loci of genes known to regulate synaptic differentiation and regulate the release of glutamatergic neurotransmitters, such as SHANK3 (22q13.3 deletion), presynaptic extrusion protein 1 (NRXN1, 2p16.3 deletion), and neuroligin (NLGN4, Xp22.33 deletion). Schizophrenia is also associated with multiple de novo CNVs. Microdeletions and microduplications associated with schizophrenia include an overrepresentation of genes belonging to neurodevelopmental and glutamatergic pathways, suggesting that multiple CNVs affecting these genes may directly contribute to the pathogenesis of schizophrenia, such as ERBB4, 2q34 deletion; SLC1A3, 5p13.3 deletion; RAPEGF4, 2q31.1 deletion; CIT, 12.24 deletion; and multiple genes with de novo CNVs. CNVs are also associated with other neurological disorders, including epilepsy (CHRNA7, 15q13.3 deletion), Parkinson's disease (SNCA 4q22 duplication), and ALS (SMN1, 5q12.2.-q13.3 deletion; and SMN2 deletion). In some embodiments, the methods described herein can be used to determine the presence or absence of CNVs associated with neurological diseases. For example, these methods can be used to determine the presence of CNVs in patients suspected of having autism, schizophrenia, epilepsy, neurodegenerative diseases (such as Parkinson's disease), amyotrophic lateral sclerosis (ALS), or autosomal dominant Alzheimer's disease. The methods can be used to determine CNVs in genes associated with neurological diseases (including but not limited to any of autism spectrum disorder (ASD), schizophrenia, and epilepsy), as well as CNVs in genes associated with neurodegenerative disorders (such as Parkinson's disease). Examples of CNVs associated with such diseases include, but are not limited to, duplications at 15q11-q13, 2p16, 1q21, 17p12, 16p11.2, and 4q22, and deletions at 22q13.3, 2p16.3, Xp22.33, 2q34, 5p13.3, 2q31.1, 12.24, 15q13.3, and 5q12.2. In some embodiments, these methods can be used to determine the presence of CNVs in genes including, but not limited to, SHANK3, NLGN4, NRXN1, ERBB4, SLC1A3, RAPGEF4, CIT, CHRNA7, SNCA, SMN1, and SMN2.
CNV和新陈代谢的或心血管的疾病CNV and metabolic or cardiovascular disease
新陈代谢的和心血管的病特点(例如家族性高胆固醇血症(FH)、动脉粥样硬化症和冠状动脉病)与CNV之间的关系已经报告于大量研究中(范茨尼 (Fanciulli)等人,临床遗传学(Clin Genet)77:201-213[2010])。例如,已经在未携带其他LDLR突变的某些FH患者的LDLR基因(LDLR,19p13.2缺失 /复制)处观察到种系重排(主要为缺失)。另一个实例是编码阿朴脂蛋白(a) (apo(a))的LPA基因,阿朴脂蛋白(a)的血浆浓度与冠状动脉病、心肌梗死(MI) 和中风的风险有关。包含脂蛋白Lp(a)的apo(a)的血浆浓度在个体之间的变异性超过1000倍,并且此变异性90%在遗传上在LPA基因座决定,其中血浆浓度和Lp(a)同种型尺寸与高度变化的‘kringle 4’重复序列数目(范围5到50)成比例。这些数据表明至少两种基因中的CNV可以与心血管风险关联。在此描述的方法可以在大型研究中特定用于搜索CNV与心血管病症的关系。在一些实施方案中,本发明方法可用于确定与新陈代谢的或心血管的疾病有关的CNV 的存在或不存在。例如,本发明方法可用于确定怀疑患有家族性高胆固醇血症的患者中CNV的存在。在此描述的方法可用于测定与新陈代谢的或心血管的疾病(例如高胆固醇血症)有关的基因的CNV。与此类疾病有关的CNV实例包括但不限于LDLR基因中的19p13.2缺失/复制,和LPA基因中的扩增。The relationship between metabolic and cardiovascular disease traits (such as familial hypercholesterolemia (FH), atherosclerosis, and coronary artery disease) and CNVs has been reported in numerous studies (Fanciulli et al., Clin Genet 77:201-213 [2010]). For example, germline rearrangements (primarily deletions) have been observed in the LDLR gene (LDLR, 19p13.2 deletion/duplication) in some FH patients who do not carry other LDLR mutations. Another example is the LPA gene, which encodes apolipoprotein (a) (apo(a)), whose plasma concentration is associated with the risk of coronary artery disease, myocardial infarction (MI), and stroke. Plasma concentrations of apo(a), which comprises lipoprotein Lp(a), vary over 1000-fold between individuals, and 90% of this variability is genetically determined at the LPA locus, where plasma concentrations and Lp(a) isoform sizes are proportional to the highly variable number of 'kringle 4' repeats (range 5 to 50). These data suggest that CNVs in at least two genes can be associated with cardiovascular risk. The methods described herein can be specifically used to search for the relationship between CNVs and cardiovascular conditions in large studies. In some embodiments, the methods of the present invention can be used to determine the presence or absence of CNVs associated with metabolic or cardiovascular diseases. For example, the methods of the present invention can be used to determine the presence of CNVs in patients suspected of having familial hypercholesterolemia. The methods described herein can be used to determine CNVs in genes associated with metabolic or cardiovascular diseases (e.g., hypercholesterolemia). Examples of CNVs associated with such diseases include, but are not limited to, 19p13.2 deletions/duplications in the LDLR gene, and amplifications in the LPA gene.
测定患者样品中的完全染色体非整倍性Determine complete chromosomal aneuploidy in patient samples
在一个实施方案中,提供了方法,用于在包含核酸分子的患者测试样品中确定存在或不存在任何一种或多种不同的、完整的染色体性非整倍性。在一些实施方案中,该方法确定存在或不存在任何一种或多种不同的、完整的染色体性非整倍性。该方法的步骤包括:(a)获得针对在患者测试样品中的患者核酸的序列信息;并且(b)使用该序列信息来针对选自染色体1-22、X、以及Y的任何一个或多个感兴趣的染色体中的每一个识别出序列标签的一个数目,并且针对用于所述感兴趣的任何一个或更多个染色体中的每一个的一个归一化染色体序列识别出序列标签的一个数目。这个归一化染色体序列可以是一个单染色体,或者它可以是选自染色体1-22、X、和Y的一组染色体。该方法进一步在步骤(c)中使用针对所述任何一个或多个感兴趣的染色体中的每一个识别出的所述序列标签的数目以及针对每个所述归一化染色体序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或更多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述任何一个或多个感兴趣的染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的任何一个或更多个染色体中的每一个的一个阈值进行比较,由此来确定在该患者测试样品中存在或不存在任何一种或多种不同的、完整的患者染色体性非整倍性。In one embodiment, a method is provided for determining the presence or absence of any one or more different, complete chromosomal aneuploidies in a patient test sample comprising a nucleic acid molecule. In some embodiments, the method determines the presence or absence of any one or more different, complete chromosomal aneuploidies. The steps of the method include: (a) obtaining sequence information for the patient's nucleic acid in the patient test sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing chromosome sequence for each of any one or more chromosomes of interest. This normalizing chromosome sequence can be a single chromosome, or it can be a group of chromosomes selected from chromosomes 1-22, X, and Y. The method further uses the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each normalizing chromosome sequence to calculate a single chromosome dose for each of the one or more chromosomes of interest in step (c); and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, thereby determining the presence or absence of any one or more different, complete patient chromosomal aneuploidies in the patient test sample.
在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化染色体序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single chromosome dose for each of the chromosomes of interest as the ratio of the number of sequence tags identified for each of the chromosomes of interest to the number of sequence tags identified for the normalizing chromosome sequence for each of the chromosomes of interest.
在其他实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化染色体识别出的序列标签数目的比率。在其他实施方案中,步骤(c)包括:通过使针对感兴趣的染色体获得的序列标签的数目与感兴趣的染色体的长度进行关联、并且使针对感兴趣的染色体的相应的归一化染色体序列的标签数目与归一化染色体序列的长度进行关联,针对一个感兴趣的染色体计算出一个序列标签比率,并且针对这个感兴趣的染色体来计算一个染色体剂量,作为感兴趣的染色体的序列标签密度与针对归一化染色体序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。可以针对来自不同患者的测试样品重复步骤(a)-(d)。In other embodiments, step (c) comprises calculating a single chromosome dosage for each described chromosome interested, as the ratio of the sequence tag number identified for each described chromosome interested and the sequence tag number identified for the described normalizing chromosome of each described chromosome interested. In other embodiments, step (c) comprises: by making the number of sequence tags obtained for chromosome interested be associated with the length of chromosome interested and making the number of tags for the corresponding normalizing chromosome sequence of chromosome interested be associated with the length of normalizing chromosome sequence, calculate a sequence tag ratio for a chromosome interested, and calculate a chromosome dosage for this chromosome interested, as the sequence tag density of chromosome interested and the ratio of the sequence tag density for normalizing chromosome sequence. Repeat this calculation for each of all sequences interested. Steps (a)-(d) can be repeated for test samples from different patients.
通过该实施方案的一个实例在包含无细胞DNA分子的癌症患者测试样品中确定了一个或多个完整的染色体性非整倍性,该实例包括:(a)对无细胞 DNA分子中的至少一部分进行测序以便获得针对在测试样品中的患者无细胞 DNA分子的序列信息;(b)使用该序列信息来针对选自染色体1-22、X、以及 Y的每一个感兴趣的任何二十个或更多个染色体识别出序列标签的一个数目并且来针对每个所述感兴趣的二十个或更多个染色体的一个归一化染色体识别出序列标签的一个数目;(c)使用针对每个所述感兴趣的二十个或更多个染色体所识别出的所述序列标签的数目以及针对每个归一化染色体识别出的序列标签的数目来对于每个感兴趣的二十个或更多个染色体计算出一个单染色体剂量;并且(d)将针对每个所述感兴趣的二十个或更多个染色体的每个单染色体剂量与针对每个感兴趣的二十个或更多个染色体的一个阈值进行比较,并且由此来确定在患者测试样品中存在或不存在任何二十种或更多种不同的、完整的染色体性非整倍性。By one example of this embodiment, one or more complete chromosomal aneuploidies are determined in a cancer patient test sample comprising cell-free DNA molecules, the example comprising: (a) sequencing at least a portion of the cell-free DNA molecules to obtain sequence information for the patient's cell-free DNA molecules in the test sample; (b) using the sequence information to identify a number of sequence tags for each of any twenty or more chromosomes of interest selected from chromosomes 1-22, X, and Y and to identify a number of sequence tags for a normalizing chromosome for each of the twenty or more chromosomes of interest; (c) using the number of sequence tags identified for each of the twenty or more chromosomes of interest and the number of sequence tags identified for each normalizing chromosome to calculate a single chromosome dose for each of the twenty or more chromosomes of interest; and (d) comparing each single chromosome dose for each of the twenty or more chromosomes of interest to a threshold value for each of the twenty or more chromosomes of interest, and thereby determining the presence or absence of any twenty or more different complete chromosomal aneuploidies in the patient test sample.
在另一实施方案中,如以上所述用于确定在患者测试样品中存在或不存在任何一个或多个不同的、完整的染色体性非整倍性的方法使用了一个归一化区段序列来确定感兴趣的染色体的剂量。在这个实例中,该方法包括:(a)获得针对在所述样品中的核酸的序列信息;并且(b)使用所述序列信息来针对选自染色体1-22、X、以及Y的任何一个或多个感兴趣的染色体中的每一个识别出序列标签的一个数目,并且针对用于所述感兴趣的任何一个或更多个染色体中的每一个的一个归一化区段序列识别出序列标签的一个数目。该归一化区段序列可以是一个染色体的单区段,或者它可以是来自一个或多个不同染色体的一组区段。该方法进一步在步骤(c)中使用了针对所述任何一个或多个感兴趣的染色体中的每一个识别出的所述序列标签的数目以及针对所述归一化区段序列识别出的所述序列标签的数目来针对所述感兴趣的任何一个或更多个染色体中每一个计算出一个单染色体剂量;并且(d)将针对所述任何一个或多个感兴趣的染色体中的每一个的每个所述单染色体剂量与针对所述感兴趣的一个或更多个染色体中的每一个的一个阈值进行比较,并且由此来确定在患者样品中存在或不存在一种或多种不同的、完整的染色体性非整倍性。In another embodiment, the method for determining the presence or absence of any one or more different, complete chromosomal aneuploidies in a patient test sample as described above uses a normalizing segment sequence to determine the dose of the chromosome of interest. In this example, the method includes: (a) obtaining sequence information for the nucleic acid in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing segment sequence for each of any one or more chromosomes of interest. The normalizing segment sequence can be a single segment of a chromosome, or it can be a group of segments from one or more different chromosomes. The method further uses the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for the normalizing segment sequence to calculate a single chromosome dose for each of the one or more chromosomes of interest in step (c); and (d) comparing each of the single chromosome doses for each of the one or more chromosomes of interest to a threshold value for each of the one or more chromosomes of interest, and thereby determining the presence or absence of one or more different, complete chromosomal aneuploidies in the patient sample.
在一些实施方案中,步骤(c)包括针对每个所述感兴趣的染色体来计算一个单染色体剂量,作为针对每个所述感兴趣的染色体识别出的序列标签数目与针对每个所述感兴趣的染色体的所述归一化区段序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single chromosome dose for each of the chromosomes of interest as the ratio of the number of sequence tags identified for each of the chromosomes of interest to the number of sequence tags identified for the normalizing segment sequence of each of the chromosomes of interest.
在其他实施方案中,步骤(c)包括:通过使针对感兴趣的染色体获得的序列标签数目与感兴趣的染色体的长度进行关联、并且使针对感兴趣的染色体的相应的归一化区段序列的标签数目与归一化区段序列的长度进行关联,针对一个感兴趣的染色体计算出一个序列标签比率,并且针对这个感兴趣的染色体来计算一个染色体剂量,作为感兴趣的染色体的序列标签密度与针对归一化区段序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。可以针对来自不同患者的测试样品重复步骤(a)-(d)。In other embodiments, step (c) comprises: calculating a sequence tag ratio for a chromosome of interest by associating the number of sequence tags obtained for the chromosome of interest with the length of the chromosome of interest and associating the number of tags for the corresponding normalizing segment sequence of the chromosome of interest with the length of the normalizing segment sequence, and calculating a chromosome dose for the chromosome of interest as the ratio of the sequence tag density of the chromosome of interest to the sequence tag density of the normalizing segment sequence. This calculation is repeated for each of all sequences of interest. Steps (a)-(d) can be repeated for test samples from different patients.
通过确定一个归一化的染色体值(NCV)提供了用于比较不同样品组的染色体剂量的一种手段,它使测试样品中的染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联。计算NCV,作为:A means for comparing chromosome doses in different sample groups is provided by determining a normalized chromosome value (NCV), which relates the chromosome dose in a test sample to the mean of the corresponding chromosome dose in a set of qualified samples. The NCV is calculated as:
其中和分别是合格样品集的第j次染色体剂量的估算均值和标准差,并且xij是测试样品i的第j次染色体剂量观察值。where and are the estimated mean and standard deviation of the jth chromosome dose for the qualified sample set, respectively, and xij is the observed value of the jth chromosome dose for test sample i.
在一些实施方案中,确定了存在或不存在一个完整的染色体性非整倍性。在其他实施方案中,在一个样品中确定存在或不存在两种、三种、四种、五种、六种、七种、八种、九种、十种、十一种、十二种、十三种、十四种、十五种、十六种、十七种、十八种、十九种、二十种、二十一种、二十二种、二十三种、或二十四种完整的染色体性非整倍性,其中二十二种完整的染色体性非整倍性对应于任何一个或多个常染色体的完整的染色体性非整倍性;第二十三和第二十四种染色体性非整倍性对应于染色体X和Y的完整的染色体性非整倍性。因为非整倍性可以包括三体性、四体性、五体性和其他多体性,并且在不同疾病中和在相同疾病的不同阶段中,完整的染色体性非整倍性的数目发生变化,根据本方法确定的完整的染色体性非整倍性的数目是至少24、至少25、至少 26、至少27、至少28、至少29、至少30complete、至少40、至少50、至少 60、至少70、至少80、至少90、至少100或更多种染色体性非整倍性。肿瘤的系统核型分析已经揭示,在癌细胞中的染色体数目是高度可变的,范围从亚二倍体(相当地少于46个染色体)到四倍体和超四倍体(高达200个染色体) (Storchova(斯托克瓦)和Kuffer(枯否),J Cell Sci(细胞科学杂志), 121:3859-3866[2008])。在一些实施方案中,该方法包括确定在来自一位怀疑或已知患有癌症(例如结肠癌)的患者的样品中存在不或不存在高达200种或更多种染色体性非整倍性。这些染色体性非整倍性包括丢失一个或多个完整的染色体(亚二倍体),获得包括三体性、四体性、五体性、以及其他多体性的完整染色体。如在本申请的其他地方所说明的,还可以确定染色体区段的获得和/或丢失。该方法适用于确定在来自怀疑或已知患有如在本申请的其他地方所说明的癌症的患者的样品中存在或不存在不同的非整倍性。In some embodiments, the presence or absence of a complete chromosomal aneuploidy is determined. In other embodiments, the presence or absence of two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-one, twenty-two, twenty-three, or twenty-four complete chromosomal aneuploidies is determined in a sample, wherein twenty-two complete chromosomal aneuploidies correspond to the complete chromosomal aneuploidies of any one or more autosomes; the twenty-third and twenty-fourth chromosomal aneuploidies correspond to the complete chromosomal aneuploidies of chromosomes X and Y. Because aneuploidy can include trisomy, tetrasomy, pentasomy and other polysomy, and the number of complete chromosomal aneuploidies varies in different diseases and at different stages of the same disease, the number of complete chromosomal aneuploidies determined according to the present method is at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30complete, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100 or more chromosomal aneuploidies. Systematic karyotyping of tumors has revealed that the number of chromosomes in cancer cells is highly variable, ranging from hypodiploid (considerably less than 46 chromosomes) to tetraploid and hypertetraploid (up to 200 chromosomes) (Storchova and Kuffer, J Cell Sci, 121:3859-3866 [2008]). In some embodiments, the method includes determining that in the sample from a patient who suspects or is known to suffer from cancer (such as colon cancer), there is not or does not exist up to 200 kinds or more kinds of chromosomal aneuploidies. These chromosomal aneuploidies include losing one or more complete chromosomes (hypodiploids), and acquisition includes the complete chromosomes of trisomy, tetrasomy, pentasomy and other polysomy. As described elsewhere in the application, it is also possible to determine that the acquisition and/or loss of chromosome segment. The method is applicable to determining that in the sample from the patient who suspects or is known to suffer from cancer as described elsewhere in the application, there is or does not exist different aneuploidies.
在一些实施方案中,染色体1-22、X和Y中的任何一个可以是在确定在如上所述的患者测试样品中存在或不存在任何一种或多种不同的、完整的染色体性非整倍性中的感兴趣的染色体。在其他实施方案中,两个或更多个感兴趣的染色体是选自染色体1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、 16,17、18、19、20、21、22、X、或Y中的任何两个或更多个。在一个实施方案中,选自染色体1-22、X、和Y的任何一个或多个感兴趣的染色体包括选自染色体1-22、X、和Y的至少二十个染色体,并且其中确定了存在或不存在至少二十种不同的、完整的染色体性非整倍性。在其他实施方案中,选自染色体 1-22、X、和Y的任何一个或多个感兴趣的染色体是全部的染色体1-22、X、和Y,并且其中确定了存在或不存在全部染色体1-22、X、和Y的完整的染色体性非整倍性。可以被确定的完整的、不同的染色体性非整倍性包括染色体 1-22、X和Y中的任何一个或多个的完整的染色体单体性;染色体1-22、X和 Y中的任何一个或多个的完整的染色体三体性;染色体1-22、X和Y中的任意一个或多个的完整的染色体四体性;染色体1-22、X和Y中的任何一个或多个的完整的染色体五体性;以及染色体1-22、X和Y中的任何一个或多个的其他完整的染色体多体性。In some embodiments, any one of chromosome 1-22, X and Y can be the chromosome of interest in determining the presence or absence of any one or more different, complete chromosome aneuploidy in patient test sample as described above. In other embodiments, two or more chromosomes of interest are selected from any two or more of chromosome 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X or Y. In one embodiment, any one or more chromosomes of interest selected from chromosome 1-22, X and Y include at least 20 chromosomes selected from chromosome 1-22, X and Y, and wherein determine the presence or absence of at least 20 different, complete chromosome aneuploidy. In other embodiments, any one or more chromosomes of interest selected from chromosome 1-22, X and Y are whole chromosomes 1-22, X and Y, and wherein determine the presence or absence of the complete chromosome aneuploidy of whole chromosomes 1-22, X and Y. Complete, different chromosomal aneuploidies that can be determined include complete monosomy of any one or more of chromosomes 1-22, X, and Y; complete trisomy of any one or more of chromosomes 1-22, X, and Y; complete tetrasomy of any one or more of chromosomes 1-22, X, and Y; complete pentasomy of any one or more of chromosomes 1-22, X, and Y; and other complete polysomy of any one or more of chromosomes 1-22, X, and Y.
测定患者样品中的部分染色体非整倍性Determine partial chromosomal aneuploidy in patient samples
在另一个实施方案中,提供了多种方法,用于在包含核酸分子的患者测试样品中确定存在或不存在任何一种或多种不同的、部分的染色体性非整倍性。该方法的步骤包括:(a)获得针对所述样品中的患者核酸的序列信息;并且(b) 使用该序列信息来针对选自染色体1-22、X、以及Y的任何一个或多个感兴趣的染色体中的每一个识别出序列标签的一个数目,并且针对用于任何一个或多个感兴趣的染色体中的所述任何一个或多个区段中的每一个的一个归一化区段序列识别出序列标签的一个数目。该归一化区段序列可以是一个染色体的单区段,或者它可以是来自一个或多个不同染色体的一组区段。该方法在步骤(c)中进一步使用了针对每个所述任何一个或多个感兴趣的染色体的任何一个或多个区段识别出的所述序列标签的数目以及针对每个所述归一化区段序列识别出的所述序列标签的数目来针对所述任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个计算出一个单区段剂量;并且(d)将针对每个所述任何一个或多个感兴趣的染色体的任何一个或多个区段中的每个所述单染色体剂量与针对每个所述任何一个或多个感兴趣的染色体的任何一个或多个染色体区段的一个阈值进行比较,并且由此来确定在所述样品中存在或不存在一种或多种不同的、部分的染色体性非整倍性。In another embodiment, a variety of methods are provided for determining the presence or absence of any one or more different, partial chromosomal aneuploidies in a patient test sample comprising nucleic acid molecules. The steps of the method include: (a) obtaining sequence information for the patient nucleic acid in the sample; and (b) using the sequence information to identify a number of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifying a number of sequence tags for a normalizing segment sequence for each of any one or more segments in any one or more chromosomes of interest. The normalizing segment sequence can be a single segment of a chromosome, or it can be a group of segments from one or more different chromosomes. The method further uses the number of sequence tags identified for each of any one or more segments of any one or more chromosomes of interest and the number of sequence tags identified for each of the normalizing segment sequences to calculate a single segment dose for each of any one or more segments of any one or more chromosomes of interest in step (c); and (d) compares each of the single chromosome doses for each of any one or more segments of any one or more chromosomes of interest to a threshold value for each of any one or more chromosome segments of any one or more chromosomes of interest, and thereby determines the presence or absence of one or more different, partial chromosomal aneuploidies in the sample.
在一些实施方案中,步骤(c)包括:对于每个任何一个或多个感兴趣的染色体的任何一个或多个区段计算出一个单区段剂量,作为针对每个任何一个或多个感兴趣的染色体的任何一个或多个区段识别出的序列标签数目与针对每个所述任何一个或多个感兴趣的染色体的任何一个或多个区段的所述归一化区段序列识别出的序列标签数目的比率。In some embodiments, step (c) comprises calculating a single segment dose for each of any one or more segments of any one or more chromosomes of interest as the ratio of the number of sequence tags identified for each of any one or more segments of any one or more chromosomes of interest to the number of sequence tags identified for the normalizing segment sequence for each of any one or more segments of any one or more chromosomes of interest.
在其他实施方案中,步骤(c)包括:通过使针对感兴趣的区段获得的序列标签的数目与感兴趣的区段的长度进行关联、并且使针对感兴趣的区段的相应的归一化区段序列的标签数目与归一化区段序列的长度进行关联,针对一个感兴趣的区段计算出一个序列标签比率,并且针对这个感兴趣的区段来计算一个区段剂量,作为感兴趣的区段的序列标签密度与针对归一化区段序列的序列标签密度的比率。针对全部感兴趣的序列的每一个重复该计算。可以针对来自不同患者的测试样品重复步骤(a)-(d)。In other embodiments, step (c) comprises calculating a sequence tag ratio for a segment of interest by correlating the number of sequence tags obtained for the segment of interest with the length of the segment of interest and correlating the number of tags for the corresponding normalizing segment sequence for the segment of interest with the length of the normalizing segment sequence, and calculating a segment dose for the segment of interest as the ratio of the sequence tag density for the segment of interest to the sequence tag density for the normalizing segment sequence. This calculation is repeated for each of all sequences of interest. Steps (a)-(d) can be repeated for test samples from different patients.
通过确定归一化的区段值(NSV)提供了用于比较不同样品组的区段剂量的一种手段,这使测试样品中的区段剂量与在一组合格样品中的相应的区段剂量的平均值进行关联。计算NSV,作为:A means for comparing segment doses across different sample groups is provided by determining a normalized segment value (NSV), which relates the segment dose in a test sample to the mean of the corresponding segment doses in a set of qualified samples. The NSV is calculated as:
其中和分别是合格样品集的第j次区段剂量的估算均值和标准差,并且xij是测试样品i的第j次区段剂量观察值。where and are the estimated mean and standard deviation of the j-th bin dose of the qualified sample set, respectively, and x ij is the observed j-th bin dose of test sample i.
在一些实施方案中,确定了存在或不存在一种部分的染色体性非整倍性。在其他实施方案中,在一个样品中确定了存在或不存在两种、三种、四种、五种、六种、七种、八种、九种、十种、十五种、二十种、二十五种、或更多种部分的染色体性非整倍性。在一个实施方案中,选自染色体1-22、X、和Y中的任何一个的一个感兴趣的区段是选自染色体1-22、X、和Y。在其他实施方案中,选自染色体1-22、X、和Y的两个或更多个感兴趣的区段是选自染色体1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16,17、18、19、20、 21、22、X、或Y中的任何两个或更多个。在一个实施方案中,选自染色体1-22、 X、和Y的任何一个或多个感兴趣的区段包括选自染色体1-22、X、和Y的至少一个、五个、十个、15个、20个、25个、50个、75个、100个或更多个区段,并且其中确定了存在或不存在至少一种、五种、十种、15种、20种、25 种、50种、75种、100种、或更多种不同的、部分的染色体性非整倍性。可以确定的不同的、部分的染色体性非整倍性包括部分复制、部分倍增、部分插入以及部分缺失。In some embodiments, it is determined that there is or is not a kind of chromosomal aneuploidy of part.In other embodiments, it is determined that there is or is not two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty, twenty-five or more kinds of chromosomal aneuploidy of part in a sample.In one embodiment, a section of interest selected from any one of chromosome 1-22, X and Y is selected from chromosome 1-22, X and Y.In other embodiments, two or more sections of interest selected from chromosome 1-22, X and Y are selected from chromosome 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20, 21, any two or more of 22, X or Y. In one embodiment, any one or more segments of interest selected from chromosomes 1-22, X, and Y include at least one, five, ten, 15, 20, 25, 50, 75, 100 or more segments selected from chromosomes 1-22, X, and Y, and wherein the presence or absence of at least one, five, ten, 15, 20, 25, 50, 75, 100 or more different, partial chromosomal aneuploidies is determined. Different, partial chromosomal aneuploidies that can be determined include partial duplications, partial doublings, partial insertions, and partial deletions.
可用于确定在患者中存在或不存在染色体性非整倍性(部分的或完整的) 的样品可以是在本申请的其他地方所说明的任何生物学样品。可用于确定患者中的非整倍性的样品类型或样品将取决于患者已知或被怀疑患有的疾病的类型。例如,可以选择粪便样品作为DNA源来确定存在或不存在与结肠直肠癌症相关联的非整倍性。该方法还适用于在此所述的组织样品。优选地,该样品是通过无创性方式获得的生物学样品,例如血浆样品。如在本申请的其他地方所说明,可以使用在本申请其他地方所说明的下一代测序(NGS)来进行患者样品中的核酸的测序。在一些实施方案中,测序是使用借助可逆染料终止子的合成法测序的大规模平行测序。在其他实施方案中,测序是连接法测序。在另外的其他实施方案,测序是单分子测序。可任选地,在测序前进行一个扩增步骤。The sample that can be used to determine the presence or absence of chromosomal aneuploidy (partial or complete) in a patient can be any biological sample described elsewhere in this application. The sample type or sample that can be used to determine the aneuploidy in a patient will depend on the type of disease that the patient is known to or suspected of having. For example, a fecal sample can be selected as a DNA source to determine the presence or absence of aneuploidy associated with colorectal cancer. The method is also applicable to tissue samples described herein. Preferably, the sample is a biological sample obtained in a non-invasive manner, such as a plasma sample. As described elsewhere in this application, next generation sequencing (NGS) described elsewhere in this application can be used to carry out sequencing of nucleic acids in patient samples. In some embodiments, sequencing is a massively parallel sequencing using synthetic sequencing with the aid of reversible dye terminators. In other embodiments, sequencing is ligation sequencing. In other embodiments, sequencing is single molecule sequencing. Optionally, an amplification step is performed before sequencing.
在一些实施方案中,确定了一位患者体内存在或不存在非整倍性,这位患者怀疑患有如在本申请的其他地方所说明的癌症,例如肺癌、乳癌、肾癌、头颈癌、卵巢癌、子宫颈癌、结肠癌、胰腺癌、食道癌、膀胱癌和其他器官的癌症,以及血液癌症。血液癌症包括骨髓、血液、和淋巴系统的癌症,而淋巴系统包括淋巴结、淋巴管、扁桃体、胸腺、脾脏、以及消化道淋巴组织。始于骨髓的白血病和骨髓瘤,以及始于淋巴系统的淋巴瘤是最常见的血癌症类型。In some embodiments, the presence or absence of aneuploidy is determined in a patient suspected of having a cancer as described elsewhere in this application, such as lung cancer, breast cancer, kidney cancer, head and neck cancer, ovarian cancer, cervical cancer, colon cancer, pancreatic cancer, esophageal cancer, bladder cancer, and cancers of other organs, and blood cancers. Blood cancers include cancers of the bone marrow, blood, and lymphatic system, which includes lymph nodes, lymphatic vessels, tonsils, thymus, spleen, and digestive tract lymphoid tissue. Leukemias and myelomas that begin in the bone marrow, and lymphomas that begin in the lymphatic system are the most common blood cancer types.
在患者样品中可以做出存在或不存在一种或多种染色体性非整倍性的确定而对以下各项没有限制,即:确定患者对一种具体癌症的易感性,作为在已知或不知易感一种癌症的患者当中常规筛查的一部分来确定存在或不存在所关心的癌症,提供对疾病的预后,评估对辅助疗法的需要,并且确定疾病的进展或复原。The determination of the presence or absence of one or more chromosomal aneuploidies in a patient sample can be made, without limitation, to determine a patient's susceptibility to a particular cancer, to determine the presence or absence of a cancer of interest as part of routine screening in patients known or not to be susceptible to a cancer, to provide a prognosis for a disease, to assess the need for adjuvant therapy, and to determine the progression or regression of a disease.
遗传咨询Genetic counseling
胎儿染色体异常是导致流产、先天异常和围产期死亡的主要原因(韦尔斯利(Wellesley)等人,欧洲人类遗传学杂志(Europ.J.Human Genet.),20:521-526 [2012];长冈(Nagaoka)等人,自然综述遗传学(Nature Rev.Genetics)13:493-504 [2012])。自从引入了羊膜穿刺,随后引入了绒膜绒毛采样(CVS),孕妇已经有权获得有关胎儿染色体状况的信息(ACOG实践公告第77号(ACOG Practice Bulletin No.77):产科学与妇科学(ObstetGynecol)109:217-227[2007])。在获得足够的组织时,对从这些程序获得的胎儿细胞或绒膜绒毛进行细胞遗传核型定型,在绝大多数情况下使诊断灵敏度和特异性很高(约99%)(哈勒曼 (Hahnemann)和弗吉斯勒(Vejerslev),产前诊断(Prenat Diagn.),17:801-8201997;NICHD羊膜穿刺研究国家注册JAMA236:1471-1476[1976])。然而,这些程序也对胎儿和孕妇带来了风险(奥迪博(Odibo)等人,产科学与妇科学 (Obstet Gynecol)112:813-819[2008];奥迪博(Odibo)等人,产科学与妇科学(Obstet Gynecol)111:589-595[2008])。Fetal chromosomal abnormalities are a leading cause of miscarriage, congenital anomalies, and perinatal death (Wellesley et al., Europ. J. Human Genet., 20:521-526 [2012]; Nagaoka et al., Nature Rev. Genetics 13:493-504 [2012]). Since the introduction of amniocentesis and subsequently chorionic villus sampling (CVS), pregnant women have had the right to obtain information about their fetal chromosomal status (ACOG Practice Bulletin No. 77: Obstet Gynecol 109:217-227 [2007]). When sufficient tissue is available, cytogenetic karyotyping of fetal cells or chorionic villi obtained from these procedures results in a high sensitivity and specificity (approximately 99%) in the vast majority of cases (Hahnemann and Vejerslev, Prenat Diagn., 17:801-820 1997; NICHD Amniocentesis National Registry JAMA 236:1471-1476 [1976]). However, these procedures also carry risks to the fetus and the mother (Odibo et al., Obstet Gynecol 112:813-819 [2008]; Odibo et al., Obstet Gynecol 111:589-595 [2008]).
为了减轻这些风险,已经开发出一系列产前筛查算法,针对出现最常见胎儿三体性-T21(唐氏综合征)和三体性18(T18,爱德华综合征),以及较小程度的三体性13(T13,帕塔综合征)的它们的可能性将妇女分级。筛查典型地涉及在不同时点测量母体血清中的多种生物化学分析物,结合超声波检查测量胎儿颈半透明度(NT),以及其他母体因素(例如年龄)的合并,以产生风险评分。根据其多年的发展和改进并且取决于何时给予筛查(仅妊娠期的最初三个月或第二个三个月,连续或充分整体化)以及如何给予筛查(仅血清或血清与NT组合),已开发出具有不同检出率(65%到90%)和高筛检阳性率(5%) 的选项菜单(ACOG实践公告第77号(ACOG Practice Bulletin No.77):产科学与妇科学(Obstet Gynecol)109:217-227[2007])。To mitigate these risks, a series of prenatal screening algorithms have been developed to stratify women for their likelihood of developing the most common fetal trisomies - T21 (Down syndrome) and trisomy 18 (T18, Edward's syndrome), and to a lesser extent trisomy 13 (T13, Patau syndrome). Screening typically involves measuring multiple biochemical analytes in maternal serum at different time points, combined with ultrasonographic measurement of the fetal nuchal translucency (NT), and the incorporation of other maternal factors (e.g., age) to generate a risk score. Over the years, and depending on when screening is given (first or second trimester only, continuous or fully integrated) and how it is given (serum only or serum combined with NT), a menu of options has developed with varying detection rates (65% to 90%) and high screen-positive rates (5%) (ACOG Practice Bulletin No. 77: Obstet Gynecol 109:217-227 [2007]).
对于患者而言,在这个多步骤程序之后,所得信息或“风险评分”可能令其困惑并且引发其焦虑,特别是在综合性咨询缺乏的情况下。最后,在妇女作出决定时,针对因侵入性程序所致的流产风险权衡结果。获得关于胎儿染色体状况的更明确信息的更好非侵入性方式协助在此背景下作出决定。获得关于胎儿染色体状况的更明确信息的此类非侵入性改良手段被认为可通过在此所述的方法提供。For patients, the resulting information, or "risk score," following this multi-step process can be confusing and anxiety-provoking, particularly in the absence of comprehensive counseling. Ultimately, women must weigh the outcome against the risk of miscarriage associated with the invasive procedure when making their decision. Better non-invasive methods of obtaining more definitive information about the fetus's chromosomal status could aid in this decision-making process. Such improved non-invasive means of obtaining more definitive information about the fetus's chromosomal status are believed to be provided by the methods described herein.
在不同的实施方案中,考虑了遗传咨询作为使用在此所述的分析的一部分,特别是在临床背景下。相反,在此所述的非整倍性检测方法可以包括在产前护理和相关遗传咨询背景下提供的一个选项。In various embodiments, genetic counseling is contemplated as part of using the assays described herein, particularly in a clinical setting. Conversely, the aneuploidy detection methods described herein can include an option provided in the context of prenatal care and related genetic counseling.
因此,在不同的实施方案中,在此所述的方法可以作为初步筛查(例如,对于有前设的怀孕风险的妇女)或作为对“常规”筛查呈阳性的那些妇女的二次筛查被提供。在某些实施方案中,考虑了在此所述的非侵入性产前测试(NIPT) 方法另外包括遗传咨询部分,并和/或在此所述的NIPT方法中任选地或明确地并入遗传咨询和怀孕“管理”。Thus, in various embodiments, the methods described herein can be provided as a primary screen (e.g., for women with a pre-existing risk for pregnancy) or as a secondary screen for those women who test positive on a "routine" screen. In certain embodiments, it is contemplated that the non-invasive prenatal testing (NIPT) methods described herein additionally include a genetic counseling component and/or that genetic counseling and pregnancy "management" are optionally or explicitly incorporated into the NIPT methods described herein.
例如,在某些实施方案中,妇女存在一种或多种前设的怀孕风险。此类风险包括但不限于以下一项或多项:For example, in certain embodiments, a woman has one or more pre-existing pregnancy risks. Such risks include, but are not limited to, one or more of the following:
1)母亲年龄超过35岁,虽然指出,约80%生来患有唐氏综合征的儿童是由不到35岁的妇女生下的。1) The mother is over 35 years old, although it is noted that approximately 80% of children born with Down syndrome are born to women under 35 years old.
2)具有常染色体三体性的先前胎儿/儿童。取决于三体性类型、先前怀孕是否自然流产、以及初次发生时的母亲年龄和后来产前诊断时的母亲年龄,认为再发生率为母亲年龄风险的约1.6倍到约8.2倍。2) A previous fetus/child with an autosomal trisomy. The recurrence rate is thought to be approximately 1.6 to 8.2 times the maternal age risk, depending on the type of trisomy, whether the previous pregnancy ended in spontaneous abortion, and the maternal age at the time of the first occurrence and subsequent prenatal diagnosis.
3)具有的性染色体异常的先前胎儿/儿童——不是所有的性染色体异常具有母体来源,并且并非都具有再发生的风险。当它们发生时,再发生率为母体年龄风险的约1.6倍到约1.5倍。3) Previous fetuses/children with sex chromosome abnormalities - Not all sex chromosome abnormalities are of maternal origin, and not all carry a risk of recurrence. When they occur, the recurrence rate is about 1.6 to about 1.5 times the risk for maternal age.
4)染色体易位的亲代携带者。4) Parental carriers of chromosomal translocation.
5)染色体倒位的亲代携带者。5) Parental carriers of chromosomal inversion.
6)亲代非整倍性或嵌合性。6) Parental aneuploidy or mosaicism.
7)使用某些辅助生殖技术。7) Use of certain assisted reproductive technologies.
在此类情形下,服从述不同考虑,母亲,例如经与医师、遗传咨询师等人磋商,可以被提供使用在此所述的方法,用于非侵入性确定胎儿非整倍性(例如三体性21、三体性18、三体性13、单体性X等等)的存在或不存在。在这点上,应指出在此所述的方法被认为是有效的,即使在妊娠期的最初三个月。因此,在某些实施方案中,考虑了早在8周时使用在此所述的NIPT方法,并且在不同的实施方案中,在约10周或更晚。In such cases, subject to the various considerations described above, the mother, for example, in consultation with a physician, genetic counselor, etc., can be offered the use of the methods described herein for non-invasively determining the presence or absence of a fetal aneuploidy (e.g., trisomy 21, trisomy 18, trisomy 13, monosomy X, etc.). In this regard, it should be noted that the methods described herein are believed to be effective even in the first three months of pregnancy. Thus, in certain embodiments, the use of the NIPT methods described herein is contemplated as early as 8 weeks, and in various embodiments, at about 10 weeks or later.
在某些实施方案中,可以向“常规”筛查呈阳性的那些妇女提供在此所述的方法作为二次筛查。例如,在某些实施方案中,孕妇可能呈现结构异常,例如例如胎儿水囊状淋巴管瘤,或提高的颈半透明度,例如像使用超声波探测术所检测。典型地,在18周到22周进行结构缺陷的超声波检测,并且特别是当观察到不规则性时,可以和胎儿超声心电图偶联。在此考虑了当观察到异常(例如,“常规”筛查呈阳性)时,母亲,例如经与医师、遗传咨询师等人磋商,可以被提供使用在此所述的方法,用于非侵入性确定胎儿非整倍性(例如三体性21、三体性18、三体性13、单体性X等等)的存在或不存在。In certain embodiments, the methods described herein may be offered as a secondary screening to those women who are positive for a "conventional" screen. For example, in certain embodiments, a pregnant woman may present with a structural abnormality, such as, for example, a fetal cystic hygroma, or an increased nuchal translucency, such as detected using ultrasonography. Typically, ultrasound detection of structural defects is performed at 18 to 22 weeks, and particularly when irregularities are observed, can be coupled to a fetal echocardiogram. It is contemplated that when an abnormality is observed (e.g., a "conventional" screen is positive), the mother, for example, in consultation with a physician, genetic counselor, or the like, may be offered the use of the methods described herein for non-invasive determination of the presence or absence of a fetal aneuploidy (e.g., trisomy 21, trisomy 18, trisomy 13, monosomy X, etc.).
因此,在不同的实施方案中,考虑了遗传咨询,其中提供在此所述的(NIPT) 分析作为产前护理、怀孕管理和/或分娩方案的开发/设计的一个组成部分。通过向常规筛查呈阳性(或其他前设风险)的那些妇女提供NIPT作为二次筛查,预计可减少不必要的羊膜穿刺和CVS程序的次数。然而,因为同意书是NIPT 的重要组成部分,所以遗传咨询的必要性提高。Therefore, in various embodiments, genetic counseling is contemplated in which the (NIPT) analysis described herein is provided as an integral part of prenatal care, pregnancy management, and/or delivery plan development/design. By offering NIPT as a secondary screening test to women who test positive on conventional screening (or other pre-existing risk factors), it is expected that the number of unnecessary amniocentesis and CVS procedures can be reduced. However, because consent is an essential component of NIPT, the need for genetic counseling is heightened.
由于NIPT阳性结果(使用在此所述的方法)更类似于羊膜穿刺或CVS的阳性结果,因此应在此测试之前,在遗传咨询时,向妇女提供可决定其是否需要这个程度的信息的机会。测试前的NIPT遗传咨询还应该包括讨论/建议以确认经由CVS、羊膜穿刺、脐带穿刺等(取决于妊娠年龄)的异常测试结果,从而对结果的所期望的时间安排可以给予适当考虑,用于测试后的规划按照国家遗传咨询师学会(NSGC,USA)关于该主题的声明(参见例如戴弗斯(Devers) 等人,非侵入性产前测试/非侵入性产前诊断:国家遗传咨询师学会的立场(通过NSGC公共政策委员会)NSGC立场声明2012(Noninvasive Prenatal Testing /Noninvasive Prenatal Diagnosis:the position of the National Society ofGenetic Counselors(by NSGC Public Policy Committee).NSGC Position Statements2012;伯恩(Benn)等人,产前诊断(Prenat Diagn),31:519-522[2011]),因为NIPT 目前不筛查所有的染色体或遗传状况,所以它可能不会取代标准的风险评估和产前诊断。在此考虑了具有暗示染色体异常的其他因素(例如,某些异常的超声波研究结果)的患者应该接受遗传咨询,其中向他们提供常规验证性诊断测试的选项,而不管NIPT结果。妇女在遗传咨询时还应该意识到,对于某些患者而言,NIPT结果可能信息量不大。Because a positive NIPT result (using the method described here) is more similar to a positive result from amniocentesis or CVS, women should be given the opportunity to decide whether they want this level of information during genetic counseling before this testing. Pre-test NIPT genetic counseling should also include discussion/advice to confirm abnormal test results via CVS, amniocentesis, cordocentesis, etc. (depending on gestational age) so that appropriate consideration can be given to the expected timing of the results for post-test planning in accordance with the National Society of Genetic Counselors (NSGC, USA) statements on this topic (see, e.g., Devers et al., Noninvasive Prenatal Testing/Noninvasive Prenatal Diagnosis: the position of the National Society of Genetic Counselors (by NSGC Public Policy Committee). NSGC Position Statements 2012; Benn et al., Prenat Diagn, 31:519-522 [2011]), because NIPT Screening for all chromosomal or genetic conditions is not currently available, so it may not replace standard risk assessment and prenatal diagnosis. Patients who have other factors suggestive of a chromosomal abnormality (eg, certain abnormal ultrasound findings) who are considered here should receive genetic counseling in which they are offered the option of routine confirmatory diagnostic testing, regardless of NIPT results. Women should also be aware during genetic counseling that NIPT results may not be informative for some patients.
与羊膜穿刺相比,在非整倍性的检测典型地表示胎儿的染色体组成,但是在有些情况下可能表示受限制的胎盘非整倍性或受限制的胎盘嵌合性(CPM) 方面,使用在此所述方法的NIPT也许更类似于CVS,。在今天的CVS结果中,约1%到2%的情况存在CPM,并且一些妇女在CVS之后,在更晚的妊娠年龄经历羊膜穿刺,来使得在清楚分离的胎盘非整倍性对比胎儿非整倍性之间产生差别。随着NIPT实施更广泛,因此预计CPM情况可产生一定数目的可能随后不会被侵入性程序(特别是羊膜穿刺)确认的阳性NIPT结果。再一次,在不同的实施方案中,考虑了此信息在遗传咨询的背景下(例如通过医师、遗传咨询师等)呈现给患者。NIPT using the methods described herein may be more similar to CVS than amniocentesis in that the detection of aneuploidy typically indicates the chromosomal composition of the fetus, but in some cases may indicate confined placental aneuploidy or confined placental mosaicism (CPM). CPM is present in approximately 1% to 2% of today's CVS results, and some women undergo amniocentesis at a later gestational age after CVS to differentiate between clearly separated placental aneuploidies and fetal aneuploidies. As NIPT is implemented more widely, it is expected that CPM cases may produce a certain number of positive NIPT results that may not be subsequently confirmed by invasive procedures (particularly amniocentesis). Again, in various embodiments, it is contemplated that this information is presented to the patient in the context of genetic counseling (e.g., by a physician, genetic counselor, etc.).
应认识到,在不同的实施方案中,遗传咨询的一个组成部分可能是推荐确诊方式,告知风险水平时间安排,并且针对不同确诊方式进行时间安排,可以用来提供关于通过此等验证方法所提供的信息值的输入,特别是在选择怀孕时间的背景下。在不同的实施方案中,遗传咨询还可以确立一个方案,用来监控怀孕(例如后续超声波检查、额外的医师出诊等等),并且用来在适当时设立一系列决策点。此外,遗传咨询可以建议并且有助于开发出分娩方案,分娩方案可以包括例如关于分娩地点(例如家、医院、专门设施等等)、分娩地点所涉及的人员、婴儿可获得的第三方护理等等。It will be appreciated that, in various embodiments, a component of genetic counseling may be a recommendation for a confirmed diagnosis, information on the timing of risk levels, and timing for different confirmed diagnosis methods, which can be used to provide input on the value of the information provided by such validation methods, particularly in the context of choosing the timing of pregnancy. In various embodiments, genetic counseling can also establish a plan for monitoring the pregnancy (e.g., follow-up ultrasounds, additional physician visits, etc.) and for establishing a series of decision points when appropriate. In addition, genetic counseling can recommend and assist in developing a delivery plan that can include, for example, information on the location of the birth (e.g., home, hospital, specialized facility, etc.), the personnel involved at the location of the birth, third-party care available to the baby, and the like.
虽然以上论述集中于在此所述的方法作为产前诊断的一个组成部分(并且也许是第二工具),但随着临床经验积累并且如果从比较研究到常规筛查的结果成功,那么在此所述的NIPT方法可能取代现有的筛查方案并且可能用作主要工具。Although the above discussion focuses on the methods described herein as a component of prenatal diagnosis (and perhaps a secondary tool), as clinical experience accumulates and if the results from comparative studies to routine screening are successful, the NIPT methods described herein may replace existing screening programs and may be used as the primary tool.
还考虑了在此所述的方法将针对多胎妊娠的怀孕发现用途。It is also contemplated that the methods described herein will be used for pregnancy discovery in multiple pregnancies.
典型地,预计遗传咨询(例如上文所述)可通过医师(例如主要医师、产科医师等)和/或通过遗传咨询师或其他合格的医学专业人士提供。在某些实施方案中,面对面提供咨询,然而应认识到,在某些情况下,可通过远程访问(例如,通过文本、手机、手机应用程序、平板计算机应用程序、因特网等等)提供咨询。Typically, it is contemplated that genetic counseling (e.g., as described above) can be provided by a physician (e.g., primary physician, obstetrician, etc.) and/or by a genetic counselor or other qualified medical professional. In certain embodiments, counseling is provided face-to-face, however, it will be appreciated that in certain circumstances, counseling can be provided via remote access (e.g., via text, cell phone, cell phone application, tablet computer application, Internet, etc.).
还应认识到,在某些实施方案中,遗传咨询或其一个组成部分可以通过计算机系统递送。例如,可以提供“智能建议“系统,其响应于测试结果、来自医疗护理供应者的指令和/或响应于查询(例如来自患者查询)而提供遗传咨询信息(例如上文所述)。在某些实施方案中,信息将是通过医师、保健系统和/或患者提供的特定临床信息。在某些实施方案中,信息能够以迭代方式提供。因此,例如,患者可以提供“如果之类”的查询并且系统可以返回信息,例如诊断选项、风险系数、时间安排、以及不同结果的涵义。It should also be appreciated that in certain embodiments, genetic counseling or a component thereof can be delivered by a computer system. For example, a "smart advice" system can be provided that provides genetic counseling information (such as described above) in response to test results, instructions from a healthcare provider, and/or in response to a query (such as a query from a patient). In certain embodiments, the information will be specific clinical information provided by a physician, a healthcare system, and/or a patient. In certain embodiments, the information can be provided in an iterative manner. Thus, for example, a patient can provide a "what if" query and the system can return information such as diagnostic options, risk factors, timelines, and the meaning of different results.
在某些实施方案中,信息能够以暂时性方式提供(例如,在计算机屏幕上呈现)。在某些实施方案中,信息能够以非暂时性方式提供。因此,例如,信息可以打印出(例如,作为选项和/或建议的菜单,其任选地附有相关时间安排等)和/或储存在计算机可读媒体(例如磁性媒体,例如本地硬盘、服务器等;光学媒体;闪存等等)上。In some embodiments, the information can be provided in a transient manner (e.g., presented on a computer screen). In some embodiments, the information can be provided in a non-transitory manner. Thus, for example, the information can be printed out (e.g., as a menu of options and/or suggestions, optionally with associated schedules, etc.) and/or stored on computer-readable media (e.g., magnetic media, such as a local hard drive, server, etc.; optical media; flash memory, etc.).
应了解,此类系统典型地被配置为提供足够的安全性,以便维持患者隐私,例如根据行业中的现行标准。It will be appreciated that such systems are typically configured to provide sufficient security to maintain patient privacy, such as in accordance with current standards in the industry.
遗传咨询的以上论述旨在为示意性的而非限制性的。遗传咨询是医学科学中的一个良好确认的分支,并且关于在此所述的分析的咨询组成部分的结合属于从业者的技能范围内。此外,应认识到,随着该领域发展,遗传咨询和相关信息以及建议的性质很可能改变。The above discussion of genetic counseling is intended to be illustrative and not limiting. Genetic counseling is a well-established branch of medical science, and the incorporation of counseling components regarding the analysis described herein is within the skill of the practitioner. Furthermore, it should be recognized that as the field advances, the nature of genetic counseling and related information and recommendations is likely to change.
确定胎儿分数Determining fetal fraction
胎儿分数确定方法披露于美国专利申请公开2010-0010085(117.201)、美国专利申请公开2011-0201507(120.201)、美国专利申请号13/365,240(2012 年2月2日提交)和美国专利申请号13/445,778(2012年4月12日提交)中。在这些文件中可以找到用于确定胎儿分数的技术的充分论述。Methods for determining fetal fraction are disclosed in U.S. Patent Application Publication No. 2010-0010085 (117.201), U.S. Patent Application Publication No. 2011-0201507 (120.201), U.S. Patent Application No. 13/365,240 (filed February 2, 2012), and U.S. Patent Application No. 13/445,778 (filed April 12, 2012). A comprehensive discussion of techniques for determining fetal fraction can be found in these documents.
在此所述的方法使能够确定样品中的胎儿分数,该样品包括胎儿和母体核酸的混合物,或更一般来说,是来源于两个不同基因组的核酸的混合物。为此论述的目的,将描述母体和胎儿核酸,但是应理解,可以因此替代任何两个基因组。在一些实施方案中,确定胎儿分数,同时确定拷贝数变异(例如非整倍性)的存在或不存在。如下文更充分描述,可采用测试样品的一组标签确定胎儿分数和拷贝数变异。The methods described herein enable determination of fetal fraction in a sample comprising a mixture of fetal and maternal nucleic acids, or more generally, a mixture of nucleic acids derived from two different genomes. For the purposes of this discussion, maternal and fetal nucleic acids will be described, but it will be understood that any two genomes may be substituted. In some embodiments, fetal fraction is determined while simultaneously determining the presence or absence of copy number variation (e.g., aneuploidy). As described more fully below, fetal fraction and copy number variation can be determined using a set of labels for a test sample.
量化胎儿分数的方法是依赖于胎儿基因组与母体基因组之间的差异。在此所述的某些实施方案中,确定样品DNA的胎儿分数依赖于已知容纳一种或多种多态性的序列位点处的多重DNA序列读数。在一些实施方案中,在对序列标签彼此和/或参考序列进行比对的同时发现多态位点或目标核酸序列。在某些实施方案中,样品DNA的胎儿分数是通过考虑具体染色体或染色体序列的拷贝数信息来确定,其中母体染色体与胎儿染色体之间存在拷贝数差异。在此类实施方案中,样品DNA的胎儿分数是通过考虑母亲和胎儿的样品DNA相对数量来确定,其中染色体或区段本来就确定或已知具有拷贝数变异。在此类实施方案中,胎儿分数可以使用母体染色体和胎儿染色体之间的拷贝数变异加以计算。为此目的,该方法和设备可以计算如下文所述的归一化的染色体值 (NCV),或类似度量标准。The method of quantifying fetal fraction is to rely on the difference between the fetal genome and the maternal genome. In certain embodiments described herein, determining the fetal fraction of sample DNA relies on multiple DNA sequence readings at sequence sites known to accommodate one or more polymorphisms. In some embodiments, polymorphic sites or target nucleic acid sequences are found while sequence tags are compared to each other and/or to reference sequences. In certain embodiments, the fetal fraction of sample DNA is determined by considering the copy number information of a specific chromosome or chromosome sequence, wherein there is a copy number difference between the maternal chromosome and the fetal chromosome. In such embodiments, the fetal fraction of sample DNA is determined by considering the relative number of sample DNA of the mother and fetus, wherein a chromosome or segment is originally determined or known to have a copy number variation. In such embodiments, the fetal fraction can be calculated using the copy number variation between the maternal chromosome and the fetal chromosome. For this purpose, the method and apparatus can calculate a normalized chromosome value (NCV) as described below, or a similar metric.
某些方法受到胎儿性别的限制,例如用于量化胎儿分数的方法依赖于对Y 染色体具有特异性的序列的存在或决定男性胎儿的X染色体的染色体剂量。在某些实施方案中,量化胎儿DNA是针对胎儿目标,这些胎儿目标没有母体配对物,例如Y染色体序列(范(Fan)等人,国家科学院院刊(Proc Natl Acad Sci)105:16266-16271[2008]和美国专利申请公开号2010/0112590,2009年11 月6日提交,罗(Lo)等人)或RhD阴性母体中没有RHD1基因,亦或通过在多个DNA碱基对,不同于与母体背景。其他方法独立于胎儿性别,并且依赖于胎儿与母体基因组之间的多态性差异。Some methods are limited by fetal sex, for example, methods for quantifying fetal fraction rely on the presence of sequences specific for the Y chromosome or determine the chromosome dose of the X chromosome in male fetuses. In certain embodiments, fetal DNA is quantified for fetal targets that lack a maternal counterpart, such as Y chromosome sequences (Fan et al., Proc Natl Acad Sci 105:16266-16271 [2008] and U.S. Patent Application Publication No. 2010/0112590, filed November 6, 2009, Lo et al.), or for RhD-negative mothers who lack the RHD1 gene, or for those who differ from the maternal background by multiple DNA base pairs. Other methods are independent of fetal sex and rely on polymorphic differences between the fetal and maternal genomes.
多态性中的等位基因不平衡可以通过不同技术检测并且量化。在一些实施方案中,使用数字PCR确定多态性中的等位基因不平衡,例如mRNA上的SNP。可替代地,使用毛细管凝胶电泳来检测多态区域尺寸的差异,例如在STR情况下。Allelic imbalance in polymorphisms can be detected and quantified by various techniques. In some embodiments, digital PCR is used to determine allelic imbalance in polymorphisms, such as SNPs on mRNA. Alternatively, capillary gel electrophoresis is used to detect differences in the size of polymorphic regions, such as in the case of STRs.
在一些实施方案中,可以检测外遗传差异,例如启动子区域有差异的甲基化,可单独或与数字PCR组合用于确定胎儿基因组与母体基因组之间的差异并且量化胎儿分数(童(Tong)等人,临床化学(Clin Chem)56:90-98[2010])。还包括外遗传方法的变型,例如基于甲基化的DNA辨别(艾尼奇(Erich)等人,AJOG 204:第205.e1页到第205.e11页[2011])。在一些实施方案中,使用如在本申请的其他地方所说明的一个或多个预选定的组的多态序列的测序,来估计胎儿分数。In some embodiments, epigenetic differences can be detected, such as differential methylation of promoter regions, which can be used alone or in combination with digital PCR to determine differences between the fetal and maternal genomes and quantify fetal fraction (Tong et al., Clin Chem 56:90-98 [2010]). Variations of epigenetic methods, such as methylation-based DNA discrimination (Erich et al., AJOG 204: pp. 205.e1-pp. 205.e11 [2011]), are also included. In some embodiments, fetal fraction is estimated using sequencing of one or more preselected sets of polymorphic sequences as described elsewhere in this application.
除如在本申请的其他地方所说明的对多组预选多态序列进行测序的方法之外,用于量化母体血浆中的胎儿DNA的方法包括但不限于实时qPCR、质谱测定法、数字PCR(包括微流体数字PCR)、毛细管凝胶电泳。In addition to methods of sequencing sets of preselected polymorphic sequences as described elsewhere in this application, methods for quantifying fetal DNA in maternal plasma include, but are not limited to, real-time qPCR, mass spectrometry, digital PCR (including microfluidic digital PCR), capillary gel electrophoresis.
本节论述开始考虑胎儿分数,如从不(或经确定不)具有拷贝数变异的染色体或染色体区段的一种或多种多态性或其他信息所进行确定。通过此类技术确定的胎儿分数在此将称为非CNV胎儿分数或“NCNFF”。在本节后面的部分,描述了多种技术,用于从经确定拥有拷贝数变异的染色体或染色体区段计算胎儿分数。从此类技术确定的胎儿分数在此将称为CNV胎儿分数或“CNFF”。This section discusses the fetal fraction as determined from one or more polymorphisms or other information for chromosomes or chromosome segments that do not (or are determined not to) have a copy number variation. The fetal fraction determined by such techniques will be referred to herein as the non-CNV fetal fraction or "NCNFF." Later in this section, various techniques are described for calculating the fetal fraction from chromosomes or chromosome segments that are determined to have a copy number variation. The fetal fraction determined from such techniques will be referred to herein as the CNV fetal fraction or "CNFF."
在一些实施方案中,通过确定来源于胎儿基因组的多态性等位基因的相对贡献和来源于母体基因组的相应多态性等位基因的贡献来评估胎儿分数。在一些实施方案中,通过确定来源于胎儿基因组的多态性等位基因的相对贡献对比来源于胎儿基因组与母体基因组的相应多态性等位基因的总贡献来评估胎儿分数。In some embodiments, the fetal fraction is assessed by determining the relative contribution of a polymorphic allele derived from the fetal genome and the contribution of the corresponding polymorphic allele derived from the maternal genome. In some embodiments, the fetal fraction is assessed by determining the relative contribution of a polymorphic allele derived from the fetal genome compared to the total contribution of the corresponding polymorphic alleles derived from the fetal genome and the maternal genome.
多态性可以是指示性的,信息性的(informative),或两者。指示性多态性表明母体样品中存在胎儿无细胞DNA(“cfDNA”)。信息性多态性(例如信息性SNP)产生关于胎儿的信息,例如,疾病的存在或不存在、遗传异常、或任何其他生物信息,例如妊娠阶段或性别。在这种情况下,信息性多态性是识别母亲与胎儿的序列之间差异的那些,并且用于在此披露的方法中。换言之,信息性多态性是拥有不同序列的核酸样品(即,它们具有不同的等位基因)中的多态性,且这些序列以不同的量存在。在此的一些方法中,使用不同数量的序列/等位基因确定胎儿分数,特别是NCNFF。Polymorphisms can be indicative, informative, or both. Indicative polymorphisms indicate the presence of fetal cell-free DNA ("cfDNA") in a maternal sample. Informative polymorphisms (e.g., informative SNPs) generate information about the fetus, e.g., the presence or absence of a disease, a genetic abnormality, or any other biological information, such as gestational stage or gender. In this case, informative polymorphisms are those that identify the difference between the sequences of the mother and the fetus, and are used in the methods disclosed herein. In other words, informative polymorphisms are polymorphisms in nucleic acid samples that have different sequences (i.e., they have different alleles), and these sequences are present in different amounts. In some methods herein, different numbers of sequences/alleles are used to determine fetal fraction, particularly NCNFF.
多态位点包括但不限于单核苷酸多态性(SNP)、串联SNP、小规模多碱基缺失或插入(IN-DELS或缺失插入多态性(DIP))、多核苷酸多态性(MNP)、短串联重复片段(STR)、限制性片断长度多态性(RFLP),或染色体中拥有任何其他等位基因序列变异的任何多态性。在一些实施方案中,每个目标核酸包含两个串联SNP。串联SNP作为单一单元(例如,作为短单体型)加以分析,且在此作为具有两个SNP的多个集合而提供。Polymorphic sites include, but are not limited to, single nucleotide polymorphisms (SNPs), tandem SNPs, small-scale multi-base deletions or insertions (IN-DELS or deletion-insertion polymorphisms (DIPs)), multiple nucleotide polymorphisms (MNPs), short tandem repeats (STRs), restriction fragment length polymorphisms (RFLPs), or any polymorphism with any other allelic sequence variation in a chromosome. In some embodiments, each target nucleic acid comprises two tandem SNPs. Tandem SNPs are analyzed as a single unit (e.g., as a short haplotype) and are provided herein as multiple sets having two SNPs.
在一些实施方案中,胎儿分数是通过统计学和近似技术来确定,这些技术通过使用用来确定相对贡献的多态位点来评估胎儿和母体基因组的配型的相对贡献。还可以通过电泳法确定胎儿分数,其中将某些类型的多态位点以电泳方式分离并且用于识别来自胎儿基因组的多态性等位基因的相对贡献和来自母体基因组的相应多态性等位基因的相对贡献。In some embodiments, fetal fraction is determined by statistical and approximate techniques that assess the relative contributions of the fetal and maternal genomes using polymorphic sites to determine relative contributions. Fetal fraction can also be determined by electrophoresis, in which certain types of polymorphic sites are electrophoretically separated and used to identify the relative contribution of a polymorphic allele from the fetal genome and the relative contribution of the corresponding polymorphic allele from the maternal genome.
在图6工艺流程图所示的一个实施方案中,胎儿分数是通过方法600确定,方法600包括首先在操作610中获得包含胎儿与母体核酸的混合物的测试样品,在操作620中针对多态目标核酸富集核酸混合物,在操作630中对富集的核酸混合物进行测序,并且在操作640中同时确定样品中的胎儿分数和非整倍性。In one embodiment, shown in the process flow diagram of FIG6 , fetal fraction is determined by method 600, which includes first obtaining a test sample comprising a mixture of fetal and maternal nucleic acids in operation 610, enriching the nucleic acid mixture for polymorphic target nucleic acids in operation 620, sequencing the enriched nucleic acid mixture in operation 630, and simultaneously determining fetal fraction and aneuploidy in the sample in operation 640.
图7显示用于一些实施方案的工艺流程图。通过以下确定胎儿分数:(i) 在操作710中获得母体血浆样品,(ii)在操作720中纯化样品中的cfDNA,(iii) 在操作730中扩增多态核酸,(iv)在操作740中使用大规模平行测序方法对混合物测序,和(v)在操作760中计算胎儿分数。在另一个实施方案中,通过以下确定胎儿分数:(i)在操作710中获得母体血浆样品,(ii)在操作720中纯化样品中的cfDNA,(iii)在操作730中扩增多态核酸,(iv)在操作750中使用电泳法按照尺寸分离核酸,和(v)在操作770中计算胎儿分数。Figure 7 shows a process flow diagram for some embodiments. The fetal fraction is determined by (i) obtaining a maternal plasma sample in operation 710, (ii) purifying cfDNA in the sample in operation 720, (iii) amplifying polymorphic nucleic acids in operation 730, (iv) sequencing the mixture using a massively parallel sequencing method in operation 740, and (v) calculating the fetal fraction in operation 760. In another embodiment, the fetal fraction is determined by (i) obtaining a maternal plasma sample in operation 710, (ii) purifying cfDNA in the sample in operation 720, (iii) amplifying polymorphic nucleic acids in operation 730, (iv) separating nucleic acids by size using electrophoresis in operation 750, and (v) calculating the fetal fraction in operation 770.
在图8工艺流程图所示的一个实施方案中,通过以下确定胎儿分数:(i) 在操作810中获得包含胎儿与母体核酸的混合物的样品,(ii)在操作820中扩增样品,(iii)在操作830中通过将扩增的样品与初始混合物的未扩增样品合并来富集样品,(iv)在操作840中纯化样品,和(v)在操作850中使用不同方法对样品测序以确定胎儿分数,在860操作中同时确定胎儿分数和非整倍性的存在或不存在。In one embodiment, as shown in the process flow diagram of FIG8 , fetal fraction is determined by (i) obtaining a sample comprising a mixture of fetal and maternal nucleic acids in operation 810 , (ii) amplifying the sample in operation 820 , (iii) enriching the sample in operation 830 by combining the amplified sample with an unamplified sample of the initial mixture, (iv) purifying the sample in operation 840 , and (v) sequencing the sample using a different method to determine fetal fraction in operation 850 , and simultaneously determining the fetal fraction and the presence or absence of aneuploidy in operation 860 .
在图9工艺流程图所示的另一实施方案中,通过以下确定胎儿分数:(i) 在操作910中获得包含胎儿与母体核酸的混合物的样品,(ii)在操作920中纯化样品,(iii)在操作930中扩增样品的一部分,(iv)在操作940中通过将扩增的样品与初始混合物的初始样品的经纯化但未扩增的部分组合来富集样品,和(v)在操作950中对样品测序以确定胎儿分数,在960操作中使用不同方法同时确定胎儿分数和非整倍性的存在或不存在。In another embodiment, shown in the process flow diagram of FIG9 , fetal fraction is determined by (i) obtaining a sample comprising a mixture of fetal and maternal nucleic acids in operation 910 , (ii) purifying the sample in operation 920 , (iii) amplifying a portion of the sample in operation 930 , (iv) enriching the sample by combining the amplified sample with a purified but non-amplified portion of the initial sample of the initial mixture in operation 940 , and (v) sequencing the sample to determine fetal fraction in operation 950 , and simultaneously determining the fetal fraction and the presence or absence of aneuploidy using different methods in operation 960 .
在图10工艺流程图所示的另一个实施方案中,通过以下确定胎儿分数:(i) 在操作1010中获得包含胎儿和母体核酸的混合物的样品,(ii)在操作1020中纯化样品,(iii)在操作1040中扩增样品的第一部分,(iv)在操作1050中制备样品的经扩增部分的测序文库,(v)在操作1030中制备样品的第二个经纯化但未扩增部分的测序文库,(vi)在操作1060中通过将两个测序文库组合来富集混合物,和(vii)在操作1070中对混合物测序,在1080操作中使用不同方法同时确定胎儿分数和非整倍性的存在或不存在。In another embodiment shown in the process flow diagram of Figure 10, fetal fraction is determined by (i) obtaining a sample comprising a mixture of fetal and maternal nucleic acids in operation 1010, (ii) purifying the sample in operation 1020, (iii) amplifying a first portion of the sample in operation 1040, (iv) preparing a sequencing library of the amplified portion of the sample in operation 1050, (v) preparing a sequencing library of a second purified but unamplified portion of the sample in operation 1030, (vi) enriching the mixture by combining the two sequencing libraries in operation 1060, and (vii) sequencing the mixture in operation 1070, and simultaneously determining the fetal fraction and the presence or absence of aneuploidy using different methods in operation 1080.
在另一个实施方案中,通过以下确定胎儿分数:(i)获得包含胎儿和母体核酸的混合物的样品,(ii)纯化样品,(iii)使用经标记的引物扩增样品,和 (iv)使用电泳法对样品测序,以使用不同方法确定胎儿分数。In another embodiment, fetal fraction is determined by (i) obtaining a sample comprising a mixture of fetal and maternal nucleic acid, (ii) purifying the sample, (iii) amplifying the sample using labeled primers, and (iv) sequencing the sample using electrophoresis to determine fetal fraction using a different method.
在另一个实施方案中,通过以下确定胎儿分数:(i)获得包含胎儿和母体核酸的混合物的样品,(ii)纯化样品,(iii)通过扩增样品的一部分来任选地富集样品,和(iv)对样品测序,以使用不同方法确定胎儿分数。In another embodiment, the fetal fraction is determined by (i) obtaining a sample comprising a mixture of fetal and maternal nucleic acids, (ii) purifying the sample, (iii) optionally enriching the sample by amplifying a portion of the sample, and (iv) sequencing the sample to determine the fetal fraction using a different method.
纯化最初获得的样品、经扩增的样品、或经扩增和富集的样品、或与在此披露的方法有关的其他核酸样品(例如在操作720、840、920和1020中),可以通过任何常规技术完成。为从细胞中分离cfDNA,可以使用分级分离、离心 (例如密度梯度离心)、DNA特异性沉淀、或高通量细胞分选、和/或分离方法。任选地,所得样品可以在纯化或扩增之前片段化。如果所用样品包含cfDNA,那么可不要求片段化,因为cfDNA在性质上片段化,其中片段尺寸时常为约 150bp到200bp。Purification of the initially obtained sample, the amplified sample, or the amplified and enriched sample, or other nucleic acid samples used in connection with the methods disclosed herein (e.g., in operations 720, 840, 920, and 1020) can be accomplished by any conventional technique. To isolate cfDNA from cells, fractionation, centrifugation (e.g., density gradient centrifugation), DNA-specific precipitation, or high-throughput cell sorting and/or separation methods can be used. Optionally, the resulting sample can be fragmented prior to purification or amplification. If the sample used comprises cfDNA, fragmentation may not be required, as cfDNA is fragmented in nature, with fragment sizes often ranging from approximately 150 bp to 200 bp.
在上述一些程序中,使用选择性扩增和富集提高来自多态性所处的区域中的核酸的相对数量。类似结果可以通过对基因组的所选区域(特别是多态性所处的区域)进行深入测序来获得。In some of the above procedures, selective amplification and enrichment are used to increase the relative amount of nucleic acid from the region where the polymorphism is located. Similar results can be obtained by deep sequencing of selected regions of the genome, particularly where the polymorphism is located.
扩增Amplification
获得样品并且纯化样品之后,使用胎儿和母体核酸(例如cfDNA)的纯化混合物的一部分扩增多个多态目标核酸,每个核酸包含多态位点。扩增胎儿和母体核酸混合物中的目标核酸,在某些实现方式中,是通过使用PCR(聚合酶链式反应)或该方法的变异的任何方法(包括但不限于不对称PCR、解螺旋酶依赖性扩增、热启动PCR、qPCR、固相PCR、和降落PCR)实现。在一些实施方案中,样品可以部分地扩增以协助确定胎儿分数。在一些实施方案中,不进行扩增。在操作730、820、930和1040中可使用所披露的扩增方法及其他扩增技术。After obtaining and purifying the sample, a portion of the purified mixture of fetal and maternal nucleic acids (e.g., cfDNA) is used to amplify multiple polymorphic target nucleic acids, each nucleic acid comprising a polymorphic site. Amplifying the target nucleic acid in the mixture of fetal and maternal nucleic acids, in certain implementations, is achieved by using PCR (polymerase chain reaction) or any method of variation of the method (including but not limited to asymmetric PCR, helicase-dependent amplification, hot start PCR, qPCR, solid phase PCR, and touchdown PCR). In some embodiments, the sample can be partially amplified to assist in determining the fetal fraction. In some embodiments, amplification is not performed. The disclosed amplification methods and other amplification techniques can be used in operations 730, 820, 930, and 1040.
扩增SNPAmplification SNP
有大量的核酸引物可供用来扩增包含SNP的DNA片段,并且可以获得其序列,例如来自本领域普通技术人员所知的数据库。还可以设计另外的引物,例如使用以下文献所公开的类似方法:维克斯E.F.(Vieux,E.F.),郭P-Y(Kwok, P-Y)和米勒R.D.(Miller,R.D.),生物技术(BioTechniques)(2002年6月),第32卷,增刊:“SNP:标记物疾病的发现(SNPs:Discovery of Marker Disease)”,第28页到第32页。Numerous nucleic acid primers are available for amplifying DNA fragments containing SNPs, and their sequences are available, for example, from databases known to those skilled in the art. Additional primers can also be designed, for example, using similar methods disclosed in Vieux, E.F., Kwok, P-Y, and Miller, R.D., BioTechniques (June 2002), Vol. 32, Supplement: "SNPs: Discovery of Marker Disease," pp. 28-32.
选择序列特异性引物以扩增目标核酸。在一个实施方案中,如扩增子扩增包含多态位点的目标核酸。在另一个实施方案中,如扩增子扩增包含两个或更多个多态位点(例如两个串联SNP)的目标核酸。至少约100bp的经扩增的目标核酸扩增子包含单个或串联SNP。用于扩增包含串联SNP的目标序列的引物经设计可涵盖两个SNP位点。Sequence-specific primers are selected to amplify target nucleic acid. In one embodiment, the target nucleic acid comprising a polymorphic site is amplified as an amplicon. In another embodiment, the target nucleic acid comprising two or more polymorphic sites (e.g., two tandem SNPs) is amplified as an amplicon. The amplified target nucleic acid amplicon of at least about 100bp comprises a single or tandem SNP. The primers for amplifying a target sequence comprising a tandem SNP can be designed to encompass two SNP sites.
扩增STRAmplified STR
一些核酸引物可供用来扩增包含STR的DNA片段,并且此类序列可以从本领域一个技术人员已知的数据库获得。Nucleic acid primers are available for amplifying DNA fragments containing STRs, and such sequences can be obtained from databases known to one skilled in the art.
在一些实施方案中,使用胎儿和母体核酸混合物的一部分作为用于扩增具有至少一个STR的目标核酸的模板。关于STR、所公开的PCR引物、常见多重系统和相关种群数据的参考文献、论据和序列信息的综合性目录汇编于 STRBase中,该STRBase可经由因特网在cstl.nist.gov/strbase处进行访问。来自在ncbi.nlm.nih.gov/genbank的的、针对常用STR基因座的序列信息通过STRBase也是可访问的。In some embodiments, a portion of the fetal and maternal nucleic acid mixture is used as a template for amplifying a target nucleic acid having at least one STR. A comprehensive directory of references, literature, and sequence information on STRs, published PCR primers, common multiplex systems, and related population data is compiled in STRBase, which is accessible via the Internet at cstl.nist.gov/strbase. Sequence information for commonly used STR loci from ncbi.nlm.nih.gov/genbank is also accessible through STRBase.
STR多重系统允许在单一反应中同时扩增多个不重叠的基因座,从而实质上提高通量。因为STR的多态性高,所以大部分个体是杂合型。STR可用于如下文进一步描述的电泳分析中。STR multiplexing systems allow for the simultaneous amplification of multiple, non-overlapping loci in a single reaction, substantially increasing throughput. Because STRs are highly polymorphic, most individuals are heterozygous. STRs can be used in electrophoretic analysis as described further below.
还可以使用miniSTRs进行扩增以产生尺寸减小的扩增子,从而辨别在长度上更短的STR等位基因。所披露实施方案的方法涵盖确定已富集目标核酸的母体样品中的胎儿核酸分数,目标核酸各自包含一个miniSTR,该方法包括量化位于一个多态性miniSTR的至少一个胎儿和一个母体等位基因,其可以扩增以产生长度约为循环胎儿DNA片段的尺寸的扩增子。任一对miniSTR引物或两对或更多对miniSTR引物的组合可用于扩增至少一个miniSTR。MiniSTRs can also be used for amplification to produce amplicons of reduced size, thereby discerning STR alleles that are shorter in length. Methods of the disclosed embodiments encompass determining the fraction of fetal nucleic acid in a maternal sample that has been enriched for target nucleic acids, each of which comprises a miniSTR, the method comprising quantifying at least one fetal and one maternal allele located at a polymorphic miniSTR that can be amplified to produce amplicons approximately the size of circulating fetal DNA fragments. Any pair of miniSTR primers or a combination of two or more pairs of miniSTR primers can be used to amplify at least one miniSTR.
富集Enrichment
加以富集的样品可包括:血液样品的血浆分离部分;从血浆中提取出的经纯化cfDNA的样品;从胎儿和母体核酸的经纯化混合物制备的测序文库样品;等等。The samples that are enriched may include: the plasma fraction of a blood sample; a sample of purified cfDNA extracted from plasma; a sequencing library sample prepared from a purified mixture of fetal and maternal nucleic acids; and the like.
在某些实施方案中,在对全基因组测序之前,针对全基因组非特异性富集包含DNA分子混合物的样品,即,在测序之前,进行全基因组扩增。非特异性富集核酸混合物是指对DNA样品的基因组DNA片段进行全基因组扩增该 DNA样品可用于在通过测序识别多态性之前提高样品DNA的水平。非特异性富集可以是样品中存在的两个基因组(胎儿和母体)之一的选择性富集。In certain embodiments, before full genome sequencing, for full genome non-specific enrichment comprising a sample of a DNA molecule mixture, that is, before sequencing, full genome amplification is performed. Non-specific enrichment nucleic acid mixture refers to that the genomic DNA fragments of a DNA sample are subjected to full genome amplification. This DNA sample can be used to improve the level of sample DNA before identifying polymorphisms by sequencing. Non-specific enrichment can be the selective enrichment of one of the two genomes (fetus and mother) present in the sample.
在其他实施方案中,样品中的cfDNA经特异性富集。特异性富集是指基因组样品针对特定序列(例如多态性目标序列)的富集,其通过包括特异性扩增目标核酸序列的方法完成,目标核酸序列包含多态位点。In other embodiments, the cfDNA in the sample is specifically enriched. Specific enrichment refers to the enrichment of a genomic sample for a specific sequence (e.g., a polymorphic target sequence) by a method that includes specifically amplifying a target nucleic acid sequence that contains a polymorphic site.
在其他实施方案中,存在于样品中的核酸混合物是针对各自包含多态位点的多态目标核酸加以富集。在操作620中可使用此类富集。富集胎儿和母体核酸的混合物包括,从最初母体样品所包含的核酸的一部分中扩增目标序列,并且将部分或整个扩增产物与最初母体样品的剩余部分组合,例如在操作830和 940中。In other embodiments, the mixture of nucleic acids present in the sample is enriched for polymorphic target nucleic acids, each of which contains a polymorphic site. Such enrichment can be used in operation 620. Enriching the mixture of fetal and maternal nucleic acids includes amplifying the target sequence from a portion of the nucleic acids contained in the original maternal sample and combining part or all of the amplified product with the remainder of the original maternal sample, for example, in operations 830 and 940.
在又一个实施方案中,加以富集的样品是由胎儿和母体核酸的纯化混合物制备的测序文库样品。选择用于富集初始样品的扩增产物的量以获得足以用于确定胎儿分数的序列信息。从测序获得的序列标签的总数中至少约3%、至少约5%、至少约7%、至少约10%、至少约15%、至少约20%、至少约25%、至少约30%或更多被映射以确定胎儿分数。In yet another embodiment, the enriched sample is a sequencing library sample prepared from a purified mixture of fetal and maternal nucleic acids. The amount of amplification product used to enrich the initial sample is selected to obtain sufficient sequence information for determining fetal fraction. At least about 3%, at least about 5%, at least about 7%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, or more of the total number of sequence tags obtained from sequencing are mapped to determine fetal fraction.
在一个实施方案中,在图10中,富集包括在操作1040中将胎儿和母体核酸的纯化混合物的初始样品(例如,已从母体血浆样品中纯化的cfDNA)的一部分中所包含的目标核酸扩增。类似地,在操作1050中,使用经纯化但未扩增的cfDNA的一部分制备初级测序文库。在操作1060中,将目标文库的一部分与由未扩增的核酸混合物产生的初级文库结合,并且在操作1070中对两个文库中所包含的胎儿和母体核酸混合物进行测序。富集的文库可包括目标文库的至少约5%、至少约10%、至少约15%、至少约20%或至少约25%。在操作 1080中,对来自测序轮次的数据进行分析,并且如图6所描绘的实施方案的操作640中所述,同时确定胎儿分数和非整倍性的存在或不存在。In one embodiment, in FIG10 , enrichment includes amplifying target nucleic acids contained in a portion of an initial sample of a purified mixture of fetal and maternal nucleic acids (e.g., cfDNA purified from a maternal plasma sample) in operation 1040. Similarly, in operation 1050, a primary sequencing library is prepared using a portion of the purified but unamplified cfDNA. In operation 1060, a portion of the target library is combined with a primary library generated from the unamplified nucleic acid mixture, and the mixture of fetal and maternal nucleic acids contained in the two libraries is sequenced in operation 1070. The enriched library may include at least about 5%, at least about 10%, at least about 15%, at least about 20%, or at least about 25% of the target library. In operation 1080, the data from the sequencing rounds are analyzed and the fetal fraction and the presence or absence of aneuploidy are simultaneously determined as described in operation 640 of the embodiment depicted in FIG6 .
测序技术Sequencing technology
对富集的胎儿和母体核酸混合物进行测序。为确定胎儿分数所必需的序列信息可以使用任何已知的DNA测序方法获得,其中很多方法已在本申请的其他地方说明。此类测序方法包括下一代测序法(NGS)、桑格尔测序法(Sanger sequencing)、海里科思真正的单分子测序法(Helicos True Single Molecule Sequencing)(tSMSTM)、454测序法(Roche)、SOLiD技术(应用生物系统)、单分子实时(SMRTTM)、测序技术(太平洋生物科学)、纳米孔测序法、化学敏感性场效应晶体管(chemFET)阵列、使用透射电子显微术(TEM)的霍尔康分子法(Halcyon Molecular’s method)、离子流单分子测序法、杂交测序法等等。在某些实施方案中,采用大规模平行测序法。在一个实施方案中,使用伊鲁米纳合成法测序和基于可逆终止子的测序化学技术。在某些实施方案中,使用部分测序法。The enriched fetal and maternal nucleic acid mixture is sequenced. Any known DNA sequencing method can be used to obtain the sequence information necessary for determining the fetal fraction, many of which are described elsewhere in this application. Such sequencing methods include next-generation sequencing (NGS), Sanger sequencing, Helicos True Single Molecule Sequencing (tSMS ™ ), 454 sequencing (Roche), SOLiD technology (Applied Biosystems), single-molecule real-time (SMRT ™ ), sequencing technology (Pacific Biosciences), nanopore sequencing, chemically sensitive field-effect transistor (chemFET) arrays, Halcyon Molecular's method using transmission electron microscopy (TEM), ion current single-molecule sequencing, hybridization sequencing, etc. In certain embodiments, a massively parallel sequencing method is used. In one embodiment, Illumina synthesis sequencing and sequencing chemistry based on reversible terminators are used. In certain embodiments, a partial sequencing method is used.
所测序的DNA映射到参考基因组。参考基因组可为人工基因组或可为人类参照序列基因组。此类参考基因组包括:包含多态目标核酸序列的人工目标序列基因组;人工SNP参考基因组;人工STR参考基因组;人工串联STR参考基因组;人类参照序列基因组NCBI36/hg18序列,其在因特网 genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105可获得;以及包括目标多态序列的人类参照序列基因组NCBI36/hg18序列和人工目标序列基因组,例如SNP基因组。在映射过程中允许存在某些错配。The sequenced DNA is mapped to a reference genome. The reference genome can be an artificial genome or a human reference sequence genome. Such reference genomes include: artificial target sequence genomes containing polymorphic target nucleic acid sequences; artificial SNP reference genomes; artificial STR reference genomes; artificial tandem STR reference genomes; the human reference sequence genome NCBI36/hg18 sequence, available online at genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105; and the human reference sequence genome NCBI36/hg18 sequence containing target polymorphic sequences and artificial target sequence genomes, such as SNP genomes. Some mismatches are tolerated during the mapping process.
在一个实施方案中,对在操作630中获得的测序信息进行分析并且同时作出确定,确定胎儿分数和确定非整倍性的存在或不存在。In one embodiment, the sequencing information obtained in operation 630 is analyzed and a determination is made simultaneously to determine the fetal fraction and to determine the presence or absence of aneuploidy.
如上文所说明,每种样品获得多个序列标签。在某些实施方案中,利用读数映射到参考基因组,每种样品获得至少约3x 106个序列标签、至少约5x 106个序列标签、至少约8x106个序列标签、至少约10x 106个序列标签、至少约 15x 106个序列标签、至少约20x 106个序列标签、至少约30x 106个序列标签、至少约40x 106个序列标签、或至少约50x 106个序列标签,这些序列标签包含20bp与40bp之间的读数。在一个实施方案中,所有序列读数映射到参考基因组的所有区域。在一个实施方案中,对包含已映射到人类参照序列基因组的所有区域(例如所有染色体)的读数的标签进行计数,并且在混合的DNA 样品中确定胎儿非整倍性,即,感兴趣的序列(例如染色体或其一部分)的过度代表或代表不足,并且对包含映射到人工目标序列基因组的读数的标签进行计数以确定胎儿分数。该方法不要求在母体基因组与胎儿基因组之间作出区分。As described above, every kind of sample obtains multiple sequence tags.In certain embodiments, utilize reading to be mapped to reference genome, every kind of sample obtains at least about 3x 106 sequence tags, at least about 5x 106 sequence tags, at least about 8x106 sequence tags, at least about 10x 106 sequence tags, at least about 15x 106 sequence tags, at least about 20x 106 sequence tags, at least about 30x 106 sequence tags, at least about 40x 106 sequence tags or at least about 50x 106 sequence tags, these sequence tags comprise the reading between 20bp and 40bp.In one embodiment, all sequence reads are mapped to all regions of reference genome. In one embodiment, tags comprising reads mapped to all regions of a human reference sequence genome (e.g., all chromosomes) are counted, and fetal aneuploidy, i.e., over-representation or under-representation of a sequence of interest (e.g., a chromosome or a portion thereof), is determined in a mixed DNA sample, and tags comprising reads mapped to an artificial target sequence genome are counted to determine the fetal fraction. The method does not require a distinction to be made between the maternal genome and the fetal genome.
在一个实施方案中,对来自测序轮次的数据进行分析并且同时确定胎儿分数,以及存在或不存在非整倍性。In one embodiment, data from sequencing rounds are analyzed and the fetal fraction, and the presence or absence of aneuploidy are determined simultaneously.
测序文库Sequencing library
在一些实施方案中,使用所扩增的多态序列的一部分或全部来制备用于以所述平行方式测序的测序文库。在一个实施方案中,制备文库以便使用伊鲁米纳基于可逆终止子的测序化学技术进行合成法测序。可以从纯化的cfDNA制备文库并且包括至少约10%、至少约15%、至少约20%、至少约25%、至少约30%、至少约35%、至少约40%、至少约45%、或至少约50%的扩增产物。In some embodiments, a portion or all of the amplified polymorphic sequences are used to prepare a sequencing library for sequencing in the parallel manner. In one embodiment, the library is prepared for sequencing by synthesis using Illumina's reversible terminator-based sequencing chemistry. The library can be prepared from purified cfDNA and include at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, or at least about 50% amplification products.
对通过图11所描绘的任一种方法产生的文库进行测序,提供了来源于扩增的目标核酸的序列标签和来源于最初未扩增的母体样品的标签。胎儿分数是从映射到人工参考基因组的标签数目来计算。Sequencing the libraries generated by any of the methods depicted in Figure 11 provides sequence tags derived from the amplified target nucleic acid and tags derived from the original unamplified maternal sample. Fetal fraction is calculated from the number of tags that map to the artificial reference genome.
计算胎儿分数Calculating fetal fraction
如所解释,对有关DNA进行测序之后,可利用计算方法将序列映射或比对到具体基因、染色体、等位基因、或其他结构上。存在多种用于比对序列的计算机算法,包括但不限于BLAST(奥茨秋(Altschul)等人,1990)、BLITZ (MPsrch)(斯特罗科和柯林斯(Sturrock&Collins),1993)、FASTA(普尔逊和李普曼(Pearson&Lipman),1988)、BOWTIE(郎格米(Langmead)等人,基因组生物学(Genome Biology)10:R25.1-R25.10[2009])、或ELAND(伊鲁米纳公司,圣地亚哥,CA,美国(Illumina,Inc.,San Diego,CA,USA))。在一些实施方案中,数据箱序列发现于本领域那些技术人员已知的核酸数据库中,包括GenBank、dbEST、dbSTS、EMBL(欧洲分子生物学实验室)和DDBJ (日本DNA数据库)。可利用BLAST或类似工具对照序列数据库搜索所识别的序列,并且可利用搜索命中将所识别的序列分类成适当的数据箱。可替代地,可采用布隆过滤器(Bloom filter)或类似的集合成员测试器(setmembership tester)将读数与参考基因组比对。参见2011年10月27日提交的美国专利申请号61/552,374,该申请以其全文通过引用结合在此。As explained, after sequencing the DNA of interest, computational methods can be used to map or align the sequence to a specific gene, chromosome, allele, or other structure. There are a variety of computer algorithms for aligning sequences, including but not limited to BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Pearson & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10: R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, CA, USA). In some embodiments, the data bin sequences are found in nucleic acid databases known to those skilled in the art, including GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Database of Japan). The sequence that can utilize BLAST or similar tool to control sequence database search is identified, and can utilize search hit to classify the sequence that is identified into suitable data box.Alternatively, can adopt Bloom filter (Bloom filter) or similar set member tester (setmembership tester) by reading and reference genome comparison.Referring to U.S. patent application number 61/552,374 that submits on October 27, 2011, this application is combined in this by reference in its full text.
如所提及,根据一些实施方案(特别是NCNFF技术)确定胎儿分数是基于映射到第一等位基因的标签总数和映射到第二等位基因的总数,第二等位基因位于参考基因组所包含的信息性多态位点(例如SNP)。信息性多态位点是通过等位基因序列的差异和每个可能等位基因的数量加以识别。胎儿cfDNA 经常以<10%母体cfDNA的浓度存在。因此,相对于母体等位基因的主要贡献,存在可以分配给胎儿的、胎儿和母体核酸混合物的等位基因的次要贡献。来源于母体基因组的等位基因在此称为主等位基因,并且来源于胎儿基因组的等位基因在此称为次等位基因。用所映射的序列标签的类似水平表示的等位基因代表母体等位基因。对包含来源于母体血浆样品的SNP的目标核酸进行示例性多重扩增的结果显示于图12中。As mentioned, according to some embodiments (particularly NCNFF technology), determining the fetal fraction is based on the total number of tags mapped to the first allele and the total number mapped to the second allele, the second allele being located at an informative polymorphic site (e.g., SNP) contained in the reference genome. Informative polymorphic sites are identified by the difference in allele sequence and the number of each possible allele. Fetal cfDNA is often present at a concentration of <10% of maternal cfDNA. Therefore, relative to the major contribution of the maternal allele, there is a minor contribution of alleles that can be assigned to the fetus, the mixture of fetal and maternal nucleic acids. The alleles derived from the maternal genome are referred to herein as the major alleles, and the alleles derived from the fetal genome are referred to herein as the minor alleles. Alleles represented by similar levels of the mapped sequence tags represent maternal alleles. The results of an exemplary multiplex amplification of target nucleic acids containing SNPs derived from maternal plasma samples are shown in Figure 12.
在这里,术语“染色体性非整倍性”和“完整染色体性非整倍性”在此是指由损失或获得整个染色体而引起的遗传物质的不平衡,并且包括种系非整倍性和嵌合性非整倍性。术语“部分非整倍性”和“部分染色体性非整倍性”在此是指由损失或获得染色体的一部分(例如,部分单体性和部分三体性)而引起的遗传物质的不平衡,并且涵盖由易位、缺失和插入引起的不平衡。Herein, the terms "chromosomal aneuploidy" and "complete chromosomal aneuploidy" refer to an imbalance of genetic material caused by the loss or gain of an entire chromosome, and include germline aneuploidy and mosaic aneuploidy. The terms "partial aneuploidy" and "partial chromosomal aneuploidy" refer to an imbalance of genetic material caused by the loss or gain of a portion of a chromosome (e.g., partial monosomy and partial trisomy), and encompass imbalances caused by translocations, deletions, and insertions.
使用等位基因比率估计胎儿分数Estimating fetal fraction using allele ratios
对于预定多态位点处的两个等位基因中的每一者而言,胎儿cfDNA在母体样品中的相对丰度可以被确定,作为映射到参考基因组上的目标核酸序列的独特序列标签的总数的参数。在一个实施方案中,针对每个信息性等位基因(等位基因x)如下计算胎儿和母体核酸混合物中胎儿核酸的分数:For each of the two alleles at a predetermined polymorphic site, the relative abundance of fetal cfDNA in the maternal sample can be determined as a parameter of the total number of unique sequence tags mapped to the target nucleic acid sequence on the reference genome. In one embodiment, the fraction of fetal nucleic acid in the mixture of fetal and maternal nucleic acids is calculated as follows for each informative allele (allele x):
并且计算针对样品的胎儿分数,作为所有信息性等位基因的胎儿分数平均值。任选地,针对每个信息性等位基因(等位基因x),如下计算胎儿和母体核酸混合物中胎儿核酸的分数:And calculate the fetal fraction for the sample as the fetal fraction average of all informative alleles. Optionally, for each informative allele (allele x), the fraction of fetal nucleic acid in the mixture of fetal and maternal nucleic acid is calculated as follows:
为了补偿两个胎儿等位基因的存在,一个被母体背景遮蔽。To compensate for the presence of two fetal alleles, one was obscured by the maternal background.
通过对预先确定的多态序列进行测序来确定胎儿分数Determine fetal fraction by sequencing predetermined polymorphic sequences
关于通过对预先确定的多态序列进行测序来确定胎儿分数的更多细节提供如下。Further details regarding determining fetal fraction by sequencing predetermined polymorphic sequences are provided below.
参见图7,操作720、730、740以及760展示通过对经过PCR扩增的多态目标核酸进行大规模平行测序来确定一个母体生物样品中的胎儿核酸的分数的一个工艺流程。在步骤720中,从一个受试者获得包含胎儿和母体核酸的混合物的母体样品。该样品是从一个怀孕女性(例如孕妇)获得的母体样品。其他母体样品可以来自于哺乳动物,例如母牛、马、狗或猫。如果受试者是人类,那么样品可以在妊娠的第一个或第二个三月期获取。任何母体生物样品都可以用作包含在细胞中或无细胞的胎儿和母体核酸的来源。在某些实施方案中,有利的是获得包含无细胞核酸(cfDNA)的母体样品。优选地,该母体生物样品是生物学流体样品。优选地,该母体样品是选自血液、血浆、血清、尿以及唾液的孕妇样品。在某些实施方案中,该母体样品是血浆样品。Referring to Figure 7, operations 720, 730, 740, and 760 illustrate a process flow for determining the fraction of fetal nucleic acid in a maternal biological sample by massively parallel sequencing of polymorphic target nucleic acids amplified by PCR. In step 720, a maternal sample comprising a mixture of fetal and maternal nucleic acids is obtained from a subject. The sample is a maternal sample obtained from a pregnant woman (e.g., a pregnant woman). Other maternal samples can come from mammals, such as cows, horses, dogs, or cats. If the subject is a human, the sample can be obtained during the first or second trimester of pregnancy. Any maternal biological sample can be used as a source of fetal and maternal nucleic acids contained in cells or without cells. In certain embodiments, it is advantageous to obtain a maternal sample comprising cell-free nucleic acid (cfDNA). Preferably, the maternal biological sample is a biological fluid sample. Preferably, the maternal sample is a pregnant sample selected from blood, plasma, serum, urine, and saliva. In certain embodiments, the maternal sample is a plasma sample.
在步骤720中,胎儿和母体核酸的混合物从例如血浆等样品部分进一步处理,以获得包含胎儿和母体核酸(例如cfDNA)的纯化混合物的样品。用于处理母体样品的方法在本文其他地方描述。In step 720, the mixture of fetal and maternal nucleic acids is further processed from a sample portion, such as plasma, to obtain a sample comprising a purified mixture of fetal and maternal nucleic acids (e.g., cfDNA). Methods for processing maternal samples are described elsewhere herein.
在步骤730中,胎儿和母体cfDNA的纯化混合物的一部分用于扩增多个多态目标核酸,每一个多态目标核酸都包含一个多态位点。在某些实施方案中,这些目标核酸各自包括SNP。在其他实施方案中,这些目标核酸各自包含一对串联SNP。在另外的其他实施方案中,每个目标核酸都包含STR。目标核酸中所包含的多态位点包括而不限于单核苷酸多态性(SNP)、串联SNP、小规模多碱基缺失或插入(称为IN-DELS,也称为缺失插入多态性或DIP)、多核苷酸多态性(MNP)、短串联重复片段(STR)、限制性片断长度多态性(RFLP),或包括染色体中任何其他序列变化的多态性。在某些实施方案中,该方法所涵盖的多态位点位于常染色体上,由此能够确定与胎儿性别无关的胎儿分数。与除染色体13、18、21以及Y以外的染色体相关联的多态性也可以用于在此描述的方法中。In step 730, a portion of the purified mixture of fetal and maternal cfDNA is used to amplify multiple polymorphic target nucleic acids, each of which contains a polymorphic site. In certain embodiments, each of these target nucleic acids includes a SNP. In other embodiments, each of these target nucleic acids includes a pair of tandem SNPs. In other embodiments, each target nucleic acid includes an STR. The polymorphic sites contained in the target nucleic acid include, but are not limited to, single nucleotide polymorphisms (SNPs), tandem SNPs, small-scale multi-base deletions or insertions (called IN-DELS, also known as deletion insertion polymorphisms or DIPs), multinucleotide polymorphisms (MNPs), short tandem repeats (STRs), restriction fragment length polymorphisms (RFLPs), or polymorphisms including any other sequence changes in chromosomes. In certain embodiments, the polymorphic sites covered by the method are located on autosomes, thereby enabling the determination of fetal fractions that are unrelated to fetal sex. Polymorphisms associated with chromosomes other than chromosomes 13, 18, 21, and Y can also be used in the methods described herein.
多态性可以是指示性的,信息性的,或两者。指示性多态性表明母体样品中存在胎儿无细胞DNA。举例来说,具体的遗传序列(例如SNP)越多,一种方法就越容易将其存在转化成具体的色彩强度、色彩密度或可检测并且可测量并且表明具体的DNA区段和/或具体的多态性(例如胚胎的SNP)的存在、不存在以及量的某些其他性质。关于本发明,这些方法不是使用一个基因组中的所有可能的SNP进行,而是使用预先选定的很可能识别出母亲与胎儿之间的序列差异的多态性(即信息性多态性)来进行。信息性多态位点通过等位基因的序列的差异和可能的等位基因中的每一个的量来识别。通过在此描述的测序方法产生的读数所涵盖的任何多态位点都可以用于确定胎儿分数。In some embodiments, the present invention provides the method for the present invention.Polymorphism can be indicative, informative, or both.Indicative polymorphism shows that there is fetal cell-free DNA in the maternal sample.For example, concrete genetic sequence (such as SNP) is more many, and a method is more easy to its existence and is converted into concrete color intensity, color density or can detect and measure and show the existence of concrete DNA segment and/or concrete polymorphism (such as embryonic SNP), do not exist and some other properties of amount.About the present invention, these methods are not to use all possible SNPs in a genome to carry out, but to use the polymorphism (i.e. informative polymorphism) that is likely to identify the sequence difference between mother and fetus in advance to carry out.Informative polymorphic sites are identified by the amount of each in the difference of allelic sequence and possible allele.Any polymorphic site contained in the reading produced by sequencing method described herein can be used to determine fetal fraction.
使用样品中胎儿和母体核酸(例如cfDNA)混合物的一部分被用作对包含至少一个SNP的目标核酸进行扩增的模板。在某些实施方案中,每一个目标核酸都包括单个(即一个)SNP。包含SNP的目标核酸序列可以从可公开访问的数据库获得,这些数据库包括但不限于万维网地址是wi.mit.edu的人类 SNP数据库、万维网地址是ncbi.nlm.nih.gov的NCBIdbSNP主页、万维网地址lifesciences.perkinelmer.com、万维网地址是appliedbiosystems.com的Life TechnologiesTM(加利福尼亚州卡尔斯巴德市(Carlsbad,CA))的应用生物系统(Applied Biosystems)、万维网地址是celera.com的Celera人类SNP数据库、万维网地址是gan.iarc.fr的基因组分析组(GAN)的SNP数据库。在一个实施方案中,选择用来富集胎儿和母体cfDNA的SNP是选自帕克斯(Pakstis) 等人(帕克斯等人,人类遗传学(Hum Genet)127:315-324[2010])描述的92 个个别识别SNP(IISNP)的群组,这些SNP已经显示遍及群体在频率上具有非常小的变化(Fst<0.06)并且在全世界是具有高度信息性,平均杂合性≥0.4。本发明方法所涵盖的SNP包括连接和未连接的SNP。可应用或适用于在此描述的方法的其他可用SNP披露于美国专利申请号20080070792、20090280492、20080113358、20080026390、20080050739、20080220422以及20080138809 中,这些专利申请通过引用以其全文结合于此。每一个目标核酸包含至少一个多态位点,例如单SNP,该多态位点不同于在另一个目标核酸上存在的多态位点,从而产生含有足够数目的多态位点的一组多态位点,例如SNP,其中至少 1个、至少2个、至少3个、至少4个、至少5个、至少6个、至少7个、至少8个、至少9个、至少10个、至少11个、至少12个、至少13个、至少14 个、至少15个、至少16个、至少17个、至少18个、至少19个、至少20个、至少25个、至少30个、至少40个或更多个是信息性的。举例来说,一组SNP 可以被配置为包含至少一个信息性SNP。在一个实施方案中,目标是进行扩增的SNP是选自rs560681、rs1109037、rs9866013、rs13182883、rs13218440、 rs7041158、rs740598、rs10773760、rs4530059、rs7205345、rs8078417、rs576261、 rs2567608、rs430046、rs9951171、rs338882、rs10776839、rs9905977、rs1277284、 rs258684、rs1347696、rs508485、rs9788670、rs8137254、rs3143、rs2182957、rs3739005以及rs530022。在一个实施方案中,该组SNP包括至少3个、至少 5个、至少10个、至少13个、至少15个、至少20个、至少25个、至少30 个或更多个SNP。在一个实施方案中,该组SNP包括rs560681、rs1109037、 rs9866013、rs13182883、rs13218440、rs7041158、rs740598、rs10773760、rs4530059、 rs7205345、rs8078417、rs576261以及rs2567608。包含SNP的多态核酸可以使用在实例24中提供并且披露为SEQ ID NOs:63-118的例示性引物对来扩增。In certain embodiments, the target nucleic acid sequence comprising at least one SNP is amplified using a portion of a mixture of fetus and maternal nucleic acid (e.g., cfDNA) in a sample. In certain embodiments, each target nucleic acid includes a single (i.e., one) SNP. The target nucleic acid sequence comprising SNP can be obtained from publicly accessible databases, including but not limited to the human SNP database at wi.mit.edu, the NCBIdbSNP homepage at ncbi.nlm.nih.gov, the Applied Biosystems (Applied Biosystems) at Life Technologies ™ (Carlsbad, CA) at appliedbiosystems.com, the Celera human SNP database at celera.com, the SNP database at the Genome Analysis Group (GAN) at gan.iarc.fr. In one embodiment, the SNPs selected for enrichment of fetal and maternal cfDNA are selected from a group of 92 individually identified SNPs (IISNPs) described by Pakstis et al. (Pakstis et al., Hum Genet 127:315-324 [2010]), which have been shown to have very little variation in frequency across populations ( Fst <0.06) and are highly informative worldwide, with an average heterozygosity of ≥0.4. The SNPs encompassed by the methods of the present invention include both linked and unlinked SNPs. Other available SNPs that may be applied or adapted for use in the methods described herein are disclosed in U.S. Patent Application Nos. 20080070792, 20090280492, 20080113358, 20080026390, 20080050739, 20080220422, and 20080138809, which are incorporated herein by reference in their entirety. Each target nucleic acid comprises at least one polymorphic site, e.g., a single SNP, that is different from a polymorphic site present in another target nucleic acid, thereby generating a set of polymorphic sites, e.g., SNPs, containing a sufficient number of polymorphic sites, e.g., SNPs, of which at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 25, at least 30, at least 40 or more are informative. For example, a set of SNPs can be configured to comprise at least one informative SNP. In one embodiment, the SNP targeted for amplification is selected from rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, rs9788670, rs8137254, rs3143, rs2182957, rs3739005, and rs530022. In one embodiment, the set of SNPs includes at least 3, at least 5, at least 10, at least 13, at least 15, at least 20, at least 25, at least 30 or more SNPs. In one embodiment, the set of SNPs includes rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, and rs2567608. Polymorphic nucleic acids comprising SNPs can be amplified using the exemplary primer pairs provided in Example 24 and disclosed as SEQ ID NOs: 63-118.
在其他实施方案中,每一个目标核酸包含两个或更多个SNP,即每一个目标核酸包含串联SNP。优选地,每一个目标核酸包含两个串联SNP。串联 SNP作为单一单元(例如,作为短单体型)加以分析,并且在此作为具有两个 SNP的多个集合而提供。为识别出适合的串联SNP序列,可以搜索国际人类基因组单体型图团体(International HapMap Consortium)数据库(国际人类基因组单体型图计划(The International HapMap Project),自然(Nature) 426:789-796[2003])。该数据库可以在万维网上hapmap.org处获得。在一个实施方案中,目标用于进行扩增的串联SNP是选自串联SNP对的以下集合: rs7277033-rs2110153;rs2822654-rs1882882;rs368657-rs376635; rs2822731-rs2822732;rs1475881-rs7275487;rs1735976-rs2827016; rs447340-rs2824097;rs418989-rs13047336;rs987980-rs987981;rs4143392- rs4143391;rs1691324-rs13050434;rs11909758-rs9980111;rs2826842-rs232414; rs1980969-rs1980970;rs9978999-rs9979175;rs1034346-rs12481852; rs7509629-rs2828358;rs4817013-rs7277036;rs9981121-rs2829696; rs455921-rs2898102;rs2898102-rs458848;rs961301-rs2830208; rs2174536-rs458076;rs11088023-rs11088024;rs1011734-rs1011733;rs2831244-rs9789838;rs8132769-rs2831440;rs8134080-rs2831524; rs4817219-rs4817220;rs2250911-rs2250997;rs2831899-rs2831900; rs2831902-rs2831903;rs11088086-rs2251447;rs2832040-rs11088088; rs2832141-rs2246777;rs2832959-rs9980934;rs2833734-rs2833735; rs933121-rs933122;rs2834140-rs12626953;rs2834485-rs3453; rs9974986-rs2834703;rs2776266-rs2835001;rs1984014-rs1984015; rs7281674-rs2835316;rs13047304-rs13047322;rs2835545-rs4816551;rs2835735-rs2835736;rs13047608-rs2835826;rs2836550-rs2212596; rs2836660-rs2836661;rs465612-rs8131220;rs9980072-rs8130031; rs418359-rs2836926;rs7278447-rs7278858;rs385787-rs367001; rs367001-rs386095;rs2837296-rs2837297;以及rs2837381-rs4816672。In other embodiments, each target nucleic acid comprises two or more SNPs, and each target nucleic acid comprises series connection SNPs.Preferably, each target nucleic acid comprises two series connection SNPs. Series connection SNP is analyzed as a single unit (for example, as short haplotype), and is provided as a plurality of sets with two SNPs at this. For identifying applicable series connection SNP sequences, international human genome haplotype graph group (International HapMap Consortium) database (international human genome haplotype graph project (The International HapMap Project), nature (Nature) 426:789-796[2003]) can be searched. This database can be obtained at hapmap.org on the world wide web. In one embodiment, the tandem SNPs targeted for amplification are selected from the following set of tandem SNP pairs: rs7277033-rs2110153; rs2822654-rs1882882; rs368657-rs376635; rs2822731-rs2822732; rs1475881-rs7275487; rs1735976-rs2827016; rs447340-rs2824097; rs418989-rs13047336; rs987980-rs987981; rs414392-rs2822731 rs4143391; rs1691324-rs13050434; rs11909758-rs9980111; rs2826842-rs232414; rs1980969-rs1980970; rs9978999-rs9979175; rs1034346-rs12481852; rs7509629-rs2828358; rs4817013-rs7277036; rs9981121-rs2829696; rs455921-rs2898102; rs2898102-rs458848; rs961301-rs2830208; rs2174536-rs458076; rs11088023-rs11088024; rs1011734-rs1011733; rs2831244-rs9789838; rs8132769-rs2831440; rs8134080-rs2831524; rs4817219-rs4817220; rs2250911-rs2250997; rs2831899-rs2831900; rs2831902-rs2831903; rs11088086-rs2251447; rs2832040-rs11088088; rs2832141-rs2246777; rs2832959-rs9980934; rs2833734-rs2833735; rs933121-rs933122; rs2834140-rs12626953; rs2834485-rs3453; rs9974986-rs2834703; rs2776266-rs2835001; rs1984014-rs1984015; rs7281674-rs2835316; rs13047304-rs13047322; rs2835545-rs4816551; rs2835735-rs2835736; rs13047608-rs2835826; rs2836550-rs2212596; rs2836660-rs2836661; rs465612-rs8131220; rs9980072-rs8130031; rs418359-rs2836926; rs7278447-rs7278858; rs385787-rs367001; rs367001-rs386095; rs2837296-rs2837297; and rs2837381-rs4816672.
在一个实施方案中,使用样品中胎儿和母体核酸(例如cfDNA)混合物的一部分作为用于对包含至少一个STR的目标核酸进行扩增的模板。在某些实施方案中,每一个目标核酸包括单(即一个)SNP。STR基因座在基因组中几乎每个染色体上都可以找到并且可以使用多种聚合酶链反应(PCR)引物进行扩增。四核苷酸重复片段由于在PCR扩增中的保真度而在法医科学家当中为优选,不过也使用某些三核苷酸和五核苷酸重复片段。有关STR、公开的PCR引物、常用多重系统以及相关群体数据的参考、事实以及序列信息的明细表编辑在STRBase中,STRBase可以通过万维网 ibm4.carb.nist.gov:8800/dna/home.htm访问。来自 (http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank)的关于常用STR基因座的序列信息也可以通过STRBase获得。可用于分析STR基因座的商业试剂盒通常提供全部必要的反应组分和扩增所需要的对照。STR多重系统允许在单个反应中同时扩增多个不重叠的基因座,这实质性地增加了通过量。使用多色荧光检测,甚至重叠的基因座也可以多重进行。遍及人类基因组广布的串联重复DNA序列的多态性使这些序列成为重要的遗传标记物,用于基因定位研究、连接分析以及人类识别测试。因为STR的多态性高,所以大多数个体将是杂合型,即,大多数人拥有两个等位基因(版本)——一个由每个亲代遗传而来——每个具有不同的重复数目。包含STR的PCR产物可以使用人工、半自动化或自动化方法来分离和检测。半自动化系统是基于凝胶的,并且将电泳、检测以及分析组合成一个单元。在半自动式系统上,凝胶装配和样品加载仍然是人工过程;然而,一旦样品加载在凝胶上,电泳、检测以及分析将自动进行。当荧光标记的片段迁移通过固定点处的检测器并且可以随着收集它们来观察到它们时,“实时”进行数据收集。顾名思义,毛细管电泳是在微细管中而非在玻璃板之间进行。一旦将样品、凝胶聚合物和缓冲液加载于仪器上,则毛细管充满凝胶聚合物并且自动加载样品。因此,非母体遗传的胎儿STR序列将在重复数目上与母体序列不同。扩增这些STR序列可以产生一种或两种与母体等位基因(和母体遗传的胎儿等位基因)相对应的主要扩增产物,和一种与非母体遗传的胎儿等位基因相对应的次要产物。这项技术于2000年首次报导(普洱(Pertl) 等人,人类遗传学(Human Genetics)106:45-49[2002])并且随后已经使用实时PCR同时识别多种不同STR区域而得到发展(Liu等人,Acta Obset GynScand 86:535-541[2007])。已经使用各种尺寸的PCR扩增子来辨别循环胎儿和母体DNA物质的对应粒径分布,并且已经展示孕妇血浆中胎儿DNA分子通常比母体DNA分子短(Chan等人,临床化学(Clin Chem)50:8892[2004]。循环胎儿DNA的尺寸分级分离已经证实,循环胎儿DNA片段的平均长度<300 bp,而估计母体DNA在约0.5Kb与1Kb之间(Li等人,临床化学,50:1002-1011 [2004])。本发明提供了一种用于在一个母体样品中确定胎儿核酸分数的方法,该方法包含确定位于一个多态miniSTR位点的至少一个胎儿和一个母体等位基因的拷贝量,miniSTR可以经过扩增以产生长度大约是循环胎儿DNA片段的尺寸(例如小于约250个碱基对)的扩增子。在一个实施方案中,胎儿分数可以通过一种包括对经过扩增的多态目标核酸的至少一部分进行测序的方法确定,每一个目标核酸都包含一个miniSTR。位于信息性STR位点的胎儿和母体等位基因通过其不同的长度,即,重复数目来辨别,并且胎儿分数可以通过位于该位点的胎儿母体等位基因的量的比率百分比来计算。该方法可以使用一个信息性miniSTR或任何数目的信息性miniSTR的组合来确定胎儿核酸的分数。在一个实施方案中,该方法包括确定至少位于一个多态miniSTR的至少一个胎儿和至少一个母体等位基因的拷贝数,该miniSTR经过扩增以产生小于约 300bp、小于约250bp、小于约200bp、小于约150bp、小于约100bp或小于约50bp的扩增子。在另一个实施方案中,通过对miniSTR进行扩增所产生的扩增子小于约300bp。在另一个实施方案中,通过对miniSTR进行扩增所产生的扩增子小于约250bp。在另一个实施方案中,通过对miniSTR进行扩增所产生的扩增子小于约200bp。信息性等位基因的扩增包括使用miniSTR引物,这些引物可以对尺寸减小的扩增子进行扩增以检测小于约500bp、小于约450bp、小于约400bp、小于约350bp、小于约300碱基对(bp)、小于约250bp、小于约200bp、小于约150bp、小于约100bp或小于约50bp的STR等位基因。使用miniSTR引物产生的尺寸减小的扩增子被称为miniSTR,这些miniSTR根据与它们已经映射的基因座相对应的标记物名称识别。在一个实施方案中, miniSTR引物包括针对在可商购的STR试剂盒中发现的所有13个CODIS STR 基因座,除D2S1338、Penta D和pentaE之外,已经允许对扩增子尺寸最大程度地尺寸减小的miniSTR引物(布特勒(Butler)等人,法医学杂志(J Forensic Sci)48:1054-1064[2003])、如库柏(Coble)和布特勒所述的未与CODIS标记物连接的miniSTR基因座(库柏和布特勒,法医学杂志50:43-53[2005])以及已经在NIST表征的其他miniSTR。有关在NIST表征的miniSTR的信息可以经由万维网cstl.nist.gov/biotech/strbase/newSTRs.htm获得。任一对miniSTR 引物或两对或更多对miniSTR引物的组合可用于扩增至少一个miniSTR。In one embodiment, a portion of a mixture of fetal and maternal nucleic acid (e.g., cfDNA) in a sample is used as a template for amplifying a target nucleic acid comprising at least one STR. In certain embodiments, each target nucleic acid comprises a single (i.e., one) SNP. STR loci are found on nearly every chromosome in the genome and can be amplified using a variety of polymerase chain reaction (PCR) primers. Tetranucleotide repeats are preferred by forensic scientists due to their fidelity in PCR amplification, although certain trinucleotide and pentanucleotide repeats are also used. A comprehensive list of references, facts, and sequence information regarding STRs, published PCR primers, commonly used multiplex systems, and related population data is compiled in STRBase, which can be accessed via the World Wide Web at ibm4.carb.nist.gov:8800/dna/home.htm. Sequence information for commonly used STR loci is also available through STRBase (http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank). Commercial kits for analyzing STR loci typically provide all necessary reaction components and controls required for amplification. STR multiplexing systems allow for the simultaneous amplification of multiple, non-overlapping loci in a single reaction, substantially increasing throughput. Using multicolor fluorescence detection, even overlapping loci can be multiplexed. The polymorphism of tandemly repeated DNA sequences, which are widespread throughout the human genome, makes these sequences important genetic markers for gene mapping studies, linkage analysis, and human identification testing. Because STRs are highly polymorphic, most individuals will be heterozygous, meaning they possess two alleles (versions)—one inherited from each parent—each with a different number of repeats. PCR products containing STRs can be separated and detected using manual, semi-automated, or automated methods. Semi-automated systems are gel-based and combine electrophoresis, detection, and analysis into a single unit. On semi-automated systems, gel assembly and sample loading remain manual processes; however, once the sample is loaded on the gel, electrophoresis, detection, and analysis are automated. Data collection occurs in "real time" as fluorescently labeled fragments migrate past detectors at fixed points and can be observed as they are collected. As the name suggests, capillary electrophoresis is performed in microtubes rather than between glass plates. Once the sample, gel polymer, and buffer are loaded onto the instrument, the capillary is filled with gel polymer and the sample is automatically loaded. Therefore, non-maternally inherited fetal STR sequences will differ from the maternal sequences in the number of repeats. Amplification of these STR sequences can produce one or two major amplification products corresponding to the maternal alleles (and maternally inherited fetal alleles) and one minor product corresponding to the non-maternally inherited fetal allele. This technology was first reported in 2000 (Pertl et al., Human Genetics 106:45-49 [2002]) and has subsequently been developed using real-time PCR to simultaneously identify multiple different STR regions (Liu et al., Acta Obset GynScand 86:535-541 [2007]). PCR amplicons of various sizes have been used to discern the corresponding size distributions of circulating fetal and maternal DNA species, and it has been shown that fetal DNA molecules in maternal plasma are generally shorter than maternal DNA molecules (Chan et al., Clin Chem 50:8892 [2004]. Size fractionation of circulating fetal DNA has demonstrated that the average length of circulating fetal DNA fragments is <300 bp, whereas maternal DNA is estimated to be between approximately 0.5 Kb and 1 Kb (Li et al., Clin Chem 50:1002-1011 [2004]). [2004]). The present invention provides a method for determining the fraction of fetal nucleic acid in a maternal sample, the method comprising determining the copy amount of at least one fetal and one maternal allele located at a polymorphic miniSTR locus, the miniSTR being amplified to produce amplicons having a length that is approximately the size of circulating fetal DNA fragments (e.g., less than about 250 base pairs). In one embodiment, the fetal fraction can be determined by a method comprising sequencing at least a portion of an amplified polymorphic target nucleic acid, each target nucleic acid comprising a miniSTR. The fetal and maternal alleles located at an informative STR locus are distinguished by their different lengths, i.e., the number of repeats, and the fetal fraction can be calculated as a percentage ratio of the amount of the fetal to maternal alleles located at the locus. The method can determine the fraction of fetal nucleic acid using one informative miniSTR or a combination of any number of informative miniSTRs. In one embodiment, the method comprises determining the copy number of at least one fetal and at least one maternal allele located at at least one polymorphic miniSTR being amplified to produce amplicons having a length that is less than about In another embodiment, the amplicon generated by amplifying a miniSTR is less than about 300bp. In another embodiment, the amplicon generated by amplifying a miniSTR is less than about 250bp. In another embodiment, the amplicon generated by amplifying a miniSTR is less than about 200bp. Amplification of informative alleles includes the use of miniSTR primers that can amplify the size-reduced amplicons to detect STR alleles of less than about 500bp, less than about 450bp, less than about 400bp, less than about 350bp, less than about 300 base pairs (bp), less than about 250bp, less than about 200bp, less than about 150bp, less than about 100bp, or less than about 50bp. The size-reduced amplicons generated using miniSTR primers are referred to as miniSTRs, which are identified by the marker names corresponding to the loci to which they have been mapped. In one embodiment, MiniSTR primers include primers for all 13 CODIS STR loci found in commercially available STR kits, with the exception of D2S1338, Penta D, and pentaE, miniSTR primers that have been shown to maximize amplicon size reduction (Butler et al., J Forensic Sci 48:1054-1064 [2003]), miniSTR loci not linked to CODIS markers as described by Coble and Butler (Coble and Butler, J Forensic Sci 50:43-53 [2005]), and other miniSTRs characterized at NIST. Information about miniSTRs characterized at NIST is available on the World Wide Web at cstl.nist.gov/biotech/strbase/newSTRs.htm. Any pair of miniSTR primers or a combination of two or more pairs of miniSTR primers can be used to amplify at least one miniSTR.
扩增胎儿和母体核酸(例如cfDNA)混合物中的目标核酸是通过使用PCR 或如在本申请的其他地方描述的变异的任何方法实现。扩增这些目标序列是使用每一个能够在多重PCR反应中扩增包括多态位点(例如SNP)的一个目标核酸序列的引物对实现的。多重PCR反应包括将至少2个、至少三个、至少3 个、至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少40个或更多个引物集合组合于同一个反应中,以量化在同一个测序反应中包括至少两个、至少三个、至少5个、至少10个、至少15个、至少20个、至少25个、至少30个、至少40个或更多个多态位点的经过扩增的目标核酸。引物集合的任何小组都可以被配置为扩增至少一个信息性多态序列。Amplification of the target nucleic acid in a mixture of fetal and maternal nucleic acid (e.g., cfDNA) is achieved by using PCR or any method of variation as described elsewhere in this application. Amplification of these target sequences is achieved using primer pairs that each amplify a target nucleic acid sequence comprising a polymorphic site (e.g., SNP) in a multiplex PCR reaction. The multiplex PCR reaction comprises combining at least 2, at least three, at least 3, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more primer sets in the same reaction to quantify the amplified target nucleic acid comprising at least two, at least three, at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40 or more polymorphic sites in the same sequencing reaction. Any group of primer sets can be configured to amplify at least one informative polymorphic sequence.
引物被设计成与一个接近cfDNA上的SNP位点的序列杂交以确保该SNP 位点包括在通过测序仪产生的读数的长度内。如实例中所提供,用于识别任一个多态位点的引物集合中的两个引物中的至少一个以足够接近多态位点的方式杂交,以使该多态位点涵盖在通过在伊鲁米纳分析仪GII上进行大规模平行测序所产生的36bp读数内,并且产生长度足以在成簇形成期间进行桥式扩增的扩增子。因此,引物被设计成能产生至少110bp的扩增子,这些扩增子在与用于成簇扩增的通用适配子(加利福尼亚州圣地亚哥市伊鲁米纳公司(Illumina Inc.,San Diego,CA))组合时产生至少200bp的DNA分子。在表33中给出的SNP用于在一个多重检验中同时扩增13个目标序列。在表33中提供小组是一个例示性SNP小组。可以采用更少或更多的SNP来针对多态目标核酸富集胎儿和母体DNA。可以使用的额外的SNP包括在表34中给出的SNP。SNP 等位基因用粗体展示并且加下划线。可用于根据本发明的方法确定胎儿分数的其他SNP包括rs315791、rs3780962、rs1410059、rs279844、rs38882、rs9951171、 rs214955、rs6444724、rs2503107、rs1019029、rs1413212、rs1031825、rs891700、 rs1005533、rs2831700、rs354439、rs1979255、rs1454361、rs8037429和rs1490413。这些SNP已经通过TaqMan PCR针对确定胎儿分数来分析,并且披露于美国专利申请公开2010-0010085中。Primers are designed to hybridize with a sequence close to the SNP site on cfDNA to ensure that the SNP site is included in the length of the reading produced by the sequencer. As provided in the example, at least one of the two primers in the primer set for identifying any polymorphic site hybridizes in a manner close enough to the polymorphic site, so that the polymorphic site is included in the 36bp reading produced by carrying out large-scale parallel sequencing on Illumina analyzer GII, and produces an amplicon of length sufficient to carry out bridge amplification during cluster formation. Therefore, primers are designed to produce an amplicon of at least 110bp, which produces a DNA molecule of at least 200bp when combined with the universal adapter (Illumina Inc., San Diego, CA) for cluster amplification. The SNPs given in Table 33 are used to amplify 13 target sequences simultaneously in a multiplex test. Providing a panel in Table 33 is an exemplary SNP panel. Fewer or more SNPs can be used to enrich fetus and maternal DNA for polymorphic target nucleic acids. Additional SNPs that can be used include those given in Table 34. SNP alleles are shown in bold and underlined. Other SNPs that can be used to determine fetal fraction according to the methods of the present invention include rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429, and rs1490413. These SNPs have been analyzed by TaqMan PCR for determining fetal fraction and are disclosed in US Patent Application Publication No. 2010-0010085.
每一个引物集合中的正向或反向引物与一个足够接近所述多态位点的DNA序列杂交以包括在通过对经过扩增的预先选定的多态核酸进行所述大规模平行测序所产生的序列读数内。序列读数的长度与具体的测序技术有关。大规模平行测序方法提供了尺寸从几十碱基对到数百碱基对变化的序列读数。每一个引物集合中的至少一个引物被设计成能识别在20bp、约25bp、约30bp、约35bp、约40bp、约45bp、约50bp、约55bp、约60bp、约65bp、约70bp、约75bp、约80bp、约85bp、约90bp、约95bp、约100bp、约110bp、约120bp、约130bp、约140bp、约150bp、约200bp、约250bp、约300bp、约350bp、约400bp、约 450bp或约500bp的序列读数内存在的一个多态位点。在某些实施方案中,每一个所述引物集合中的至少一个引物被设计成能识别在约25bp、约40bp、约 50bp或约100bp的序列读数内存在的一个多态位点。Each forward or reverse primer in the primer set hybridizes to a DNA sequence sufficiently close to the polymorphic site to be included in the sequence read generated by the massively parallel sequencing of the amplified preselected polymorphic nucleic acid. The length of the sequence read is related to the specific sequencing technology. Massively parallel sequencing methods provide sequence reads ranging in size from tens of base pairs to hundreds of base pairs. At least one primer in each primer set is designed to recognize a polymorphic site within a sequence read length of 20 bp, about 25 bp, about 30 bp, about 35 bp, about 40 bp, about 45 bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp, about 130 bp, about 140 bp, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp. In certain embodiments, at least one primer in each primer set is designed to recognize a polymorphic site within a sequence read length of about 25 bp, about 40 bp, about 50 bp, or about 100 bp.
循环无细胞DNA约<300bp。因此,引物集合被设计成能与长度平均多达约300 bp的多态序列杂交并且对其进行扩增,其中胎儿DNA长度平均是约170bp。在某些实施方案中,引物集合与DNA杂交,产生多达约300bp的扩增子。在其他实施方案中,引物集合与所述DNA序列杂交,产生至少约100bp、至少约150bp、至少约200bp的扩增子。引物集合可以与在相同染色体上存在的DNA 序列杂交或与在不同染色体上存在的DNA序列杂交。举例来说,一个或多个引物集合可以与在相同染色体上存在的序列杂交。可替代地,两个或更多个引物集合与在不同染色体上存在的序列杂交。在一个实施方案中,引物对在染色体1到22中的一个或多个上存在的多态序列进行扩增。在某些实施方案中,引物集合不与在染色体13、18、21、X或Y上存在的DNA序列杂交。Circulating cell-free DNA is approximately <300bp. Therefore, primer sets are designed to hybridize to polymorphic sequences up to an average of about 300bp in length and amplify them, wherein the fetal DNA length is an average of about 170bp. In certain embodiments, primer sets hybridize to DNA to produce an amplicon up to about 300bp. In other embodiments, primer sets hybridize to the DNA sequence to produce an amplicon of at least about 100bp, at least about 150bp, at least about 200bp. Primer sets can hybridize to the DNA sequence present on the same chromosome or to the DNA sequence present on different chromosomes. For example, one or more primer sets can hybridize to the sequence present on the same chromosome. Alternatively, two or more primer sets hybridize to the sequence present on different chromosomes. In one embodiment, a primer pair amplifies the polymorphic sequence present on one or more of chromosomes 1 to 22. In certain embodiments, primer sets do not hybridize to the DNA sequence present on chromosomes 13, 18, 21, X or Y.
在步骤740(图7)中,使用所扩增的多态序列的一部分或全部来制备用于以所述平行方式测序的测序文库。在一个实施方案中,制备文库以便使用伊鲁米纳的基于可逆终止子的测序化学技术合成法进行测序。In step 740 ( FIG. 7 ), a portion or all of the amplified polymorphic sequences are used to prepare sequencing libraries for sequencing in the parallel manner. In one embodiment, the libraries are prepared for sequencing by synthesis using Illumina's reversible terminator-based sequencing chemistry.
在步骤740中,确定胎儿分数所需要的序列信息使用任一种已知的DNA 测序方法来获得。优选地,在此描述的方法采用下一代测序技术(NGS)来提供如在本申请的其他地方所描述的可计数的序列标签。测序可以是合成法大规模平行测序。优选地,合成法大规模平行测序使用可逆染料终止子。可替代地,大规模平行测序可以是连接法测序,或单分子测序。In step 740, the sequence information required to determine the fetal fraction is obtained using any known DNA sequencing method. Preferably, the method described herein employs next-generation sequencing (NGS) to provide countable sequence tags as described elsewhere in this application. Sequencing can be massively parallel sequencing by synthesis. Preferably, massively parallel sequencing by synthesis uses reversible dye terminators. Alternatively, massively parallel sequencing can be sequencing by ligation or single-molecule sequencing.
对所扩增的目标多态核酸进行部分测序,并且对包含预定长度(例如36bp) 的读数、映射到已知参考基因组的序列标签进行计数。仅仅与参考基因组独特比对的序列读数作为序列标签进行计数。在一个实施方案中,参考基因组是包含多态目标核酸(SNP)序列的人工目标序列基因组。在一个实施方案中,参考基因组是人工SNP参考基因组。在另一个实施方案中,参考基因组是人工 STR参考基因组。在又一个实施方案中,参考基因组是人工串联STR参考基因组。人工参考基因组可以使用目标多态核酸序列编辑。人工参考基因组可以包括每一个包含一种或多种不同类型的多态序列的多态目标序列。举例来说,人工参考基因组可以包括包含SNP等位基因和/或STR的多态序列。在一个实施方案中,参考基因组是人类参考序列基因组NCBI36/hg18序列,其在万维网 genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105可获得。其他公开的序列信息来源包括GenBank、dbEST、dbSTS、EMBL(欧洲分子生物学实验室(European Molecular Biology Laboratory))以及DDBJ (日本DNA数据库)。在另一个实施方案中,参考基因组包括人类参考基因组NCBI36/hg18序列和包括目标多态序列的人工目标序列基因组,例如SNP 基因组。通过将映射标签的序列与参考基因组的序列进行比较来确定所测序的核酸(例如cfDNA)分子的染色体起点可实现序列标签的映射,并且不需要具体的遗传序列信息。多种计算机算法可以用于比对序列,包括而不限于BLAST (奥茨秋(Altschul)等人,1990)、BLITZ(MPsrch)(斯特罗科和柯林斯 (Sturrock&Collins),1993)、FASTA(普尔逊和李普曼(Pearson&Lipman),1988)、BOWTIE(郎格米(Langmead)等人,基因组生物学(Genome Biology) 10:R25.1-R25.10[2009])、或ELAND(美国加利福尼亚州圣地亚哥市伊鲁米纳公司(Illumina,Inc.,San Diego,CA,USA))。在一个实施方案中,对血浆 cfDNA分子的以克隆方式扩增的拷贝的一端进行测序并且通过伊鲁米纳基因组分析仪的生物信息学比对分析加以处理,伊鲁米纳基因组分析仪使用核苷酸数据库(ELAND)软件的大规模高效比对来进行。在包括使用NGS测序方法确定存在或不存在非整倍性和胎儿分数的方法的实施方案中,为确定非整倍性而对测序信息进行的分析可允许较小程度的错配(每个序列标签0到2个错配),以解释参考基因组与混合样品中的基因组之间可能存在的微小多态性。为确定胎儿分数而对测序信息进行的分析可以允许较小程度的错配,这取决于多态序列。举例来说,如果多态序列是STR,那么可以允许较小程度的错配。在多态序列是SNP的情况下,首先对与位于SNP位点的两个等位基因中的任一个精确匹配的所有序列进行计数并且从剩余读数中过滤掉,对于剩余读数,可以允许较小程度的错配。可以如在此所描述,或者使用采用将感兴趣的染色体的序列标签的中位数相对于其他常染色体中每一个的标签的中位数归一化(范(Fan) 等人,美国国家科学院院刊(Proc Natl Acad Sci)105:16266-16271[2008])或比较与每一个染色体进行比对的独特读数的数目和与所有染色体进行比对的读数总数以得出每一个染色体的基因组表达百分比的替代分析,确定与每一个染色体进行比对的序列读数的数目的量化以确定染色体非整倍性。产生“z分数”以表示感兴趣的染色体的基因组表达百分比与相同染色体在整倍体对照组之间的平均表达百分比之间的差异除以标准差(赵(Chiu)等人,临床化学(Clin Chem)56:459-463[2010])。在另一个实施方案中,测序信息可以如2010年1 月19日申请的标题是“归一化的生物学检验”的美国临时专利申请案号 32047-768.101中所述来确定,该申请通过引用以其全文结合于此。The amplified target polymorphic nucleic acid is partially sequenced, and sequence tags comprising reads of a predetermined length (e.g., 36 bp) mapped to a known reference genome are counted. Only sequence reads that uniquely align to the reference genome are counted as sequence tags. In one embodiment, the reference genome is an artificial target sequence genome comprising polymorphic target nucleic acid (SNP) sequences. In one embodiment, the reference genome is an artificial SNP reference genome. In another embodiment, the reference genome is an artificial STR reference genome. In yet another embodiment, the reference genome is an artificial tandem STR reference genome. The artificial reference genome can be compiled using the target polymorphic nucleic acid sequence. The artificial reference genome can include polymorphic target sequences that each include one or more different types of polymorphic sequences. For example, the artificial reference genome can include polymorphic sequences comprising SNP alleles and/or STRs. In one embodiment, the reference genome is the human reference sequence genome NCBI36/hg18 sequence, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105. Other publicly available sources of sequence information include GenBank, dbEST, dbSTS, EMBL (European Molecular Biology Laboratory), and DDBJ (DNA Database of Japan). In another embodiment, the reference genome includes the human reference genome NCBI36/hg18 sequence and an artificial target sequence genome including target polymorphic sequences, such as a SNP genome. Mapping of sequence tags can be achieved by comparing the sequence of the mapped tag with the sequence of the reference genome to determine the chromosomal origin of the sequenced nucleic acid (e.g., cfDNA) molecule, and does not require specific genetic sequence information. A variety of computer algorithms can be used to align sequences, including, but not limited to, BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Pearson & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10: R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, CA, USA). In one embodiment, one end of a clonally amplified copy of a plasma cfDNA molecule is sequenced and processed by bioinformatics alignment analysis using the Illumina Genome Analyzer, which uses large-scale, efficient alignment of nucleotide databases (ELAND) software. In embodiments comprising methods for determining the presence or absence of aneuploidy and fetal fraction using an NGS sequencing method, analysis of sequencing information for determining aneuploidy may allow for a smaller degree of mismatch (0 to 2 mismatches per sequence tag) to account for minor polymorphisms that may exist between the reference genome and the genome in the mixed sample. Analysis of sequencing information for determining fetal fraction may allow for a smaller degree of mismatch, depending on the polymorphic sequence. For example, if the polymorphic sequence is an STR, a smaller degree of mismatch may be allowed. In the case where the polymorphic sequence is a SNP, all sequences that are precisely matched to any one of the two alleles at the SNP site are first counted and filtered out from the remaining reads, for which a smaller degree of mismatch may be allowed. Can be as described herein, or use adopt and adopt the median of the sequence tag of chromosome of interest relative to the median of the tag of each in other autosomes Normalized (fan (Fan) et al., Proc Natl Acad Sci (Proc Natl Acad Sci) 105:16266-16271[2008]) or compare the number of unique readings compared with each chromosome and the total number of readings compared with all chromosomes to derive the genomic expression percentage of each chromosome alternative analysis, determine the quantification of the number of sequence readings compared with each chromosome to determine chromosome aneuploidy.Produce " z score " to represent the difference between the genomic expression percentage of chromosome of interest and the average expression percentage of the same chromosome between euploid control groups divided by standard deviation (Zhao (Chiu) et al., clinical chemistry (Clin Chem) 56:459-463[2010]). In another embodiment, sequencing information can be determined as described in U.S. Provisional Patent Application No. 32047-768.101, filed January 19, 2010, entitled "Normalized Biological Assays," which is incorporated herein by reference in its entirety.
为确定胎儿分数而对测序信息进行的分析可以允许较小程度的错配,这取决于多态序列。举例来说,如果多态序列是STR,那么可以允许较小程度的错配。在多态序列是SNP的情况下,首先对与位于SNP位点的两个等位基因中的任一个精确匹配的所有序列进行计数并且从剩余读数中过滤掉,对于剩余读数,可以允许较小程度的错配。通过对核酸进行测序来确定胎儿分数的本发明方法可以与其他方法组合使用。Analysis of sequencing information to determine fetal fraction can tolerate a small degree of mismatch, depending on the polymorphic sequence. For example, if the polymorphic sequence is an STR, a small degree of mismatch can be tolerated. If the polymorphic sequence is a SNP, all sequences that exactly match either of the two alleles at the SNP site are first counted and filtered out from the remaining reads, for which a small degree of mismatch can be tolerated. The present method for determining fetal fraction by sequencing nucleic acids can be used in combination with other methods.
在步骤760中,胎儿分数是基于在参考基因组中所包含的信息性多态位点(例如SNP)上映射到第一等位基因的标签的总数和映射到第二等位基因的标签的总数来确定。举例来说,参考基因组是涵盖了包括SNP rs560681、 rs1109037、rs9866013、rs13182883、rs13218440、rs7041158、rs740598、rs10773760、 rs4530059、rs7205345、rs8078417、rs576261、rs2567608、rs430046、rs9951171、 rs338882、rs10776839、rs9905977、rs1277284、rs258684、rs1347696、rs508485、 rs9788670、rs8137254、rs3143、rs2182957、rs3739005以及rs530022的多态序列的人工目标序列基因组。在一个实施方案中,人工参考基因组包括SEQ ID NO:7到62的多态目标序列(参见实例24)。In step 760 , the fetal fraction is determined based on the total number of tags mapped to the first allele and the total number of tags mapped to the second allele at informative polymorphic sites (eg, SNPs) contained in the reference genome. For example, the reference genome includes SNPs rs560681, rs1109037, rs9866013, rs13182883, rs13218440, rs7041158, rs740598, rs10773760, rs4530059, rs7205345, rs8078417, rs576261, rs2567608, rs430046, rs9951171, rs338882, rs10776839, rs9905977, rs1277284, rs258684, rs1347696, rs508485, An artificial target sequence genome comprising polymorphic sequences of rs9788670, rs8137254, rs3143, rs2182957, rs3739005, and rs530022. In one embodiment, the artificial reference genome comprises polymorphic target sequences of SEQ ID NOs: 7 to 62 (see Example 24).
在另一个实施方案中,人工基因组是涵盖了包含串联SNP的多态序列的人工目标序列基因组。在另一个实施方案中,人工目标基因组涵盖了包含STR 的多态序列。人工目标序列基因组的组成将视用于确定胎儿分数的多态序列而变化。因此,人工目标序列基因组不限于在此例证的SNP、串联SNP或STR 序列。In another embodiment, the artificial genome is an artificial target sequence genome encompassing polymorphic sequences comprising tandem SNPs. In another embodiment, the artificial target genome encompasses polymorphic sequences comprising STRs. The composition of the artificial target sequence genome will vary depending on the polymorphic sequences used to determine fetal fraction. Therefore, the artificial target sequence genome is not limited to the SNPs, tandem SNPs, or STR sequences exemplified herein.
信息性多态位点(例如SNP)通过等位基因的序列的差异和可能的等位基因中的每一个的量来识别。胎儿cfDNA以低于母体cfDNA 10%的浓度存在。因此,相对于母体等位基因的主要贡献,存在可以分配给胎儿的等位基因对胎儿和母体核酸混合物的次要贡献。来源于母体基因组的等位基因在此称为主等位基因,并且来源于胎儿基因组的等位基因在此称为次等位基因。用所映射的序列标签的类似水平表示的等位基因代表母体等位基因。对包含SNP并且来源于母体血浆样品的目标核酸进行例示性多重扩增的结果显示于图12中。将信息性SNP与位于多态位点的单一核苷酸变化进行辨别,并且胎儿等位基因通过与母体核酸对胎儿和母体核酸混合物的主要贡献进行比较,其对样品中该混合物的贡献相对次要来辨别。因此,对于预定多态位点处的两个等位基因中的每一者而言,胎儿cfDNA在母体样品中的相对丰度可以被确定,作为映射到参考基因组上的目标核酸序列的独特序列标签的总数的参数。在一个实施方案中,针对每个信息性等位基因(等位基因x),如在本申请的其他地方所述,计算胎儿和母体核酸混合物中胎儿核酸的分数。Informative polymorphic sites (such as SNPs) are identified by the difference in the sequence of the alleles and the amount of each of the possible alleles. Fetal cfDNA is present at a concentration lower than 10% of maternal cfDNA. Therefore, relative to the main contribution of the maternal allele, there is a minor contribution of the alleles to the fetus to the mixture of fetal and maternal nucleic acids that can be assigned to the fetus. The alleles derived from the maternal genome are referred to as major alleles herein, and the alleles derived from the fetal genome are referred to as minor alleles herein. The alleles represented by the similar levels of the mapped sequence tags represent maternal alleles. The result of exemplary multiple amplification of the target nucleic acid comprising SNPs and derived from maternal plasma samples is shown in Figure 12. Informative SNPs are distinguished from single nucleotide changes at the polymorphic sites, and the fetal alleles are distinguished by comparing the main contribution of the maternal nucleic acid to the mixture of fetus and maternal nucleic acids, and their contribution to the mixture in the sample is relatively minor. Thus, for each of the two alleles at a predetermined polymorphic site, the relative abundance of fetal cfDNA in a maternal sample can be determined as a parameter of the total number of unique sequence tags mapped to the target nucleic acid sequence on the reference genome. In one embodiment, for each informative allele (allele x ), the fraction of fetal nucleic acid in the mixture of fetal and maternal nucleic acids is calculated as described elsewhere in this application.
使用STR序列和毛细管电泳法估计胎儿分数Estimation of fetal fraction using STR sequencing and capillary electrophoresis
因重复数目不同,个体具有不同的STR长度。因为STR的多态性高,所以大多数个体将是杂合型,即,大多数人拥有两个等位基因(版本)——一个由每个亲代遗传而来——每个具有不同的重复数目。非母体遗传的胎儿STR 序列将在重复数目上与母体序列不同。扩增这些STR序列可以产生一种或两种与母体等位基因(和母体遗传的胎儿等位基因)相对应的主要扩增产物,和一种与非母体遗传的胎儿等位基因相对应的次要产物。当测序时,可以将所收集的样品与相应等位基因相关联且进行计数以通过使用等式3确定相对分数。Individuals have different STR lengths due to different repeat numbers. Because STRs are highly polymorphic, most individuals will be heterozygous, that is, most people will have two alleles (versions)—one inherited from each parent—each with a different repeat number. Fetal STR sequences that are not maternally inherited will differ from the maternal sequence in repeat number. Amplification of these STR sequences can produce one or two major amplification products corresponding to the maternal alleles (and maternally inherited fetal alleles) and one minor product corresponding to the non-maternally inherited fetal allele. When sequencing, the collected samples can be associated with the corresponding alleles and counted to determine relative scores using Equation 3.
通过使用荧光标记的引物对纯化的样品进行PCR。可以使用人工、半自动化或自动化电泳法分离并且检测包含STR的PCR产物。半自动化系统是基于凝胶的并且将电泳、检测和分析组合成一个单元。在半自动化系统上,凝胶装配和样品加载仍然是人工程序;然而,一旦样品加载于凝胶上,则电冰、检测和分析自动进行。顾名思义,毛细管电冰是在微细管中而非在玻璃板之间进行。一旦样品、凝胶聚合物和缓冲液加载于仪器上,则毛细管充满凝胶聚合物并且自动加载样品。当荧光标记的片段迁移通过固定点处的检测器并且可以随着收集它们可以观察到它们时,“实时”进行数据收集。共毛细管电冰获得的序列可以通过测量荧光标记波长的程序加以检测。胎儿分数的计算是基于平均所有信息性标记物。信息性标记物是通过电泳图谱上峰值的存在加以识别,这些峰值落在针对所分析的STR的预设数据箱参数内。PCR is performed on the purified sample using fluorescently labeled primers. STR-containing PCR products can be separated and detected using manual, semi-automated, or automated electrophoresis. Semi-automated systems are gel-based and combine electrophoresis, detection, and analysis into a single unit. On semi-automated systems, gel assembly and sample loading remain manual procedures; however, once the sample is loaded onto the gel, electrophoresis, detection, and analysis proceed automatically. As the name suggests, capillary electrophoresis is performed in microtubes rather than between glass plates. Once the sample, gel polymer, and buffer are loaded onto the instrument, the capillary is filled with gel polymer and the sample is loaded automatically. Data collection occurs in "real time" as fluorescently labeled fragments migrate past detectors at fixed points and can be observed as they are collected. Sequences obtained by capillary electrophoresis can be detected using a procedure that measures the wavelength of the fluorescent markers. Fetal fraction is calculated based on the average of all informative markers. Informative markers are identified by the presence of peaks on the electropherogram that fall within preset bin parameters for the STR being analyzed.
针对任何指定信息性标记物的次等位基因的分数是通过次要成分的峰值高度除以主成分的峰值高度总和来计算,并且该分数表示为如下针对每个信息性基因座的百分比:The fraction of minor alleles for any given informative marker is calculated by dividing the peak height of the minor component by the sum of the peak heights of the principal components, and the fraction is expressed as a percentage for each informative locus as follows:
会计算针对包含两个或更多个信息性STR的样品的胎儿分数,作为针对两个或更多个信息性标记物所计算的胎儿分数平均值。The fetal fraction for a sample comprising two or more informative STRs is calculated as the average of the fetal fractions calculated for the two or more informative markers.
使用混合模型估计胎儿分数Estimating fetal fraction using mixed models
在此披露的实施方案中,存在多达四种不同的数据类型(配型情况),它们构成在考虑中的多态性的次等位基因频率数据。In the embodiments disclosed herein, there are up to four different types of data (typing cases) that constitute the minor allele frequency data for the polymorphism under consideration.
如图13中所显示,情况1和情况2是多态性情况,其中母亲在某一等位基因处是纯合型。在情况1中,如果婴儿和母亲都是纯合型,那么多态性是情况1多态性。此情形典型地不是令人特别感兴趣,因为所收集的数据仅在所分析的多态位点存在一种类型的等位基因。在情况2中,如果母亲是纯合型并且婴儿是杂合型,那么胎儿分数f在名义上由次等位基因计数与覆盖范围的比率的2倍得到。覆盖范围定义为映射到多态性特定位点的读数或标签(胎儿与母体)总数。情况2中以胎儿和母体样品的分数来对胎儿分数进行近似估计的等式如下:As shown in Figure 13, situation 1 and situation 2 are polymorphism situations, and wherein mother is homozygous at a certain allele place.In situation 1, if baby and mother are all homozygous, then polymorphism is situation 1 polymorphism.This situation is typically not particularly interesting, because the data collected only have a type of allele in the polymorphic site analyzed.In situation 2, if mother is homozygous and baby is heterozygous, so fetal fraction f is obtained in name by 2 times of the ratio of secondary allele count and coverage.Coverage is defined as the reading or label (fetus and mother) sum that are mapped to the polymorphism specific site.The equation that fetal fraction is carried out approximate estimation with the score of fetus and maternal sample in situation 2 is as follows:
在情况3中,其中母亲是杂合型并且婴儿是纯合型的,胎儿分数在名义上是次等位基因计数与覆盖范围的比率的1-2倍。在情况3中,以胎儿和母体样品这二者中总读取数的分数来对胎儿分数进行近似的等式如下:In case 3, where the mother is heterozygous and the baby is homozygous, the fetal fraction is nominally 1-2 times the ratio of the minor allele count to coverage. In case 3, the equation that approximates the fetal fraction as a fraction of the total reads in both the fetal and maternal samples is as follows:
最后,在情况4中,其中母亲和胎儿都是杂合型,次等位基因分数应该总是0.5(不包括误差)。对于落在情况4中的多态性,无法推导出胎儿分数。Finally, in case 4, where both the mother and fetus are heterozygous, the minor allele fraction should always be 0.5 (excluding error). For polymorphisms falling into case 4, no fetal fraction can be derived.
表7概述如果主等位基因读数的数目是300并且次等位基因读数的数目是 200,那么使用等式4和5估计胎儿分数的实例。覆盖范围会是500。Table 7 summarizes an example of estimating fetal fraction using Equations 4 and 5 if the number of major allele reads is 300 and the number of minor allele reads is 200. The coverage would be 500.
表7:使用配型估计胎儿分数的实例Table 7: Example of fetal fraction estimation using matching
在某些实施方案中,可采用混合模型将多态性集合分类成两个或更多个所提出的配型情况,并且同时针对这些情况中的每一者从平均等位基因频率估计胎儿DNA分数。总体而言,混合模型假定具体数据集合是由不同类型的数据的混合组成,其每一个具有它自己的期望的分布(例如正态分布)。该程序试图找到每一类型数据的平均值和可能的其他特征。在此披露的实施方案中,存在多达四种不同的数据类型(配型情况),其构成在考虑中的针对多态性的次等位基因频率数据。In certain embodiments, a hybrid model can be used to classify the polymorphism set into two or more proposed matching situations, and simultaneously estimate the fetal DNA fraction from the average allele frequency for each of these situations. In general, the hybrid model assumes that the specific data set is composed of a mixture of different types of data, each of which has its own expected distribution (e.g., normal distribution). The program attempts to find the average value and possible other characteristics of each type of data. In the embodiments disclosed herein, there are up to four different data types (matching situations) that constitute the minor allele frequency data for the polymorphism under consideration.
在采用混合模型的某些实施方案中,针对正考虑为多态性的位置计算由等式1给出的一个或多个阶乘矩。例如,使用多个在DNA序列中所考虑的SNP 位置计算阶乘矩Fi(或阶乘矩的一个集)。如下文等式10所示,每个不同阶乘矩Fi是对给定位置,针对次等位基因频率ai与覆盖范围di的比率,所考虑的所有不同多态性位置上的总和。如下文等式11所示,这些阶乘矩还涉及与上述四种配型情况中的每一者有关的参数α和pi。确切地说,它们涉及针对每一情况的概率pi,以及由α给定的、在所考虑的多态性的集中的四种情况中每一者的相对量。如所解释,概率pi是在母亲血液中的无细胞DNA中,胎儿DNA 的分数的函数。如下文更充分解释,通过计算足够数量的这些阶乘矩,该方法提供足够数量的表达式来求出所有未知量。此情况下的未知量会是在所考虑的多态性种群中,四种情况中的每一者的相对量以及与这些四种情况中的每一者相关的概率(并且由此是胎儿DNA分数)。使用其他版本的混合模型可以获得类似结果。某些版本仅利用落在情况1和情况2中的多态性,其中情况3和情况4的多态性是通过阈值技术加以过滤。In certain embodiments employing a mixed model, one or more factorial moments given by equation 1 are calculated for the position being considered as a polymorphism. For example, factorial moments F (or a set of factorial moments) are calculated using a plurality of SNP positions considered in the dna sequence. As shown in equation 10 below, each different factorial moment F is the summation of the ratio of minor allele frequency ai to coverage d i for a given position over all different polymorphic positions under consideration. As shown in equation 11 below, these factorial moments also relate to parameters α and pi relevant to each of the above-mentioned four matching situations. Specifically, they relate to the probability pi for each situation, and the relative amount of each of the four situations given by α, in the concentrated polymorphic situation under consideration. As explained, probability pi is a function of the fraction of fetal DNA in the cell-free DNA in the mother's blood. As more fully explained below, by calculating a sufficient number of these factorial moments, the method provides a sufficient number of expressions to solve for all unknown quantities. The unknowns in this case would be the relative amounts of each of the four cases and the probabilities associated with each of these four cases (and thus the fetal DNA fraction) in the population of polymorphisms under consideration. Similar results can be obtained using other versions of the admixture model. Some versions utilize only polymorphisms that fall into cases 1 and 2, with polymorphisms in cases 3 and 4 being filtered out using thresholding techniques.
因此,阶乘矩可用作混合模型的一部分,以识别配型的四种情况的任何组合的概率。并且,如所提及,这些概率,或至少针对情况2和情况3的这些概率,直接涉及母亲血液中的总无细胞DNA中的胎儿DNA分数。Therefore, factorial moments can be used as part of a mixture model to identify the probability of any combination of the four scenarios of matching. And, as mentioned, these probabilities, or at least those for scenarios 2 and 3, are directly related to the fraction of fetal DNA in the total cell-free DNA in the mother's blood.
还应该提及,由e给定的测序误差可用于降低必须求解的阶乘矩等式的系统复杂性。在这点上,应该认识到测序误差实际上可以具有四种结果中的任一个(对应于位于任何给定的多态性位置的四个可能碱基中的每一个)。It should also be mentioned that the sequencing error given by e can be used to reduce the systematic complexity of the factorial moment equation that must be solved. In this regard, it should be recognized that a sequencing error can actually have any one of four outcomes (corresponding to each of the four possible bases at any given polymorphic position).
假设在基因组位置j的主等位基因计数是B,在位置j的计数(读数的计数)的一阶统计量。主等位基因,b,是对应的自变量最大值(arg max)。当考虑一个以上SNP时,使用下标。按以下给出主等位基因计数:Assume that the major allele count at genomic position j is B, a first-order statistic of the count (number of reads) at position j. The major allele, b, is the corresponding arg max. When more than one SNP is considered, a subscript is used. The major allele count is given as follows:
假设位置j的次等位基因计数是A,在位置j的计数(即,次最高的等位基因计数)的二阶统计量:Assuming the minor allele count at position j is A, the second-order statistic for the count at position j (i.e., the next highest allele count) is:
覆盖范围定义为映射到多态性具体位点的总读取数(胎儿与母体)。假设位置j的覆盖范围定义为D:Coverage is defined as the total number of reads (fetal and maternal) mapped to a specific polymorphic site. Suppose the coverage of position j is defined as D:
D≡Dj={di}=Aj+Bj 等式8D≡D j = {d i } = A j + B j Equation 8
在这个实施方案中,次等位基因频率A是如等式9中所示的四项的总和。所述的四种杂合性情况提示针对在点(ai,di)的ai个次等位基因计数的分布的以下二项式混合模型,其中di是覆盖范围:In this embodiment, the minor allele frequency A is the sum of four terms as shown in Equation 9. The four heterozygosity cases suggest the following binomial mixture model for the distribution of ai minor allele counts at points ( ai , dj ), where dj is the coverage:
A={ai}~α1数据箱(p1,di)+α2数据箱(p2,di)+α3数据箱(p3,di)+α4数据箱(p4,di)A = {a i } ~ α 1 data box (p 1 , d i ) + α 2 data box (p 2 , d i ) + α 3 data box (p 3 , d i ) + α 4 data box (p 4 , d i )
其中in
1=α1+α2+α3+α4 1=α 1 +α 2 +α 3 +α 4
m=4m=4
等式9Equation 9
每一项对应于四种配型情况之一。每一项是多态性分数α与次等位基因频率的二项式分布的乘积。这些α表示落在四种情况中每一者中的多态性的分数。每个二项式分布具有相关的概率,p,和覆盖范围,d。情况2的次等位基因概率例如由f/2给定,其中f是胎儿分数。用于使pi与胎儿分数和测序误差率关联的不同模型描述如下。参数αi涉及种群特异性参数并且相对于如亲代的种族和后代,让这些值“浮动”的能力可赋予这些方法额外的鲁棒性。Each item corresponds to one of four matching situations. Each item is the product of the binomial distribution of polymorphism score α and minor allele frequency. These α representations fall on the score of the polymorphism in each of the four situations. Each binomial distribution has relevant probability, p, and coverage, d. The minor allele probability of situation 2 is for example given by f/2, where f is fetal fraction. The different models for associating p i with fetal fraction and sequencing error rate are described below. Parameter α i relates to population-specific parameters and relative to race and offspring such as parental generation, the ability to "float" these values can give these methods extra robustness.
所披露的实施方案利用针对考虑中的等位基因频率数据的阶乘矩。众所周知,分布平均值是一阶矩。它是次等位基因频率的期望值。方差是二阶矩。它是从等位基因频率平方的期望值计算而来。The disclosed embodiments utilize factorial moments for the allele frequency data under consideration. As is well known, the mean of a distribution is a first-order moment. It is the expected value of the minor allele frequency. The variance is a second-order moment. It is calculated from the expected value of the squared allele frequency.
对于不同的杂合性情况,以上等式9可以解出胎儿分数。在某些实施方案中,胎儿分数是通过阶乘矩方法解出,其中混合参数可以用矩表示,这些矩可以容易地从观察数据估计出。For different heterozygosity conditions, Equation 9 above can be solved for fetal fraction. In certain embodiments, fetal fraction is solved by the factorial moment method, where the mixing parameters can be expressed in terms of moments that can be easily estimated from the observed data.
跨所有多态性的等位基因频率数据可用于计算第i个阶乘矩Fi(第一阶乘矩F1、第二阶乘矩F2等),如等式10所示。(SNP仅用于实例的目的。其他类型的多态性可如在本申请的其他地方所论述使用。)给定n个SNP位置,则阶乘矩如下定义:The allele frequency data across all polymorphisms can be used to calculate the i-th factorial moment F i (first factorial moment F 1 , second factorial moment F 2 , etc.), as shown in Equation 10. (SNPs are used for example purposes only. Other types of polymorphisms can be used as discussed elsewhere in this application.) Given n SNP positions, the factorial moment is defined as follows:
如由这些等式所显示,阶乘矩是超过i项(数据集中的个体多态性)的总和,其中数据集中存在n个此类多态性。求和的各项是次等位基因计数ai,和覆盖范围值di的函数。As shown by these equations, the factorial moment is the sum over i terms (individual polymorphisms in the data set), where there are n such polymorphisms in the data set. Each term of the sum is a function of the minor allele count ai , and the coverage value dj .
有用的是,阶乘矩与αi和pi的值有关,如等式11中所说明。阶乘矩可以与{αi,pi}关联,从而Usefully, the factorial moments are related to the values of αi and pi , as illustrated in Equation 11. The factorial moments can be associated with { αi , pi } such that
从概率pi可以确定胎儿分数f。例如,并且因此,可靠的逻辑可以求出方程组的解,这个方程组使未知量α和p变量与针对跨所考虑的多个多态性中次等位基因分数的阶乘矩表达式关联。当然,在所披露的实施方案的范围内存在对混合模型求解的其他技术。From the probabilities p , the fetal fraction f can be determined. For example, and therefore, reliable logic can solve a system of equations that relates the unknowns α and p variables to the factorial moment expression for the minor allele fraction across the multiple polymorphisms under consideration. Of course, other techniques for solving the mixture model exist within the scope of the disclosed embodiments.
当n>2*(要估计的参数数目)时,通过求出由以上关系等式8推导出的方程组中{αi,pi}的解可以识别一个解。显而易见,该问题在数学上变得困难得多,因为g越高,需要估计的{αi,pi}越多。When n>2*(number of parameters to estimate), a solution can be identified by solving the system of equations for { αi , pi } derived from the above relationship, Equation 8. Obviously, the problem becomes much more difficult mathematically, as the higher g, the more { αi , pi } need to be estimated.
典型地不可能通过更低胎儿分数下的简单阈值准确地区分情况1与情况2 (或情况3与情况4)的数据。通过在点进行区分,可将情况1和情况 2的数据容易地与情况3和情况4的数据分离,其中A是次等位基因计数并且 D是覆盖范围并且T是阈值。已发现使用T=0.5可表现满意。It is typically not possible to accurately distinguish the data for Case 1 from Case 2 (or Case 3 from Case 4) by a simple threshold at a lower fetal fraction. The data for Cases 1 and 2 can be easily separated from the data for Cases 3 and 4 by distinguishing at the point A, where A is the minor allele count, D is the coverage, and T is the threshold. Using T = 0.5 has been found to perform satisfactorily.
注意,采用等式10和等式11的混合模型方法是利用所有多态性的数据,但没有分别说明测序误差。将第一和第二情况的数据从第三和第四情况的数据分离的适当方法可以说明测序误差。Note that the mixed model approach using Equations 10 and 11 utilizes data from all polymorphisms but does not account for sequencing errors separately. An appropriate method of separating the data from the first and second cases from the data from the third and fourth cases can account for sequencing errors.
在另外的实例中,提供给混合模型的数据集仅包含针对情况1和情况2的多态性的数据。这些是针对母亲为纯合型的多态性。可采用阈值技术消除情况 3和4的多态性。例如,在采用混合模型之前,将其中次等位基因频率大于具体阈值的多态性排除。利用经适当过滤的数据和已按照下文等式13和14化简的阶乘矩,人们能够计算出胎儿分数f,如等式15中所示。注意等式13是针对混合模型的这个实现方式的等式9的再表述。还注意在这个具体实例中,与机器读数有关的测序误差是未知的。作为结果,必须分别求出方程组的误差, e。In another example, the data set provided to the hybrid model only contains data for the polymorphisms of situation 1 and situation 2. These are polymorphisms that are homozygous for the mother. Thresholding techniques can be used to eliminate the polymorphisms of situation 3 and 4. For example, before adopting the hybrid model, polymorphisms in which the minor allele frequency is greater than a specific threshold value are excluded. Using the data after appropriate filtering and the factorial moment simplified according to equations 13 and 14 below, people can calculate the fetal fraction f, as shown in equation 15. Note that equation 13 is a restatement of equation 9 for this implementation of the hybrid model. Also note that in this specific example, the sequencing error associated with the machine reading is unknown. As a result, the error of the system of equations, e, must be solved separately.
图14显示使用这个混合模型的结果和已知胎儿分数(X轴)以及估计的胎儿分数(Y轴)的比较。如果混合模型完美地预测出胎儿分数,那么描绘的结果将遵循短划线。然而,估计的分数明显地好,特别是考虑到大部分数据在应用混合模型之前被排除。Figure 14 shows the results of using this hybrid model and comparing the known fetal fraction (X-axis) and the estimated fetal fraction (Y-axis). If the hybrid model perfectly predicted the fetal fraction, the results plotted would follow the dashed line. However, the estimated fraction is significantly better, especially considering that most of the data were excluded before applying the hybrid model.
为了进一步详述,可利用若干其他方法对来自等式7的模型进行参数估计。在一些情况下,可以通过将卡方统计量(chi-squared statistic)导数设定为为零来找到易处理的解。在通过直接微分不能够找到容易解的情况下,对二项式概率分布函数(PDF)或其他近似多项式进行泰勒级数展开可以是有效的。最小卡方估计式已众所周知为有效的。从等式9求矩解的方法可用作迭代法的起始点。可使用以下卡方估计式:To further elaborate, several other methods can be used to estimate the parameters of the model from Equation 7. In some cases, a tractable solution can be found by setting the derivative of the chi-squared statistic to zero. In cases where a simple solution cannot be found by direct differentiation, a Taylor series expansion of the binomial probability distribution function (PDF) or other approximate polynomial can be effective. The minimum chi-square estimator is well known to be effective. The method of finding the moment solution from Equation 9 can be used as a starting point for an iterative method. The following chi-square estimator can be used:
其中Pi是计数i的点数。莱卡恩(Le Cam)的迭代方法[“估计和测试性假设的渐近理论(Asymptotic Theory of Estimation and Testing Hypotheses)”,第三次伯克来数理统计和概率研讨会论文集(Proceedings of the Third Berkeley Symposium onMathematical Statistics and Probability),第1卷,伯克来,加利福尼亚(BerkeleyCA):加利福尼亚大学出版社(University of CA Press),1956,第129页到第156页]是使用似然函数中的拉尔夫-牛顿迭代(Ralph-Newton iteration)。where Pi is the number of points counted i. Le Cam's iterative method ["Asymptotic Theory of Estimation and Testing Hypotheses," Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Berkeley, CA: University of California Press, 1956, pp. 129-156] uses Ralph-Newton iterations in the likelihood function.
根据另一种应用,论述一种解析混合模型的方法,其涉及对近似β-分布的混合进行操作的期望值最大化方法。According to another application, a method for analyzing mixture models is discussed, which involves an expectation maximization method operating on mixtures of approximately beta-distributions.
模型1:情况1和2,测序误差未知Model 1: Cases 1 and 2, sequencing error unknown
考虑仅说明杂合性情况1和2的缩小模型。在这种情况下,混合物分布可写成:Consider a reduced model that accounts for only heterozygosity cases 1 and 2. In this case, the mixture distribution can be written as:
A={ai}~α1Bin(e,di)+α2Bin(f/2,di)A={a i }~α 1 Bin(e, d i )+α 2 Bin(f/2, d i )
其中in
l=α1+α2 l=α 1 +α 2
m=4 等式13。m=4 Equation 13.
并且将方程组:And the system of equations:
F1=α1e+(1-α1)(f/2)F 1 =α 1 e+(1-α 1 )(f/2)
F2=α1e2+(1-α1)(f/2)2 F 2 =α 1 e 2 +(1-α 1 )(f/2) 2
F3=α1e3+(1-α1)(f/2)3 等式14,F 3 = α1e 3 + (1-α1)(f/2) 3 Equation 14,
解出e(测序误差率)、α(情况1点的比例)以及f(胎儿分数),其中Fi如以上等式10中所定义。胎儿分数的闭合形式解选择为以下等式的实数解:Solve for e (sequencing error rate), α (proportion of case 1 points), and f (fetal fraction), where Fi is defined above in Equation 10. The closed-form solution for fetal fraction is chosen to be the real number solution of the following equation:
该解在0与1之间。The solution is between 0 and 1.
为了测量推算式的性能,用设计为{1%,3%,5%,10%,,15%,20%以及 25%}的胎儿分数和1%的恒定测序误差率来构造哈迪-温博格平衡点 (Hardy-WeinbergEquilibrium points)的模拟数据集(ai,di)。1%误差率是所使用的测序机器和方案当前所接受的比率,并且与图15中所示出的伊鲁米纳基因组成部分析器II数据一致。将等式15应用于该数据并且发现除了四个点向上偏差之外,与“已知的”胎儿分数大体一致。令人感兴趣的是,据估计,测序误差率,e,正好高于1%。To measure the performance of the imputed formula, a simulated data set (a i , d i ) was constructed using fetal fractions designed to be {1%, 3%, 5%, 10%, 15%, 20%, and 25 %} and a constant sequencing error rate of 1%. A 1% error rate is currently accepted for the sequencing machines and protocols used and is consistent with the Illumina Genome Analyzer II data shown in Figure 15. Equation 15 was applied to this data and found to be generally consistent with the "known" fetal fraction, except for four points that deviated upward. Interestingly, the sequencing error rate, e, was estimated to be just above 1%.
模型2:情况1和2,测序误差已知Model 2: Cases 1 and 2, sequencing error is known
在下一个混合物模型实例中,再次采用阈值确定或另一种过滤技术来去除属于情况3和4的针对多态性的数据。然而,在这种情况下,测序误差是已知的。此举简化了胎儿分数,f,的所得表达式,如等式16中所示。图16示出了混合物模型的这种版本与等式15所采用的方法相比提供了改良的结果。在随后的等式中,使测序机器误差率为e。In the next mixture model example, thresholding or another filtering technique is again employed to remove data specific to polymorphisms belonging to Cases 3 and 4. However, in this case, the sequencing error is known. This simplifies the resulting expression for the fetal fraction, f, as shown in Equation 16. Figure 16 shows that this version of the mixture model provides improved results compared to the approach employed in Equation 15. In the subsequent equations, the sequencing machine error rate is set to e.
在等式17和18中示出了一种类似的方法。该方法认识到,只有一些测序误差添加到次等位基因计数。然而,每四个测序误差中只有一个应当增加次等位基因计数。图17示出了使用该技术时实际的与估计的胎儿分数之间的非常良好契合性。A similar approach is shown in Equations 17 and 18. This approach recognizes that only some sequencing errors contribute to the minor allele count. However, only one out of every four sequencing errors should increase the minor allele count. Figure 17 shows a very good agreement between the actual and estimated fetal fractions using this technique.
因为使用的机器的测序误差率在很大程度上是已知的,所以通过消除作为欲求解的变量的e可降低计算的偏差和复杂性。因此,我们获得了针对胎儿分数f的方程组:Because the sequencing error rate of the machine used is largely known, the bias and complexity of the calculation can be reduced by eliminating e as a variable to be solved. Therefore, we obtain a system of equations for the fetal fraction f:
F1=α1e+(1-α1)(f/2)F 1 =α 1 e+(1-α 1 )(f/2)
F2=α1e2+(1-α1)(f/2)2 等式16,以便获得解:F 2 =α 1 e 2 +(1-α 1 )(f/2) 2 Equation 16, so as to obtain the solution:
图16显示,使用机器误差率作为已知的参数可减少点向上偏差。Figure 16 shows that using the machine error rate as a known parameter can reduce the point-up deviation.
模型3:情况1和2,测序误差已知,改进的误差模型Model 3: Cases 1 and 2, sequencing error is known, improved error model
为了改善该模型中的偏差,我们展开了以上等式的误差模型以说明以下事实:在杂合性情况1中,不是每个测序误差事件都会增加到次等位基因计数 A=ai。此外,我们允许以下事实:测序误差事件可能有助于杂合性情况2的计数。因此,我们通过对以下因子矩关系的系统进行求解来确定胎儿分数f:To improve the bias in this model, we expanded the error model of the above equation to account for the fact that not every sequencing error event contributes to the minor allele count A = a i in heterozygosity case 1. In addition, we allow for the fact that sequencing error events may contribute to the count in heterozygosity case 2. Therefore, we determine the fetal fraction f by solving the following system of factor-moment relations:
F1=α1e/4+(1-α1)(e+f/2)F 1 =α 1 e/4+(1-α 1 )(e+f/2)
则该系统的解是:Then the solution of this system is:
图17示出了使用机器误差率作为已知参数,增强情况1和2的误差模型的模拟数据,使向上偏差大大降低至小于针对低于0.2的胎儿分数的点。Figure 17 shows simulated data for enhancing the error models for Cases 1 and 2 using the machine error rate as a known parameter, resulting in a significant reduction in the upward bias to a point below for fetal fractions below 0.2.
使用胎儿分数对受影响的样品进行分类Classification of affected samples using fetal fraction
在某些实施方案中,采用胎儿分数估计值来进一步表征受影响的样品。在一些情况下,胎儿分数估计值允许将受影响的样品分类为嵌合性、完整的非整倍性或部分的非整倍性。相对于图18的流程图来描绘用于获得该信息的一种计算机实施的方法。可进行这种和相关的方法来同时提供胎儿分数的估计、 CNV的确定以及CNV的分类。换言之,可以采用相同的标签来进行这三种功能中的任一种。In certain embodiments, a fetal fraction estimate is used to further characterize the affected sample. In some cases, the fetal fraction estimate allows the affected sample to be classified as mosaicism, complete aneuploidy, or partial aneuploidy. A computer-implemented method for obtaining this information is described with respect to the flow chart of Figure 18. This and related methods can be performed to simultaneously provide an estimate of fetal fraction, determination of CNVs, and classification of CNVs. In other words, the same label can be used to perform any of these three functions.
为了使用该方法,采用两种评估胎儿分数的模式。一种模式产生NCNFF 值,而另一种模式产生CNFF值。如所解释,CNFF值是使用依赖于被确定拥有拷贝数变异的染色体或染色体区段的技术而获得。不需要依赖多态性来计算胎儿分数。用来计算胎儿分数的非多态技术的一个实例描述于实例17中,该实例假设存在全染色体的复制或缺失并且采用以下表达式:To use this method, two models for estimating fetal fraction are employed. One model produces NCNFF values, while the other produces CNFF values. As explained, CNFF values are obtained using a technique that relies on chromosomes or chromosome segments that have been determined to have copy number variations. It is not necessary to rely on polymorphisms to calculate fetal fraction. An example of a non-polymorphic technique for calculating fetal fraction is described in Example 17, which assumes the presence of a whole chromosome duplication or deletion and uses the following expression:
ff(i)=2*NCVjACVjU 等式28,ff (i) = 2*NCV jA CV jU Equation 28,
其中j代表非整倍性染色体的识别,并且CV代表从合格样品中获得的用来确定针对NCV的表达式中的平均值和标准差的变异系数。where j represents the identification of aneuploid chromosomes and CV represents the coefficient of variation obtained from qualified samples to determine the mean and standard deviation in expression for NCV.
NCNFF值是使用依赖于不具有拷贝数变异的染色体或染色体区段的技术而获得。换言之,NCN胎儿分数是在假设用来计算胎儿分数的基因组的部分的正常倍性的情况下,通过可靠地确定胎儿分数的技术来确定。CN胎儿分数是通过假设予以考虑的样品具有非整倍性的一种形式的技术来确定。受影响的染色体或染色体区段的CNV用来计算CN胎儿分数。下文呈现用于其计算的技术。The NCNFF value is obtained using a technique that relies on chromosomes or chromosome segments that do not have copy number variation. In other words, the NCN fetal fraction is determined by a technique that reliably determines the fetal fraction, assuming a normal ploidy of the portion of the genome used to calculate the fetal fraction. The CNF fetal fraction is determined by a technique that assumes that the sample under consideration has a form of aneuploidy. The CNV of the affected chromosome or chromosome segment is used to calculate the CNF fetal fraction. The technique used for its calculation is presented below.
通过比较NCN胎儿分数的估计值对比CN胎儿分数的估计值,一种方法可确定样品中可能存在的非整倍性的类型。基本上,如果NCN胎儿分数和CN 胎儿分数值匹配,那么在用于评估CN胎儿分数的技术中的倍性假设可被视为是真实的。例如,如果计算CN胎儿分数的方法假设样品具有完整的染色体非整倍性,该非整倍性展现一个染色体的一个单一附加拷贝或一个染色体的一个单一缺失,并且NCN胎儿分数值匹配CN胎儿分数值,那么该方法可得出以下结论:该样品展现完整的染色体非整倍性。作出该假设的基础更详细地描述于下文中。By comparing the estimated value of the NCN fetal fraction to the estimated value of the CN fetal fraction, a method can determine the type of aneuploidy that may be present in a sample. Essentially, if the NCN fetal fraction and CN fetal fraction values match, then the ploidy assumption in the technique used to estimate the CN fetal fraction can be considered true. For example, if the method for calculating the CN fetal fraction assumes that the sample has a complete chromosomal aneuploidy, which exhibits a single additional copy of a chromosome or a single deletion of a chromosome, and the NCN fetal fraction value matches the CN fetal fraction value, then the method can conclude that the sample exhibits a complete chromosomal aneuploidy. The basis for making this assumption is described in more detail below.
NCN胎儿分数可通过不同的技术来确定。在一些实施方案中,使用参照序列基因组中的所选择的多态性估计NCN胎儿分数。这些技术的实例描述于上文中。在其他实施方案中,NCN胎儿分数使用已知不是非整倍体或者已经确定不是非整倍体的染色体的相对量来确定。举例来说,样品中已知不是非整倍体的染色体可能是男性胎儿中的染色体X。因此,在其他实施方案中,使用包含来自怀有儿子的孕妇的DNA的样品中的X染色体或Y染色体的相对量 (例如,这样的染色体的染色体剂量)来确定NCN胎儿分数。儿子的基因组应不包括X染色体的第二拷贝。已知这一点,X染色体DNA的相对量可用于提供胎儿分数的NCN值。在包含女性胎儿DNA的样品中,已知不是非整倍体的染色体可以是已知不与生命相容的染色体。可替代地,对于包含来自男性或女性胎儿的DNA的样品,可以使用序列标签确定染色体剂量(和NCV或NSV) 以证实染色体可用于确定NCN胎儿分数,来确定可用于确定NCN胎儿分数的染色体的正常倍数性的存在。The NCN fetal fraction can be determined using various techniques. In some embodiments, the NCN fetal fraction is estimated using selected polymorphisms in a reference sequence genome. Examples of these techniques are described above. In other embodiments, the NCN fetal fraction is determined using the relative amounts of chromosomes that are known to be non-aneuploid or have been determined to be non-aneuploid. For example, the chromosome in a sample known to be non-aneuploid may be chromosome X in a male fetus. Therefore, in other embodiments, the NCN fetal fraction is determined using the relative amounts of chromosomes X or Y (e.g., the chromosome doses of such chromosomes) in a sample comprising DNA from a pregnant woman carrying a son. A son's genome should not include a second copy of the X chromosome. Knowing this, the relative amount of X chromosome DNA can be used to provide an NCN value for the fetal fraction. In a sample comprising DNA from a female fetus, the chromosome known to be non-aneuploid may be one known to be incompatible with life. Alternatively, for samples comprising DNA from either a male or female fetus, sequence tags can be used to determine chromosome doses (and NCV or NSV) to confirm that the chromosome is useful for determining the NCN fetal fraction, thereby confirming the presence of normal ploidy for the chromosomes useful for determining the NCN fetal fraction.
转到图18的流程图1800,比较NCN胎儿分数估计值1802和CN胎儿分数估计值1804。如果它们匹配,如方框1806处所指示,那么该过程得出结论,并且确定用于估计CN胎儿分数的技术中所含假设是真实的。在不同的实施方案中,该假设为:胎儿的染色体之一中存在三体性或单体性。18 , the NCN fetal fraction estimate 1802 is compared to the CN fetal fraction estimate 1804. If they match, as indicated at block 1806, the process concludes and determines that the assumptions underlying the technique for estimating the CN fetal fraction are true. In various embodiments, the assumption is that a trisomy or monosomy is present in one of the chromosomes of the fetus.
另一方面,如果该比较指出,两个胎儿分数的值不匹配(条件1808)并且实际上CN胎儿分数的估计值小于NCN胎儿分数,那么将如方框1810处所指示执行该方法的第二阶段。On the other hand, if the comparison indicates that the two fetal fraction values do not match (condition 1808 ) and that the estimated value of the CN fetal fraction is in fact less than the NCN fetal fraction, then the second stage of the method is performed as indicated at block 1810 .
在该第二阶段中,该方法确定样品是包含部分的非整倍性还是嵌合性。此外,如果样品包括部分的非整倍性,那么该方法确定非整倍性驻留在非整倍体染色体上的何处。在某些实施方案中,这是通过首先将受影响的染色体装箱成多个基块来实现。在一个实例中,各基块在长度上是约1百万个碱基对。当然,可以使用其他基块长度,如约1千碱基、约10千碱基、约100千碱基等。这些基块不重叠并且跨越该染色体的大部分或所有长度。将这些基块或数据箱彼此相比较,并且该比较提供关于条件的见解。在一种方法中,针对各基块或数据箱,对映射的标签进行计数并且任选地转化成数据箱剂量。如果这些数据箱或基块中的任一者是非整倍体,那么这些计数或数据箱剂量就将其指出。作为单独的数据箱的分析的一部分,可以比较适当的是将来自各个数据箱的信息归一化以说明数据箱间变异,如G-C含量。所得归一化的数据箱可以称为针对归一化的数据箱值的NBV;NBV是染色体区段的一个实例,该染色体区段归一化到映射到具有类似GC含量的区段的GC含量的归一化区段的标签(如以下实例19中)。在一些实施方案中,针对各数据箱计算胎儿分数并且比较胎儿分数值的单独值。各数据箱的该序列分析描绘于图18的方框1812中。如果任何数据箱或基块被识别为具有非整倍性(通过考虑标签密度、胎儿分数或其他信息),那么该方法确定该样品包含部分的非整倍性并且另外地用其中标签计数充分偏离预期值的数据箱来定位该非整倍性。参见方框1814。In this second stage, the method determines whether the sample comprises partial aneuploidy or mosaicism. In addition, if the sample comprises partial aneuploidy, the method determines where the aneuploidy resides on the aneuploid chromosome. In certain embodiments, this is achieved by first boxing the affected chromosome into multiple base blocks. In one example, each base block is about 1 million base pairs in length. Of course, other base block lengths can be used, such as about 1 kilobase, about 10 kilobases, about 100 kilobases, etc. These base blocks do not overlap and span most or all lengths of the chromosome. These base blocks or data boxes are compared with each other, and the comparison provides insights into the condition. In one method, for each base block or data box, the mapped label is counted and optionally converted into a data box dosage. If any one of these data boxes or base blocks is aneuploid, these counts or data box dosages are just pointed out. As a part of the analysis of an independent data box, it can be more appropriate to normalize the information from each data box to illustrate variation between the data boxes, such as G-C content. The resulting normalized data bins can be referred to as NBVs for the normalized data bin values; NBVs are an example of chromosome segments that are normalized to the labels of normalized segments mapped to GC content of segments with similar GC content (as in Example 19 below). In some embodiments, fetal fractions are calculated for each data bin and the individual values of fetal fraction values are compared. This sequence analysis of each data bin is depicted in block 1812 of Figure 18. If any data bin or block is identified as having an aneuploidy (by considering tag density, fetal fraction, or other information), the method determines that the sample contains partial aneuploidy and additionally locates the aneuploidy using the data bins in which the tag counts deviate sufficiently from the expected value. See block 1814.
然而,如果当分析予以考虑的染色体的单独这些末端时,该方法不识别展现非整倍性的任何染色体区域,那么该方法确定样品包含嵌合性。参见方框 1816。However, if the method does not identify any chromosomal regions that exhibit aneuploidy when analyzing the ends of the chromosomes under consideration alone, then the method determines that the sample contains mosaicism. See box 1816.
在受影响的样品的感兴趣的染色体上和已知不是非整倍体的染色体(例如,染色On the chromosome of interest in the affected sample and on a chromosome known not to be aneuploid (e.g., chromosome 体X)上使用多态性,例如SNP,来计算并且比较真实的胎儿分数,以便确定男性胎儿中存在The true fetal fraction is calculated and compared using polymorphisms, such as SNPs, on chromosome X to determine the presence of a male fetus. 或不存在完整的或部分的非整倍性Complete or partial aneuploidy is absent
如所解释,使用信息多态序列,例如信息SNP,确定的胎儿分数(FF) 可用于区分完整的染色体非整倍性与部分的非整倍性。As explained, using informative polymorphic sequences, such as informative SNPs, the determined fetal fraction (FF) can be used to distinguish complete chromosomal aneuploidies from partial aneuploidies.
存在或不存在非整倍性,无论是部分的还是完整的,可以从使用感兴趣的染色体上所存在的多态目标序列确定的胎儿分数的值来确定,并且与使用该样品中不同的染色体上所存在的多态目标序列确定的胎儿分数的值进行比较。在胎儿是男性的样品中,可确定感兴趣的染色体上的FF,并且与同一样品中针对染色体X确定的FF进行比较。例如,给定母体样品来自怀有具有21三体性的男性胎儿的母亲,那么选择多态序列,例如包含至少一个信息SNP的序列,以便呈现在染色体21上和染色体X上;对多态目标序列进行扩增和测序,并且如在本申请的其他地方说明确定胎儿分数。The presence or absence of an aneuploidy, whether partial or complete, can be determined from the fetal fraction value determined using a polymorphic target sequence present on the chromosome of interest and compared to the fetal fraction value determined using a polymorphic target sequence present on a different chromosome in the sample. In a sample where the fetus is male, the fetal fraction (FF) on the chromosome of interest can be determined and compared to the fetal fraction (FF) determined for chromosome X in the same sample. For example, given a maternal sample from a mother carrying a male fetus with trisomy 21, polymorphic sequences, such as sequences containing at least one informative SNP, are selected so as to be represented on chromosome 21 and chromosome X; the polymorphic target sequences are amplified and sequenced, and the fetal fraction is determined as described elsewhere in this application.
给定胎儿分数与样品中胎儿染色体的量成比例,那么使用母体样品中三体性染色体上所存在的多态序列确定的胎儿分数将是使用相同母体样品中的男性胎儿中已知不是非整倍体的染色体(例如,染色体X)上的多态序列确定的胎儿分数的1+1/2倍。例如,在正常样品中,当使用染色体21上的多态性组确定胎儿分数(FF21)并且使用染色体X上的多态性组确定胎儿分数(FFX) 时,已知染色体X在男性胎儿中是未受影响的,那么FF21=FFX。然而,如果胎儿针对染色体21是三体性的,那么针对三体性染色体21的胎儿分数(FF21) 将等于相同样品中染色体X的胎儿分数(FFX)的一又二分之一倍(FF21=1.5* FFX)。于是,如果FF21<FFX,那么分析逻辑可得出以下结论:存在染色体21 的部分的缺失和/或存在嵌合性。如果FF21>FFX,那么分析逻辑可得出以下结论:染色体21的一部分有所增加,例如染色体21的部分的复制或倍增或者完整的复制,染色体21在用于由染色体21计算胎儿分数的技术中未进行说明。两个结果之间的差异可一被解决为部分的复制,将产生<1.5*FFX的FF。可替代地,嵌合性的部分的复制、缺失或存在可通过例如增加染色体21上的多态序列数以便沿该染色体的长度获得多个FF值来进行确定,使得针对FF的双重或多重值的局部存在表明染色体的一部分有所增加。可替代地,如将作为针对嵌合性样品的情况,由多态序列确定的FF在染色体的整个长度上保持不变,表明完整的染色体的量总体增加,但该增加小于针对FFX的增加,如上文所述。在存在整个染色体的损失的情况下,例如染色体X单体性,那么FF单体性=1/2FFX。由信息多态序列获得的胎儿分数值可以被用于与序列剂量和其归一化的剂量值,例如NCV、NSV组合,用于确认存在完整的非整倍性。Given that the fetal fraction is proportional to the amount of fetal chromosomes in a sample, the fetal fraction determined using polymorphic sequences present on a trisomic chromosome in a maternal sample will be 1+1/2 times the fetal fraction determined using polymorphic sequences on a chromosome known to be non-aneuploid (e.g., chromosome X) in a male fetus in the same maternal sample. For example, in a normal sample, when the fetal fraction (FF 21 ) is determined using a polymorphism panel on chromosome 21 and the fetal fraction (FF X ) is determined using a polymorphism panel on chromosome X, it is known that chromosome X is unaffected in male fetuses, then FF 21 = FF X . However, if the fetus is trisomic for chromosome 21, the fetal fraction for trisomic chromosome 21 (FF 21 ) will be equal to one and a half times the fetal fraction (FF X ) for chromosome X in the same sample (FF 21 = 1.5 * FF X ). Thus, if FF 21 < FF X , the analysis logic can conclude that there is a partial deletion of chromosome 21 and/or mosaicism. If FF 21 > FF X , the analysis logic can conclude that there is an increase in a portion of chromosome 21, such as a partial duplication or multiplication of chromosome 21, or a complete duplication, which is not accounted for in the techniques used to calculate fetal fraction from chromosome 21. The difference between the two results can be resolved as a partial duplication, resulting in an FF < 1.5 * FF X . Alternatively, the presence of a partial duplication, deletion, or mosaicism can be determined by, for example, increasing the number of polymorphic sequences on chromosome 21 to obtain multiple FF values along the length of the chromosome, such that the local presence of double or multiple FF values indicates an increase in a portion of the chromosome. Alternatively, as will be the case for a mosaic sample, the FF determined from the polymorphic sequences remains constant along the entire length of the chromosome, indicating an overall increase in the amount of the complete chromosome, but this increase is less than the increase for FF X , as described above. In the case of loss of an entire chromosome, such as chromosome X monosomy, then FF monosomy = 1/2 FF X. Fetal fraction values obtained from informative polymorphic sequences can be used in combination with sequence doses and their normalized dose values, such as NCV, NSV, to confirm the presence of a complete aneuploidy.
由非整倍体序列的染色体剂量计算胎儿分数Calculation of fetal fraction from chromosome dose of aneuploid sequences
根据以下等式计算针对感兴趣的染色体的NCV:Calculate the NCV for the chromosome of interest according to the following equation:
其中和对应地为针对合格的样品集中的第j个染色体剂量的估计均数和标准差,而xij是测试样品i的观测第j个染色体剂量。where and are the estimated mean and standard deviation of the jth chromosome dose for the qualified sample set, respectively, and x ij is the observed jth chromosome dose for test sample i.
总体上,针对三体性的染色体剂量将与胎儿分数(ff)成比例地增加。因此,针对含有三体性染色体的样品中的染色体剂量的ff将相对于胎儿分数成比例地增加:In general, the chromosome dose for trisomy will increase proportionally with the fetal fraction (ff). Thus, the ff for the chromosome dose in a sample containing a trisomic chromosome will increase proportionally relative to the fetal fraction:
针对单体性的染色体剂量将与胎儿分数(ff)成比例地减少。因此,针对含有单体性染色体的样品中的染色体剂量的ff将相对于胎儿分数成比例地减少:The chromosome dose for monosomy will be reduced proportionally with the fetal fraction (ff). Thus, the ff for the chromosome dose in a sample containing a monosomy chromosome will be reduced proportionally relative to the fetal fraction:
等式20和21中,RjA是受影响的样品 (例如,待测试的母体样品)i中针对染色体j的染色体剂量(xij);ff是未受影响的(合格的)样品U中的预期胎儿分数;并且RjU是未受影响的样品中的染色体剂量。基于以下假设包括因子“2”:等式20中的计算符号为“加号”,即存在感兴趣的染色体的一个额外的拷贝;等式21中的计算符号为“减号”,即缺少感兴趣的染色体的一个完整拷贝。如果另外作出不同的假设(例如,这是感兴趣的染色体的部分的复制),那么因子“2”不代表实际意义。In equations 20 and 21, R jA is the chromosome dose (x ij ) for chromosome j in the affected sample (e.g., the maternal sample to be tested) i; ff is the expected fetal fraction in the unaffected (qualified) sample U; and R jU is the chromosome dose in the unaffected sample. The factor "2" is included based on the following assumptions: the calculation sign in equation 20 is "plus," i.e., there is an extra copy of the chromosome of interest; the calculation sign in equation 21 is "minus," i.e., there is a complete copy of the chromosome of interest. If a different assumption is made (e.g., this is a partial duplication of the chromosome of interest), then the factor "2" has no practical significance.
替代等式19中的染色体剂量RA:Substituting chromosome dose RA in Equation 19:
其中是的等效表示,并且σjU是的等效表示;如下解出ff:where is an equivalent representation of and σ jU is an equivalent representation of; solve for ff as follows:
或or
或or
或or
因此,可将针对三体性染色体假设的任何染色体的百分比“ff(i)”确定为:Therefore, the percentage "ff (i) " for any chromosome hypothesized to be trisomic can be determined as:
ff(i)=2*NCVjACVjU 等式26。ff (i) = 2*NCV jA CV jU Equation 26.
可将针对单体性染色体假设的任何染色体的百分比“ff(i)”确定为:The percentage "ff (i) " for any chromosome hypothesized for a monosomic chromosome can be determined as:
ff(i)=-2*NCVjACVjU 等式27。ff (i) = -2*NCV jA CV jU Equation 27.
等式27的假设是染色体的一个完整拷贝缺失。该染色体对应的NCVjA必然是负数。因此,虽然等式27含有负号,但计算得到的胎儿分数仍然是正数值。Equation 27 assumes the deletion of a complete copy of the chromosome. The NCV jA corresponding to this chromosome is necessarily negative. Therefore, even though Equation 27 contains a negative sign, the calculated fetal fraction is still positive.
由于胎儿分数不可能是负数,任何染色体的“ff(i)”可以通过以下等式来计算:Since the fetal fraction cannot be negative, the "ff (i) " for any chromosome can be calculated using the following equation:
ff(i)=2*|NCVjACVjU| 等式28ff (i) = 2*|NCV jA CV jU | Equation 28
使用胎儿分数来解决无判定Using fetal fraction to resolve no call
基于第一个基因组的相对序列贡献相对于第二个基因组的贡献来断定确定两个基因组的混合物中所存在的一个或多个序列的表达的显著差异的能力。例如,使用母体样品中的cfDNA的非侵入性产前诊断具有挑战性,因为只有一小部分DNA样品来源于胎儿。针对产前诊断分析,母体DNA的背景形成了对灵敏度的实际限制,并且因此,母体样品中所存在的胎儿DNA的分数是一个重要参数。通过对DNA分子计数进行的胎儿非整倍性检测的灵敏度取决于胎儿DNA分数和所计数的分子数。The ability to determine significant differences in the expression of one or more sequences present in a mixture of two genomes based on the relative sequence contribution of the first genome relative to the contribution of the second genome. For example, non-invasive prenatal diagnosis using cfDNA in a maternal sample is challenging because only a small portion of the DNA sample is derived from the fetus. For prenatal diagnostic analysis, the background of maternal DNA forms a practical limitation to sensitivity, and therefore, the fraction of fetal DNA present in the maternal sample is an important parameter. The sensitivity of fetal aneuploidy detection by counting DNA molecules depends on the fetal DNA fraction and the number of molecules counted.
典型地,通过大规模平行测序针对胎儿非整倍性进行分析的母体测试样品中约1%是“无判定”样品,对其而言,不充分的测序信息,例如胎儿序列标签数,阻碍了自信地确定母体样品中存在或不存在一个或多个胎儿非整倍性。“无判定”确定可能由于胎儿cfDNA含量相对于母体贡献给用于提供测序信息样品的含量而言过低以致由合格的样品中所确定的测序信息来辨别非整倍体样品所致。为了确定““无判定”样品是或不是非整倍体样品,凭经验确定和/或例如由NVC值得到胎儿分数,并且用于确定或否定染色体非整倍性的存在。如本文其他部分所述,ff可用于表征测试样品中所存在的非整倍性的类型。例如,针对将“无判定”区设在2.5与4NCV值之间的阈值,具有接近4倍NCV 阈值的NCV并且显示具有较低(例如小于3%)胎儿分数的测试样品可能是受影响的样品。反之,具有接近2.5NCV阈值的NCV并且显示具有较高(例如大于40%)胎儿分数的测试样品可能是未受影响的样品。拆分“无判定”样品可能依赖于胎儿分数的一种确定。优选地,根据两种或更多种不同的方法,或由使用利用相同的方法从样品的两个或更多个不同的染色体中确定的NCV来确定胎儿分数,类似地,胎儿分数可用于评定NCV略大于4或略小于NCV 2.5 的样品对应地是否可能是假阳性或假阴性判定。Typically, approximately 1% of maternal test samples analyzed for fetal aneuploidy by massively parallel sequencing are "no-call" samples, for which insufficient sequencing information, such as the number of fetal sequence tags, prevents a confident determination of the presence or absence of one or more fetal aneuploidies in the maternal sample. A "no-call" determination may be due to the fetal cfDNA content being too low relative to the maternal contribution to the sample used to provide sequencing information, resulting in the sequencing information determined in a qualified sample being sufficient to discern an aneuploid sample. To determine whether a "no call" sample is or is not an aneuploid sample, a fetal fraction is determined empirically and/or derived, for example, from an NVC value, and used to determine or deny the presence of a chromosomal aneuploidy. As described elsewhere herein, ff can be used to characterize the type of aneuploidy present in a test sample. For example, for a "no call" zone set at a threshold between 2.5 and 4 NCV values, a test sample having an NCV close to 4 times the NCV threshold and exhibiting a low (e.g., less than 3%) fetal fraction may be an affected sample. Conversely, a test sample having an NCV close to the 2.5 NCV threshold and exhibiting a high (e.g., greater than 40%) fetal fraction may be an unaffected sample. Splitting a "no call" sample may rely on one determination of the fetal fraction. Preferably, the fetal fraction is determined according to two or more different methods, or by using NCVs determined from two or more different chromosomes of a sample using the same method. Similarly, the fetal fraction can be used to assess whether a sample having an NCV slightly greater than 4 or slightly less than an NCV 2.5 is likely to be a false positive or false negative call, respectively.
用于确定CNV的设备和系统Devices and systems for determining CNVs
对测序数据的分析和源自于其的诊断典型地使用不同的计算机执行的算法和程序来进行。因此,某些实施方案采用涉及在一个或多个计算机系统或其他处理系统中对数据进行存储或通过其进行转移的工艺。本发明的多个实施方案还关于用于进行这些操作的设备。该设备可针对所需目的特别地构造,或其可以是由计算机中存储的计算机程序和/或数据结构选择性地激活或重新配置的通用计算机(或一组计算机)。在一些实施方案中,一组处理器以协作方式和/或同时执行一些或所有叙述的分析操作(例如通过网络或云计算)。用于执行本文所述的方法的一个处理器或一组处理器可属于不同的类型,包括微控制器和微处理器,如可编程装置(例如CPLD和FPGA)和非可编程装置,如门阵列ASIC或通用微处理器。The analysis of sequencing data and the diagnosis derived therefrom are typically performed using different computer algorithms and programs. Therefore, certain embodiments employ a process that relates to storing data in one or more computer systems or other processing systems or transferring data therethrough. Multiple embodiments of the present invention are also related to the equipment for performing these operations. The equipment can be specially constructed for the desired purpose, or it can be a general-purpose computer (or a group of computers) selectively activated or reconfigured by a computer program and/or data structure stored in a computer. In some embodiments, a group of processors performs some or all of the analytical operations narrated in a collaborative manner and/or simultaneously (e.g., through a network or cloud computing). A processor or a group of processors for performing the methods described herein may belong to different types, including microcontrollers and microprocessors, such as programmable devices (e.g., CPLDs and FPGAs) and non-programmable devices, such as gate array ASICs or general-purpose microprocessors.
另外,某些实施方案关于有形的和/或非暂时性的计算机可读媒体或计算机程序产品,这些媒体或产品包括程序指令和/或数据(包括数据结构),这些程序指令和/或数据(包括数据结构)用于执行不同的由计算机实施的操作。计算机可读媒体的实例包括但不限于半导体存储装置;磁媒体,如磁盘驱动器、磁带;光学媒体,如CD;磁光媒体;以及经过特别配置以存储并且执行程序指令的硬件装置,如只读存储装置(ROM)和随机存取存储器(RAM)。计算机可读媒体可由最终用户直接地控制,或媒体可由最终用户间接地控制。受直接控制的媒体的实例包括位于不与其他机构共享的用户装置和/或媒体处的媒体。受间接控制的媒体的实例包括用户通过外部网络和/或通过提供共享资源的服务(如“云”)而间接地可存取的媒体。程序指令的实例包括机器码(如由编译程序产生的)以及包含可由计算机使用解释器执行的高级代码的文件。Additionally, certain embodiments relate to tangible and/or non-transitory computer-readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. Examples of computer-readable media include, but are not limited to, semiconductor memory devices; magnetic media such as disk drives and magnetic tapes; optical media such as CDs; magneto-optical media; and hardware devices specifically configured to store and execute program instructions, such as read-only memory (ROM) and random access memory (RAM). Computer-readable media can be directly controlled by an end user, or the media can be indirectly controlled by the end user. Examples of directly controlled media include media located at a user device and/or media that is not shared with other entities. Examples of indirectly controlled media include media that is indirectly accessible to a user via an external network and/or through a service that provides shared resources (such as the "cloud"). Examples of program instructions include machine code (such as produced by a compiler) and files containing high-level code that can be executed by a computer using an interpreter.
在不同的实施方案中,所披露的方法以及设备中采用的数据或信息是以电子格式提供。这些数据或信息可包括源自于核酸样品的读数和标签、与参照序列的特定区域比对(例如与染色体或染色体区段比对)的这些标签的计数或密度、参照序列(包括仅仅或主要提多态性的参照序列)、染色体和区段剂量、判定(如非整倍性判定)、归一化的染色体和区段值、成对染色体或区段和相应的归一化染色体或区段、咨询建议、诊断等。如本文所使用,以电子格式提供的数据或其他信息可存储在机器上并且在机器之间传输。常规地,呈电子格式的数据以数位形式提供,并且可作为比特和/或字节形式存储在不同的数据结构、列表、数据库等中。该数据可以电子、光学等方式体现。In different embodiments, the data or information adopted in disclosed method and equipment are provided in electronic format.These data or information may include the reading and label derived from nucleic acid sample, the counting or density of these labels compared (such as compared with chromosome or chromosome segment) with the specific region of reference sequence (including only or mainly mentioning polymorphic reference sequence), chromosome and segment dosage, judgment (such as aneuploidy judgment), normalized chromosome and segment value, paired chromosome or segment and corresponding normalization chromosome or segment, consultation, diagnosis etc. As used herein, the data or other information provided in electronic format can be stored on a machine and transmitted between machines.Routinely, the data in electronic format are provided in digital form, and can be stored in different data structures, lists, databases etc. as bit and/or byte form.The data can be embodied in electronic, optical or other ways.
在一个实施方案中,本发明提供一种计算机程序产品,该产品用于产生指示测试样品中存在或不存在非整倍性(例如胎儿非整倍性)或癌症的输出。该计算机产品可含有用于执行任何一种或多种用于确定染色体异常的上述方法的指令。如所说明,该计算机产品可包括非暂时性的和/或有形的计算机可读媒体,该计算机可读媒体上具有记录在其上的计算机可执行的或可编译的逻辑 (例如指令)以便启动处理器来确定染色体剂量以及在一些情况下存在还是不存在胎儿非整倍性。在一个实例中,该计算机产品包含计算机可读媒体,该计算机可读媒体具有记录在其上的计算机可执行的或可编译的逻辑(例如指令) 以便启动处理器来诊断胎儿非整倍性,该计算机产品包含:一个接收程序,用于接收来自母体生物样品的至少一部分核酸分子的测序数据,其中该测序数据包含经计算的染色体和/或区段剂量;计算机辅助逻辑,用于根据该接收的数据分析胎儿非整倍性;以及一个输出程序,用于产生指示该胎儿非整倍性的存在、不存在或种类的输出。In one embodiment, the present invention provides a computer program product for generating an output indicating the presence or absence of aneuploidy (e.g., fetal aneuploidy) or cancer in a test sample. The computer product may contain instructions for executing any one or more of the above-mentioned methods for determining chromosomal abnormalities. As described, the computer product may include a non-transitory and/or tangible computer-readable medium having computer-executable or compilable logic (e.g., instructions) recorded thereon to start a processor to determine chromosome dosage and, in some cases, the presence or absence of fetal aneuploidy. In one example, the computer product includes a computer-readable medium having computer-executable or compilable logic (e.g., instructions) recorded thereon to start a processor to diagnose fetal aneuploidy, the computer product comprising: a receiving program for receiving sequencing data of at least a portion of nucleic acid molecules from a maternal biological sample, wherein the sequencing data comprises calculated chromosome and/or segment dosages; computer-assisted logic for analyzing fetal aneuploidy based on the received data; and an output program for generating an output indicating the presence, absence, or type of fetal aneuploidy.
来自予以考虑的样品的测序信息可映射到染色体参照序列以识别许多针对任何一个或多个感兴趣的染色体中每一者的序列标签并且识别许多针对所述任何一个或多个感兴趣的染色体中每一者的归一化区段序列的序列标签。在不同的实施方案中,这些参照序列存储在数据库中,例如关系曲线或目标数据库。The sequencing information from the sample under consideration can be mapped to a chromosome reference sequence to identify a number of sequence tags for each of any one or more chromosomes of interest and to identify a number of sequence tags for normalizing segment sequences for each of the one or more chromosomes of interest. In various embodiments, these reference sequences are stored in a database, such as a relationship curve or target database.
应理解,让一个不使用辅助工具的人来执行本文所披露的方法的计算操作在大多数情况下是不切实际的或甚至不可能的。例如,在无计算装置辅助的情况下将来自样品的单一30bp读数映射到任一个人类染色体可能需要几年的努力。当然,该问题由于可靠的非整倍性判定总体上需要映射一个或多个染色体的数千(例如至少约10,000)或甚至数百万个读数而复杂化。It should be understood that it is impractical or even impossible to allow a person who does not use auxiliary tools to perform the calculation operation of the method disclosed herein in most cases. For example, in the case where there is no computing device auxiliary, a single 30bp reading from a sample is mapped to any human chromosome and may require several years of effort. Of course, this problem is complicated by the fact that reliable aneuploidy judgment generally requires thousands (such as at least about 10,000) or even millions of readings to map one or more chromosomes.
本文所披露的方法可使用计算机可读媒体来执行,该计算机可读媒体具有存储在其上的计算机可读指令,用于执行用于识别任何CNV,例如染色体或部分的非整倍性的方法。因此,在一个实施方案中,本发明提供一种计算机可读媒体,该计算机可读媒体具有存储在其上的计算机可读指令,用于执行用于鉴别完整的和部分的染色体非整倍性,例如胎儿非整倍性的方法。这些指令可以包括例如用于进行以下操作的指令:(a)获得针对一个样品中的胎儿和母体核酸的序列信息和/或至少暂时性地将这些信息存储在计算机可读媒体中;(b) 使用所存储的序列信息从胎儿和母体核酸的混合物中计算机识别许多针对任何一个或多个选自染色体1-22、X以及Y的感兴趣的染色体中每一者的序列标签,并且识别许多针对该一个或多个感兴趣的染色体中每一者的至少一个归一化染色体序列的序列标签;以及(c)使用针对该一个或多个感兴趣的染色体中每一者识别的序列标签数和针对各归一化染色体序列识别的序列标签数,由计算机计算各感兴趣的染色体的单一染色体剂量。这些指令可以使用一个或多个经过适当地设计或配置的处理器来执行。这些指令可以另外地包括将各染色体剂量与相关阈值进行比较,并且由此确定该样品中存在或不存在任何四种或更多种部分的或完整的不同胎儿染色体非整倍性。如上文所说明,存在许多关于该工艺的变化方案。所有这些变化方案都可以在如此处所述使用处理和存储特征时实施。The method disclosed herein can be performed using a computer-readable medium having a computer-readable instruction stored thereon for performing a method for identifying any CNV, such as a chromosome or a partial aneuploidy. Therefore, in one embodiment, the present invention provides a computer-readable medium having a computer-readable instruction stored thereon for performing a method for identifying complete and partial chromosome aneuploidy, such as a fetal aneuploidy. These instructions may include, for example, instructions for performing the following operations: (a) obtaining sequence information for the fetus and maternal nucleic acid in a sample and/or at least temporarily storing this information in a computer-readable medium; (b) using the stored sequence information from a mixture of fetal and maternal nucleic acids, a computer identifies many sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and identifies many sequence tags for at least one normalizing chromosome sequence for each of the one or more chromosomes of interest; and (c) using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each normalizing chromosome sequence, a single chromosome dose for each chromosome of interest is calculated by a computer. These instructions can use one or more processors suitably designed or configured to perform.These instructions can additionally comprise comparing each chromosome dosage with a relevant threshold value, and determine thus that in this sample, there are or do not exist any four or more partial or complete different fetal chromosome aneuploidies.As explained above, there are many variations about this technique.All these variations can be implemented when using processing and storage features as described herein.
在一些实施方案中,这些指令可以进一步包括在针对提供母体测试样品的人类受试者的患者医疗记录中自动地记录关于该方法的信息,如染色体剂量和存在或不存在胎儿染色体非整倍性。该患者医疗记录可以由例如实验室、医师办公室、医院、健康维护组织、保险公司或个人医疗记录网站保存。此外,基于由处理器实施的分析的结果,该方法可进一步涉及开处方、起始和/或改变取得母体测试样品的人类受试者的治疗。这可能涉及对取自该受试者的附加样品进行一种或多种附加测试或分析。In some embodiments, these instructions may further be included in the patient medical record for the human subject providing the maternal test sample and automatically record information about the method, such as chromosome dosage and presence or absence of fetal chromosome aneuploidy. The patient medical record can be preserved by, for example, a laboratory, a physician's office, a hospital, a health maintenance organization, an insurance company or a personal medical record website. In addition, based on the result of the analysis implemented by the processor, the method may further relate to prescribing, starting and/or changing the treatment of the human subject obtaining the maternal test sample. This may relate to carrying out one or more additional tests or analysis to the additional sample taken from the experimenter.
所披露的方法还可以使用计算机处理系统来执行,该计算机处理系统经过调适或配置以执行用于识别任何CNV,例如染色体或部分的非整倍性的方法。因此,在一个实施方案中,本发明提供一种计算机处理系统,其经过调适或配置以执行如本文所述的方法。在一个实施方案中,该设备包含一个测序装置,该测序装置经过调适或配置用于对样品中的至少一部分核酸分子进行测序以获得本文其他部分所述的序列信息类型。该设备还可以包括用于处理样品的装置。这些装置描述于本文其他部分中。The disclosed method can also be performed using a computer processing system that is adapted or configured to perform a method for identifying any CNV, such as a chromosomal or partial aneuploidy. Therefore, in one embodiment, the present invention provides a computer processing system that is adapted or configured to perform the method as described herein. In one embodiment, the device comprises a sequencing device that is adapted or configured to sequence at least a portion of the nucleic acid molecules in the sample to obtain the sequence information type described elsewhere herein. The device can also include a device for processing the sample. These devices are described elsewhere herein.
序列或其他数据可直接或间接地输入到计算机中或存储在计算机可读媒体上。在一个实施方案中,计算机系统被直接连接到可读取和/或分析来自样品的核酸序列的测序装置上。得自这些工具的序列或其他信息通过界面提供在计算机系统中。作为替代方案,由序列存储源,如数据库或其他存储库提供通过系统处理的序列。在用该处理装置后,存储装置或大容量存储装置至少暂时性地缓冲或存储核酸的序列。另外,存储装置可以存储针对不同的染色体或基因组的标签计数等。该存储器还可以存储用于分析存在的序列或映射数据的不同的子程序和/或程序。这些程序/子程序可包括用于执行统计分析的程序等。Sequence or other data can be directly or indirectly input into the computer or be stored on a computer-readable medium. In one embodiment, a computer system is directly connected to a sequencing device that can read and/or analyze the nucleic acid sequence from a sample. The sequence or other information derived from these instruments is provided in the computer system by an interface. As an alternative, a sequence storage source is provided, such as a database or other storage libraries, by a sequence processing system. After using this treatment unit, a storage device or a large-capacity storage device can at least temporarily buffer or store the sequence of nucleic acid. In addition, a storage device can store label counts for different chromosomes or genomes, etc. This memory can also store different subroutines and/or programs for analyzing the sequence or mapping data that exist. These programs/subroutines can comprise programs for performing statistical analysis, etc.
在一个实例中,用户向测序装置中提供一个样品。通过连接到计算机的测序装置来收集和/或分析数据。该计算机上的软件允许数据收集和/或分析。数据可存储、显示(通过监视器或其他类似装置)和/或发送到另一位置。可将该计算机连接到因特网,用于将数据传输到远程用户(例如医师、科学家或分析员)所使用的手持型装置中。应理解,可以在传输之前存储和/或分析数据。在一些实施方案中,收集原始数据并且发送给将对该数据进行分析和/或存储的远程用户或装置。可通过因特网进行传输,但也可以通过卫星或其他连接进行。作为替代方案,可以将数据存储在计算机可读媒体上,并且可将该媒体送到最终用户处(例如通过邮件)。该远程用户可处于相同或不同的地理位置,包括但不限于建筑物、城市、州、国家或大陆。In one example, a user provides a sample to a sequencing device. Collect and/or analyze data by being connected to a sequencing device of a computer. The software on this computer allows data collection and/or analysis. Data can be stored, displayed (by a monitor or other similar device) and/or sent to another location. This computer can be connected to the Internet for transmitting data to the handheld device used by a remote user (such as a physician, scientist or analyst). It should be understood that data can be stored and/or analyzed before transmission. In some embodiments, raw data is collected and sent to a remote user or device that will analyze and/or store this data. Transmission can be performed via the Internet, but it can also be performed by satellite or other connections. As an alternative, data can be stored on a computer-readable medium, and this medium can be sent to an end user (such as by mail). This remote user can be in the same or different geographical location, including but not limited to a building, city, state, country or continent.
在一些实施方案中,这些方法还包括收集关于多个多核苷酸序列的数据 (例如读数、标签和/或参照染色体序列)并且将该数据发送到计算机或其他计算系统。例如,可以将该计算机连接到实验室设备,例如样品收集装置、核苷酸扩增装置、核苷酸测序装置或杂交装置。然后,该计算机可收集由实验室装置采集的适当数据。可以在任何步骤,例如在收集时实时、在发送之前、在发送期间或同时或者在发送之后将该数据存储在计算机上。可以将该数据存储在可从该计算机中拔出的计算机可读媒体上。所收集或存储的数据可以从该计算机传输到远程位置,例如通过局域网或广域网,如因特网。在该远程位置处,可如下文所述对所传输的数据进行不同的操作。In some embodiments, these methods also comprise collecting the data about a plurality of polynucleotide sequences (for example reading, label and/or with reference to chromosome sequence) and sending these data to computer or other computing systems.For example, this computer can be connected to laboratory equipment, for example sample collection device, nucleotide amplification device, nucleotide sequencing device or hybridization device.Then, this computer can collect the appropriate data gathered by laboratory equipment.Can be in any step, for example, when collecting, in real time, before sending, during sending or simultaneously or after sending, these data are stored on computer.These data can be stored on the computer-readable medium that can be pulled out from this computer.The data collected or stored can be transferred to a remote location from this computer, for example, by local area network or wide area network, as the Internet.At this remote location, can carry out different operations to the data transmitted as described below.
可在本文所披露的系统、装置以及方法中存储、传输、分析和/或操作的电子格式化数据的类型如下:The types of electronically formatted data that may be stored, transmitted, analyzed, and/or manipulated in the systems, devices, and methods disclosed herein include the following:
通过对测试样品中的核酸进行测序获得的读数Reads obtained by sequencing nucleic acids in a test sample
通过将读数与参照基因组或其他参照序列进行比对获得的标签Tags obtained by aligning reads to a reference genome or other reference sequence
该参照基因组或序列The reference genome or sequence
序列标签密度-针对参考基因组或其他参照序列的两个或更多个区域(典型地为染色体或染色体区段)中的每一者的计数或标签数Sequence tag density - the count or number of tags for each of two or more regions (typically chromosomes or chromosome segments) of a reference genome or other reference sequence
针对感兴趣的特定染色体或染色体区段的归一化染色体或染色体区段的一致性Normalized chromosome or chromosome segment identity for a specific chromosome or chromosome segment of interest
针对获自感兴趣的染色体或区段和相应的归一化染色体或区段的染色体或染色体区段(或其他区域)的剂量Doses for chromosomes or chromosome segments (or other regions) obtained from the chromosome or segment of interest and the corresponding normalizing chromosome or segment
用于判定染色体剂量受影响、未受影响或无判定的阈值;Thresholds used to determine chromosome dosage as affected, unaffected, or no determination;
染色体剂量的实际判定Practical determination of chromosome dosage
诊断(与这些判定相关的临床条件)Diagnosis (clinical conditions relevant to these determinations)
源自于这些判定和/或诊断的针对其他测试的建议Recommendations for additional testing arising from these determinations and/or diagnoses
源自于这些判定和/或诊断的治疗和/或监测计划Treatment and/or monitoring plans derived from these determinations and/or diagnoses
这些不同的数据类型可在一个或多个位置使用不同的装置获得、存储、传输、分析和/或操作。处理选择跨越较宽范围。在该范围的一端,在处理该测试样品的位置,例如医生办公室或其他临床环境对所有或多数该信息进行存储和使用。在另一种极端中,在一个位置获得样品,在不同的位置对其进行处理并且可任选地进行测序,在一个或多个不同的位置比对读数并且进行判定,并且在再另一个位置(其可以是获得样品的位置)制作诊断、建议和/或计划。In one embodiment, the present invention provides the method for the present invention to prepare the sample of the present invention.These different data types can be obtained, stored, transmitted, analyzed and/or operated using different devices in one or more positions.Processing selection spans a wider range.At one end of the scope, in the position of processing this test sample, for example doctor's office or other clinical environments all or most of these information are stored and used.In another extreme, obtain sample in a position, process it and optionally order-check at different positions, compare reading and judge at one or more different positions, and make diagnosis, suggestion and/or plan in another position again (it can be the position that obtains sample).
在不同的实施方案中,利用该测序装置产生这些读数,然后传输到远程站点,在该远程点处对其进行处理以产生非整倍性判定。在该远程位置,举例而言,将这些读数与参照序列进行比对以产生标签,对其进行计数并且分配给感兴趣的染色体或区段。同样在该远程位置,使用相关的归一化染色体或区段将这些计数转化成剂量。再进一步,在该远程位置,将这些剂量用来产生非整倍性判定。In different embodiments, the sequencing device is utilized to produce these readings, which are then transferred to a remote site where they are processed to produce aneuploidy. At the remote location, for example, these readings are compared with a reference sequence to produce a label, which is counted and assigned to a chromosome or segment of interest. Similarly at the remote location, relevant normalization chromosomes or segments are used to convert these counts into dosage. Further still, at the remote location, these dosages are used to produce aneuploidy.
可在不同位置采用的处理操作如下:The processing operations that can be applied at different locations are as follows:
样品收集Sample collection
测序前的样品处理Sample processing before sequencing
测序Sequencing
分析序列数据并且推导非整倍性判定Analyze sequence data and derive aneuploidy calls
诊断diagnosis
向患者或护理供应商报告诊断和/或判定Reporting diagnoses and/or decisions to patients or care providers
制定针对进一步治疗、测试和/或监测的计划Develop a plan for further treatment, testing, and/or monitoring
执行该计划Execute the plan
咨询consult
这些操作中的任何一个或多个可如本文其他部分所述自动化。典型地,测序和对序列数据进行分析以及推导非整倍性判定将在计算机上执行。其他操作可人工地或自动地执行。Any one or more of these operations can be automated as described in other parts of this paper. Typically, sequencing and analyzing sequence data and deriving aneuploidy determination will be performed on a computer. Other operations can be performed manually or automatically.
可以进行样品收集的位置的实例包括保健人员办公室、诊所、患者家(其中提供样品收集工具或试剂盒)以及移动护理车辆。可以进行测序前样品处理的位置的实例包括保健人员办公室、诊所、患者家(其中提供样品处理装置或试剂盒)、移动护理车辆以及非整倍性分析供应商的设施。可以进行测序的位置的实例包括保健人员办公室、诊所、保健人员办公室、诊所、患者家(其中提供样品测序装置和/或试剂盒)、移动护理车辆以及非整倍性分析供应商的设施。进行测序的位置可提供有专用网络连接以用于传输呈电子格式的测序数据 (典型地为读数)。该连接可以是有线的或无线的,而且已经并且可能经过配置以便在传输到处理点之前将数据发送到可以处理和/或汇总数据的站点。数据汇总器可以由保健组织维护,如健康维护组织(HMO)。The example of the position that can carry out sample collection includes health care personnel's office, clinic, patient's home (wherein sample collection tool or test kit is provided) and mobile nursing vehicle. The example of the position that can carry out sample treatment before sequencing includes health care personnel's office, clinic, patient's home (wherein sample processing device or test kit is provided), mobile nursing vehicle and the facility of aneuploidy analysis supplier. The example of the position that can carry out sequencing includes health care personnel's office, clinic, health care personnel's office, clinic, patient's home (wherein sample sequencing device and/or test kit is provided), mobile nursing vehicle and the facility of aneuploidy analysis supplier. The position that carries out sequencing can provide a dedicated network connection for transmitting sequencing data (typically reading) in electronic format. The connection can be wired or wireless, and has been and may be configured so that before being transferred to the processing point, data is sent to a site that can process and/or summarize data. Data aggregator can be maintained by a healthcare organization, such as a health maintenance organization (HMO).
分析和/或推导操作可在任何上述位置,或作为替代方案,在致力于计算和/或核酸序列数据分析服务的另一远程站点进行。这些位置包括例如集群,如通用服务器区、非整倍性分析服务业设施等。在一些实施方案中,用于执行分析的计算装置是租借或租用的。计算资源可以是处理器在因特网可存取的集合的一部分,如俗称为云的处理资源。在一些情况下,计算由彼此关联的或未关联的平行或大规模平行处理器群组来执行。处理可以使用分布式处理来实现,如集群计算、网格计算等。在这些实施方案中,计算资源的集群或网格集中形成由一起作用以执行本文所述的分析和/或推导的多个处理器或计算机构成的一个超级虚拟计算机。这些技术以及更多常规巨型计算机可用于处理如本文所述的序列数据。各自为依赖于处理器计算机的平行计算形式。在网格计算的情况下,这些处理器(经常是完整的计算机)通过网络(私人的、公共的或因特网)通过常规网络协议(如以太网)连接。相反,巨型计算机具有由本地高速计算机总线连接的许多个处理器。The analysis and/or derivation operation can be performed at any of the above-mentioned locations, or alternatively, at another remote site dedicated to computing and/or nucleic acid sequence data analysis services. These locations include, for example, clusters, such as general server areas, aneuploidy analysis service facilities, etc. In some embodiments, the computing device used to perform the analysis is rented or leased. Computing resources can be part of a collection of processors accessible on the Internet, such as processing resources commonly known as clouds. In some cases, the calculation is performed by parallel or massively parallel processor groups that are associated with each other or not. Processing can be implemented using distributed processing, such as cluster computing, grid computing, etc. In these embodiments, a cluster or grid of computing resources is concentrated to form a super virtual computer composed of multiple processors or computers that act together to perform analysis and/or derivation as described herein. These technologies and more conventional supercomputers can be used to process sequence data as described herein. Each is a parallel computing form that depends on a processor computer. In the case of grid computing, these processors (often complete computers) are connected by conventional network protocols (such as Ethernet) through a network (private, public or Internet). In contrast, a supercomputer has many processors connected by a local high-speed computer bus.
在某些实施方案中,在与分析操作相同的位置处产生诊断(例如胎儿患有唐氏综合征或患者患有特定类型的癌症)。在其他实施方案中,其是在不同的位置执行。在一些实例中,报告诊断是在获取样品的位置处执行,但情况也不一定如此。可产生或报告诊断和/或制定计划的位置的实例包括保健人员办公室、诊所、计算机可存取的因特网站点以及具有连接到网络的有线或无线连接的手持型装置,如手机、平板、智能电话等。进行咨询的位置的实例包括保健人员办公室、诊所、计算机可存取的因特网站点、手持型装置等。In certain embodiments, a diagnosis is generated at the same location as the analytical operation (e.g., a fetus suffers from Down syndrome or a patient suffers from a specific type of cancer). In other embodiments, it is performed at different locations. In some instances, reporting a diagnosis is performed at the location where the sample is obtained, but this is not necessarily the case. Examples of locations where a diagnosis and/or a plan can be generated or reported include health care personnel offices, clinics, computer-accessible Internet sites, and handheld devices with a wired or wireless connection to a network, such as mobile phones, tablets, smart phones, etc. Examples of locations for consultation include health care personnel offices, clinics, computer-accessible Internet sites, handheld devices, etc.
在一些实施方案中,在第一个位置进行样品收集、样品处理以及测序操作,并且在第二个位置进行推导操作。然而,在一些情况下,样品收集是在一个位置(例如保健人员办公室或诊所)收集,而样品处理和测序是在一个不同的位置进行,该位置可任选地为进行分析和推导的同一位置。In some embodiments, sample collection, sample processing, and sequencing operations are performed at a first location, and derivation operations are performed at a second location. However, in some cases, sample collection is performed at one location (e.g., a health care provider's office or clinic), while sample processing and sequencing are performed at a different location, which may optionally be the same location where analysis and derivation are performed.
在不同的实施方案中,以上列出的操作的顺序可由开始样品收集、样品处理和/或测序的用户或机构来触发。在已开始执行一个或多个这些操作之后,其他操作可以自然地随后。例如,测序操作可使读数被自动地收集并且发送到处理装置,然后该处理装置通常自动地并且可能在无其他用户干预的情况下进行序列分析和推导非整倍性操作。在一些实现方式中,然后将该处理操作的结果自动地递送(可能伴随重新格式化作为诊断)到系统组件或机构,该系统组件或机构处理信息并且报告给健康专家和/或患者。如所说明,该信息,可能与咨询信息一起,还可以经过自动处理以产生治疗、测试和/或监测计划。因此,开始早期操作可以触发端对端顺序,在其中向健康专家、患者或其他相关团体提供诊断、计划、咨询和/或可用于作用于身体健康状况的其他信息。即使整个系统的各部分在物理上分离并且可能远离例如样品和序列装置的位置,此举也能实现。In different embodiments, the order of the operations listed above can be triggered by the user or institution that starts sample collection, sample processing and/or sequencing. After having started to perform one or more of these operations, other operations can naturally follow. For example, a sequencing operation can cause readings to be automatically collected and sent to a processing device, which then automatically and possibly performs sequence analysis and derivation of aneuploidy operations without the intervention of other users. In some implementations, the result of the processing operation is then automatically delivered (possibly accompanied by reformatting as a diagnosis) to a system component or institution, which processes the information and reports it to a health expert and/or patient. As described, this information, possibly together with consulting information, can also be automatically processed to produce a treatment, test and/or monitoring plan. Therefore, starting early operations can trigger an end-to-end sequence, wherein providing diagnosis, planning, consultation and/or other information that can be used to act on physical health conditions to health experts, patients or other related groups. This can be achieved even if the various parts of the entire system are physically separated and possibly away from the location of, for example, a sample and a sequencer.
图19示出了用于从测试样品中产生判定或诊断的分散系统的一个实现方式。样品收集位置01用于从患者,如怀孕女性或假定的癌症病人处获得测试样品。然后将样品提供到处理和测序位置03,其中可如上文所述对测试样品进行处理和测序。位置03包括用于处理样品的装置以及用于对经过处理的样品进行测序的装置。如本文其他部分所述的测序结果是读数的集合,这些读数典型地以电子格式提供并且提供到网络,如因特网,该网络在图19中以参照编号05指示。Figure 19 shows an implementation of a decentralized system for generating a judgment or diagnosis from a test sample. Sample collection position 01 is used to obtain a test sample from a patient, such as a pregnant woman or a hypothetical cancer patient. The sample is then provided to processing and sequencing position 03, where the test sample can be processed and sequenced as described above. Position 03 comprises a device for processing the sample and a device for sequencing the processed sample. The sequencing result as described in other parts of this paper is a collection of readings, which typically provide and are provided to a network, such as the Internet, in electronic format, which is indicated by reference number 05 in Figure 19.
将该序列数据提供到远程位置07处,在其中进行分析和判定产生。该位置可以包括一或多个高效计算装置,如计算机或处理器。在位置07处的计算资源已经完成它们的分析并且从所接收的序列信息产生一个判定后,将该判定分程传递到网络05。在一些实施方式中,不仅在位置07处产生判定,而且还产生相关诊断。然后如图19中所说明将该判定和或诊断通过网络传输并且传回样品收集位置01。如所说明,这只不过是关于如何在不同的位置之间分配与产生判定或诊断相关的不同的操作的许多变化方案之一。一个常见变化方案涉及在单一位置提供样品收集和处理以及测序。另一变化方案涉及在与分析和判定产生相同的位置提供处理和测序。This sequence data is provided to remote location 07, analyzes and judges to produce therein.This position can comprise one or more high-efficiency computing devices, as computer or processor.After the computing resource at position 07 has completed their analysis and produced a judgement from the sequence information received, this judgement is passed to network 05 by sub-trip. In some embodiments, not only judgement is produced at position 07, but also relevant diagnosis is produced.Then as shown in Figure 19, this judgement and or diagnosis are transmitted through network and are passed back to sample collection position 01.As illustrated, this is only about how to distribute one of many variations of different operations relevant to generation judgement or diagnosis between different positions.A common variation relates to providing sample collection and processing and order-checking at single position.Another variation relates to providing processing and order-checking at the position identical with analysis and judgement.
图20对针对在不同的位置执行不同的操作的选择进行详述。在图20中所述的最全面的意义上,各以下操作是在分开的位置处进行:样品收集、样品处理、测序、读数比对、判定、诊断以及报告和/或计划制定。Figure 20 details the selection of performing different operations at different locations. In the most comprehensive sense as described in Figure 20, each of the following operations is performed at a separate location: sample collection, sample processing, sequencing, read alignment, determination, diagnosis, and reporting and/or planning.
在汇总这些操作中的一些的一个实施方案中,在一个位置进行样品处理和测序,并且在一个分开的位置进行读数比对、判定以及诊断。参见图20的由参照字母A标识的部分。在由图20中的字母B标识的另一种实现方式中,样品收集、样品处理以及测序都在同一个位置进行。在该实现方式中,读数比对和判定在第二个位置进行。最后,诊断以及报告和/或计划开展在第三个位置进行。在由图20中的字母C所述的实现方式中,样品收集在第一个位置处进行,样品处理、测序、读数比对、判定以及诊断都一起在第二个位置处进行,而报告和/或计划制定在第三个位置处进行。最后,在由图20中的字母D所标记的实现方式中,样品收集在第一个位置处进行,样品处理、测序、读数比对以及判定都在第二个位置处进行,而诊断以及报告和/或计划处理在第三个位置处进行。In one embodiment of some of these operations of gathering together, sample treatment and order-checking are carried out in one position, and read comparison, judgement and diagnosis are carried out in a separated position.Referring to the part by reference letter A mark of Figure 20.In another implementation by the letter B mark in Figure 20, sample collection, sample treatment and order-checking are all carried out in the same position.In this implementation, read comparison and judgement are carried out in second position.Finally, diagnosis and report and/or plan to carry out and be carried out in the third position.In the implementation described by the letter C in Figure 20, sample collection is carried out in first position, and sample treatment, order-checking, read comparison, judgement and diagnosis are all carried out together in second position, and report and/or plan to formulate and carry out in the third position.Finally, in the implementation marked by the letter D in Figure 20, sample collection is carried out in first position, and sample treatment, order-checking, read comparison and judgement are all carried out in second position, and diagnosis and report and/or plan to process and be carried out in the third position.
在一个实施方案中,本发明提供一种系统,用于确定包括胎儿和母体核酸的母体测试样品中存在或不存在任何一种或多种不同的完整的胎儿染色体非整倍性,该系统包括:一个测序器,用于接收核酸样品并且提供得自该样品的胎儿和母体核酸序列信息;一个处理器;以及一个机器可读取存储媒体,包括用于在该处理器上执行的指令,这些指令包括:In one embodiment, the present invention provides a system for determining the presence or absence of any one or more different complete fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids, the system comprising: a sequencer for receiving a nucleic acid sample and providing fetal and maternal nucleic acid sequence information obtained from the sample; a processor; and a machine-readable storage medium comprising instructions for execution on the processor, the instructions comprising:
(a)用于获得该样品中的这些胎儿和母体核酸的序列信息的代码;(a) code for obtaining sequence information of the fetal and maternal nucleic acids in the sample;
(b)用于使用所述序列信息通过计算机从这些胎儿和母体核酸中识别针对选自染色体1-22、X以及Y的任何一个或多个感兴趣的染色体中的每一个的许多序列标签,并且识别针对所述任何一个或多个感兴趣的染色体中的每一个的至少一个归一化染色体序列或归一化染色体区段序列的许多序列标签的代码;(b) code for using the sequence information to identify, by computer, from the fetal and maternal nucleic acids a plurality of sequence tags for each of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y, and to identify a plurality of sequence tags for at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest;
(c)用于使用针对所述任何一个或多个感兴趣的染色体中的每一个所识别的所述序列标签数以及针对各归一化染色体序列或归一化染色体区段序列识别的所述序列标签数来计算针对该任何一个或多个感兴趣的染色体中的每一个的单一染色体剂量的代码;以及(c) code for calculating a single chromosome dose for each of the one or more chromosomes of interest using the number of sequence tags identified for each of the one or more chromosomes of interest and the number of sequence tags identified for each normalizing chromosome sequence or normalizing chromosome segment sequence; and
(d)用于比较针对该任何一个或多个感兴趣的染色体中的每一个的各单一染色体剂量与针对该任何一个或多个感兴趣的染色体中的每一个的相应的阈值,并且由此确定该样品中存在或不存在任何一种或多种完整的不同胎儿染色体非整倍性的代码。(d) code for comparing each single chromosome dose for each of the one or more chromosomes of interest to a corresponding threshold value for each of the one or more chromosomes of interest, and thereby determining the presence or absence of any one or more complete different fetal chromosomal aneuploidies in the sample.
在一些实施方案中,用于计算针对任何一个或多个感兴趣的染色体中的每一个的单一染色体剂量的代码包括用于将所选择的一个感兴趣的染色体的染色体剂量计算为针对所选择的感兴趣的染色体的序列标签数与针对所选择的感兴趣的染色体的相应的至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签数的比率的代码。In some embodiments, the code for calculating a single chromosome dose for each of any one or more chromosomes of interest includes code for calculating the chromosome dose for the selected chromosome of interest as the ratio of the number of sequence tags for the selected chromosome of interest to the number of sequence tags identified for the corresponding at least one normalizing chromosome sequence or normalizing chromosome segment sequence for the selected chromosome of interest.
在一些实施方案中,该系统进一步包括用于重复计算针对任何一个或多个感兴趣的染色体的任何一个或多个区段的任何其余染色体区段中的每一个的染色体剂量的代码。In some embodiments, the system further comprises code for repeatedly calculating a chromosome dose for each of any remaining chromosome segments of any one or more segments of any one or more chromosomes of interest.
在一些实施方案中,选自染色体1-22、X以及Y的该一个或多个感兴趣的染色体包括至少二十个选自染色体1-22、X以及Y的染色体,并且其中这些指令包括用于确定存在或不存在至少二十种不同的完整的胎儿染色体非整倍性的指令。In some embodiments, the one or more chromosomes of interest selected from chromosomes 1-22, X, and Y include at least twenty chromosomes selected from chromosomes 1-22, X, and Y, and wherein the instructions include instructions for determining the presence or absence of at least twenty different complete fetal chromosomal aneuploidies.
在一些实施方案中,该至少一个归一化染色体序列是选自染色体1-22、X 以及Y的一组染色体。在其他实施方案中,该至少一个归一化染色体序列是选自染色体1-22、X以及Y的一个单染色体。In some embodiments, the at least one normalizing chromosome sequence is a group of chromosomes selected from chromosomes 1-22, X, and Y. In other embodiments, the at least one normalizing chromosome sequence is a single chromosome selected from chromosomes 1-22, X, and Y.
在另一实施方案中,本发明提供一种系统,用于确定包括胎儿和母体核酸的母体测试样品中存在或不存在任何一种或多种不同的部分的胎儿染色体非整倍性,该系统包括:一个测序器,用于接收核酸样品并且提供得自该样品的胎儿和母体核酸序列信息;一个处理器;以及一个机器可读取存储媒体,包括用于在该处理器上执行的指令,这些指令包括:In another embodiment, the present invention provides a system for determining the presence or absence of any one or more different partial fetal chromosomal aneuploidies in a maternal test sample comprising fetal and maternal nucleic acids, the system comprising: a sequencer for receiving a nucleic acid sample and providing fetal and maternal nucleic acid sequence information obtained from the sample; a processor; and a machine-readable storage medium comprising instructions for execution on the processor, the instructions comprising:
(a)用于获得所述样品中的所述胎儿和母体核酸的序列信息的代码;(a) code for obtaining sequence information of the fetal and maternal nucleic acids in the sample;
(b)用于使用所述序列信息通过计算机从这些胎儿和母体核酸中识别针对选自染色体1-22、X以及Y的任何一个或多个感兴趣的染色体的任何一个或多个区段中的每一个的许多序列标签,并且识别针对任何一个或多个感兴趣的染色体的所述任何一个或多个区段中的每一个的至少一个归一化区段序列的许多序列标签的代码;(b) code for identifying, by a computer, from the fetal and maternal nucleic acids a plurality of sequence tags for each of any one or more segments of any one or more chromosomes of interest selected from chromosomes 1-22, X, and Y using the sequence information, and identifying a plurality of sequence tags for at least one normalizing segment sequence for each of the one or more segments of any one or more chromosomes of interest;
(c)用于使用针对任何一个或多个感兴趣的染色体的所述任何一个或多个区段中的每一个所识别的所述序列标签数以及针对所述归一化区段序列识别的所述序列标签数来计算针对任何一个或多个感兴趣的染色体的所述任何一个或多个区段中的每一个的单一染色体区段剂量的代码;以及(c) code for calculating a single chromosome segment dose for each of the any one or more segments of any one or more chromosomes of interest using the number of sequence tags identified for each of the any one or more segments of any one or more chromosomes of interest and the number of sequence tags identified for the normalizing segment sequence; and
(d)用于比较针对任何一个或多个感兴趣的染色体的所述任何一个或多个区段中的每一个的所述单一染色体区段剂量中的每一个与针对任何一个或多个感兴趣的染色体的所述任何一个或多个染色体区段中的每一个的相应的阈值,并且由此确定所述样品中存在或不存在一种或多种不同的部分的胎儿染色体非整倍性的代码。(d) code for comparing each of the single chromosome segment doses for each of the any one or more segments of any one or more chromosomes of interest to a corresponding threshold value for each of the any one or more chromosome segments of any one or more chromosomes of interest, and thereby determining the presence or absence of one or more different partial fetal chromosomal aneuploidies in the sample.
在一些实施方案中,用于计算单一染色体区段剂量的代码包括用于将所选择的一个染色体区段的染色体区段剂量计算为针对所选择的染色体区段所识别的序列标签数与针对所选择染色体区段的相应的归一化区段序列所识别的序列标签数的比率的代码。In some embodiments, the code for calculating a single chromosome segment dose includes code for calculating the chromosome segment dose for a selected chromosome segment as the ratio of the number of sequence tags identified for the selected chromosome segment to the number of sequence tags identified for a corresponding normalizing segment sequence for the selected chromosome segment.
在一些实施方案中,该系统进一步包括用于重复计算针对任何一个或多个感兴趣的染色体的任何一个或多个区段的任何其余染色体区段中的每一个的染色体区段剂量的代码。In some embodiments, the system further comprises code for repeatedly calculating a chromosome segment dose for each of any remaining chromosome segments of any one or more segments of any one or more chromosomes of interest.
在一些实施方案中,该系统进一步包括(i)用于针对来自不同的母体受试者的测试样品重复(a)-(d)的代码,以及(ii)用于确定所述样品中的每一个中存在或不存在任何一个或多个不同的部分的胎儿染色体非整倍性的代码。In some embodiments, the system further includes (i) code for repeating (a)-(d) for test samples from different maternal subjects, and (ii) code for determining the presence or absence of any one or more different partial fetal chromosomal aneuploidies in each of the samples.
在本文所提供的任何系统的其他实施方案中,该代码进一步包括用于根据(d)中所确定在针对提供母体测试样品的人类受试者的患者医疗记录中自动地记录存在或不存在胎儿染色体非整倍性的代码,其中使用处理器执行该记录。In other embodiments of any of the systems provided herein, the code further comprises code for automatically recording the presence or absence of a fetal chromosomal aneuploidy in a patient medical record for the human subject providing the maternal test sample as determined in (d), wherein the recording is performed using a processor.
在本文所提供的任何系统的一些实施方案中,测序器经过配置以执行下一代测序(NGS)。在一些实施方案中,测序器经过配置以使用合成法测序、利用可逆染料终止子来执行大规模平行测序。在其他实施方案中,测序器经过配置以执行连接法测序。在又其他实施方案中,测序器经过配置以执行单分子测序。In some embodiments of any system provided herein, the sequencer is configured to perform next generation sequencing (NGS). In some embodiments, the sequencer is configured to perform massively parallel sequencing using synthesis sequencing, utilizing reversible dye terminators. In other embodiments, the sequencer is configured to perform ligation sequencing. In yet other embodiments, the sequencer is configured to perform single molecule sequencing.
用于确定胎儿分数的设备Devices used to determine fetal fraction
可以使用一种用于对样品进行医学分析的设备提供有关一个或两个基因组对核酸混合物所贡献的分数的信息,来进行对来源于测序样品(例如母体样品)的序列标签的分析。举例来说,提供了多种设备对从测序母体样品获得的序列标签进行分析以确定在母体样品中存在的胎儿和母体核酸的混合物中的胎儿核酸分数。所提供的医学设备包括一系列装置,这些装置用于进行如在本申请其他之处所描述用于确定胎儿分数的方法的步骤。Sequence tags derived from a sequenced sample (e.g., a maternal sample) can be analyzed using an apparatus for performing medical analysis of a sample to provide information about the fraction contributed by one or two genomes to a mixture of nucleic acids. For example, a variety of apparatuses are provided for analyzing sequence tags obtained from sequencing a maternal sample to determine the fraction of fetal nucleic acid in a mixture of fetal and maternal nucleic acids present in the maternal sample. The provided medical apparatus includes a series of devices for performing the steps of the method for determining fetal fraction as described elsewhere in this application.
图65展示一种医学分析设备的一个实施方案,该医学分析设备用于在包含胎儿和母体核酸的混合物的母体测试样品中确定胎儿分数。该设备包括:Figure 65 shows one embodiment of a medical analysis device for determining fetal fraction in a maternal test sample comprising a mixture of fetal and maternal nucleic acids. The device comprises:
一个装置(a),用于接收来自所述母体测试样品中的所述胎儿和母体核酸多个序列读数;a device (a) for receiving a plurality of sequence reads of said fetal and maternal nucleic acids from said maternal test sample;
一个装置(b),用于将所述多个序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相应的多个序列标签;a means (b) for aligning the plurality of sequence reads to one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads;
一个装置(c),用于识别来自一个或多个感兴趣的染色体或感兴趣的染色体区段的那些序列标签的一个数目,这些染色体或染色体区段选自染色体1-22、 X和Y及其区段,并且用于针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个,识别来自至少一个归一化染色体序列或归一化染色体区段序列的那些序列标签的一个数目,以确定一个染色体剂量或染色体区段剂量,其中,所述感兴趣的染色体或感兴趣的染色体区段具有拷贝数变异;以及a means (c) for identifying a number of sequence tags from one or more chromosomes of interest or chromosome segments of interest, the chromosomes or chromosome segments of interest being selected from chromosomes 1-22, X, and Y, and segments thereof, and for identifying a number of sequence tags from at least one normalizing chromosome sequence or normalizing chromosome segment sequence for each of the one or more chromosomes of interest or chromosome segments of interest to determine a chromosome dose or chromosome segment dose, wherein the chromosome of interest or chromosome segment of interest has a copy number variation; and
一个装置(d),用于使用所述感兴趣的染色体的剂量或所述感兴趣的染色体区段的剂量来确定所述胎儿分数。a means (d) for determining the fetal fraction using the dose of the chromosome of interest or the dose of the chromosome segment of interest.
优选地,该装置(a)的信号输出端与该装(b)相连接,该装置(b)的信号输出端与该装置(c)相连接,该装置(c)的信号输出端与该装置(d)相连接。Preferably, the signal output end of the device (a) is connected to the device (b), the signal output end of the device (b) is connected to the device (c), and the signal output end of the device (c) is connected to the device (d).
在某些实施方案中,所述拷贝数变异是通过将所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的所述染色体剂量与针对所述一个或多个感兴趣的染色体或感兴趣的染色体区段中的每一个染色体或染色体区段的一个相应阈值进行比较来确定的。In certain embodiments, the copy number variation is determined by comparing the chromosome dose for each of the one or more chromosomes or chromosome segments of interest to a corresponding threshold value for each of the one or more chromosomes or chromosome segments of interest.
胎儿可以带有的拷贝数变异包括完整染色体复制、完整染色体缺失、部分复制、部分倍增、部分插入以及部分缺失。Copy number variations that a fetus may carry include complete chromosome duplication, complete chromosome deletion, partial duplication, partial multiplication, partial insertion, and partial deletion.
在某些实施方案中,通过装置(c)确定的染色体或区段剂量计算为针对所述所选定的感兴趣的染色体或区段所识别的序列标签的数目与针对所选定的感兴趣的染色体或区段的相应的至少一个归一化染色体序列或归一化染色体区段序列所识别的序列标签的数目的比率。在某些实施方案中,通过装置(c) 确定的所述染色体剂量或区段剂量计算为所述选定的感兴趣的染色体或区段的序列标签密度比与每个所述选定的感兴趣的染色体或区段的至少一个相应归一化染色体序列或归一化染色体区段序列的序列标签密度比的比率。In certain embodiments, the chromosome or segment dose determined by means (c) is calculated as the ratio of the number of sequence tags identified for the selected chromosome or segment of interest to the number of sequence tags identified for the corresponding at least one normalizing chromosome sequence or normalizing chromosome segment sequence of the selected chromosome or segment of interest. In certain embodiments, the chromosome dose or segment dose determined by means (c) is calculated as the ratio of the sequence tag density ratio of the selected chromosome or segment of interest to the sequence tag density ratio of at least one corresponding normalizing chromosome sequence or normalizing chromosome segment sequence of each selected chromosome of interest or segment of interest.
在某些实施方案中,该设备进一步包括装置(e),该装置(e)用于计算出一个归一化染色体值(NCV)或一个归一化区段值(NSV),其中计算该NCV 使该染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联,作为:In certain embodiments, the apparatus further comprises means (e) for calculating a normalized chromosome value (NCV) or a normalized segment value (NSV), wherein the NCV is calculated to relate the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU对应地是对于在该组合格样品中的第i个染色体剂量的估算平均值以及标准差,并且RiA是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述感兴趣的染色体;其中计算该NSV使该染色体区段剂量与在一组合格样品中的相应的染色体区段剂量的平均值进行关联,作为:where σ and σ are the estimated mean and standard deviation of the dose for the i-th chromosome in the set of qualified samples, respectively, and R is the calculated chromosome dose for the i-th chromosome in the test sample, wherein the i-th chromosome is the chromosome of interest; wherein the NSV is calculated to relate the chromosome segment dose to the mean of the corresponding chromosome segment dose in a set of qualified samples as:
其中和σiU对应地是对于在该组合格样品中的第i个染色体区段剂量的估算平均值以及标准差,并且RiA是针对测试样品中第i个染色体区段计算出的染色体区段剂量,其中所述第i个染色体区段是所述感兴趣的染色体区段。优选地,器件(c)的信号输出端与器件(e)连接。Wherein σ and σ iU are the estimated mean and standard deviation of the dose for the i-th chromosome segment in the set of qualified samples, respectively, and R iA is the calculated chromosome segment dose for the i-th chromosome segment in the test sample, wherein the i-th chromosome segment is the chromosome segment of interest. Preferably, the signal output terminal of device (c) is connected to device (e).
在某些实施方案中,该设备的装置(d)接着根据以下表达式确定胎儿分数:In certain embodiments, means (d) of the apparatus then determines the fetal fraction according to the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是在一个受影响样品(例如,待测试的母体样品) 中在第i个染色体上的归一化的染色体值,并且CViU是在这些合格样品中确定的感兴趣的染色体的剂量的变异系数;或根据以下表达式确定胎儿分数:where ff is the fetal fraction value, NCViA is the normalized chromosome value on chromosome i in an affected sample (e.g., the maternal sample to be tested), and CViU is the coefficient of variation of the dose of the chromosome of interest determined in the qualified samples; or the fetal fraction is determined according to the following expression:
ff=2×|NSViACViU|ff=2×|NSV iA CV iU |
其中ff是胎儿分数值,NSViA是在一个受影响样品(例如,待测试的母体样品) 中在第i个染色体区段上的归一化的染色体区段值,并且CViU是在这些合格样品中确定的第i个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体。优选地,装置(e)的信号输出端与器件(d)连接。Wherein ff is the fetal fraction value, NSV iA is the normalized chromosome segment value on the i-th chromosome segment in an affected sample (e.g., the maternal sample to be tested), and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, wherein the i-th chromosome is the chromosome of interest. Preferably, the signal output end of device (e) is connected to device (d).
在某些实施方案中,感兴趣的染色体是常染色体或者男性胎儿的X染色体,感兴趣的染色体区段选自常染色体或者男性胎儿的X染色体。In certain embodiments, the chromosome of interest is an autosome or the X chromosome of a male fetus, and the chromosome segment of interest is selected from an autosome or the X chromosome of a male fetus.
在某些实施方案中,该至少一个归一化染色体序列或归一化染色体区段序列是针对一种相关联的感兴趣的染色体或区段所选定的一个染色体或区段,这是通过以下方式进行的,即:(i)识别针对该感兴趣的染色体或区段的多个合格样品;(ii)使用多个潜在的归一化染色体序列或归一化染色体区段序列来针对该所选定的染色体或染色体区段重复计算染色体剂量或染色体区段剂量;并且(iii)单独地或在一种组合中对该归一化染色体序列或归一化染色体区段序列进行选择,从而在所计算的染色体剂量或染色体区段剂量中给出最小的变异性或最大的可分辨性。在某些实施方案中,归一化染色体序列是染色体1到22、X和Y中任意一个或多个的一个单染色体;可替代地,归一化序列是染色体1到22、X和Y中任何染色体的一组染色体。在某些实施方案中,归一化区段序列是染色体1到22、X和Y中任意一个或多个的一个单区段;可替代地,归一化区段序列是染色体1到22、X和Y中任意一个或多个的一组区段。In certain embodiments, the at least one normalizing chromosome sequence or normalizing chromosome segment sequence is a chromosome or segment selected for an associated chromosome or segment of interest by: (i) identifying multiple qualified samples for the chromosome or segment of interest; (ii) repeatedly calculating chromosome doses or chromosome segment doses for the selected chromosome or chromosome segment using multiple potential normalizing chromosome sequences or normalizing chromosome segment sequences; and (iii) selecting the normalizing chromosome sequence or normalizing chromosome segment sequence, either alone or in combination, to provide minimal variability or maximum resolvability in the calculated chromosome doses or chromosome segment doses. In certain embodiments, the normalizing chromosome sequence is a single chromosome of any one or more of chromosomes 1 to 22, X, and Y; alternatively, the normalizing sequence is a group of chromosomes of any chromosome of chromosomes 1 to 22, X, and Y. In certain embodiments, the normalizing segment sequence is a single segment of any one or more of chromosomes 1 to 22, X, and Y; alternatively, the normalizing segment sequence is a group of segments of any one or more of chromosomes 1 to 22, X, and Y.
在某些实施方案中,用于确定胎儿分数的设备进一步包括一个装置,该装置用于将使用染色体剂量或染色体区段剂量确定的所述胎儿分数与使用来自母体测试样品的胎儿和母体核酸中表现出等位基因不平衡的存在于非所述感兴趣染色体的染色体的一个或多个多态性的信息确定的胎儿分数进行比较。In certain embodiments, the apparatus for determining a fetal fraction further comprises a means for comparing the fetal fraction determined using chromosome doses or chromosome segment doses with a fetal fraction determined using information on one or more polymorphisms present in a chromosome other than the chromosome of interest that exhibit allelic imbalance in fetal and maternal nucleic acid from a maternal test sample.
在某些实施方案中,该设备进一步包括一个测序装置(10),该测序装置 (10)被配置为用于对一个母体测试样品中的胎儿和母体核酸进行测序并且获得序列读数。优选地,测序装置(10)的信号输出端与装置(a)连接。In certain embodiments, the apparatus further comprises a sequencing device (10) configured to sequence fetal and maternal nucleic acids in a maternal test sample and obtain sequence reads. Preferably, a signal output of the sequencing device (10) is connected to the device (a).
在某些实施方案中,测序装置(10)被配置为用于进行合成法测序。合成法测序可以使用可逆染料终止子进行。在其他实施方案中,测序装置(10)被配置为用于进行连接法测序。在另外的其他实施方案中,测序装置(10)被配置为用于进行单分子测序。In certain embodiments, the sequencing apparatus (10) is configured to perform sequencing by synthesis. Sequencing by synthesis can be performed using reversible dye terminators. In other embodiments, the sequencing apparatus (10) is configured to perform sequencing by ligation. In yet other embodiments, the sequencing apparatus (10) is configured to perform single molecule sequencing.
在某些实施方案中,测序装置(10)与装置(a)-(d)位于分开的地点中,并且测序装置(10)的信号输出端与装置(a)通过网络连接。In certain embodiments, the sequencing device (10) and the devices (a)-(d) are located in separate locations, and the signal output of the sequencing device (10) is connected to the device (a) via a network.
在某些实施方案中,包括如所述的测序装置的该设备进一步包括装置 (11),该装置(11)用于从一个怀孕母亲获取母体测试样品。用于获取母体测试样品的装置(11)与装置(a)-(d)以及(10)可以位于分开的地点中。除包括装置(a)-(d)以及(10)之外,该设备可以进一步包括装置(12),该装置(12) 用于从该母体测试样品提取无细胞DNA。在某些实施方案中,用于提取无细胞DNA的装置(12)与测序装置(10)位于同一个地点中,并且用于获取母体测试样品的装置(11)位于一个远程地点中。In certain embodiments, the apparatus comprising the sequencing apparatus as described further comprises an apparatus (11) for obtaining a maternal test sample from a pregnant mother. The apparatus (11) for obtaining a maternal test sample and the apparatuses (a)-(d) and (10) may be located in separate locations. In addition to the apparatuses (a)-(d) and (10), the apparatus may further comprise an apparatus (12) for extracting cell-free DNA from the maternal test sample. In certain embodiments, the apparatus (12) for extracting cell-free DNA is located in the same location as the sequencing apparatus (10), and the apparatus (11) for obtaining a maternal test sample is located in a remote location.
在某些实施方案中,该确定胎儿分数的设备还包括一个存储装置,用于至少暂时地存储装置(a)接受的序列读数。优选地,装置(a)的信号输出端与存储装置连接,存储装置的信号输出端与装置(b)连接。In certain embodiments, the apparatus for determining fetal fraction further comprises a storage device for at least temporarily storing the sequence reads received by apparatus (a). Preferably, the signal output of apparatus (a) is connected to the storage device, and the signal output of the storage device is connected to apparatus (b).
用于确定胎儿分数的额外的设备-对拷贝数变异进行分类Additional Instruments for Determining Fetal Fraction - Classifying Copy Number Variants
还提供了一种额外的医学分析设备,用于对包含胎儿和母体核酸(例如无细胞DNA)的一个母体样品中的胎儿基因组中的拷贝数变异进行分类。该额外的设备包括用于确定胎儿分数的装置和用于比较通过不同的方法确定的胎儿分数值的装置。该额外的设备使用两个计算出的胎儿分数来对胎儿基因组中的拷贝数变异进行分类。可以被该设备用于分析的母体样品可以选自血液、血浆、血清或尿样品。在某些实施方案中,母体样品是血浆样品。图66展示此类医学分析设备的一个实施方案。Also provided is an additional medical analysis device for classifying the copy number variation in the fetal genome in a maternal sample comprising fetus and maternal nucleic acid (e.g., cell-free DNA). The additional device includes a device for determining fetal fraction and a device for comparing fetal fraction values determined by different methods. The additional device uses two calculated fetal fractions to classify the copy number variation in the fetal genome. The maternal sample that can be used for analysis by the device can be selected from blood, plasma, serum, or urine samples. In certain embodiments, the maternal sample is a plasma sample. Figure 66 shows an embodiment of such a medical analysis device.
在一个实施方案中,提供了一种用于对胎儿基因组中的拷贝数变异进行分类的医学分析设备,该设备包括:In one embodiment, a medical analysis device for classifying copy number variation in a fetal genome is provided, the device comprising:
装置(1),用于接收来自一个测试样品中的胎儿和母体核酸的序列读数;Apparatus (1) for receiving sequence reads of fetal and maternal nucleic acids from a test sample;
装置(2),用于将所述序列读数与一个或多个染色体参考序列进行比对,并且由此提供与这些序列读数相对应的多个序列标签;Means (2) for aligning the sequence reads with one or more chromosome reference sequences and thereby providing a plurality of sequence tags corresponding to the sequence reads;
装置(3),识别出来自一个或多个感兴趣的染色体的这些序列标签的数目,并且确定该胎儿中的一个第一感兴趣的染色体带有一种拷贝数变异;Means (3) for identifying the number of sequence tags from one or more chromosomes of interest and determining that a first chromosome of interest in the fetus carries a copy number variation;
装置(4),用于通过一种第一方法来计算一个第一胎儿分数值,该第一方法不使用来自该第一感兴趣的染色体的这些标签的信息;Means (4) for calculating a first fetal fraction value by a first method that does not use information from the labels of the first chromosome of interest;
装置(5),用于通过一种第二方法来计算一个第二胎儿分数值,该第二方法使用来自该第一染色体的这些标签的信息;以及means (5) for calculating a second fetal fraction value by a second method using information from the labels of the first chromosome; and
装置(6),用于将该第一胎儿分数值与该第二胎儿分数值进行比较并且使用该比较对该第一染色体的该拷贝数变异进行分类。Means (6) for comparing the first fetal fraction value with the second fetal fraction value and classifying the copy number variation of the first chromosome using the comparison.
优选地,装置(1)的信号输出端与装置(2)连接,装置(2)的信号输出端与装置(3)连接,装置(2)和(3)的信号输出端与装置(4)连接,装置(2)和 (3)的信号输出端与装置(5)连接,并且装置(4)和(5)的信号输出端与装置 (6)连接。该第一感兴趣的染色体可以选自染色体1到2、X和Y中的任一个。Preferably, the signal output end of device (1) is connected to device (2), the signal output end of device (2) is connected to device (3), the signal output ends of devices (2) and (3) are connected to device (4), the signal output ends of devices (2) and (3) are connected to device (5), and the signal output ends of devices (4) and (5) are connected to device (6). The first chromosome of interest can be selected from any one of chromosomes 1 to 2, X, and Y.
在某些实施方案中,该额外的设备还包括一个存储装置,用于至少暂时地存储装置(1)接受的序列读数。优选地,装置(1)的信号输出端与存储装置连接,存储装置的信号输出端与装置(2)连接。In certain embodiments, the additional device further comprises a storage device for at least temporarily storing the sequence reads received by the apparatus (1). Preferably, the signal output of the apparatus (1) is connected to the storage device, and the signal output of the storage device is connected to the apparatus (2).
在某些实施方案中,用于计算第一胎儿分数的第一方法的装置(4)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值的一个组件,所述多态性存在于非所述第一感兴趣染色体的染色体;和用于计算第二胎儿分数值的该第二方法的装置(5)包括:In certain embodiments, the apparatus (4) for the first method for calculating a first fetal fraction comprises a component for calculating the first fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids of the maternal test sample, the polymorphisms being present on a chromosome other than the first chromosome of interest; and the apparatus (5) for calculating a second fetal fraction value comprises:
(a)组件(5-1),用于计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签的数目以确定染色体剂量;和(a) component (5-1) for counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and
(b)组件(5-2),用于使用该第二方法从该染色体剂量计算该胎儿分数值。在某些实施方案中,装置(2)和(3)的信号输出端与组件(5-1)连接,并且组件(5-1)的信号输出端连接到组件(5-2),并且组件(5-2)的信号输出端与装置(6)连接。(b) a component (5-2) for calculating the fetal fraction value from the chromosome dose using the second method. In certain embodiments, the signal outputs of devices (2) and (3) are connected to component (5-1), and the signal output of component (5-1) is connected to component (5-2), and the signal output of component (5-2) is connected to device (6).
在某些实施方案中,第一方法的装置(4)使用的信息包括通过对预先确定的多态序列进行测序获得的序列标签,所述多态序列的每一个包括所述一个或多个多态位点。第一方法的装置(4)使用的信息也可以不是通过测序方法获得的,例如,通过qPCR、数字PCR、质谱测定法、或毛细管凝胶电泳等非测序方法获得的。In certain embodiments, the information used by the apparatus (4) of the first method includes sequence tags obtained by sequencing predetermined polymorphic sequences, each of which includes the one or more polymorphic sites. The information used by the apparatus (4) of the first method may also be obtained by methods other than sequencing, for example, by non-sequencing methods such as qPCR, digital PCR, mass spectrometry, or capillary gel electrophoresis.
在某些实施方案中,用于第一方法的装置(4)包括使用来自于不具有拷贝数变异的染色体或染色体区段的标签计算该第一胎儿分数值的组件。举例来说,当该第一感兴趣的染色体是染色体21时,可以将使用来自于染色体21的序列标签所确定的胎儿分数与根据来自于男性胎儿中的染色体X的序列标签所确定的胎儿分数进行比较。已知不以非整倍性状态出现,或者通过在此描述的任何方法在测试样品中确定了不是非整倍体(例如通过计算其NCV或NSV 来确定)的任何染色体或染色体区段都可以用于通过装置(4)确定胎儿分数。In certain embodiments, the apparatus (4) for the first method includes a component for calculating the first fetal fraction value using tags from a chromosome or chromosome segment that does not have a copy number variation. For example, when the first chromosome of interest is chromosome 21, the fetal fraction determined using sequence tags from chromosome 21 can be compared to the fetal fraction determined based on sequence tags from chromosome X in a male fetus. Any chromosome or chromosome segment that is known not to occur in an aneuploid state, or that is determined not to be aneuploid in a test sample by any of the methods described herein (e.g., by calculating its NCV or NSV), can be used to determine the fetal fraction by apparatus (4).
在某些实施方案中,用于计算该胎儿分数值的该第二方法的装置(5)进一步包括用于计算出一个归一化染色体值(NCV)的组件(5-3),其中用于计算该NCV的该组件(5-3)使该染色体剂量与在一组合格样品中的相应的染色体剂量的平均值进行关联,作为:In certain embodiments, the apparatus (5) for calculating the second method of fetal fraction value further comprises a component (5-3) for calculating a normalized chromosome value (NCV), wherein the component (5-3) for calculating the NCV relates the chromosome dose to the mean of the corresponding chromosome dose in a set of qualified samples as:
其中和σiU对应地是对于在该组合格样品中的第i个染色体剂量的估算平均值以及标准差,并且RiA是针对测试样品中第i个染色体计算出的染色体剂量,其中所述第i个染色体是所述感兴趣的染色体。where σ and σ iU are the estimated mean and standard deviation of the dose for the i-th chromosome in the set of qualified samples, respectively, and R iA is the chromosome dose calculated for the i-th chromosome in the test sample, where the i-th chromosome is the chromosome of interest.
优选地,组件(5-1)的信号输出端与组件(5-3)连接,并且组件(5-3)的信号输出端与组件(5-2)连接。Preferably, the signal output terminal of the component (5-1) is connected to the component (5-3), and the signal output terminal of the component (5-3) is connected to the component (5-2).
在某些实施方案中,用于通过第二方法从该染色体剂量计算该胎儿分数值的组件(5-2)使用该归一化染色体值。用于计算该胎儿分数值的该第二方法的装置(5)的组件(5-2)根据以下表达式评估该胎儿分数:In certain embodiments, the component (5-2) for calculating the fetal fraction value from the chromosome dose by the second method uses the normalized chromosome value. The component (5-2) of the apparatus (5) for the second method of calculating the fetal fraction value estimates the fetal fraction according to the following expression:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是第二胎儿分数值,NCViA是在一个受影响样品(例如,待测试的母体样品)中在第i个染色体上的归一化的染色体值,并且CViU是在所述合格样品中确定的第i个染色体的剂量的变异系数,其中所述第i个染色体是所述感兴趣的染色体。Wherein ff is the second fetal fraction value, NCV iA is the normalized chromosome value on the i-th chromosome in an affected sample (e.g., the maternal sample to be tested), and CV iU is the coefficient of variation of the dose of the i-th chromosome determined in the qualified samples, where the i-th chromosome is the chromosome of interest.
在某些实施方案中,计算第一胎儿分数的第一方法的装置(4)包括:(a) 一个组件(4-1),用于计算来自非所述第一感兴趣染色体的染色体和至少一个归一化染色体序列的序列标签数目以确定该非所述第一感兴趣染色体的染色体的染色体剂量;以及(b)一个组件(4-2),用于通过该第一方法从该染色体剂量计算该第一胎儿分数值;和,计算第二胎儿分数的第二方法的装置(5)包括: (a)一个组件(5-1),用于计算来自该第一感兴趣的染色体和至少一个归一化染色体序列的序列标签数目以确定一个染色体剂量;以及(b)一个组件(5-2),用于通过该第二方法从该染色体剂量计算该第二胎儿分数值。In certain embodiments, the apparatus (4) for a first method of calculating a first fetal fraction comprises: (a) a component (4-1) for counting the number of sequence tags from a chromosome other than the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose for the chromosome other than the first chromosome of interest; and (b) a component (4-2) for calculating the first fetal fraction value from the chromosome dose by the first method; and, the apparatus (5) for a second method of calculating a second fetal fraction comprises: (a) a component (5-1) for counting the number of sequence tags from the first chromosome of interest and at least one normalizing chromosome sequence to determine a chromosome dose; and (b) a component (5-2) for calculating the second fetal fraction value from the chromosome dose by the second method.
优选地,第一方法的装置(4)进一步包括一个组件(4-3),第二方法的装置(5) 进一步包括一个组件(5-3),组件(4-3)和组件(5-3)分别计算归一化的染色体值 (NCV),组件(4-3)和组件(5-3)分别将组件(4-1)和组件(5-1)确定的染色体剂量与一组合格样品中的相应染色体剂量的平均值相关联,作为:Preferably, the apparatus (4) of the first method further comprises a component (4-3), and the apparatus (5) of the second method further comprises a component (5-3), wherein the components (4-3) and (5-3) respectively calculate a normalized chromosome value (NCV), and the components (4-3) and (5-3) respectively relate the chromosome doses determined by the components (4-1) and (5-1) to the average value of the corresponding chromosome doses in a group of qualified samples as:
其中和σiU分别是对于该组合格样品中第i个染色体的剂量的估算平均值和标准差,并且RiA是计算的测试样品中第i个染色体的剂量,where σiU and σiU are the estimated mean and standard deviation of the dose for chromosome i in the set of qualified samples, respectively, and R iA is the calculated dose for chromosome i in the test sample,
其中,对于该第一方法的装置(4),所述第i个染色体是所述非所述第一感兴趣染色体的染色体;对于该第二方法的装置(5),所述第i个染色体是所述第一感兴趣的染色体。Wherein, for the device (4) of the first method, the i-th chromosome is a chromosome that is not the first chromosome of interest; for the device (5) of the second method, the i-th chromosome is the first chromosome of interest.
优选地,组件(4-1)的信号输出端与组件(4-3)相连接,并且组件(4-3) 的信号输出端与组件(4-2)相连接,其中组件(4-2)通过使用相应归一化的染色体值的所述第一方法从相应染色体剂量计算第一胎儿分数值;组件(5-1)的信号输出端与组件(5-3)相连接,并且组件(5-3)的信号输出端与组件(5-2) 相连接,其中组件(5-2)通过使用相应归一化的染色体值的所述第二方法从相应染色体剂量计算第二胎儿分数值。Preferably, the signal output end of component (4-1) is connected to component (4-3), and the signal output end of component (4-3) is connected to component (4-2), wherein component (4-2) calculates a first fetal fraction value from the corresponding chromosome dose by using the first method of the corresponding normalized chromosome value; the signal output end of component (5-1) is connected to component (5-3), and the signal output end of component (5-3) is connected to component (5-2), wherein component (5-2) calculates a second fetal fraction value from the corresponding chromosome dose by using the second method of the corresponding normalized chromosome value.
在某些实施方案中,第一方法的装置(4)的组件(4-2)和第二方法的装置 (5)的组件(5-2)通过以下表达式求值:In certain embodiments, component (4-2) of apparatus (4) of the first method and component (5-2) of apparatus (5) of the second method are evaluated by the following expressions:
ff=2×|NCViACViU|ff=2×|NCV iA CV iU |
其中ff是胎儿分数值,NCViA是在一个受影响样品(例如,待测试的母体样品) 中在第i个染色体上的归一化的染色体值,并且CViU是所述合格样品中第i个染色体的剂量的变异系数;Wherein ff is the fetal fraction value, NCV iA is the normalized chromosome value on chromosome i in an affected sample (e.g., the maternal sample to be tested), and CV iU is the coefficient of variation of the dose of chromosome i in the qualified samples;
其中,对于用于该第一方法的装置(4),所述第i个染色体是所述非所述第一感兴趣染色体的染色体;对于用于该第二方法的装置(5),所述第i个染色体是所述第一感兴趣的染色体。优选地,当所述胎儿是男性时,所述非所述第一感兴趣染色体的染色体是X染色体。Wherein, for the apparatus (4) used in the first method, the i-th chromosome is a chromosome other than the first chromosome of interest; and for the apparatus (5) used in the second method, the i-th chromosome is the first chromosome of interest. Preferably, when the fetus is male, the chromosome other than the first chromosome of interest is the X chromosome.
在某些实施方案中,比较所述第一胎儿分数值与所述第二胎儿分数值的装置(6)确定两个胎儿分数值是否近似相等。在某些实施方案中,装置(6) 进一步包括在所述两个胎儿分数值近似相等时确定所述第二方法中暗含的一种倍数性假设真实的组件。所述第二方法中暗含的所述倍数性假设可以是,所述第一感兴趣的染色体具有一种完整染色体非整倍性,例如,所述第一感兴趣的染色体的完整染色体非整倍性是一种单体性或一种三体性。In some embodiments, the means (6) for comparing the first fetal fraction value to the second fetal fraction value determines whether the two fetal fraction values are approximately equal. In some embodiments, the means (6) further comprises a component for determining that a ploidy assumption implicit in the second method is true when the two fetal fraction values are approximately equal. The ploidy assumption implicit in the second method can be that the first chromosome of interest has a complete chromosomal aneuploidy, for example, the complete chromosomal aneuploidy of the first chromosome of interest is a monosomy or a trisomy.
在某些实施方案中,所述额外的设备进一步包括分析所述第一感兴趣的染色体的标签信息的一个装置(7),以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii)该胎儿是一个嵌合体,其中分析该第一感兴趣的染色体的标签信息的装置(7)被配置为在所述比较第一胎儿分数值与第二胎儿分数值的装置(6)指示这两个胎儿分数值不近似相等时执行。优选地,装置(2)、 (3)和(6)的信号输出端与装置(7)相连接。In certain embodiments, the additional apparatus further comprises a device (7) for analyzing the tag information of the first chromosome of interest to determine whether (i) the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic, wherein the device (7) for analyzing the tag information of the first chromosome of interest is configured to be executed when the device (6) for comparing the first fetal fraction value with the second fetal fraction value indicates that the two fetal fraction values are not approximately equal. Preferably, the signal outputs of devices (2), (3) and (6) are connected to device (7).
在某些实施方案中,所述的额外的设备中,第一方法的装置(4)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第一胎儿分数值的一个组件,所述多态性存在于非所述第一感兴趣染色体的染色体;第二方法的装置(5)包括使用来自展现该母体测试样品的胎儿和母体核酸中的等位基因不平衡的一种或多种多态性的信息来计算该第二胎儿分数值的一个组件,所述多态性存在于所述第一感兴趣的染色体。第一方法的装置(4)使用的信息可以包括通过对预先确定的多态序列进行测序获得的序列标签,所述多态序列的每一个包括所述一个或多个多态位点。第一方法的装置(4)使用的信息也可以不是通过测序方法获得的,例如,通过 qPCR、数字PCR、质谱测定法、或毛细管凝胶电泳等非测序方法获得的。In certain embodiments, in the additional apparatus, the apparatus (4) of the first method includes a component for calculating the first fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids from the maternal test sample, the polymorphisms being present on a chromosome other than the first chromosome of interest; and the apparatus (5) of the second method includes a component for calculating the second fetal fraction value using information from one or more polymorphisms exhibiting allelic imbalance in fetal and maternal nucleic acids from the maternal test sample, the polymorphisms being present on the first chromosome of interest. The information used by the apparatus (4) of the first method may include sequence tags obtained by sequencing predetermined polymorphic sequences, each of the polymorphic sequences including the one or more polymorphic sites. The information used by the apparatus (4) of the first method may also be obtained by methods other than sequencing, for example, by non-sequencing methods such as qPCR, digital PCR, mass spectrometry, or capillary gel electrophoresis.
在某些实施方案中,用于比较的装置(6)包括:当所述第二胎儿分数值与第一胎儿分数值的比率近似为1时确定所述第一感兴趣的染色体为二倍体的一个组件;当所述第二胎儿分数值与第一胎儿分数值的比率近似为1.5时确定所述第一感兴趣的染色体为三倍体的一个组件;和,当所述第二胎儿分数值与第一胎儿分数值的比率近似为0.5时确定所述第一感兴趣的染色体为单倍体的一个组件。In certain embodiments, the means for comparing (6) comprises: determining that the first chromosome of interest is a component of a diploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 1; determining that the first chromosome of interest is a component of a triploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 1.5; and determining that the first chromosome of interest is a component of a haploid when the ratio of the second fetal fraction value to the first fetal fraction value is approximately 0.5.
更优选的,用于对拷贝数变异进行分类的该额外的设备进一步包括分析所述第一感兴趣的染色体的标签信息的一个装置(7’),以确定是否(i)第一感兴趣的染色体带有一种部分非整倍性,或是(ii)该胎儿是一个嵌合体,其中分析该第一感兴趣的染色体的标签信息的装置(7’)被配置为在所述比较第一胎儿分数值与第二胎儿分数值的装置(6)指示第二胎儿分数值与第一胎儿分数值的比率不是近似为1、1.5或0.5时执行。优选地,装置(2)、(3)和(6)的信号输出端与装置(7’)相连接。More preferably, the additional apparatus for classifying copy number variation further comprises a means (7') for analyzing the tag information of the first chromosome of interest to determine whether (i) the first chromosome of interest carries a partial aneuploidy, or (ii) the fetus is a mosaic, wherein the means (7') for analyzing the tag information of the first chromosome of interest is configured to be performed when the means (6) for comparing the first fetal fraction value with the second fetal fraction value indicates that the ratio of the second fetal fraction value to the first fetal fraction value is not approximately 1, 1.5 or 0.5. Preferably, the signal outputs of means (2), (3) and (6) are connected to means (7').
在某些实施方案中,分析针对该第一感兴趣的染色体的标签信息的装置 (7)或(7’)包括:(a)一个组件(7-1),用于将该第一感兴趣的染色体的序列装箱进入多个部分;(b)一个组件(7-2),用于确定所述部分中的任一个是否包含比一个或多个其他部分显著更多或显著更少的核酸;以及(c)一个组件(7-3),用于在与一个或多个其他部分相比如果所述部分任何一个含有显著更多或显著更少的核酸时确定该第一感兴趣的染色体带有一种部分非整倍性、或者在与一个或多个其他部分相比如果所述部分都没有包含显著更多或显著更少的核酸时确定该胎儿是一个嵌合体。优选地,装置(2)、(3)和(6)的信号输出端与组件(7-1)相连接,并且组件(7-1)的信号输出端被连接到组件(7-2),并且组件(7-2)的信号输出端被连接到组件(7-3)。在某些实施方案中,组件(7-3) 进一步确定包含比一个或多个其他部分显著更多或显著更少的核酸的该第一感兴趣的染色体的一部分带有部分非整倍性。In certain embodiments, the apparatus (7) or (7') for analyzing the tag information for the first chromosome of interest comprises: (a) a component (7-1) for binning the sequence of the first chromosome of interest into a plurality of parts; (b) a component (7-2) for determining whether any of the parts contains significantly more or significantly less nucleic acid than one or more other parts; and (c) a component (7-3) for determining that the first chromosome of interest carries a partial aneuploidy if any of the parts contains significantly more or significantly less nucleic acid compared to one or more other parts, or determining that the fetus is a mosaic if none of the parts contains significantly more or significantly less nucleic acid compared to one or more other parts. Preferably, the signal outputs of apparatuses (2), (3) and (6) are connected to component (7-1), and the signal output of component (7-1) is connected to component (7-2), and the signal output of component (7-2) is connected to component (7-3). In certain embodiments, component (7-3) further determines that a portion of the first chromosome of interest comprises significantly more or significantly less nucleic acid than one or more other portions carries a partial aneuploidy.
在某些实施方案中,第一感兴趣的染色体是选自下组,该组由染色体1-22、 X、和Y组成。In certain embodiments, the first chromosome of interest is selected from the group consisting of chromosomes 1-22, X, and Y.
在某些实施方案中,装置(6)包括用于将该拷贝数变异分类成选自下组的一个类别的组件,该组由以下各项组成:完整染色体插入或倍增、完整染色体缺失、部分染色体复制、以及部分染色体缺失、以及嵌合体。In certain embodiments, the apparatus (6) comprises a component for classifying the copy number variation into a category selected from the group consisting of: complete chromosome insertion or duplication, complete chromosome deletion, partial chromosome duplication, and partial chromosome deletion, and mosaicism.
在某些实施方案中,该额外的医学分析设备进一步包括:In certain embodiments, the additional medical analysis equipment further comprises:
(i)装置(8),用于确定拷贝数变异是由部分非整倍性还是嵌合体引起;以及(i) means (8) for determining whether the copy number variation is caused by partial aneuploidy or mosaicism; and
(ii)装置(9),用于若该拷贝数变异由部分非整倍性引起,则确定在该第一感兴趣的染色体上的部分非整倍性的基因座。(ii) means (9) for determining the locus of the partial aneuploidy on the first chromosome of interest if the copy number variation is caused by partial aneuploidy.
其中装置(8)和(9)被配置为用于在用于将该第一胎儿分数值与该第二胎儿分数值进行比较的装置(6)确定该第一胎儿分数值与该第二胎儿分数值不近似相等时执行。优选地,装置(6)的信号输出端连接到装置(8),并且装置(8) 的信号输出端连接到装置(9)。在某些实施方案中,用于确定在该第一感兴趣的染色体上的部分非整倍性的基因座的装置(9)包括用于将该第一感兴趣的染色体的这些序列标签分成该第一感兴趣的染色体中的核酸数据箱或基块的组件;以及用于对每一个数据箱中的这些映射标签进行计数的组件。wherein means (8) and (9) are configured to be executed when means (6) for comparing the first fetal fraction value with the second fetal fraction value determines that the first fetal fraction value is not approximately equal to the second fetal fraction value. Preferably, the signal output of means (6) is connected to means (8), and the signal output of means (8) is connected to means (9). In certain embodiments, means (9) for determining the locus of partial aneuploidy on the first chromosome of interest comprises a component for grouping the sequence tags of the first chromosome of interest into nucleic acid data bins or blocks in the first chromosome of interest; and a component for counting the mapped tags in each data bin.
在某些实施方案中,该额外的设备进一步包括一个测序装置(10),该测序装置被配置为对一个母体测试样品(例如,血液、血浆、血清、或尿样品)中的胎儿和母体核酸进行测序并且获得这些序列读数。优选地,胎儿和母体核酸是无细胞DNA(cfDNA)。优选地,测序装置(10)的信号输出端与该装置(1) 相连接。In certain embodiments, the additional apparatus further comprises a sequencing device (10) configured to sequence fetal and maternal nucleic acids in a maternal test sample (e.g., a blood, plasma, serum, or urine sample) and obtain the sequence reads. Preferably, the fetal and maternal nucleic acids are cell-free DNA (cfDNA). Preferably, a signal output of the sequencing device (10) is connected to the apparatus (1).
在某些实施方案中,测序装置(10)被配置为进行合成法测序。可以使用可逆染料终止子进行合成法测序。或者,测序装置(10)被配置为进行连接法测序。或者,测序装置(10)被配置为进行单分子测序。在某些实施方案中,测序装置(10)和该用于分类的额外设备的装置(1)-(6)位于分开的地点中。优选地,测序装置(10)的信号输出端通过一个网络与该装置(1)相连接。In some embodiments, the sequencing apparatus (10) is configured to perform sequencing by synthesis. Sequencing by synthesis can be performed using reversible dye terminators. Alternatively, the sequencing apparatus (10) is configured to perform sequencing by ligation. Alternatively, the sequencing apparatus (10) is configured to perform single molecule sequencing. In some embodiments, the sequencing apparatus (10) and the apparatus (1)-(6) of the additional equipment for sorting are located in separate locations. Preferably, the signal output terminal of the sequencing apparatus (10) is connected to the apparatus (1) via a network.
在某些实施方案中,用于分类的该额外设备进一步包括从怀孕的母亲获取该母体测试样品的装置(11)。装置(11)和装置(1)-(6)可以位于分开的地点中。此外,该额外的设备还可以进一步包括从该母体测试样品提取无细胞DNA 的装置(12)。提取无细胞DNA的装置(12)可以与该测序装置(10)位于同一个地点中,并且其中获取该母体测试样品的装置(11)位于一个远程地点中。In certain embodiments, the additional apparatus for sorting further comprises a device (11) for obtaining the maternal test sample from the pregnant mother. The device (11) and the devices (1)-(6) may be located in separate locations. In addition, the additional apparatus may further comprise a device (12) for extracting cell-free DNA from the maternal test sample. The device (12) for extracting cell-free DNA may be located in the same location as the sequencing device (10), and the device (11) for obtaining the maternal test sample is located in a remote location.
在某些实施方案中,装置(2)比对至少约1百万个读数。In certain embodiments, the device (2) aligns at least about 1 million reads.
试剂盒Reagent test kit
在不同的实施方案中,提供试剂盒用于实施本文所述的方法。在某些实施方案中,这些试剂盒包括一种或多种针对完全的非整倍性和/或部分的非整倍性的阳性内部对照。典型地,但未必,这些对照包括内部阳性对照,这些阳性对照包括欲筛选的类型的核酸序列。例如,用于确定母体样品中存在或不存在胎儿三体性(例如21三体性)的测试的对照可以包括以21三体性为特征的 DNA(例如,获自具有21三体性的个人的DNA)。在一些实施方案中,该对照包括获自两个或更多个具有不同的非整倍性的个人的DNA的混合物。例如,对于确定存在或不存在13三体性、18三体性、21三体性以及X单体性的测试,该对照可包括获自各怀有一个具有所测试的三体性之一的胎儿的孕妇的DNA 样品的组合。除完整的染色体非整倍性之外,还可以产生IPC以针对测试提供阳性对照,以便确定存在或不存在部分的非整倍性。In various embodiments, a kit is provided for implementing the methods described herein. In certain embodiments, these kits include one or more positive internal controls for complete aneuploidy and/or partial aneuploidy. Typically, but not necessarily, these controls include internal positive controls, which include nucleic acid sequences of the type to be screened. For example, a control for determining the presence or absence of a fetal trisomy (e.g., trisomy 21) in a maternal sample may include DNA characterized by trisomy 21 (e.g., DNA obtained from an individual with trisomy 21). In some embodiments, the control includes a mixture of DNA obtained from two or more individuals with different aneuploidies. For example, for determining the presence or absence of trisomy 13, trisomy 18, trisomy 21, and X monosomy, the control may include a combination of DNA samples obtained from a pregnant woman each harboring a fetus with one of the trisomy tested. In addition to complete chromosomal aneuploidy, an IPC may also be generated to provide a positive control for testing to determine the presence or absence of partial aneuploidy.
在某些实施方案中,该(这些)阳性对照包括一种或多种包括21三体性 (T21)和/或18三体性(T18)和/或13三体性(T13)的核酸。在某些实施方案中,包括所存在的各三体性都是T21的核酸提供于分开的容器中。在某些实施方案中,包括两种或更多种三体性的核酸提供于单一容器中。因此,例如,在某些实施方案中,容器可包含T21和T18、T21和T13、T18和T13。在某些实施方案中,容器可含有T18、T21以及T13。在这些不同的实施方案中,三体性可以相等数量/浓度提供。在其他实施方案中,三体性可以特定的预定比率提供。在不同的实施方案中,对照可作为已知浓度的“储备”溶液来提供。In certain embodiments, the positive control(s) include one or more nucleic acids comprising trisomy 21 (T21) and/or trisomy 18 (T18) and/or trisomy 13 (T13). In certain embodiments, nucleic acids comprising T21 for each trisomy present are provided in separate containers. In certain embodiments, nucleic acids comprising two or more trisomies are provided in a single container. Thus, for example, in certain embodiments, a container may contain T21 and T18, T21 and T13, or T18 and T13. In certain embodiments, a container may contain T18, T21, and T13. In these various embodiments, the trisomies may be provided in equal amounts/concentrations. In other embodiments, the trisomies may be provided in specific predetermined ratios. In various embodiments, the controls may be provided as "stock" solutions of known concentrations.
在某些实施方案中,用于检测非整倍性的对照包含获自两个受试者的细胞基因组DNA的混合物,一人是该非整倍体基因组的贡献者。例如,如上文所说明,所产生的作为对照用于确定胎儿三体性(例如21三体性)的测试的内部阳性对照(IPC)可包括来自携带该三体性染色体的男性或女性受试者的基因组DNA与来自已知不携带该三体性染色体的女性受试者的基因组DNA 的组合。在某些实施方案中,剪切该基因组DNA以提供约100-400bp之间、约150-350bp之间或约200-300bp之间的片段来模拟母体样品中的循环cfDNA 片段。In certain embodiments, the control for detecting aneuploidy comprises a mixture of genomic DNA obtained from cells of two subjects, one of whom is a contributor to the aneuploid genome. For example, as described above, the internal positive control (IPC) generated as a control for determining a test for fetal trisomy (e.g., trisomy 21) may include a combination of genomic DNA from a male or female subject carrying the trisomy chromosome and genomic DNA from a female subject known not to carry the trisomy chromosome. In certain embodiments, the genomic DNA is sheared to provide fragments of about 100-400 bp, about 150-350 bp, or about 200-300 bp to simulate circulating cfDNA fragments in maternal samples.
在某些实施方案中,该对照中来自携带非整倍性(例如21三体性)的受试者的片段化的DNA的比例经过选择以模拟母体样品中所发现的循环胎儿 cfDNA的比例,以便提供包括片段化DNA的混合物的IPC,该混合物包括约 5%、约10%、约15%、约20%、约25%、约30%来自携带该非整倍性的受试者的DNA。在某些实施方案中,该对照包括来自各携带不同的非整倍性的不同受试者的DNA。例如,IPC可包括约80%未受影响的女性DNA,并且其余20%可以是来自各携带三体性染色体21、三体性染色体13以及三体性染色体 18的三个不同的受试者的DNA。In certain embodiments, the proportion of fragmented DNA from a subject carrying an aneuploidy (e.g., trisomy 21) in the control is selected to mimic the proportion of circulating fetal cfDNA found in maternal samples, so as to provide an IPC comprising a mixture of fragmented DNA comprising approximately 5%, approximately 10%, approximately 15%, approximately 20%, approximately 25%, or approximately 30% DNA from a subject carrying the aneuploidy. In certain embodiments, the control comprises DNA from different subjects, each carrying a different aneuploidy. For example, an IPC may comprise approximately 80% unaffected female DNA, and the remaining 20% may be DNA from three different subjects, each carrying trisomy 21, trisomy 13, and trisomy 18.
在某些实施方案中,该(这些)对照包括获自已知怀有具有已知的染色体非整倍性的胎儿的母体的cfDNA。例如,这些对照可包括获自怀有具有21 三体性和/或18三体性和/或13三体性的胎儿的孕妇的cfDNA。该cfDNA可以从母体样品中提取,并且克隆到细菌载体中并且在细菌中生长以提供持续不断的IPC来源。作为替代方案,可以通过例如PCR来扩增经克隆的cfDNA。In certain embodiments, the (these) controls include cfDNA obtained from a mother known to be pregnant with a fetus with a known chromosomal aneuploidy. For example, these controls may include cfDNA obtained from a pregnant woman pregnant with a fetus with trisomy 21 and/or trisomy 18 and/or trisomy 13. The cfDNA can be extracted from a maternal sample and cloned into a bacterial vector and grown in bacteria to provide a continuous source of IPCs. As an alternative, the cloned cfDNA can be amplified by, for example, PCR.
虽然试剂盒中所存在的对照是在上文相对于三体性而述,但其无需受此限制。应了解,可以产生试剂盒中所存在的阳性对照来体现其他部分的非整倍性,包括例如不同的区段扩增和/或缺失。因此,例如,在已知不同的癌症与实质上完整的染色体臂的特定扩增或缺失相关的情况下,该(这些)阳性对照可包括染色体1-22、X以及Y中任何一个或多个的短臂或长臂。在某些实施方案中,该对照包括选自下组的一或多个臂的扩增,该组由以下各项组成:1q、3q、 4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、12q、 13q、14q、16p、17p、17q、18p、18q、19p、19q、20p、20q、21q和/或22q (参见例如表2)。Although the control present in the test kit is described above with respect to trisomy, it is not necessary to be limited thereto. It should be understood that the positive control present in the test kit can be produced to embody the aneuploidy of other parts, including, for example, different segment amplifications and/or deletions. Therefore, for example, in the case where it is known that different cancers are associated with the specific amplification or deletion of substantially complete chromosome arms, the (these) positive controls may include any one or more short arms or long arms of chromosomes 1-22, X and Y. In certain embodiments, the control includes the amplification of one or more arms selected from the group consisting of: 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q and/or 22q (see, for example, Table 2).
在某些实施方案中,这些对照包括针对已知与特定扩增或缺失相关的任何区域的非整倍性(例如与20Q13处的扩增相关的乳癌)。说明性区域包括但不限于17q23(与乳癌相关)、19q12(与卵巢癌相关)、1q21-1q23(与肉瘤和不同的实体瘤相关)、8p11-p12(与乳癌相关)、ErbB2扩增子等等。在某些实施方案中,这些对照包括如表3-6中的任一者中所示的染色体区域的扩增或缺失。在某些实施方案中,这些对照包括包含如表3-6中的任一者中所示的基因的染色体区域的扩增或缺失。在某些实施方案中,这些对照包括包含多个核酸序列,这些核酸序列包括包含一或多个癌基因的核酸的扩增。在某些实施方案中,这些对照包括多个核酸序列,这些核酸序列包括包含一或多个选自下组的基因的核酸的扩增,该组的组成为:MYC、ERBB2(EFGR)、CCND1(周期蛋白D1)、FGFR1、FGFR2、HRAS、KRAS、MYB、MDM2、CCNE、KRAS、MET、 ERBB1、CDK4、MYCB、ERBB2、AKT2、MDM2以及CDK4。In certain embodiments, these controls include aneuploidy for any region known to be relevant to a specific amplification or deletion (e.g., breast cancer associated with the amplification at 20Q13). Illustrative regions include, but are not limited to, 17q23 (associated with breast cancer), 19q12 (associated with ovarian cancer), 1q21-1q23 (associated with sarcoma and different solid tumors), 8p11-p12 (associated with breast cancer), ErbB2 amplicon, etc. In certain embodiments, these controls include amplification or deletion of the chromosomal region as shown in any one of Tables 3-6. In certain embodiments, these controls include amplification or deletion of the chromosomal region comprising the gene as shown in any one of Tables 3-6. In certain embodiments, these controls include amplification or deletion of the chromosomal region comprising a gene as shown in any one of Tables 3-6. In certain embodiments, these controls include a plurality of nucleic acid sequences comprising amplification of the nucleic acid comprising one or more oncogenes. In certain embodiments, the controls comprise nucleic acid sequences comprising an amplification of nucleic acids comprising one or more genes selected from the group consisting of: MYC, ERBB2 (EFGR), CCND1 (Cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2, and CDK4.
上述对照意在是说明性的而不是限制性的。使用本文所提供的传授的内容,本领域的普通技术人员能够识别适合结合到试剂盒中的许多其他对照。The above controls are intended to be illustrative and not limiting. Using the teachings provided herein, one of ordinary skill in the art will be able to identify numerous other controls suitable for incorporation into the kit.
在不同的实施方案中,除这些对照以外或作为这些对照的替代,这些试剂盒包括一种或多种提供适合追踪并且确定样品完整性的标记物序列的核酸和/或核酸模拟物。在某些实施方案中,这些标记物包括反基因链序列。在某些实施方案中,这些标记物序列的长度在约30bp到多达约600bp长度或约100 bp到约400bp长度范围内。在某些实施方案中,该(这些)标记物序列的长度是至少30bp(或nt)。在某些实施方案中,该标记物连接到适配子,并且该适配子连接的标记物分子的长度在约200bp(或nt)与约600bp(或nt)之间、在约250bp(或nt)与550bp(或nt)之间、在约300bp(或nt)与500bp (或nt)之间或在约350与450之间。在某些实施方案中,该适配子连接的标记物分子的长度为约200bp(或nt)。在某些实施方案中,标记物分子的长度可以为约150bp(或nt)、约160bp(或nt)、170bp(或nt)、约180bp(或 nt)、约190bp(或nt)或约200bp(或nt)。在某些实施方案中,标记物的长度在约600bp(或nt)范围内。In different embodiments, except these controls or as a substitute for these controls, these test kits include one or more nucleic acids and/or nucleic acid mimics that provide the marker sequence suitable for tracking and determining sample integrity. In certain embodiments, these markers include anti-gene chain sequences. In certain embodiments, the length of these marker sequences is about 30bp to as much as about 600bp length or about 100 bp to about 400bp length range. In certain embodiments, the length of this (these) marker sequence is at least 30bp (or nt). In certain embodiments, this marker is connected to adaptor, and the length of the marker molecule that this adaptor is connected is between about 200bp (or nt) and about 600bp (or nt), between about 250bp (or nt) and 550bp (or nt), between about 300bp (or nt) and 500bp (or nt) or between about 350 and 450. In certain embodiments, the length of the marker molecule that this adaptor is connected is about 200bp (or nt). In certain embodiments, the length of the marker molecule can be about 150bp (or nt), about 160bp (or nt), 170bp (or nt), about 180bp (or nt), about 190bp (or nt) or about 200bp (or nt). In certain embodiments, the length of the marker molecule is in the scope of about 600bp (or nt).
在某些实施方案中,该试剂盒提供至少两个、或至少三个、或至少四个、或至少五个、或至少六个、或至少七个、或至少八个、或至少九个、或至少十个、或至少11个、或至少12个、或至少13个、或至少14个、或至少15个、或至少16个、或至少17个、或至少18个、或至少19个、或至少20个、或至少25个、或至少30个、或至少35个、或至少40个、或至少50个不同的序列。提供该(这些)标记物序列的不同的核酸和/或核酸模拟物可以存储在分开的容器/瓶子中。可替代地,不同的标记物分子可以保存在相同的容器/瓶子中。In certain embodiments, this test kit provides at least two, or at least three, or at least four, or at least five, or at least six, or at least seven, or at least eight, or at least nine, or at least ten, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 18, or at least 19, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 50 different sequences. Different nucleic acids and/or nucleic acid mimics providing this (these) marker sequence can be stored in separate containers/bottles. Alternatively, different marker molecules can be kept in identical containers/bottles.
在不同的实施方案中,这些标记物包括一种或多种DNA,或这些标记物包括一种或多种DNA模拟物。适合模拟物包括但不限于吗啉基衍生物、肽核酸(PNA)以及磷硫酰DNA。在不同的实施方案中,将这些标记物结合到这些对照中。在某些实施方案中,将这些标记物结合到适配子中和/或提供连接到适配子。In different embodiments, these markers include one or more DNA, or these markers include one or more DNA analogies. Suitable analogies include but are not limited to morpholino derivatives, peptide nucleic acids (PNA) and phosphorothioate DNA. In different embodiments, these markers are incorporated into these controls. In certain embodiments, these markers are incorporated into aptamers and/or provided to be connected to aptamers.
在某些实施方案中,该试剂盒进一步包括一或多种测序适配子。这些适配子包括但不限于编索引的测序适配子。在某些实施方案中,这些适配子包括单股臂,该单股臂包括一个索引序列以及一个或多个PCR引发位点。In certain embodiments, the kit further comprises one or more sequencing adapters. These adapters include but are not limited to indexed sequencing adapters. In certain embodiments, these adapters comprise a single arm comprising an index sequence and one or more PCR priming sites.
在某些实施方案中,该试剂盒进一步包含一个样品收集装置用于收集生物样品。在某些实施方案中,该样品收集装置包括一个用于收集血液的装置和可任选地,一个用于盛放血液的容器。在某些实施方案中,该试剂盒包括一个用于盛放血液的容器,并且该容器包括抗凝血剂和/或细胞固定剂和/或一种或多种反基因链标记物序列。In certain embodiments, the kit further comprises a sample collection device for collecting a biological sample. In certain embodiments, the sample collection device comprises a device for collecting blood and, optionally, a container for holding the blood. In certain embodiments, the kit comprises a container for holding the blood, and the container comprises an anticoagulant and/or a cell fixative and/or one or more antigenic strand marker sequences.
在某些实施方案中,该试剂盒进一步包括DNA提取试剂(例如分离基质和/或洗提溶液)。该试剂盒还可以包括用于对文库制备进行测序的试剂。这些试剂包括但不限于用于末端修复DNA的溶液和/或用于dA尾DNA的溶液和/ 或用于适配子连接DNA的溶液。In certain embodiments, the kit further includes a DNA extraction reagent (e.g., a separation matrix and/or an elution solution). The kit may also include reagents for sequencing the library preparation. These reagents include, but are not limited to, a solution for end-repair DNA and/or a solution for dA-tailed DNA and/or a solution for adaptor-ligated DNA.
在某些实施方案中,该试剂盒进一步包括一种包含一个或多个引物集合的组合物,这个或这些引物集合用于对母体样品中的至少一个预先选定的多态核酸进行扩增,其中每一个预先选定的多态核酸包括至少一个多态位点,并且其中每一个引物集合中的正向或反向引物与一个足够接近所述多态位点的DNA序列杂交以包括在通过对经过扩增的预先选定的多态核酸进行所述大规模平行测序所产生的序列读数内。对经过扩增的预先选定的多态序列进行测序可如在本申请的其他地方所述,用于确定母体样品中的胎儿分数。预先选定的多态核酸可以包含SNP或STR。在某些实施方案中,每一个所述引物集合中的至少一个引物被设计成能识别在约25bp、约40bp、约50bp或约100bp的序列读数内存在的一个多态位点。在某些实施方案中,引物集合与所述DNA序列杂交,产生至少约100bp、至少约150bp或至少约200bp的扩增子。引物集合可以与在相同染色体上存在的DNA序列杂交,或引物集合可以与在不同染色体上存在的DNA序列杂交。在某些实施方案中,引物集合不与在染色体13、18、21、 X或Y上存在的DNA序列杂交。In certain embodiments, the kit further comprises a composition comprising one or more primer sets for amplifying at least one preselected polymorphic nucleic acid in a maternal sample, wherein each preselected polymorphic nucleic acid comprises at least one polymorphic site, and wherein the forward or reverse primer in each primer set hybridizes to a DNA sequence sufficiently proximal to the polymorphic site to be included in the sequence read generated by massively parallel sequencing of the amplified preselected polymorphic nucleic acid. Sequencing the amplified preselected polymorphic sequence can be used to determine the fetal fraction in the maternal sample as described elsewhere herein. The preselected polymorphic nucleic acid can comprise a single nucleotide polymorphism (SNP) or a streptavidin (STR). In certain embodiments, at least one primer in each primer set is designed to recognize a polymorphic site within a sequence read of approximately 25 bp, approximately 40 bp, approximately 50 bp, or approximately 100 bp. In certain embodiments, the primer set hybridizes to the DNA sequence to produce an amplicon of at least about 100 bp, at least about 150 bp, or at least about 200 bp. The primer set can hybridize to a DNA sequence present on the same chromosome, or the primer set can hybridize to a DNA sequence present on a different chromosome. In certain embodiments, the primer set does not hybridize to a DNA sequence present on chromosomes 13, 18, 21, X or Y.
为实施这些方法并且与如在此所述的多种装置组合使用而提供的试剂盒的实施方案图示于图67和68中。在一个实施方案中,试剂盒为确定胎儿分数而提供。如图67中所示,试剂盒包括一个试剂盒主体(1)、安排在试剂盒主体中用于放置瓶子的夹钳槽、包括内部阳性对照的瓶子(2);包括适合于追踪和确定样品完整性的标记物核酸的瓶子(3)以及包括缓冲溶液的瓶子(4)。Embodiments of kits provided for practicing these methods and for use in combination with various devices as described herein are illustrated in Figures 67 and 68. In one embodiment, a kit is provided for determining fetal fraction. As shown in Figure 67, the kit includes a kit body (1), a clamping groove arranged in the kit body for receiving a bottle, a bottle (2) containing an internal positive control; a bottle (3) containing a marker nucleic acid suitable for tracking and determining sample integrity; and a bottle (4) containing a buffer solution.
试剂盒可以包括多个额外的瓶子,其中所述多个瓶子中的每一个包括不同的内部阳性对照或不同的标记物核酸。The kit can include a plurality of additional bottles, wherein each of the plurality of bottles includes a different internal positive control or a different marker nucleic acid.
在某些实施方案中,瓶子(2)包括两个或更多个内部阳性对照。该内部阳性对照包括选自下组的三体性,该组由以下各项组成:三体性21、三体性18、三体性21、三体性13、三体性16、三体性13、三体性9、三体性8、三体性22、 XXX、XXY以及XYY。在某些实施方案中,内部阳性对照包括选自下组的三体性,该组由以下各项组成:三体性21(T21)、三体性18(T18)以及三体性13(T13)。在其他实施方案中,加载到瓶子(2)中的内部阳性对照包括三体性21(T21)、三体性18(T18)以及三体性13(T13)。可替代地,试剂盒中所包括的阳性对照可以包括染色体1到22、X和Y中的一个或多个的一部分的扩增或缺失。在某些实施方案中,阳性对照包括染色体1到22、X和Y 中任意一个或多个的一个短臂或一个长臂的扩增或缺失。在某些实施方案中,瓶子(2)包括选自下组的一个或多个臂的扩增或缺失,该组由以下各项组成: 1q、3q、4p、4q、5p、5q、6p、6q、7p、7q、8p、8q、9p、9q、10p、10q、12p、 12q、13q、14q、16p、17p、17q、18p、18q、19p、19q、20p、20q、21q和22q。在其他实施方案中,瓶子(2)包括选自下组的一个区域的扩增,该组由以下各项组成:20Q13、19q12、1q21-1q23、8p11-p12和ErbB2。可替代地,加载到瓶子(2)中的阳性对照包括在表3、表4、表5以及表6中展示的一个区域或一个基因的扩增。在某些实施方案中,加载到瓶子(2)中的阳性对照包括选自下组的一个区域或一个基因的扩增,该组由以下各项组成:MYC、ERBB2 (EFGR)、CCND1(周期素D1)、FGFR1、FGFR2、HRAS、KRAS、MYB、 MDM2、CCNE、KRAS、MET、ERBB1、CDK4、MYCB、ERBB2、AKT2、MDM2 和CDK4。In certain embodiments, the bottle (2) includes two or more internal positive controls. The internal positive controls include a trisomy selected from the group consisting of trisomy 21, trisomy 18, trisomy 21, trisomy 13, trisomy 16, trisomy 13, trisomy 9, trisomy 8, trisomy 22, XXX, XXY, and XYY. In certain embodiments, the internal positive controls include a trisomy selected from the group consisting of trisomy 21 (T21), trisomy 18 (T18), and trisomy 13 (T13). In other embodiments, the internal positive controls loaded into the bottle (2) include trisomy 21 (T21), trisomy 18 (T18), and trisomy 13 (T13). Alternatively, the positive controls included in the kit may include amplification or deletion of a portion of one or more of chromosomes 1 to 22, X, and Y. In certain embodiments, the positive control comprises an amplification or deletion of one short arm or one long arm of any one or more of chromosomes 1 to 22, X, and Y. In certain embodiments, bottle (2) comprises an amplification or deletion of one or more arms selected from the group consisting of 1q, 3q, 4p, 4q, 5p, 5q, 6p, 6q, 7p, 7q, 8p, 8q, 9p, 9q, 10p, 10q, 12p, 12q, 13q, 14q, 16p, 17p, 17q, 18p, 18q, 19p, 19q, 20p, 20q, 21q, and 22q. In other embodiments, bottle (2) comprises an amplification of a region selected from the group consisting of 20Q13, 19q12, 1q21-1q23, 8p11-p12, and ErbB2. Alternatively, the positive control loaded into bottle (2) comprises amplification of a region or a gene presented in Tables 3, 4, 5, and 6. In certain embodiments, the positive control loaded into bottle (2) comprises amplification of a region or a gene selected from the group consisting of MYC, ERBB2 (EFGR), CCND1 (Cyclin D1), FGFR1, FGFR2, HRAS, KRAS, MYB, MDM2, CCNE, KRAS, MET, ERBB1, CDK4, MYCB, ERBB2, AKT2, MDM2, and CDK4.
试剂盒的多个实施方案中所包括的标记物核酸(又名标记物分子(MM)) 是反基因链标记物序列。这些标记物序列的长度可以在从约30bp到约600bp 长度范围内。在其他实施方案中,这些标记物序列的长度在从约100bp到约 400bp长度范围内。在某些实施方案中,该试剂盒包括至少2个,或至少3个,或至少4个,或至少5个,或至少6个,或至少7个,或至少8个,或至少9 个,或至少10个,或至少11个,或至少12个,或至少13个,或至少14个,或至少15个,或至少16个,或至少17个,或至少18个,或至少19个,或至少20个,或至少25个,或至少30个,或至少35个,或至少40个,或至少50个用于不同的标记物序列的瓶子。The marker nucleic acids (also known as marker molecules (MM)) included in the multiple embodiments of the test kit are antigenic strand marker sequences. The length of these marker sequences can be in the range of from about 30bp to about 600bp in length. In other embodiments, the length of these marker sequences is in the range of from about 100bp to about 400bp in length. In certain embodiments, the test kit includes at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 8, or at least 9, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 18, or at least 19, or at least 20, or at least 25, or at least 30, or at least 35, or at least 40, or at least 50 bottles for different marker sequences.
在某些实施方案中,试剂盒中所包括的标记物包含一个或多个DNA。在其他实施方案中,标记物包含一个或多个选自下组的模拟物,该组由以下各项组成:吗啉代衍生物、肽核酸(PNA)和磷硫酰DNA。In certain embodiments, the labels included in the kit comprise one or more DNAs. In other embodiments, the labels comprise one or more mimetics selected from the group consisting of morpholino derivatives, peptide nucleic acids (PNAs), and phosphorothioate DNAs.
在某些实施方案中,标记物结合到所述对照中。在其他实施方案中,标记物结合到适配子中。在某些实施方案中,试剂盒的瓶子(3)可以进一步加载一个或多个测序适配子。适配子包括编索引的测序适配子。这些适配子可以进一步包括单股臂,该单股臂包括一个索引序列以及一个或多个PCR引发位点。In certain embodiments, the marker is incorporated into the control. In other embodiments, the marker is incorporated into the aptamer. In certain embodiments, the bottle (3) of the kit can be further loaded with one or more sequencing aptamers. The aptamers include indexed sequencing aptamers. These aptamers can further include a single arm comprising an index sequence and one or more PCR priming sites.
图68展示试剂盒的简图,该试剂盒可以进一步包括一个用于收集生物样品的样品收集装置。该样品收集装置包括一个用于收集血液的装置(5)以及一个用于盛放血液的容器(6)。在某些实施方案中,该用于收集血液的装置和所述用于盛放血液的容器包括抗凝血剂和细胞固定剂。Figure 68 shows a schematic diagram of a kit, which may further include a sample collection device for collecting a biological sample. The sample collection device includes a device for collecting blood (5) and a container for holding blood (6). In certain embodiments, the device for collecting blood and the container for holding blood include an anticoagulant and a cell fixative.
在某些实施方案中,试剂盒可以进一步包括瓶子(7),该瓶子(7)加载有DNA提取试剂。该(这些)DNA提取试剂可以包括一种分离基质和/或一种洗脱溶液。In certain embodiments, the kit may further comprise a bottle (7) loaded with DNA extraction reagents. The DNA extraction reagent(s) may comprise a separation matrix and/or an elution solution.
在某些实施方案中,该试剂盒进一步包括瓶子(8),该瓶子(8)加载有用于制备测序文库的试剂。用于制备测序文库的这些试剂可以包括用于末端修复 DNA的溶液、用于对DNA进行dA加尾的溶液以及用于对DNA进行适配子连接的溶液。In certain embodiments, the kit further comprises a bottle (8) loaded with reagents for preparing a sequencing library. These reagents for preparing a sequencing library may include a solution for end-repairing DNA, a solution for dA-tailing DNA, and a solution for adaptor ligation of DNA.
在其他实施方案中,该试剂盒进一步包括瓶子(9),该瓶子(9)包括用于对预定目标核酸进行扩增的引物的组合物。In other embodiments, the kit further comprises a bottle (9) comprising a composition of primers for amplifying a predetermined target nucleic acid.
在某些实施方案中,该试剂盒进一步包括传授使用所述试剂来确定生物样品中的胎儿分数的指导材料。这些指导材料传授了使用这些材料来检测三体性或单体性。在某些实施方案中,这些指导材料传授了使用这些材料来检测癌症或癌症的易患性。In certain embodiments, the kit further comprises instructional materials that teach the use of the reagents to determine fetal fraction in a biological sample. These instructional materials teach the use of these materials to detect trisomy or monosomy. In certain embodiments, these instructional materials teach the use of these materials to detect cancer or a predisposition to cancer.
另外,这些试剂盒可任选地包括标记和/或指导材料,为使用该试剂盒中所提供的试剂和/或装置提供指导(例如方案)。例如,这些指导材料可传授使用这些试剂来制备样品和/或确定生物样品中的拷贝数变异。在某些实施方案中,这些指导材料传授使用这些材料来检测三体性。在某些实施方案中,这些指导材料传授使用这些材料来检测癌症或癌症的易患性。In addition, these kits may optionally include labels and/or instructional materials that provide guidance (e.g., protocols) for the use of the reagents and/or devices provided in the kit. For example, these instructional materials can teach the use of these reagents to prepare samples and/or determine copy number variation in biological samples. In certain embodiments, these instructional materials teach the use of these materials to detect trisomy. In certain embodiments, these instructional materials teach the use of these materials to detect cancer or a predisposition to cancer.
虽然不同的试剂盒中的指导材料典型地包括手写的或印刷的材料,但它们不限于此。本文涵盖能够存储这些指令并且将它们与最终使用者联通的任何媒体。这些媒体包括但不限于电子存储媒体(例如磁碟、磁带、拾音头、芯片)、光学媒体(例如CD ROM)等。这些媒体可包括到达提供这些指导材料的因特网站点的地址。While the instructional materials in the various kits typically include handwritten or printed materials, they are not limited thereto. Any medium capable of storing these instructions and communicating them to an end user is contemplated herein. These media include, but are not limited to, electronic storage media (e.g., magnetic disks, tapes, pickups, chips), optical media (e.g., CD ROMs), and the like. These media may include addresses to Internet sites providing these instructional materials.
在以下实例中进一步详细地描述不同的方法、装置、系统以及用途,这些实例决不意在限制所要求的本发明范围。附图希望被视作本说明书和本发明说明的组成部分。提供以下实例以说明而不是限制所要求的本发明。The various methods, devices, systems, and uses are further described in detail in the following examples, which are in no way intended to limit the scope of the claimed invention. The accompanying drawings are intended to be considered an integral part of this specification and the description of the present invention. The following examples are provided to illustrate, but not to limit, the claimed invention.
实验experiment
实例1Example 1
样品处理和cfDNA提取Sample processing and cfDNA extraction
从处于妊娠期的第一个三月期或第二个三月期并且被认为存在胎儿非整倍性风险的孕妇体内收集外周血样品。在抽血前从各参与者处获得同意书。在羊膜穿刺或绒膜绒毛采样前收集血液。使用绒膜绒毛或羊膜穿刺样品进行核型分析以确定胎儿核型。Peripheral blood samples were collected from pregnant women in the first or second trimester who were considered at risk for fetal aneuploidy. Written consent was obtained from each participant before blood draw. Blood was collected before amniocentesis or chorionic villus sampling. Karyotyping was performed on the chorionic villus or amniocentesis samples to determine the fetal karyotype.
将从各受试者抽取的外周血收集在ACD管中。将一管血样(约6到9毫升/管)转移到一个15毫升低速离心管中。使用贝克曼Allegra 6R离心机和 GA3.8型转子,在2640rpm、4℃下将血液离心10分钟。Peripheral blood was drawn from each subject and collected in an ACD tube. One blood sample (approximately 6 to 9 ml/tube) was transferred to a 15 ml low-speed centrifuge tube. The blood was centrifuged at 2640 rpm and 4°C for 10 minutes using a Beckman Allegra 6R centrifuge and a GA3.8 rotor.
对于无细胞血浆提取,将上部血浆层转移到15毫升高速离心管中,并且使用贝克曼库尔特Avanti J-E离心机和JA-14转子,在16000×g、4℃下离心 10分钟。在血液收集后,在72小时内进行两个离心步骤。将包含cfDNA的无细胞血浆存储在-80℃下,并且在血浆cfDNA扩增或cfDNA纯化前只解冻一次。For cell-free plasma extraction, the upper plasma layer was transferred to a 15-ml high-speed centrifuge tube and centrifuged at 16,000 × g at 4°C for 10 minutes using a Beckman Coulter Avanti J-E centrifuge and a JA-14 rotor. Two centrifugation steps were performed within 72 hours of blood collection. Cell-free plasma containing cfDNA was stored at −80°C and thawed only once before plasma cfDNA amplification or cfDNA purification.
使用QIAamp血液DNA小型试剂盒(凯杰)(QIAamp Blood DNA Mini kit(Qiagen)),基本上根据制造商说明书从无细胞血浆中提取经过纯化的无细胞 DNA(cfDNA)。将一毫升缓冲液AL和100μl蛋白酶溶液添加到1ml血浆中。在56℃下将该混合物孵育15分钟。将一毫升100%乙醇添加到血浆消化液中。将所得混合物转移到与QIAvac24Plus柱组合件(凯杰)(QIAvac 24Plus column assembly(Qiagen))中所提供的VacValve和VacConnector组合的QIAamp微型柱中。向样品施加真空,并且在真空下用750μl缓冲液AW1对截留在柱过滤器上的cfDNA进行洗涤,继而用750μl缓冲液AW24进行第二次洗涤。在14,000 RPM下将该柱离心5分钟以便从过滤器中去除任何残余缓冲液。通过在14,000 RPM下离心用缓冲液AE洗提cfDNA,并且使用QubitTM量化平台(QubitTM QuantitationPlatform)(英杰(Invitrogen))确定浓度。Purified cell-free DNA (cfDNA) was extracted from cell-free plasma using the QIAamp Blood DNA Mini kit (Qiagen) essentially according to the manufacturer's instructions. One milliliter of buffer AL and 100 μl of protease solution were added to 1 ml of plasma. The mixture was incubated at 56° C. for 15 minutes. One milliliter of 100% ethanol was added to the plasma digest. The resulting mixture was transferred to a QIAamp mini column combined with the VacValve and VacConnector provided in the QIAvac 24 Plus column assembly (Qiagen). Vacuum was applied to the sample, and the cfDNA trapped on the column filter was washed under vacuum with 750 μl of buffer AW1, followed by a second wash with 750 μl of buffer AW24. The column was centrifuged at 14,000 RPM for 5 minutes to remove any residual buffer from the filter. cfDNA was eluted with Buffer AE by centrifugation at 14,000 RPM, and the concentration was determined using the Qubit ™ Quantitation Platform (Invitrogen).
实例2Example 2
初始和经过富集的测序文库的制备和测序Preparation and sequencing of naive and enriched sequencing libraries
a.制备测序文库-缩短规约(ABB)a. Preparation of sequencing library - shortened protocol (ABB)
所有测序文库,即初始和经过富集的文库,都由从母体血浆中提取的约2 ng经过纯化的cfDNA制备。使用试剂NEBNextTM DNA样品制备DNA试剂集 1(NEBNextTM DNA SamplePrep DNA Reagent Set 1)(物品编号E6000L;纽英伦生物实验室(New England Biolabs),伊普斯威奇,马萨诸塞州)如下的进行文库制备。因为无细胞血浆DNA实际上是成片段的,因此不再通过喷雾法或声处理使该血浆DNA样品成片段。根据末端修复模块(End Repair Module),通过将cfDNA与NEBNextTM DNA样品制备DNA试剂集1中所提供的5μl 10×磷酸化缓冲液、2μl脱氧核苷酸溶液混合物(10mM每dNTP)、1μl1:5DNA多聚酶I稀释液、1μl T4DNA多聚酶以及1μl T4多核苷酸激酶一起在1.5ml微量离心管中在20℃下孵育15分钟,将40μl中所含的约2ng经纯化cfDNA片段的突出端转化成经过磷酸化的钝端。然后通过在75℃将该反应混合物孵育5分钟对该酶进行热灭活。将该混合物冷却到4℃,并且使用10μl包含克列诺片段(3'到5'exo minus)的dA 加尾主混合液(NEBNextTMDNA样品制备DNA试剂集1)并且在37℃下孵育15分钟来实现钝端DNA的dA加尾。随后,通过在75℃下将该反应混合物孵育5分钟对克列诺片段进行热灭活。在克列诺片段灭活之后,使用NEBNextTM DNA样品制备DNA试剂集1中所提供的4μl T4DNA连接酶,通过在25℃下将反应混合物孵育15分钟而使用1μl伊鲁米纳基因组适配子寡聚混合物(Illumina GenomicAdaptor Oligo Mix)(物品编号1000521;伊鲁米纳公司,海沃德,加利福尼亚州)的1:5稀释液将伊鲁米纳适配子(非索引Y适配子(Non-Index Y-Adaptors))到带dA尾的DNA。将该混合物冷却到4℃,并且使用安金科特(Agencourt)AMPure XP PCR纯化系统(物品编号A63881;贝克曼库尔特基因组,丹弗斯,马萨诸塞州)中所提供的磁珠,从未连接的适配子、适配子二聚体以及其他试剂中纯化出经适配子连接的cfDNA。使用高保真主混合液(25μl;芬姿美(Finnzymes),沃本,马萨诸塞州) 和补偿适配子的伊鲁米纳PCR引物(各0.5μM)(物品编号1000537和1000537) 进行18次PCR循环以便选择性地富集适配子连接的cfDNA(25μl)。使用伊鲁米纳基因组PCR引物(物品编号100537和1000538)和NEBNextTM DNA 样品制备DNA试剂集1中所提供的Phusion HF PCR主混合液,根据制造商说明书对适配子连接的DNA进行PCR(98℃,30秒;98℃,10秒,18次循环; 65℃,30秒;以及72℃,30秒;在72℃下最终延伸5分钟,并且保持在4℃)。使用安金科特AMPure XP PCR纯化系统(Agencourt AMPure XPPCR purification system)(安金科特生物科技公司(Agencourt BioscienceCorporation),比利佛,马萨诸塞州),根据可在 www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf处获得的制造商说明书来纯化经过扩增的产品。在40μl凯杰EB缓冲液(Qiagen EB BufferQiagen EB Buffer)中洗提经过纯化的扩增产品,并且使用针对2100生物分析器(安捷伦技术公司(Agilent technologies Inc.),圣克拉拉,加利福尼亚州)的安捷伦DNA 1000试剂盒来分析扩增文库的浓度和大小分布。All sequencing libraries, both initial and enriched, were prepared from approximately 2 ng of purified cfDNA extracted from maternal plasma. The library preparation was performed using the NEBNext ™ DNA SamplePrep DNA Reagent Set 1 (Article No. E6000L; New England Biolabs, Ipswich, MA) as follows. Because cell- free plasma DNA is essentially fragmented, the plasma DNA sample was no longer fragmented by nebulization or sonication. According to the End Repair Module, the overhangs of approximately 2 ng of purified cfDNA fragments contained in 40 μl were converted to phosphorylated blunt ends by incubating the cfDNA with 5 μl of 10× phosphorylation buffer, 2 μl of deoxynucleotide solution mixture (10 mM each dNTP), 1 μl of 1:5 DNA polymerase I dilution, 1 μl of T4 DNA polymerase, and 1 μl of T4 polynucleotide kinase in a 1.5 ml microcentrifuge tube at 20°C for 15 minutes. The enzyme was then heat-inactivated by incubating the reaction mixture at 75°C for 5 minutes. The mixture was cooled to 4°C and dA-tailing of the blunt-ended DNA was achieved using 10 μl of dA-tailing master mix containing Klenow fragment (3' to 5' exo minus) (NEBNext ™ DNA Sample Prep DNA Reagent Set 1) and incubated at 37°C for 15 minutes. Subsequently, the Klenow fragments were heat-inactivated by incubating the reaction mixture at 75°C for 5 minutes. After inactivation of the Klenow fragments, 4 μl of T4 DNA ligase provided in the NEBNext ™ DNA Sample Preparation DNA Reagent Set 1 were used to ligate the dA-tailed DNA using 1 μl of a 1:5 dilution of Illumina Genomic Adaptor Oligo Mix (Article No. 1000521; Illumina, Hayward, CA) by incubating the reaction mixture at 25°C for 15 minutes. Illumina adaptors (Non-Index Y-Adaptors) were ligated to the dA-tailed DNA using 1 μl of a 1:5 dilution of Illumina Genomic Adaptor Oligo Mix (Article No. 1000521; Illumina, Hayward, CA). The mixture was cooled to 4°C and the magnetic beads provided in the Agencourt AMPure XP PCR Purification System (Article No. A63881; Beckman Coulter Genomics, Danvers, MA) were used to purify the adapter-ligated cfDNA from unligated adapters, adapter dimers, and other reagents. 18 cycles of PCR were performed using High Fidelity Master Mix (25 μl; Finnzymes, Woburn, MA) and Illumina PCR primers (0.5 μM each) (Article Nos. 1000537 and 1000537) to selectively enrich for adapter-ligated cfDNA (25 μl). PCR was performed on adapter-ligated DNA using Illumina genomic PCR primers (Article Nos. 100537 and 1000538) and Phusion HF PCR Master Mix provided in the NEBNext ™ DNA Sample Preparation DNA Reagent Set 1 according to the manufacturer's instructions (98°C, 30 sec; 98°C, 10 sec, 18 cycles; 65°C, 30 sec; and 72°C, 30 sec; final extension at 72°C for 5 min and hold at 4°C). Use Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Bilifford, Massachusetts), purify the product through amplification according to the manufacturer's instructions that can obtain at www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf place.In 40 μ l Qiagen EB buffer (Qiagen EB BufferQiagen EB Buffer), elute the amplified product through purification, and use the Agilent DNA 1000 test kit for 2100 bioanalyzers (Agilent technologies Inc., Santa Clara, California) to analyze the concentration and the size distribution of amplified library.
b.制备测序文库-全长规约b. Preparation of sequencing library - full-length protocol
此处描述的全长规约基本上是伊鲁米纳提供的标准规约,并且仅在扩增文库的纯化方面与伊鲁米纳规约不同。伊鲁米纳规约指示,使用凝胶电泳纯化扩增文库,而本文所述的规约使用磁珠来进行相同纯化步骤。使用针对的NEBNextTM DNA样品制备DNA试剂集1(物品编号E6000L;纽英伦生物实验室,伊普斯威奇,马萨诸塞州),基本上根据制造商说明书,使用约2ng经过纯化的从母体血浆中提取的cfDNA来制备初始测序文库。除了对适配子连接产品进行最终纯化(该步骤是使用安金科特磁珠和试剂而不是纯化柱进行)以外,所有步骤都根据基因组DNA文库样品制备用NEBNextTM试剂所附的规约来进行,该DNA文库使用GAII来测序。NEBNextTM规约基本上沿袭伊鲁米纳所提供的规约,伊鲁米纳规约可在 grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pdf处获得。The full-length protocol described here is basically the standard protocol provided by Illumina, and is different from the Illumina protocol only in the purification of the amplified library. The Illumina protocol indicates that the amplified library is purified using gel electrophoresis, while the protocol described herein uses magnetic beads to carry out the same purification step. Using NEBNext ™ DNA sample preparation DNA reagent set 1 (article number E6000L; New England Biolabs, Ipswich, Massachusetts), basically according to the manufacturer's instructions, about 2ng of purified cfDNA extracted from maternal plasma is used to prepare the initial sequencing library. Except that the adapter connection product is finally purified (this step is carried out using Anjinkote magnetic beads and reagents rather than purification columns), all steps are carried out according to the protocol attached to the genomic DNA library sample preparation NEBNext ™ reagent, and the DNA library is sequenced using GAII. The NEBNext ™ protocol essentially follows the protocol provided by ILUMINA, which is available at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pdf.
根据末端修复模块,通过将40μl cfDNA与NEBNextTM DNA 样品制备DNA试剂集1中所提供的5μl 10×磷酸化缓冲液、2μl脱氧核苷酸溶液混合物(10mM每dNTP)、1μl 1:5DNA多聚酶I稀释液、1μl T4DNA多聚酶以及1μl T4多核苷酸激酶一起在200μl微量离心管中在循环加热器中在 20℃下孵育30分钟,将40μl中所含的约2ng经纯化cfDNA片段的突出端转化成经过磷酸化的钝端。将样品冷却到4℃,并且使用QIAQuick PCR纯化试剂盒(凯杰公司,巴伦西亚,加利福尼亚州)中所提供的QIAQuick柱如下进行纯化。将50μl反应物转移到1.5ml微量离心管中,并且添加250μl凯杰缓冲液PB。将所得300μl转移到QIAquick柱中,在微量离心机中在13,000RPM 下将其离心1分钟。用750μl凯杰缓冲液PE对该柱进行洗涤,并且再离心。通过在13,000RPM下附加离心5分钟来去除残余乙醇。在39μl凯杰缓冲液EB中通过离心来洗提DNA。使用16μl包含克列诺片段(3’到5’exo minus) 的dA加尾主混合液(NEBNextTM DNA样品制备DNA试剂集1)并且根据制造商的dA加尾模组,在37℃下孵育30分钟来实现34μl钝端DNA 的dA加尾。将样品冷却到4℃,并且使用MinElute PCR纯化试剂盒(凯杰公司,巴伦西亚,加利福尼亚州)中所提供的柱如下进行纯化。将50μl反应物转移到1.5ml微量离心管中,并且添加250μl凯杰缓冲液PB。将300μl转移到MinElute柱中,在微量离心机中在13,000RPM下将其离心1分钟。用750μl 凯杰缓冲液PE对该柱进行洗涤,并且再离心。通过在13,000RPM下再离心5 分钟来去除残余乙醇。在15μl凯杰缓冲液EB中通过离心洗提DNA。根据快速连接模组,将十微升DNA洗提液与1μl 1:5伊鲁米纳基因组适配子寡聚混合物稀释液(物品编号1000521)、15μl 2X快速连接反应缓冲液以及4μl快速T4DNA连接酶一起在25℃下孵育15分钟。将样品冷却到4℃,并且使用MinElute柱如下进行纯化。将一百五十微升凯杰缓冲液PE添加到30 μl反应物中,并且将整个体积转移到MinElute柱中,在微量离心机中在13,000 RPM下将其离心1分钟。用750μl凯杰缓冲液PE对该柱进行洗涤,并且再离心。通过在13,000RPM下再离心5分钟来去除残余乙醇。在28μl凯杰缓冲液 EB中通过离心洗提DNA。使用伊鲁米纳基因组PCR引物(物品编号100537 和1000538)和NEBNextTM DNA样品制备DNA试剂集1中所提供的Phusion HF PCR主混合液,根据制造商说明书对二十三微升经适配子连接的DNA洗提液进行18次PCR循环(98℃,30秒;98℃,10秒,18次循环;65℃,30秒;以及72℃,30秒;在72℃下最终延伸5分钟,并且保持在4℃)。使用安金科特AMPure XP PCR纯化系统(安金科特生物科技公司,比利佛,马萨诸塞州),根据可在 www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf处获得的制造商说明书来纯化扩增产品。安金科特AMPure XP PCR纯化系统将去除未结合的dNTP、引物、引物二聚物、盐以及其他污染物,并且回收大于100 bp的扩增子。在40μl凯杰EB缓冲液中从安金科特珠粒上洗提扩增产品,并且使用针对2100生物分析器(安捷伦技术公司,圣克拉拉,加利福尼亚州) 的安捷伦DNA 1000试剂盒来分析文库的大小分布。According to the end repair module, the overhangs of approximately 2 ng of purified cfDNA fragments contained in 40 μl were converted to phosphorylated blunt ends by incubating 40 μl of cfDNA with 5 μl of 10× phosphorylation buffer, 2 μl of deoxynucleotide solution mixture (10 mM each dNTP), 1 μl of 1:5 DNA polymerase I dilution, 1 μl of T4 DNA polymerase, and 1 μl of T4 polynucleotide kinase in a 200 μl microcentrifuge tube in a circulating heater at 20°C for 30 minutes. The sample was cooled to 4°C and purified using the QIAQuick column provided in the QIAQuick PCR Purification Kit (Qiagen, Valencia, CA) as follows. 50 μl of the reaction was transferred to a 1.5 ml microcentrifuge tube and 250 μl of Qiagen Buffer PB was added. The gained 300 μ l is transferred to QIAquick column and centrifuged at 13,000RPM for 1 minute in a microcentrifuge. The column is washed with 750 μ l QIAgen buffer PE and centrifuged again. Residual ethanol is removed by additional centrifugation at 13,000RPM for 5 minutes. DNA is eluted by centrifugation in 39 μ l QIAgen buffer EB. 16 μ l dA tailing master mix (NEBNext ™ DNA sample preparation DNA reagent set 1) comprising Klenow fragment (3' to 5' exo minus) is used and according to the manufacturer's dA tailing module, 34 μ l blunt-end DNA is incubated at 37°C for 30 minutes to achieve dA tailing. The sample is cooled to 4°C and purified as follows using the column provided in the MinElute PCR purification kit (Qiagen, Valencia, California). 50 μ l reactants are transferred to 1.5 ml microcentrifuge tubes and 250 μ l QIAgen buffer PB are added. 300 μl was transferred to a MinElute column and centrifuged at 13,000 RPM for 1 minute in a microcentrifuge. The column was washed with 750 μl of Qiagen Buffer PE and re-centrifuged. Residual ethanol was removed by centrifugation at 13,000 RPM for another 5 minutes. DNA was eluted by centrifugation in 15 μl of Qiagen Buffer EB. According to the rapid ligation module, ten microliters of DNA eluate were incubated with 1 μl of 1:5 Illumina Genomic Adaptor Oligo Mix Diluent (Article No. 1000521), 15 μl of 2X Rapid Ligation Reaction Buffer, and 4 μl of Rapid T4 DNA Ligase at 25°C for 15 minutes. The sample was cooled to 4°C and purified using a MinElute column as follows. One hundred and fifty microliters of Qiagen Buffer PE were added to the 30 μl reaction and the entire volume was transferred to the MinElute column and centrifuged at 13,000 RPM for 1 minute in a microcentrifuge. The column was washed with 750 μl of Qiagen buffer PE and centrifuged again. Residual ethanol was removed by centrifugation at 13,000 RPM for 5 minutes. DNA was eluted by centrifugation in 28 μl of Qiagen buffer EB. Using Illumina genomic PCR primers (item numbers 100537 and 1000538) and the Phusion HF PCR master mix provided in the NEBNext ™ DNA sample preparation DNA reagent set 1, 23 microliters of the adapter-ligated DNA eluate were subjected to 18 PCR cycles (98°C, 30 seconds; 98°C, 10 seconds, 18 cycles; 65°C, 30 seconds; and 72°C, 30 seconds; a final extension of 5 minutes at 72°C and maintained at 4°C) according to the manufacturer's instructions. The amplified products were purified using the Ankincott AMPure XP PCR purification system (Ankincott Biotech, Biliver, Massachusetts) according to the manufacturer's instructions available at www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf. The Ankincott AMPure XP PCR purification system will remove unbound dNTPs, primers, primer dimers, salts and other contaminants and reclaim an amplicon greater than 100 bp. The amplified products were eluted from the Ankincott beads in 40 μl Kaijie EB buffers and the size distribution of the libraries was analyzed using the Agilent DNA 1000 test kit for 2100 bioanalyzers (Agilent Technologies, Santa Clara, California).
c.分析根据缩短(a)和全长(b)规约制备的测序文库c. Analysis of sequencing libraries prepared according to shortened (a) and full-length (b) protocols
由生物分析器产生的电泳图于图21A和21B中示出。图21A示出了使用 (a)中所述的全长规约由从血浆样品M24228纯化而来的cfDNA制备的文库 DNA的电泳图,而图21B示出了使用(b)中所述的全长规约由从血浆样品 M24228纯化而来的cfDNA制备的文库DNA的电泳图。在两幅图中,峰值1 和4都对应地代表15bp下位内标和1,500上位内标;峰值上方的数字表明文库片段的迁移次数;并且水平线表明积分的设定阈值。图21A中的电泳图显示具有187bp的片段的一个次峰值和具有263bp的片段的一个主峰值,而图21B 中的电泳图仅显示一个265bp处的峰值。对峰值面积进行积分,得到图21A 中187bp峰值的DNA计算浓度是0.40ng/μl,图21A中263bp峰值的DNA 浓度是7.34ng/μl,并且图21B中265bp峰值的DNA浓度是14.72ng/μl。已知连接到cfDNA的伊鲁米纳适配子是92bp,在将其从265bp中减去时,表明cfDNA的峰值大小是173bp。187bp处的次峰值可能代表端对端连接的两个引物的片段。当使用缩短规约时,从最终文库产品中消除线性双引物片段。缩短规约还会消除小于187bp的其他更小片段。在该实例中,经过纯化的适配子连接cfDNA的浓度是使用全长规约产生的适配子连接cfDNA的浓度的两倍。已指出,这些适配子连接cfDNA片段的浓度始终大于使用全长规约获得者(数据未示出)。The electropherograms generated by the bioanalyzer are shown in Figures 21A and 21B. Figure 21A shows an electropherogram of library DNA prepared from cfDNA purified from plasma sample M24228 using the full-length protocol described in (a), while Figure 21B shows an electropherogram of library DNA prepared from cfDNA purified from plasma sample M24228 using the full-length protocol described in (b). In both figures, peaks 1 and 4 represent the 15 bp lower and 1,500 bp upper internal standards, respectively; the numbers above the peaks indicate the migration times of the library fragments; and the horizontal line indicates the set threshold for integration. The electropherogram in Figure 21A shows a secondary peak with a 187 bp fragment and a primary peak with a 263 bp fragment, while the electropherogram in Figure 21B shows only a single peak at 265 bp. Integration of the peak areas yielded a calculated DNA concentration of 0.40 ng/μl for the 187 bp peak in Figure 21A, 7.34 ng/μl for the 263 bp peak in Figure 21A, and 14.72 ng/μl for the 265 bp peak in Figure 21B. Given that the Illumina adaptor ligated to cfDNA is 92 bp, subtracting this from 265 bp suggests a cfDNA peak size of 173 bp. The secondary peak at 187 bp likely represents a fragment of two primers ligated end-to-end. When using the shortened protocol, linear double-primed fragments are eliminated from the final library product. The shortened protocol also eliminates other smaller fragments smaller than 187 bp. In this example, the concentration of the purified adaptor-ligated cfDNA was twice that of the adaptor-ligated cfDNA generated using the full-length protocol. As noted, the concentration of these adaptor-ligated cfDNA fragments was consistently greater than that obtained using the full-length protocol (data not shown).
因此,使用缩短规约制备测序文库的一个优势在于,所获得的文库始终只包括在262-267bp范围内的一个主峰,而使用全长规约制备的文库的品质会变化,如除了代表cfDNA的峰值以外的峰值的数目和迁移率所体现。非cfDNA 产品将占据流动池上的空间并且降低成簇扩增和随后测序反应成像的品质,这是非整倍性状态的总体分配的基础。据显示,缩短规约不影响文库的测序。Therefore, one advantage of using the shortened protocol to prepare sequencing libraries is that the resulting libraries consistently contain only a single major peak in the 262-267 bp range, whereas the quality of libraries prepared using the full-length protocol can vary, as reflected by the number and mobility of peaks other than those representing cfDNA. Non-cfDNA products occupy space on the flow cell and reduce the quality of clustered amplification and subsequent imaging of sequencing reactions, which is fundamental to the overall assignment of aneuploidy status. It has been shown that shortening the protocol does not affect library sequencing.
使用缩短规约制备测序文库的另一优势在于,钝端化、dA加尾以及适配子连接该三个酶的步骤花费小于一小时即可完成,从而支持快速非整倍体诊断服务的验证和实施。Another advantage of using the shortened protocol to prepare sequencing libraries is that the three enzymatic steps of blunt-end ligation, dA tailing, and adapter ligation take less than an hour to complete, thus supporting the validation and implementation of rapid aneuploidy diagnostic services.
另一优势在于,钝端化、dA加尾以及适配子连接该三个酶的步骤在同一反应管中进行,因而避免了多次样品转移,样品转移可能会造成物料损失,并且更重要的是可能造成样品混合和样品污染。Another advantage is that the three enzyme steps of blunting, dA tailing and adaptor ligation are performed in the same reaction tube, thus avoiding multiple sample transfers that may cause material loss and, more importantly, sample mixing and sample contamination.
实例3Example 3
由未修复的cfDNA制备测序文库:溶液中的适配子连接Preparation of sequencing libraries from unrepaired cfDNA: adaptor ligation in solution
为了确定是否可以进一步将缩短规约缩短以便进一步加快样品分析,由未修复的cfDNA制成测序文库并且使用伊鲁米纳基因组分析仪II如先前所述进行测序。To determine whether the shortening protocol could be further shortened to further expedite sample analysis, sequencing libraries were made from unrepaired cfDNA and sequenced using the ILlumina Genome Analyzer II as previously described.
如本文所述由外周血样品制备cfDNA。不进行由针对伊鲁米纳平台的公开规约所要求的5’磷酸盐的钝端化和磷酸化,以便提供未修复的cfDNA样品。cfDNA was prepared from peripheral blood samples as described herein. Blunting and phosphorylation of the 5' phosphates required by the published protocol for the Illumina platform were not performed to provide unrepaired cfDNA samples.
可以确定,省略DNA修复或DNA修复和磷酸化不影响测序文库的品质或产率(数据未示出)。It could be determined that omitting DNA repair or DNA repair and phosphorylation did not affect the quality or yield of sequencing libraries (data not shown).
针对未编索引的未修复的DNA的溶液中2步法In-solution 2-step method for unindexed, unrepaired DNA
在第一个实验集中,通过在同一反应混合物中组合克列诺Exo-和T4-DNA 连接酶而对未修复的cfDNA同时进行dA加尾和适配子连接,如下:对三十微升浓度在20-150pg/μl之间的cfDNA进行dA加尾(5μl 10X2号NEB缓冲液、 2μl 10nM dNTP、1μl 10nM ATP以及1μl5000U/ml克列诺Exo-),并且使用 1μl 400,000U/ml T4-DNA连接酶,在50μl的反应体积中连接到伊鲁米纳Y适配子(1μl 3μM储备液的1:15稀释液)。未编索引的Y适配子得自伊鲁米纳。将组合的反应物在25℃下孵育30分钟。在75℃下对酶进行热灭活5分钟,并且将反应产品存储在10℃下。In the first set of experiments, dA tailing and adapter ligation were performed simultaneously on unrepaired cfDNA by combining Klenow Exo- and T4-DNA ligase in the same reaction mixture as follows: Thirty microliters of cfDNA at a concentration between 20-150 pg/μl was dA tailed (5 μl 10X No. 2 NEB buffer, 2 μl 10 nM dNTPs, 1 μl 10 nM ATP, and 1 μl 5000 U/ml Klenow Exo-) and ligated to Illumina Y adapters (1 μl 1:15 dilution of a 3 μM stock solution) in a 50 μl reaction volume using 1 μl 400,000 U/ml T4-DNA ligase. Unindexed Y adapters were obtained from Illumina. The combined reactions were incubated at 25°C for 30 minutes. The enzymes were heat-inactivated at 75°C for 5 minutes, and the reaction products were stored at 10°C.
适配子连接的产品使用SPRI珠粒(安金科特AMPure XP PCR纯化系统,贝克曼库尔特基因组学)进行纯化并且进行18次PCR循环。使用SPRI对经过PCR扩增的文库进行纯化,并且使用伊鲁米纳基因组分析仪IIx或HiSeq根据制造商说明书进行测序,以便获得36bp的单端读数。获得许多36bp读数,覆盖约10%的基因组。在完成样品测序后,伊鲁米纳“测序器控制软件/实时分析”将碱基判定文件以二进制格式转移到连接存储装置的网络上以便进行数据分析。利用设计用于在Linux服务器上运行的软件来分析序列数据,该软件使用伊鲁米纳“BCLConverter”将二进制格式碱基判定转化成人类可读取的文本文件,然后调用开源“Bowtie”程序以便将序列与参照人类基因组进行比对,该参照人类基因组源自于国家生物技术信息中心(National Center for Biotechnology Information)所提供的hg18基因组(NCBI36/hg18,可在万维网上以 http://genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260 105获得)。The product of adapter connection is purified using SPRI beads (Anjin Kete AMPure XP PCR purification system, Beckman Coulter Genomics) and 18 PCR cycles are performed. The library amplified by PCR is purified using SPRI and sequenced using Illumina Genome Analyzer IIx or HiSeq according to the manufacturer's instructions to obtain a single-end read of 36bp. Many 36bp reads are obtained, covering approximately 10% of the genome. After completing sample sequencing, Illumina "Sequencer Control Software/Real-time Analysis" transfers the base determination file to the network connected to the storage device in binary format for data analysis. Sequence data were analyzed using software designed to run on Linux servers that used Illumina "BCLConverter" to convert binary base calls into human-readable text files and then called the open source "Bowtie" program to align sequences to a reference human genome derived from the hg18 genome provided by the National Center for Biotechnology Information (NCBI36/hg18, available on the World Wide Web at http://genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260105).
该软件读取以上程序所产生的经过与来自Bowtie输出(bowtieout.txt文件) 的基因组独特地比对的序列数据。允许具有至多2个碱基错配的序列比对,并且仅在其与基因组独特地比对时包括在比对计数中。排除具有相同的开始和结束坐标的序列比对(副本)。将具有2个或少于2个错配的约500到2500万个 36bp标签独特地映射到人类基因组。对所有映射标签进行计数并且包括在测试和合格样品中的染色体剂量计算中。从碱基0延伸到碱基2×106、碱基10× 106到碱基13×106以及碱基23×106到染色体Y末端的区域确切地从分析中排除,因为源自于男性或女性胎儿的标签映射到Y染色体的这些区域。The software reads the sequence data that the process that above program produces and the genome from Bowtie output (bowtieout.txt file) are uniquely compared.Allow the sequence alignment with up to 2 base mispairings, and only be included in the comparison count when it and genome are uniquely compared.Exclude the sequence alignment (copy) with identical start and end coordinates.To have 2 or be less than about 500 to 25,000,000 36bp tags of 2 mispairings and be mapped to human genome uniquely.All mapping tags are counted and are included in the chromosome dosage calculation in test and qualified samples.Extend to base 2 × 10 6 , base 10 × 10 6 to base 13 × 10 6 and base 23 × 10 6 to the region of chromosome Y end and exclude from analysis exactly, because the tag that originates from male or female fetus is mapped to these regions of Y chromosome.
图22A示出了当根据缩短规约(ABB;◇)制备测序文库时以及当根据无修复2步法(INSOL;□)制备测序文库时映射到各人类染色体的序列标签的总数的百分比(%染色体N)的平均值(n=16)。这些数据显示,当与使用缩短法时映射到相应染色体的标签百分比相比时,使用无修复2步法制备测序文库产生更大百分比的映射到具有更低GC含量的染色体的标签和更小百分比的映射到具有更高GC含量的染色体的标签。图22b关于序列标签百分比随着染色体大小变化,并且示出了无修复方法减少序列偏移。获自根据缩短规约(ABB;Δ)以及溶液中无修复规约(2步;□)制备的测序文库的映射标签的回归系数对应地为R2=0.9332和R2=0.9806。Figure 22A shows the mean (n = 16) of the percentage of the total number of sequence tags mapped to each human chromosome when sequencing libraries were prepared according to the shortened protocol (ABB; ◇) and when sequencing libraries were prepared according to the no-repair 2-step method (INSOL; □). These data show that when compared to the percentage of tags mapped to the corresponding chromosomes when using the shortened method, the preparation of sequencing libraries using the no-repair 2-step method produces a greater percentage of tags mapped to chromosomes with lower GC content and a smaller percentage of tags mapped to chromosomes with higher GC content. Figure 22b shows that the percentage of sequence tags varies with chromosome size and that the no-repair method reduces sequence bias. The regression coefficients for mapped tags obtained from sequencing libraries prepared according to the shortened protocol (ABB; Δ) and the no-repair protocol in solution (2 steps; □) are R 2 =0.9332 and R 2 =0.9806, respectively.
表8.百分比GC含量/染色体Table 8. Percent GC content/chromosome
缩短法与无修复2步法的比较也被看作当使用无修复方法时映射到单独染色体的标签百分比与当使用缩短法时映射到单独染色体的标签百分比的比率随着各染色体的GC含量百分比而变化。相对于染色体大小的GC含量百分比是基于染色体序列和GC含量分区的公开信息来计算(康斯坦丁尼(Constantini)等人,基因组研究(Genome Res)16:536-541[2006])并且提供在表8中。结果提供在图22C中,该图示出了针对具有高GC含量的染色体的比率显著降低,而针对具有低GC含量的染色体的比率增加。这些数据清楚显示,无修复方法所具有的用于克服GC偏移的归一化效应。Shortening method and the comparison of no repair 2-step method are also seen as the ratio of the label percentage that is mapped to individual chromosome when using no repair method and the label percentage that is mapped to individual chromosome when using shortening method and change along with the GC content percentage of each chromosome.The GC content percentage relative to chromosome size is calculated (Constantini et al., Genome Research (Genome Res) 16:536-541[2006]) and is provided in Table 8 based on the public information of chromosome sequence and GC content subregion.Result is provided in Figure 22 C, and this figure shows that the ratio for the chromosome with high GC content significantly reduces, and the ratio for the chromosome with low GC content increases.These data clearly show that no normalization effect for overcoming GC skew that repair method has.
这些数据显示,无修复方法在一定程度上修正了GC偏移,已知该GC偏移与扩增DNA的测序相关。These data show that the repair-free approach corrects to some extent the GC bias that is known to be associated with sequencing of amplified DNA.
为了确定无修复方法是否影响胎儿对比所测序的母体cfDNA的比例,确定了映射到染色体x和Y的标签的数目百分比。图23A和23B示出了条形图,这些图提供映射到染色体X(图23A;%染色体X)和Y(图23B;%染色体 Y)的标签的百分比的均值和标准差,该百分比由对从10名孕妇的血浆中纯化而来的10个cfDNA样品进行测序而获得。图23A显示相对于使用缩短法所获得的数目,当使用无修复方法时映射到X染色体的标签的数目更大。图23B 显示当使用无修复方法时映射到Y染色体的标签百分比并非与使用缩短法时的不同。To determine whether the no-repair method affects the ratio of fetal to maternal cfDNA sequenced, the percentage of the number of tags mapped to chromosomes x and Y was determined. Figures 23A and 23B show bar graphs that provide the mean and standard deviation of the percentage of tags mapped to chromosomes X (Figure 23A; % chromosome X) and Y (Figure 23B; % chromosome Y), which were obtained by sequencing 10 cfDNA samples purified from the plasma of 10 pregnant women. Figure 23A shows that the number of tags mapped to chromosome X is greater when using the no-repair method relative to the number obtained using the shortening method. Figure 23B shows that the percentage of tags mapped to chromosome Y when using the no-repair method is not different from that when using the shortening method.
这些数据显示,无修复方法不会引入针对或对抗对胎儿对比母体DNA进行测序的任何偏移,即当使用无修复法时,所测序的胎儿序列的比例不变。These data show that the no-repair approach does not introduce any bias for or against sequencing fetal versus maternal DNA, ie, the proportion of fetal sequences sequenced is unchanged when the no-repair approach is used.
总而言之,这些数据显示,无修复方法不会不利地影响测序文库的品质,也不会影响由对文库进行测序所获得的信息。排除公开规约所需的DNA修复步骤将降低试剂成本并且加快测序文库的制备。Overall, these data show that the no-repair approach does not adversely affect the quality of the sequencing library, nor does it affect the information obtained from sequencing the library. Eliminating the DNA repair step required by the published protocol will reduce reagent costs and speed up the preparation of sequencing libraries.
针对编索引的未修复的DNA的溶液中2步法In-solution 2-step method for indexing unrepaired DNA
在第二个实验集中,对未修复的cfDNA进行dA加尾,继而进行克列诺 Exo-的热灭活和适配子连接。当使用未编索引的伊鲁米纳适配子(其携带具有 21个碱基的单股臂)进行连接时,排除克列诺Exo-的热灭活不影响测序文库的产率或品质。In a second set of experiments, unrepaired cfDNA was dA-tailed, followed by heat inactivation of Klenow Exo- and adapter ligation. Excluding heat inactivation of Klenow Exo- did not affect the yield or quality of the sequencing library when ligation was performed using unindexed Illumina adapters (which carry single-stranded arms with 21 bases).
为了确定无修复方法是否可应用于多重测序,使用包含具有6个碱基的索引序列的自制编索引的Y适配子以便通过包括或排除克列诺热灭活来产生文库。不同于未编索引的适配子,编索引的适配子包含具有43个碱基的单股臂,其包括索引序列和PCR引发位点。To determine whether the no-repair approach could be applied to multiplex sequencing, libraries were generated using homemade indexed Y adapters containing a 6-base index sequence, either with or without Klenow heat inactivation. Unlike non-indexed adapters, indexed adapters contained a single 43-base arm that included the index sequence and a PCR priming site.
以获自集成DNA技术(Integrated DNA Technologies)(科拉尔维尔,爱荷华州)的寡核苷酸为起始物,制造十二种不同的与伊鲁米纳TruSeq适配子一致的编索引的适配子。寡核苷酸序列获自公开的伊鲁米纳TruSeq编索引的适配子序列。将寡核苷酸溶解,获得300μM最终浓度的退火缓冲液(10mM Tris、 1mM EDTA、50mM NaCl,pH 7.5)。将包含任何指定编索引的适配子的两个悬臂的等摩尔寡核苷酸混合物,通常10μl(各300μM)混合,并且允许退火 (95℃,6分钟;继而从95℃减缓控制冷却到10℃)。将最终150μM适配子在10mM Tris、1mMEDTA(pH 8)中稀释到7.5μM并且存储在-20℃下直到使用。Twelve different indexed aptamers were made, starting with oligonucleotides obtained from Integrated DNA Technologies (Coralville, IA) that were identical to the ILlumina TruSeq aptamers. Oligonucleotide sequences were obtained from the published ILlumina TruSeq indexed aptamer sequences. Oligonucleotides were dissolved to a final concentration of 300 μM in annealing buffer (10 mM Tris, 1 mM EDTA, 50 mM NaCl, pH 7.5). An equimolar mixture of oligonucleotides, typically 10 μl (300 μM each), containing two cantilevers of any given indexed aptamer was mixed and allowed to anneal (95°C for 6 minutes, followed by slow controlled cooling from 95°C to 10°C). The final 150 μM aptamer was diluted to 7.5 μM in 10 mM Tris, 1 mM EDTA (pH 8) and stored at -20°C until use.
数据显示,当使用编索引的适配子时,如果活性克列诺Exo-与连接酶和编索引的适配子一起存在于同一反应中,那么通过2步法进行文库制备不可行。然而,如果首先在75℃下对克列诺Exo-进行热灭活5分钟,然后添加连接酶加编索引的适配子,那么2步法非常可行。可能当编索引的适配子和活性克列诺Exo-一起存在时,克列诺Exo-的股位移活性导致编索引的适配子的较长单股DNA臂被消化,从而消除PCR引物位点。在不进行或进行热灭活步骤的情况下,在克列诺Exo-反应显示2步法中在添加连接酶和编索引的适配子之前包括克列诺Exo-热灭活可制得具有预期特征曲线(其中主峰在290bp处)的文库(数据未示出)之后,使用相同cfDNA和酶来获得测序文库的电泳图。因此,由于无修复法适用于多重测序,因此对使用编索引的Y适配子的所有实验进行修正以包括克列诺Exo-的热灭活。The data showed that when indexed adapters were used, library preparation by the 2-step method was not feasible if active Klenow Exo- was present in the same reaction with the ligase and indexed adapters. However, if Klenow Exo- was first heat-inactivated at 75°C for 5 minutes and then the ligase plus indexed adapters were added, the 2-step method was very feasible. It is possible that when indexed adapters and active Klenow Exo- are present together, the strand displacement activity of Klenow Exo- causes the longer single-stranded DNA arms of the indexed adapters to be digested, thereby eliminating the PCR primer sites. Electropherograms of sequencing libraries were obtained using the same cfDNA and enzymes after Klenow Exo- reactions showed that the 2-step method, including Klenow Exo-heat inactivation before adding the ligase and indexed adapters, could produce a library with the expected characteristic curve (with the main peak at 290 bp) (data not shown). Therefore, since the no-repair method is suitable for multiplex sequencing, all experiments using indexed Y adapters were modified to include heat inactivation of Klenow Exo-.
实例4Example 4
由未修复的cfDNA制备测序文库:在固体表面(SS)上进行适配子连接用于未编索Preparation of sequencing libraries from unrepaired cfDNA: Adapter ligation on solid surface (SS) for unrepaired cfDNA 引的DNA的1步固体表面法One-step solid surface method for DNA priming
为了确定无修复文库工艺是否可进一步简化,对实例3中所述的无修复测序文库制备法进行配置以便在固体表面上进行。如实例3中所述对所制备的文库进行测序。To determine whether the repair-free library process could be further simplified, the repair-free sequencing library preparation method described in Example 3 was configured to be performed on a solid surface. The prepared libraries were sequenced as described in Example 3.
如实例1中所述,由外周血样品制备cfDNA。用抗生蛋白链菌素涂布聚丙烯管,洗涤,并且使经过生物素酰化的编索引的适配子的第一个集结合到经过抗生蛋白链菌素涂布的管上,如下。通过在4℃下将SA孵育过夜,将8孔 PCR管排(美国科技(USA Scientific),奥卡拉,弗罗里达)的管上涂布含0.5 纳摩尔抗生蛋白链菌素(赛默科技(ThermoScientific),罗克福德,伊利诺伊州)的50μl PBS。用1XTE将管洗涤四次,每次200μl。将7.5皮摩尔、3.75 皮摩尔、1.8皮摩尔以及0.9皮摩尔各自处于50μl TE中的经过生物素酰化的索引1适配子一式两份添加到经过SA涂布的管中,并且在室温下孵育25分钟。去除未结合的适配子并且用200μl TE将管洗涤四次。如实例3中所述,使用购自IDT的经过生物素酰化的通用适配子寡核苷酸来制造经过生物素标记的索引1适配子。cfDNA was prepared from peripheral blood samples as described in Example 1. Polypropylene tubes were coated with streptavidin, washed, and the first set of biotinylated indexed adapters was bound to the streptavidin-coated tubes as follows. An 8-well PCR tube array (USA Scientific, Ocala, FL) was coated with 50 μl of PBS containing 0.5 nmol streptavidin (ThermoScientific, Rockford, IL) by incubating the tubes with SA overnight at 4°C. The tubes were washed four times with 200 μl of 1XTE. 7.5 pmoles, 3.75 pmoles, 1.8 pmoles, and 0.9 pmoles of biotinylated index 1 adapters, each in 50 μl of TE, were added in duplicate to the SA-coated tubes and incubated at room temperature for 25 minutes. Unbound aptamers were removed and the tubes were washed four times with 200 μl TE. As described in Example 3, biotinylated index 1 aptamers were made using biotinylated universal aptamer oligonucleotides purchased from IDT.
使用来自未怀孕受试者的cfDNA的1步SS法1-step SS method using cfDNA from non-pregnant subjects
在第二排PCR管中,在含20纳摩尔dNTP和10纳摩尔ATP的2号NEB 缓冲液中,在50μl反应体积中将对照样品(NTC:无模板对照)或30μl约 120pg/μl,即约32飞摩尔,经过纯化的获自未怀孕女性的cfDNA与5单位克列诺Exo-一起在37℃下孵育15分钟。随后,通过在75℃下将反应混合物孵育5分钟将克列诺酶灭活。将克列诺-DNA混合物转移到包含SA结合的经过生物素酰化的适配子的相应管中,并且通过在25℃下在10μl 1XT4-DNA连接酶缓冲液中将混合物与400单位T4-DNA连接酶一起孵育15分钟,使cfDNA 连接到经过固定的适配子。随后,通过在25℃下在10μl缓冲液中将7.5皮摩尔未经生物素酰化的索引1适配子与200单位T4-DNA连接酶一起孵育15分钟而使其连接到与固相结合的cfDNA。去除反应混合物,并且用200μl TE缓冲液将管洗涤5次。通过PCR使用包含P5和P7引物(IDT;各1μM)的50 μl PhusionPCR混合物[纽英伦生物实验室]对适配子连接的cfDNA进行扩增并且如下进行循环:[30秒,98℃;(10秒,98℃;10秒,50℃;10秒,60℃; 10秒,72℃)X 18次循环;5分钟,72℃;10℃孵育]。对所得文库产品进行 SPRI清洁[贝克曼库尔特基因组学],并且根据使用高灵敏度生物分析器芯片[安捷伦技术,圣克拉拉,加利福尼亚州]进行分析所获得的特征曲线评定文库的品质。这些特征曲线显示,未修复的cfDNA的固相测序文库制备提供高产率和高品质测序文库(数据未示出)。In a second row of PCR tubes, a control sample (NTC: no template control) or 30 μl of approximately 120 pg/μl (approximately 32 femtomoles) purified cfDNA from a non-pregnant woman was incubated with 5 units of Klenow Exo- in a 50 μl reaction volume in NEB buffer No. 2 containing 20 nanomoles dNTPs and 10 nanomoles ATP at 37°C for 15 minutes. Subsequently, Klenow enzyme was inactivated by incubating the reaction mixture at 75°C for 5 minutes. The Klenow-DNA mixture was transferred to a corresponding tube containing SA-bound biotinylated adaptors, and the cfDNA was ligated to the immobilized adaptors by incubating the mixture with 400 units of T4 DNA ligase in 10 μl of 1X T4 DNA ligase buffer at 25°C for 15 minutes. Subsequently, 7.5 pmol of non-biotinylated index 1 adapter was ligated to the solid phase-bound cfDNA by incubating it with 200 units of T4-DNA ligase in 10 μl of buffer at 25°C for 15 minutes. The reaction mixture was removed and the tube was washed five times with 200 μl of TE buffer. The adapter-ligated cfDNA was amplified by PCR using 50 μl of Phusion PCR mix [New England Biolabs] containing P5 and P7 primers (IDT; 1 μM each) and cycled as follows: [30 seconds, 98°C; (10 seconds, 98°C; 10 seconds, 50°C; 10 seconds, 60°C; 10 seconds, 72°C) × 18 cycles; 5 minutes, 72°C; 10°C incubation]. The resulting library product was SPRI cleaned [Beckman Coulter Genomics] and the quality of the library was assessed based on the profile curve obtained by analysis using a high-sensitivity bioanalyzer chip [Agilent Technologies, Santa Clara, CA]. These profiles show that solid-phase sequencing library preparation of unrepaired cfDNA provides high yield and high-quality sequencing libraries (data not shown).
使用来自怀孕受试者的cfDNA的1步SS法1-step SS method using cfDNA from pregnant subjects
使用获自孕妇的cfDNA样品来测试固体表面(SS)法。The solid surface (SS) method was tested using cfDNA samples obtained from pregnant women.
如实例1中所述,由获自孕妇的8个外周血样品制备cfDNA,并且如上文所述由经过纯化的cfDNA制备测序文库。对文库进行测序,并且分析序列信息。As described in Example 1, cfDNA was prepared from 8 peripheral blood samples obtained from pregnant women, and sequencing libraries were prepared from the purified cfDNA as described above. The libraries were sequenced and the sequence information was analyzed.
图24显示5个样品各自的参照序列基因组(hg18)上未排除位点(NE 位点)的数目和映射到这些未排除位点的标签的总数的比率,cfDNA是由这些样品制备并且用于根据实例2中所述的缩短规约(ABB)(填充条)、实例18 中所述的溶液中无修复规约(2步;空心条)以及本实例中所述的固体表面无修复规约(1步;灰色条)来构造测序文库。Figure 24 shows the ratio of the number of non-excluded sites (NE sites) on the reference sequence genome (hg18) and the total number of tags mapped to these non-excluded sites for each of the five samples, from which cfDNA was prepared and used to construct sequencing libraries according to the abbreviated protocol (ABB) described in Example 2 (filled bars), the in-solution no-repair protocol described in Example 18 (2 steps; open bars), and the solid surface no-repair protocol described in this example (1 step; gray bars).
图24中所示出的数据显示,根据三种规约制备的PCR扩增序列的表达相当,表明固体表面法不会使文库中所表达的序列变化形式偏移。The data presented in Figure 24 show that expression of PCR amplified sequences prepared according to the three protocols was comparable, indicating that the solid surface method does not bias the sequence variants represented in the library.
图25A显示当对根据无修复固体表面法制备的文库进行测序时获得的独特地映射到各染色体的序列标签数目与当使用上述溶液中无修复2步法时获得的数目相当。数据显示,两种无修复方法都减少测序数据的GC偏移。Figure 25A shows that the number of sequence tags uniquely mapped to each chromosome obtained when sequencing the library prepared according to the no-repair solid surface method is comparable to the number obtained when using the no-repair 2-step method in solution. The data show that both no-repair methods reduce the GC bias of the sequencing data.
图25B显示映射的标签数目与标签所映射的染色体的大小之间的关系。获自根据缩短规约(ABB)、溶液中无修复规约(2步)以及固体表面无修复规约(1步)制备的测序文库的映射标签的回归系数对应地为R2=0.9332、R2=0.9802以及R2=0.9807。Figure 25B shows the relationship between the number of mapped tags and the size of the chromosome to which the tags are mapped. The regression coefficients for mapped tags obtained from sequencing libraries prepared according to the abbreviated protocol (ABB), the in-solution no-repair protocol (2 steps), and the solid surface no-repair protocol (1 step) are R2 = 0.9332, R2 = 0.9802, and R2 = 0.9807, respectively.
图25C显示获自根据无修复2步规约制备的测序文库的百分比映射的序列标签/染色体与获自根据缩短规约(ABB)制备的测序文库的标签/染色体的比率为各染色体的百分比GC含量的函数(◇),并且获自根据无修复1步规约制备的测序文库的百分比映射的序列标签/染色体与获自根据缩短规约(ABB) 制备的测序文库的标签/染色体的比率为各染色体的百分比GC含量的函数(□)。总而言之,图25B和25C中的数据显示,1步和2步法两者显示类似的GC均一化效应,因为两者都省略文库工艺的DNA修复步骤。Figure 25C shows that the ratio of the percentage mapped sequence tags/chromosomes obtained from sequencing libraries prepared according to the no-repair 2-step protocol to the tags/chromosomes obtained from sequencing libraries prepared according to the shortened protocol (ABB) is a function of the percentage GC content of each chromosome (◇), and the ratio of the percentage mapped sequence tags/chromosomes obtained from sequencing libraries prepared according to the no-repair 1-step protocol to the tags/chromosomes obtained from sequencing libraries prepared according to the shortened protocol (ABB) is a function of the percentage GC content of each chromosome (□). In summary, the data in Figures 25B and 25C show that both the 1-step and 2-step methods show similar GC normalization effects because both omit the DNA repair step of the library process.
为了确定无修复方法是否影响胎儿对比所测序的母体cfDNA的比例,确定映射到染色体x和Y的标签的数目百分比。图26A和26B显示映射到染色体X(图26A)和Y(图26B)的标签百分比的均数和标准差的比较,这些数据获自对由ABB、2步以及1步法的5名孕妇的血浆纯化而来的5个cfDNA 样品进行测序。图26A显示相对于使用缩短法获得的数目(填充条),当使用无修复方法(2步和1步)时映射到X染色体的标签数目更大。图26B显示当使用无修复2步和1步法时映射到Y染色体的标签百分比与当使用缩短法时的不同。In order to determine whether the ratio of maternal cfDNA sequenced by the no-repair method affects the fetal contrast, the number percentages of tags mapped to chromosomes x and Y are determined. Figures 26A and 26B show the comparison of the mean and standard deviation of the percentages of tags mapped to chromosomes X (Figure 26A) and Y (Figure 26B), which are obtained from 5 cfDNA samples purified from the plasma of 5 pregnant women using ABB, 2-step, and 1-step methods. Figure 26A shows that the number of tags mapped to chromosome X is larger when no-repair method (2-step and 1-step) is used relative to the number obtained using the shortening method (filled bars). Figure 26B shows that the percentage of tags mapped to chromosome Y is different from that when using the shortening method when using the no-repair 2-step and 1-step methods.
这些数据显示,无修复固体表面1步法不会引入针对或对抗对胎儿对比母体DNA进行测序的任何偏移,即当使用无修复固体表面法时,所测序的胎儿序列的比例不变。These data show that the no-repair solid surface 1-step method does not introduce any bias for or against sequencing fetal versus maternal DNA, i.e., the proportion of fetal sequences sequenced is unchanged when using the no-repair solid surface method.
总而言之,数据显示在固体表面上产生测序文库对于对样品制剂进行测序而言是一个容易且可行的选择。In summary, the data show that generating sequencing libraries on solid surfaces is an easy and feasible option for sequencing sample preparations.
实例5Example 5
无修复固体表面1步文库制备法的高输送量相容性High throughput compatibility for one-step library preparation on solid surfaces without repair
为了确定通过NGS技术进行测序的无修复1步文库制备法是否可应用于高输送量样品处理,在经过SA结合的编索引的适配子涂布的96孔PCR板中由96个外周血样品制备96种cfDNA文库。如实例5中所述对所制备的文库进行测序。To determine whether a repair-free, one-step library preparation method for sequencing by NGS technology could be applied to high-throughput sample processing, 96 cfDNA libraries were prepared from 96 peripheral blood samples in a 96-well PCR plate coated with SA-bound indexed adaptors. The prepared libraries were sequenced as described in Example 5.
如实例4中所述进行用SA涂布第一个PCR板,以及连接经过生物素酰化的编索引的适配子。将96孔板的各列孔涂布包含独特索引的、经过生物素酰化的适配子。使用第二个96孔PCR板,在每个都存在10μl克列诺主混合液的情况下,在37℃下对30μl中的37个不同cfDNA进行dA加尾15分钟,继而在75℃下进行克列诺酶灭活5分钟。在多个孔中使用若干个cfDNA,总计94个孔含cfDNA;2个孔用作无模板对照。将经过dA加尾的cfDNA混合物转移到第一PCR板中并且在存在10μl快速连接酶主混合液1的情况下在 25℃下使用PCT-225四联梯度循环加热器(伯乐(BioRad),赫拉克勒斯,加利福尼亚州)连接到已结合的、经过生物素酰化的适配子。添加针对各编索引的适配子定制的10μl连接主混合液2并且在5℃下连接15分钟。去除未结合的DNA,并且用TE缓冲液将已结合的DNA-经过生物素酰化的适配子络合物洗涤五次。向各孔中添加50μl PCR主混合液,并且对适配子连接的DNA进行扩增并且如实例4中所述进行SPRI清洁。将文库稀释并且使用HiSens BA 芯片进行分析。SA coating of the first PCR plate and ligation of biotinylated indexed adapters were performed as described in Example 4. Each column of a 96-well plate was coated with a biotinylated adapter containing a unique index. Using a second 96-well PCR plate, 37 different cfDNAs in 30 μl were dA-tailed at 37°C for 15 minutes in the presence of 10 μl of Klenow master mix, followed by Klenow enzyme inactivation at 75°C for 5 minutes. Several cfDNAs were used in multiple wells, for a total of 94 wells containing cfDNA; 2 wells served as no-template controls. The dA-tailed cfDNA mixture was transferred to the first PCR plate and ligated to the bound biotinylated adapters in the presence of 10 μl of Rapid Ligase Master Mix 1 at 25°C using a PCT-225 Quadruple Gradient Circulator (BioRad, Hercules, CA). 10 μl of Ligation Master Mix 2, customized for each indexed adaptor, was added and ligated at 5°C for 15 minutes. Unbound DNA was removed, and the bound DNA-biotinylated adaptor complex was washed five times with TE buffer. 50 μl of PCR Master Mix was added to each well, and the adaptor-ligated DNA was amplified and SPRI cleaned as described in Example 4. The library was diluted and analyzed using a HiSens BA chip.
针对使用ABB法制备的61个临床样品(图27A)和使用无修复SS 1步法制备的35个研究样品(图27B),获得用于制备测序文库的经过纯化的cfDNA 的量与文库产品的所得量之间的相关性。这些数据显示,当与使用实例2中所述的缩短法制备的文库所获得的相关性(R2=0.1534;图27B)相比时,对于使用无修复SS 1步法制备的文库而言,相关性显著更大(R2=0.5826;图27A)。注意:该项比较中的cfDNA样品并不相同,因为临床样品对于研发不可用。然而,这些结果表明,无修复SS 1步法与ABB法相比一贯具有更大的cfDNA 输入与文库输出的相关性。随后,针对所有三种方法,使用连续稀释量的相同经纯化的cfDNA来比较3种方法,即ABB、无修复2步以及无修复SS 1步法的相关性。如图28中所示,当根据SS 1步法制备文库时获得最佳相关性(R2=0.9457;Δ),继而为2步法(R2=0.7666;□)和具有显著更低相关性的ABB 法(R2=0.0386;◇)。这些数据显示,与末端修饰[DNA修复和磷酸化]cfDNA的方法相比,无修复方法,无论是在溶液中还是在固体表面上,都提供一致并且可预测的产率,不论是包括还是不包括修复的DNA和dA加尾产品的纯化。Correlations between the amount of purified cfDNA used to prepare sequencing libraries and the resulting amount of library product were obtained for 61 clinical samples prepared using the ABB method ( FIG27A ) and 35 research samples prepared using the no-repair SS 1-step method ( FIG27B ). These data show that the correlation was significantly greater for libraries prepared using the no-repair SS 1-step method ( R2 = 0.5826; FIG27A ) when compared to the correlation obtained for libraries prepared using the shortened method described in Example 2 ( R2 = 0.1534; FIG27B ). Note: The cfDNA samples in this comparison were not identical, as clinical samples were not available for research and development. However, these results demonstrate that the no-repair SS 1-step method consistently exhibits greater correlation between cfDNA input and library output than the ABB method. Subsequently, the correlations of the three methods—ABB, no-repair 2-step, and no-repair SS 1-step—were compared using serially diluted amounts of the same purified cfDNA for all three methods. As shown in Figure 28, the best correlation was obtained when the library was prepared according to the SS 1-step method ( R2 = 0.9457; Δ), followed by the 2-step method ( R2 = 0.7666; □) and the ABB method with significantly lower correlation ( R2 = 0.0386; ◇). These data show that compared with methods that end-modify [DNA repair and phosphorylation] cfDNA, the no-repair method, whether in solution or on a solid surface, provides consistent and predictable yields, whether including or excluding purification of repaired DNA and dA-tailed products.
根据该实例中所述的固体表面法制备文库所花的时间比当根据缩短法制备测序文库时所花的时间少数倍。例如,在约4小时内使用ABB法可人工制备10到14个样品,而当使用SS 1步法时,在4和5小时内对应地可人工制备96或192个文库。还有,可容易地使SS 1步法自动化,以便使用NGS技术在多次96多重测序时制备文库。因此,SS法将适合于商业自动化高输送量样品分析。The time taken to prepare libraries using the solid surface method described in this example is several times shorter than the time taken to prepare sequencing libraries using the shortened method. For example, 10 to 14 samples can be manually prepared using the ABB method in approximately 4 hours, while 96 or 192 libraries can be manually prepared using the SS 1-step method in 4 and 5 hours, respectively. Furthermore, the SS 1-step method can be easily automated to prepare libraries for multiple 96-plex sequencing runs using NGS technology. Therefore, the SS method is suitable for commercial automated, high-throughput sample analysis.
对DNA文库的分析显示未修复的cfDNA的固相测序文库制备提供了高产率和高品质测序文库,这些测序文库可经过配置而用于自动化工艺以便进一步加快需要使用NGS技术进行大规模平行测序的样品分析。固体表面法适用于修复的DNA。Analysis of DNA libraries showed that solid-phase sequencing library preparation of unrepaired cfDNA provides high-yield and high-quality sequencing libraries that can be configured for automated processes to further accelerate the analysis of samples requiring massively parallel sequencing using NGS technology. Solid surface methods are suitable for repaired DNA.
实例6Example 6
对根据1步SS法制备的文库进行多重测序Multiplex sequencing of libraries prepared using the 1-step SS method
以多重方式,每个伊鲁米纳HySeq测序器流动池泳道用六种不同编索引的样品对通过SS 1步法在96孔板上制备的文库样品(实例20)进行测序。如实例2中所述对所制备的文库进行测序。图29中所示出的数据比较了索引效率,如通过2步(填充条)与SS 1步(空心条)之间的多重测序所评估。这些数据显示,在固体表面上制备文库不损害索引效率。图30A和30B显示当根据1步固体表面法制备测序文库时映射到各人类染色体的序列标签的总数百分比(%染色体N;图30A);并且图30B(R2=0.9807)显示序列标签百分比为染色体大小的函数。图30A和30B显示,SS 1步法的GC偏移与2步法相同,因为两种工艺都使用无DNA修复样品制备酶学。In a multiplexed manner, each Illumina HySeq sequencer flow cell lane was sequenced with six different indexed samples for library samples (Example 20) prepared on a 96-well plate by the SS 1-step method. The prepared library was sequenced as described in Example 2. The data shown in Figure 29 compare the indexing efficiency, as assessed by multiple sequencing between 2 steps (filled bars) and SS 1 step (open bars). These data show that preparing the library on a solid surface does not compromise indexing efficiency. Figures 30A and 30B show the total percentage of sequence tags mapped to each human chromosome when the sequencing library was prepared according to the 1-step solid surface method (% chromosome N; Figure 30A); and Figure 30B (R2=0.9807) shows the percentage of sequence tags as a function of chromosome size. Figures 30A and 30B show that the GC bias of the SS 1-step method is the same as that of the 2-step method because both processes use DNA repair-free sample preparation enzymatics.
图31显示映射到Y染色体的序列标签相对于映射到X染色体的标签的百分比,获自对使用SS 1步法以编索引的适配子制备并且以多重方式使用伊鲁米纳测序通过用可逆终止子技术合成来进行测序的42个文库进行测序。数据明显区分了获自怀有男性胎儿的孕妇与获自怀有女性胎儿的孕妇的样品。Figure 31 shows the percentage of sequence tags mapped to chromosome Y relative to those mapped to chromosome X, obtained from sequencing 42 libraries prepared using the SS 1-step method with indexed adapters and sequenced in multiplex mode using ILlumina sequencing by synthesis with reversible terminator technology. The data clearly distinguish between samples obtained from pregnant women carrying male fetuses and those obtained from pregnant women carrying female fetuses.
实例7Example 7
样品处理和DNA提取Sample processing and DNA extraction
从处于妊娠期的第一个三月期或第二个三月期并且被认为存在胎儿非整倍性风险的孕妇体内收集外周血样品。在抽血前从各参与者处获得同意书。在羊膜穿刺或绒膜绒毛采样前收集血液。使用绒膜绒毛或羊膜穿刺样品进行核型分析以确定胎儿核型。Peripheral blood samples were collected from pregnant women in the first or second trimester who were considered at risk for fetal aneuploidy. Written consent was obtained from each participant before blood draw. Blood was collected before amniocentesis or chorionic villus sampling. Karyotyping was performed on the chorionic villus or amniocentesis samples to determine the fetal karyotype.
将从各受试者抽取的外周血收集在ACD管中。将一管血样(约6到9毫升/管)转移到一个15毫升低速离心机管中。使用贝克曼Allegra 6R离心机和 GA3.8型转子在2640rpm、4℃下将血液离心10分钟。Peripheral blood was drawn from each subject and collected in an ACD tube. One blood sample (approximately 6 to 9 ml/tube) was transferred to a 15 ml low-speed centrifuge tube. The blood was centrifuged at 2640 rpm and 4°C for 10 minutes using a Beckman Allegra 6R centrifuge and a GA3.8 rotor.
对于无细胞血浆提取,将上部血浆层转移到15毫升高速离心管中,并且使用贝克曼库尔特Avanti J-E离心机和JA-14转子,在16000x g、4℃下离心 10分钟。在血液收集后,在72小时内进行两个离心步骤。将无细胞血浆存储在-80℃下,并且在DNA提取前只解冻一次。For cell-free plasma extraction, the upper plasma layer was transferred to a 15 ml high-speed centrifuge tube and centrifuged at 16,000 x g at 4°C for 10 minutes using a Beckman Coulter Avanti J-E centrifuge and a JA-14 rotor. Two centrifugation steps were performed within 72 hours of blood collection. Cell-free plasma was stored at -80°C and thawed only once before DNA extraction.
通过使用QIAamp DNA血液小型试剂盒(凯杰),根据制造商说明书从无细胞血浆中提取无细胞DNA。将五毫升缓冲液AL和500μl凯杰蛋白酶添加到4.5ml到5ml的无细胞血浆中。用磷酸盐缓冲生理盐水(PBS)将体积调节到10ml,并且在56℃下将混合物孵育12分钟。使用多个柱通过在贝克曼微量离心机中在8,000RPM下离心从溶液中分离沈淀的cfDNA。用AW1和AW2 缓冲液对柱进行洗涤,并且用55μl无核酸酶水洗提cfDNA。从血浆样品中提取约3.5到7ng cfDNA。Cell-free DNA was extracted from cell-free plasma using the QIAamp DNA Blood Mini Kit (Qiagen) according to the manufacturer's instructions. Five milliliters of buffer AL and 500 μl of Qiagen protease were added to 4.5 ml to 5 ml of cell-free plasma. The volume was adjusted to 10 ml with phosphate-buffered saline (PBS), and the mixture was incubated at 56°C for 12 minutes. The precipitated cfDNA was separated from the solution using multiple columns by centrifugation at 8,000 RPM in a Beckman microcentrifuge. The columns were washed with AW1 and AW2 buffers, and the cfDNA was eluted with 55 μl of nuclease-free water. Approximately 3.5 to 7 ng of cfDNA was extracted from the plasma sample.
所有测序文库都由从母体血浆中提取的约2ng经过纯化的cfDNA制备。使用试剂NEBNextTM DNA样品制备DNA试剂集1(物品编号E6000L;纽英伦生物实验室,伊普斯威奇,马萨诸塞州)如下进行文库制备。因为无细胞血浆DNA本质上成片段的,因此不再通过喷雾法或声处理使该血浆DNA样品成片段。将在40μl中包含的大约2ng纯化的cfDNA片段的突出端根据End Repair Module而转化成磷酸化的钝端,这是通过在1.5ml微量离心管中将cfDNA用在NEBNextTM DNA Sample Prep DNA Reagent Set 1中提供的5μl 10X的磷酸化作用的缓冲剂、2μl脱氧核苷酸溶液混合物(每份dNTP 有10mM)、1μl的1:5的DNA聚合酶I的稀释液、1μl T4DNA聚合酶以及 1μl T4多核苷酸激酶在20℃下孵育15分钟来进行的。然后通过将该反应混合物在75℃下孵育5分钟而将这些酶热灭活。将该混合物冷却至4℃,并且使用10μl的含有克列诺片段(3’至5’exo-)(NEBNextTM DNA Sample Prep DNA ReagentSet 1)的dA加尾主混合液完成钝端的DNA的dA加尾,并且在37℃下孵育15分钟。随后,通过将该反应混合物在75℃下孵育5分钟而将这些克列诺片段热灭活。在将克列诺片段灭活后,使用在NEBNextTM DNA Sample Prep DNA Reagent Set 1中提供的4μl的T4DNA连接酶,通过将该混合物在25℃下孵育15分钟,用1μl的Illumina Genomic Adaptor Oligo Mix的1:5的稀释液 (物品编号:1000521;Illumina Inc.,Hayward,CA)将这些Illumina适配子 (Non-Index Y-Adaptors)连接到带dA尾的DNA上。将该混合物冷却到4℃,并且使用AgencourtAMPure XP PCR纯化系统(物品编号:A63881;Beckman Coulter Genomics,Danvers,MA)中提供的磁珠将适配子连接的cfDNA从未连接的适配子、适配子二聚体、以及其他试剂中纯化出来。进行十八次PCR的循环以选择性地富集适配子连接的cfDNA,使用的是High-Fidelity Master Mix(Finnzymes,Woburn,MA)以及与适配子互补的Illumina的PCR引物(Part No.1000537and 1000537)。使用Illumina基因组PCR引物(物品编号100537以及1000538)以及在NEBNextTM DNA Sample Prep DNA Reagent Set 1中提供的Phusion HFPCR Master Mix(根据制造商的说明),使适配子连接的DNA经受PCR(98℃下30秒;98℃下18次循环持续10秒,65℃下30 秒,并且72℃下30秒;最终延伸在72℃下5分钟,并且保持在4℃下)。使用Agencourt AMPure XP PCR纯化系统(Agencourt Bioscience Corporation,Beverly,MA)根据制造商的说明(在 www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf处可得)对扩增的产物进行纯化。将纯化后的扩增产品在40μl的Qiagen EB缓冲液中洗脱,并且使用2100Bioanalyzer(Agilent technologiesInc.,Santa Clara,CA) 的Agilent DNA 1000Kit对扩增的文库分析浓度和尺寸分布。All sequencing libraries were prepared from approximately 2 ng of purified cfDNA extracted from maternal plasma. The library preparation was performed as follows using the reagent NEBNext ™ DNA Sample Prep DNA Reagent Set 1 (item number E6000L; New England Biolabs, Ipswich, Massachusetts). Because cell-free plasma DNA is essentially fragmented, the plasma DNA sample was no longer fragmented by nebulization or sonication. The overhangs of approximately 2 ng of purified cfDNA fragments contained in 40 μl were converted to phosphorylated blunt ends according to the End Repair Module by incubating cfDNA in a 1.5 ml microcentrifuge tube with 5 μl 10X phosphorylation buffer provided in NEBNext ™ DNA Sample Prep DNA Reagent Set 1, 2 μl deoxynucleotide solution mixture (each dNTP has 10 mM), 1 μl of a 1:5 dilution of DNA polymerase I, 1 μl T4 DNA polymerase, and 1 μl T4 polynucleotide kinase at 20°C for 15 minutes. The enzymes were then heat-inactivated by incubating the reaction mixture at 75° C. for 5 minutes. The mixture was cooled to 4° C., and dA-tailing of the blunt-ended DNA was completed using 10 μl of dA-tailing Master Mix containing Klenow Fragment (3' to 5' exo-) (NEBNext ™ DNA Sample Prep DNA Reagent Set 1) and incubated at 37° C. for 15 minutes. Subsequently, the Klenow Fragments were heat-inactivated by incubating the reaction mixture at 75° C. for 5 minutes. After inactivation of the Klenow fragments, the Illumina adapters (Non- Index Y-Adaptors) were ligated to the dA-tailed DNA using 1 μl of a 1:5 dilution of Illumina Genomic Adaptor Oligo Mix (Article No. 1000521; Illumina Inc., Hayward, CA) using 4 μl of T4 DNA ligase provided in NEBNext™ DNA Sample Prep DNA Reagent Set 1 by incubating the mixture at 25°C for 15 minutes. The mixture was cooled to 4°C, and the adapter-ligated cfDNA was purified from unligated adapters, adapter dimers, and other reagents using magnetic beads provided in the Agencourt AMPure XP PCR Purification System (Article No. A63881; Beckman Coulter Genomics, Danvers, MA). Eighteen cycles of PCR were performed to selectively enrich for adaptor-ligated cfDNA using High-Fidelity Master Mix (Finnzymes, Woburn, MA) and Illumina PCR primers complementary to the adaptors (Part Nos. 1000537 and 1000537). Adapter-ligated DNA was subjected to PCR (98° C. for 30 sec; 18 cycles of 98° C. for 10 sec, 65° C. for 30 sec, and 72° C. for 30 sec; final extension at 72° C. for 5 min and hold at 4° C.) using Illumina genomic PCR primers (Part Nos. 100537 and 1000538) and Phusion HF PCR Master Mix provided in NEBNext™ DNA Sample Prep DNA Reagent Set 1 according to the manufacturer's instructions. The amplified products were purified using the Agencourt AMPure XP PCR purification system (Agencourt Bioscience Corporation, Beverly, MA) according to the manufacturer's instructions (available at www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf). The purified amplified products were eluted in 40 μl of Qiagen EB buffer, and the amplified libraries were analyzed for concentration and size distribution using the Agilent DNA 1000 Kit of the 2100 Bioanalyzer (Agilent technologies Inc., Santa Clara, CA).
将扩增后的DNA使用Illumina的基因组分析仪II来测序,以获得36bp 的单端读数。为了识别出一个序列属于一个特定的人染色体,仅仅需要约30bp 的随机序列信息。更长的序列可以独特地识别出更具体的标靶。在当前的情况下,获得了众多36bp读数,覆盖了基因组的大约10%。一旦完成了样品的测序,Illumina“序列控制软件(Sequencer ControlSoftware)”将影像和碱基判定文件转移到一个运行Illumina“基因组分析仪流水线(Genome Analyzer Pipeline)”软件版本1.51的Unix服务器中。运行Illumina“Gerald”程序,以将序列与参照人类基因组比对,该参照人类基因组是源自国家生物技术信息中心(National Center for Biotechnology Information)提供的hg18基因组(NCBI36/hg18,在世界网站 http://genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260 105处可得)。与该基因组独特比对的、从以上程序产生的序列数据通过在一台运行Linnux操作系统的计算机上运行一个程序(c2c.pl)从Gerald输出结果 (export.txt文件)读出。允许具有碱基错配的序列比对并且只有在它们仅独特地与该基因组对齐时才包括在比对计数中。具有相同的起始和终止坐标的序列比对(复制体)排除在外。The amplified DNA was sequenced using the Illumina Genome Analyzer II, obtaining 36bp single-end reads. To identify a sequence as belonging to a specific human chromosome, only approximately 30bp of random sequence information is required. Longer sequences can uniquely identify more specific targets. In this case, numerous 36bp reads were obtained, covering approximately 10% of the genome. Once the sample was sequenced, Illumina Sequencer Control Software transferred the image and base call files to a Unix server running Illumina Genome Analyzer Pipeline software version 1.51. The Illumina "Gerald" program was run to align sequences to a reference human genome derived from the hg18 genome provided by the National Center for Biotechnology Information (NCBI36/hg18, available at http://genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hg18&hgsid=166260 105). Sequence data generated from the above program that were uniquely aligned to this genome were read from the Gerald output (export.txt file) by running a program (c2c.pl) on a computer running the Linnux operating system. Sequence alignments with base mismatches were allowed and included in the alignment count only if they were uniquely aligned to this genome. Sequence alignments with identical start and end coordinates (duplicates) were excluded.
将具有2或更少的错配的约500万至1500万之间的36bp标签独特地映射到人类基因组。将所有映射的标签进行计数并且包括在测试以及合格样品两者的染色体剂量的计算之内。从染色体Y的碱基0至碱基2x 106、碱基10x 106至碱基13x 106以及碱基23x 106至末尾的区域确切地排除在分析之外,因为从男性和女性胎儿得到的标签都映射到Y染色体的这些区域。The 36bp tags with 2 or less mispairings are mapped uniquely to the human genome. All mapped tags are counted and included in the calculation of the chromosome dosage of test and qualified samples. From the base 0 of chromosome Y to base 2x 10 , base 10x 10 to base 13x 10 and base 23x 10 to the zone at the end are excluded from the analysis exactly because the tags obtained from male and female fetuses are all mapped to these zones of chromosome Y.
应指出,序列标签的总数目上的某些变化映射到遍及在同一轮次中进行测序的样品的单个染色体(染色体间变异性),但注意到在不同轮次的测序(序列测序处理之间的变异性)中发生了实质性更大的变化。It should be noted that some variation in the total number of sequence tags mapped to individual chromosomes across samples sequenced in the same round (inter-chromosomal variability), but substantially greater variation was noted between different rounds of sequencing (inter-sequencing process variability).
实例8Example 8
针对染色体13、18、21、X、和Y的剂量及变化Dosages and changes for chromosomes 13, 18, 21, X, and Y
为了检查在对于所有染色体而言映射的序列标签的数目上染色体间变异性和序列测定间变异性的程度,提取了从48名志愿者怀孕的受试者的外周血获得的血浆cfDNA并且如实例7中所说明而进行测序,并且进行如下分析。To examine the extent of inter-chromosomal and inter-sequencing variability in the number of sequence tags mapped for all chromosomes, plasma cfDNA obtained from peripheral blood of 48 volunteer pregnant subjects was extracted and sequenced as described in Example 7 and analyzed as follows.
确定了映射到每个染色体的序列标签的总数目(序列标签密度)。可替代地,可以将映射的序列标签的数目归一化至该染色体的长度,以产生一个序列标签密度比。归一化至染色体的长度不是必需的步骤,但是可以单独进行来减小一个数目中的数字的位数从而将其简化以供人工解读。可以用来将这些序列标签计数归一化的染色体长度可以是在世界网站 genome.ucsc.edu/goldenPath/stats.html#hg18处提供的长度。Determine the total number (sequence tag density) that is mapped to each chromosome sequence tag.Alternatively, the number of the sequence tags mapped can be normalized to the length of this chromosome, to produce a sequence tag density ratio.Being normalized to the length of chromosome is not a necessary step, but can carry out separately and reduce the digit of the numeral in a number thereby it is simplified for artificial interpretation.The chromosome length that can be used for these sequence tag count normalizations can be the length that provides at world website genome.ucsc.edu/goldenPath/stats.html#hg18 place.
使对于每个染色体得到的序列标签密度与每一个剩余的染色体的序列标签密度进行关联,以得到一个合格的染色体剂量,该剂量被计算为对于感兴趣的染色体(例如染色体21)的序列标签密度与对于剩余的染色体(即染色体 1-20、22以及X)的序列标签密度的比率。表9提供了对于感兴趣的染色体 13、18、21、X、和Y计算出的合格的染色体剂量的一个实例,该剂量是在其中一个合格样品中测定的。对于所有样品中的所有染色体测定了染色体剂量,并且对于合格样品中的感兴趣的染色体13、18、21、X、和Y的平均剂量在表 10和表11中提供,并且在图32-36中进行了说明。图32至36还说明了测试样品的染色体剂量合格样品中每个感兴趣的染色体的染色体剂量提供了对于每个感兴趣的染色体而言(相对于每个剩余的染色体)映射的序列标签的总数目上变化的一种量度。因此,合格的染色体剂量可以识别以下染色体或一组染色体,即,在样品间的变异性与感兴趣的染色体的变异性最好地接近的归一化染色体,并且该归一化染色体将作为对进一步的统计评估的值进行归一化的理想序列。图37和38描绘了对于染色体13、18、以及21,以及染色体X和 Y在一个合格的样品群中测定的、计算出的平均染色体剂量。The sequence label density that obtains for each chromosome is associated with the sequence label density of each remaining chromosome, to obtain a qualified chromosome dosage, this dosage is calculated as the sequence label density for chromosome interested (for example chromosome 21) and the ratio for the sequence label density of remaining chromosome (i.e. chromosome 1-20,22 and X).Table 9 provides an example of the qualified chromosome dosage that calculates for chromosome interested 13,18,21,X and Y, and this dosage is measured in one of a qualified sample.Chromosome dosage has been measured for all chromosomes in all samples, and the average dose for chromosome interested 13,18,21,X and Y in the qualified samples is provided in table 10 and table 11, and is illustrated in Figure 32-36.Figure 32 to 36 also illustrates that the chromosome dosage of each chromosome interested in the chromosome dosage qualified sample of test sample provides a kind of measuring that changes on the total number of the sequence label of (relative to each remaining chromosome) mapping for each chromosome interested. Therefore, a qualified chromosome dose can identify the following chromosome or a group of chromosomes, that is, the normalizing chromosome whose variability between samples is best approximated to the variability of the chromosome of interest, and this normalizing chromosome will serve as the ideal sequence for normalizing the values of further statistical evaluations. Figures 37 and 38 depict the calculated average chromosome doses determined for chromosomes 13, 18, and 21, as well as chromosomes X and Y in a qualified sample population.
在一些情况下,这种最好的归一化染色体也许不具有最小的变异性,但是可能具有合格剂量的一种分布,这种分布最好地将一个或多个测试样品与这些合格样品相区分,即:最好的归一化染色体也许并不具有最低的变异性,但是可能具有最大的可分辨性。因此,可分辨性将染色体剂量的变化以及在合格样品中的剂量的分布考虑在内。In some cases, the best normalizing chromosome may not have the lowest variability, but may have a distribution of qualified doses that best distinguishes one or more test samples from the qualified samples, i.e., the best normalizing chromosome may not have the lowest variability, but may have the greatest distinguishability. Thus, distinguishability takes into account the variation in chromosome doses as well as the distribution of doses in the qualified samples.
表10和11提供了变异系数作为变异性量度,并且提供了t检验值作为染色体18、21、X和Y的可分辨性的量度,其中t检验值越小,可分辨性越大。染色体13的可分辨性作为合格样品中平均染色体剂量与仅在T13测试样品中的染色体13的剂量之差与合格剂量的平均值标准偏差的比率进行了测定。Tables 10 and 11 provide the coefficient of variation as a measure of variability and t-test values as a measure of the distinguishability of chromosomes 18, 21, X, and Y, where smaller t-test values indicate greater distinguishability. The distinguishability of chromosome 13 was determined as the ratio of the difference between the mean chromosome dose in qualified samples and the dose of chromosome 13 in the T13 test sample alone to the mean standard deviation of the qualified doses.
当如以下所说明在测试样品中识别非整倍性时,合格的染色体剂量还作为测定阈值的基础。Qualified chromosome doses also serve as the basis for determining thresholds when identifying aneuploidy in a test sample as described below.
表9.针对染色体13、18、21、X以及Y的合格染色体剂量(n=1;样品编号11342,Table 9. Qualified chromosome doses for chromosomes 13, 18, 21, X, and Y (n=1; sample number 11342, 46XY)46XY)
表10.针对染色体21、18和13的合格的染色体剂量、变化和可分辨性Table 10. Qualified chromosome doses, variations, and resolution for chromosomes 21, 18, and 13
表11.针对染色体13、X和Y的合格的染色体剂量、变化和可分辨性Table 11. Qualified chromosome doses, variations, and resolvability for chromosomes 13, X, and Y
使用对于感兴趣的染色体而言的归一化染色体、染色体剂量以及可分辨性获得的T21、T13、T18以及一个特纳综合征病例的诊断实例说明于实例9 中。Diagnostic examples for T21, T13, T18, and one case of Turner syndrome obtained using normalized chromosomes, chromosome doses, and resolution for the chromosome of interest are described in Example 9.
实例9Example 9
使用归一化染色体诊断胎儿非整倍性Diagnosis of fetal aneuploidy using normalized chromosomes
为了使染色体剂量的用途适用于评估生物测试样品中的非整倍性,从怀孕的志愿者获得了母体血液测试样品并且制备了cfDNA,并且如实例1和2 所说明进行测序和分析。To adapt the use of chromosome dosage to assess aneuploidy in biological test samples, maternal blood test samples were obtained from pregnant volunteers and cfDNA was prepared and sequenced and analyzed as described in Examples 1 and 2.
三体性21Trisomy 21
表12提供了在一个示例性的测试样品(#11403)中对于染色体21计算出的剂量。对于T21的阳性诊断计算出的阈值被设定在距这些合格(正常)样品的平均值>2的标准偏差处。T21的诊断是基于测试样品中的染色体剂量比设定的阈值大而给出的。使用了染色体14和15以单独的计算结果作为归一化染色体,以表明具有最低的变异性(例如染色体14)或具有最大的可分辨性 (例如染色体15)的染色体都可以用来识别非整倍性。使用计算出的染色体剂量识别出了十三个T21样品,并且通过核型证实这些非整倍性样品是T21。Table 12 provides the dosage calculated for chromosome 21 in an exemplary test specimen (#11403).The threshold value calculated for the positive diagnosis of T21 is set at the standard deviation place of the mean value>2 apart from these qualified (normal) samples.The diagnosis of T21 is based on the chromosome dosage in the test specimen and is given larger than the threshold value of setting.Chromosome 14 and 15 have been used as the normalization chromosome with independent result of calculation, to show that the chromosome with minimum variability (such as chromosome 14) or with maximum resolvability (such as chromosome 15) can be used to identify aneuploidy.The chromosome dosage used to calculate has identified 13 T21 samples, and confirmed that these aneuploidy samples are T21 by karyotype.
表12.针对T21非整倍性的染色体剂量(样品#11403,47XY+21)Table 12. Chromosome Dosage for T21 Aneuploidy (Sample #11403, 47XY+21)
三体性18Trisomy 18
表13提供了在一个测试样品(#11390)中对于染色体18计算出的剂量。对于T18的阳性诊断计算出的阈值设定为离开合格的(正常的)样品的平均值> 2的标准偏差。T18的诊断是基于测试样品中的染色体剂量比设定的阈值大而给出的。使用染色体8作为归一化染色体。在这一实例中,染色体8具有最低的变异性和最大的可分辨性。使用染色体剂量识别出了十八个T18样品,并且通过核型证实为是T18。Table 13 provides the dosage calculated for chromosome 18 in a test specimen (#11390).The threshold value calculated for the positive diagnosis of T18 is set as the standard deviation of the meansigma methods>2 that leave qualified (normal) sample.The diagnosis of T18 is based on the chromosome dosage in the test specimen and is given larger than the threshold value of setting.Chromosome 8 is used as the normalization chromosome.In this example, chromosome 8 has the lowest variability and maximum distinguishability.Chromosome dosage has been identified 18 T18 samples, and is confirmed to be T18 by karyotype.
这些数据表明,一个归一化染色体可以具有最低的变异性和最大的可分辨性。These data suggest that a normalized chromosome can have the lowest variability and greatest distinguishability.
表13.针对T18非整倍性的染色体剂量(样品#11390,47XY+18)Table 13. Chromosome Doses for T18 Aneuploidy (Sample #11390, 47XY+18)
三体性13Trisomy 13
表14提供了在一个测试样品(#51236)中对于染色体13计算出的剂量。对于T13的阳性诊断计算出的阈值设定为离开合格的样品的平均值>2的标准偏差。T13的诊断是基于测试样品中的染色体剂量比设定的阈值大而给出的。使用染色体5或3、4、5和6的染色体组作为归一化染色体对于染色体13计算了染色体剂量。识别出了一个T13样品。Table 14 provides the dosage calculated for chromosome 13 in a test sample (#51236).The threshold value calculated for the positive diagnosis of T13 is set as the standard deviation of the mean value>2 of the qualified sample.The diagnosis of T13 is based on the chromosome dosage in the test sample being given larger than the threshold value of setting.Chromosome dosage has been calculated for chromosome 13 using chromosome 5 or 3,4,5 and 6 chromosome groups as normalizing chromosomes.A T13 sample has been identified.
表14.针对T13非整倍性的染色体剂量(样品#51236,47XY+13)Table 14. Chromosome Dosage for T13 Aneuploidy (Sample #51236, 47XY+13)
染色体3至6的序列标签密度是染色体3至6的平均标签计数。The sequence tag density of chromosomes 3 to 6 is the average tag count of chromosomes 3 to 6.
该数据表明,染色体3、4、5和6的组合提供了低于染色体5的一个变异性,以及大于其他染色体中任何一个的最大的可分辨性。The data indicate that the combination of chromosomes 3, 4, 5, and 6 provides a variability lower than that of chromosome 5, and a maximum resolvability greater than that of any of the other chromosomes.
因此,可以使用一组染色体作为归一化染色体来确定染色体剂量并且识别非整倍性。Therefore, one set of chromosomes can be used as normalizing chromosomes to determine chromosome dosage and identify aneuploidy.
特纳综合征(单体性X)Turner syndrome (monosomy X)
表15提供了在一个测试样品(#51238)中对于染色体X和Y计算出的剂量。对于特纳综合征(单体性X)的阳性诊断计算出的阈值被设定为针对X染色体是在距离合格的(正常的)样品的平均值<-2个标准偏差处,并且针于不存在Y染色体是在距离合格的(正常的)样品平均值<-2个标准离均差处。Table 15 provides the calculated doses for chromosomes X and Y in one test sample (#51238). The calculated thresholds for a positive diagnosis of Turner syndrome (monosomy X) were set at <-2 standard deviations from the mean of qualified (normal) samples for the X chromosome and <-2 standard deviations from the mean of qualified (normal) samples for the absence of the Y chromosome.
表15.针对特纳(XO)非整倍性(样品#51238,45X)的染色体剂量Table 15. Chromosome Doses for Turner (XO) Aneuploidy (Sample #51238, 45X)
具有的X染色体剂量小于设定阈值的样品被识别为具有少于一个X染色体。同一个样品被确定为具有小于设定阈值的一个Y染色体剂量,这表明该样品不具有Y染色体。因此,使用X和Y的剂量的组合来识别特纳综合征(单体性X)样品。A sample having an X chromosome dose less than a set threshold is identified as having less than one chromosome X. The same sample is determined to have one Y chromosome dose less than a set threshold, indicating that the sample does not have a Y chromosome. Thus, a combination of X and Y doses is used to identify Turner syndrome (monosomy X) samples.
因此,所提供的方法使得能够确定染色体的CNV。具体而言,该方法通过对母体血浆cfDNA进行大规模平行测序以及对归一化染色体进行识别用于对测序数据进行统计分析使得能够确定过度代表和代表不足的染色体非整倍性。该方法的灵敏度和可靠性允许精确测定第一和第二个三月期的非整倍性。Therefore, the provided method enables determination of chromosomal CNV. Specifically, the method enables determination of over-represented and under-represented chromosomal aneuploidies by performing massively parallel sequencing of maternal plasma cfDNA and identifying normalized chromosomes for statistical analysis of sequencing data. The sensitivity and reliability of the method allow accurate determination of aneuploidies in the first and second trimesters.
实例10Example 10
部分非整倍性的确定Determination of partial aneuploidy
序列剂量的用途被应用于评估由从血浆制备的cfDNA生物学测试样品的部分非整倍性,并且如实例7中所说明进行测序。通过核型分析证实该样品是从具有染色体11部分缺失的一位受试者得到的。The use of sequence dosing was applied to assess partial aneuploidy in a cfDNA biological test sample prepared from plasma and sequenced as described in Example 7. The sample was confirmed by karyotyping to be from a subject with a partial deletion of chromosome 11.
对于部分非整倍性(染色体11,即q21-q23的部分缺失)的测序数据的分析如对于之前的实例中的染色体性非整倍性所说明而进行。在一个测试样品中序列标签到染色体11的映射显示了相对于针对合格样品中的染色体11的相应序列获得的标签计数而言在染色体的长臂中碱基对81000082-103000103之间的标签计数的一个显著损失(数据未示出)。使用了在每个合格样品中映射到染色体11的感兴趣的序列的序列标签(810000082-103000103bp)、以及在合格样品的整个基因组中映射到所有20兆碱基片段的序列标签(即合格的序列标签密度)来确定合格的序列剂量作为在所有合格样品中的标签密度的比率。对于整个基因组中的所有20个兆碱基片段计算了平均序列剂量、标准偏差、以及变异系数,并且具有最小变异性的20-兆碱基序列被识别为在染色体5上的归一化序列(13000014-33000033bp)(参见表16),该归一化序列被用来计算针对测试样品中感兴趣的序列的剂量(参见表17)。表16提供了在测试样品中染色体11上的感兴趣的序列(810000082-103000103bp)的序列剂量,该序列剂量被计算为映射到感兴趣的序列的序列标签与映射到识别出的归一化序列的序列标签的比率。图40示出了7个合格样品(O)中对于感兴趣的序列的序列剂量以及测试样品(◇)中对于相应的序列的序列剂量。由实线示出平均值,并且由虚线示出对于部分非整倍性的阳性诊断计算出的阈值,它被设定在距平均值5个标准偏差处。部分非整倍性的诊断是基于测试样品中的序列剂量比设定的阈值小而给出的。通过核型分析证实该测试样品在染色体11上具有缺失q21-q23。The analysis of the sequencing data for part aneuploidy (chromosome 11, i.e. the partial deletion of q21-q23) is as described for the chromosome aneuploidy in the example before and is carried out.In a test sample, sequence tag is shown to the mapping of chromosome 11 a notable loss (data not shown) of the tag count between base pair 81000082-103000103 in the long arm of chromosome for the tag count obtained relative to the corresponding sequence of the chromosome 11 in qualified samples.Used the sequence tag (810000082-103000103bp) of the sequence interested that is mapped to chromosome 11 in each qualified sample and the sequence tag (i.e. qualified sequence tag density) that is mapped to all 20 megabase fragments in the whole genome of qualified samples to determine the ratio of qualified sequence dosage as the tag density in all qualified samples. Calculate average sequence dosage, standard deviation and coefficient of variation for all 20 megabase fragments in whole genome, and the 20-megabase sequence with minimum variability is identified as the normalization sequence (13000014-33000033bp) on chromosome 5 (referring to Table 16), this normalization sequence is used to calculate the dosage (referring to Table 17) for sequence interested in test specimen.Table 16 provides the sequence dosage of sequence interested (810000082-103000103bp) on chromosome 11 in test specimen, and this sequence dosage is calculated as the sequence tag mapped to sequence interested and the ratio of the sequence tag mapped to the normalization sequence identified. Figure 40 shows the sequence dosage for sequence interested and the test specimen (◇) in 7 qualified samples (○) for corresponding sequence. Mean value is shown by solid line, and the threshold value calculated for the positive diagnosis of part aneuploidy is shown by dotted line, and it is set at apart from mean value 5 standard deviations. The diagnosis of partial aneuploidy was made based on the sequence dosage in the test sample being less than a set threshold. Karyotyping confirmed that the test sample had a deletion q21-q23 on chromosome 11.
因此,除了识别染色体性非整倍性之外,本发明的方法还可以被用来识别部分非整倍性。Therefore, in addition to identifying chromosomal aneuploidies, the methods of the present invention can also be used to identify partial aneuploidies.
表16.针对序列Chr11:81000082-103000103的合格的归一化序列、剂量以及变化Table 16. Qualified normalizing sequences, dosages, and variations for sequence Chr11:81000082-103000103 (合格样品n=7)(Qualified samples n=7)
表17.针对在染色体11上感兴趣的序列(81000082-103000103)的序列剂量(测试Table 17. Sequence doses for the sequence of interest on chromosome 11 (81000082-103000103) (Test 样品11206)Sample 11206)
实例11Example 11
非整倍性检测的展示Demonstration of aneuploidy detection
对于在实例2和3中说明并且在图32至36中示出的样品所获得的序列数据进行进一步的分析,以展示该方法在成功识别母体样品中的非整倍性方面的灵敏度。针对染色体21、18、13、X和Y的归一化的染色体剂量作为相对于标准离均差的一个分布(Y轴)进行分析,并且在图41A-41E中示出。所使用的归一化染色体作为分母示出(X轴)。The sequence data obtained for the samples illustrated in Examples 2 and 3 and shown in Figures 32 to 36 were further analyzed to demonstrate the sensitivity of the method in successfully identifying the aneuploidy in maternal samples. The normalized chromosome dosage for chromosomes 21, 18, 13, X, and Y was analyzed as a distribution (Y-axis) relative to the standard deviation from the mean, and illustrated in Figures 41 A-41E. The normalized chromosome used is illustrated as the denominator (X-axis).
图41(A)示出了当对于染色体21使用染色体14作为归一化染色体时,对于未受影响的样品(o)和三体性21样品(T21;Δ)中的染色体21剂量而言染色体剂量相对于标准离均差的一个分布。图41(B)示出了当对于染色体 18使用染色体8作为归一化染色体时,对于未受影响的样品(o)和三体性18 样品(T18;Δ)中的染色体18剂量而言染色体剂量相对于标准离均差的一个分布。图41(C)示出了对于未受影响的样品(o)和三体性18样品(T13;Δ)中的染色体13剂量而言染色体剂量相对于标准离均差的一个分布,使用的是3、 4、5和6的一个染色体组的平均序列标签密度作为归一化染色体以确定染色体 13的染色体剂量。图41(D)示出了当对于染色体X使用染色体4作为归一化染色体时,对于未受影响的女性样品(o)、未受影响的男性样品(Δ)、以及单体性X样品(XO;+)中的染色体X剂量而言染色体剂量相对于标准离均差的一个分布。图41(E)示出了当使用1至22以及X的一个染色体组的平均序列标签密度作为归一化染色体以确定染色体Y的染色体剂量时,对于未受影响的男性样品(o)、未受影响的女性样品(Δ)、以及单体性X样品(+)中的染色体Y剂量相对于标准离均差的一个分布。Figure 41(A) shows a distribution of chromosome doses relative to standard deviations for chromosome 21 doses in an unaffected sample (o) and a trisomy 21 sample (T21; Δ) when chromosome 14 is used as a normalizing chromosome for chromosome 21. Figure 41(B) shows a distribution of chromosome doses relative to standard deviations for chromosome 18 doses in an unaffected sample (o) and a trisomy 18 sample (T18; Δ) when chromosome 8 is used as a normalizing chromosome for chromosome 18. Figure 41(C) shows a distribution of chromosome doses relative to standard deviations for chromosome 13 doses in an unaffected sample (o) and a trisomy 18 sample (T13; Δ) using the average sequence tag density of a set of chromosomes 3, 4, 5, and 6 as normalizing chromosomes to determine the chromosome dose for chromosome 13. Figure 41 (D) shows a distribution of chromosome doses relative to standard deviations from the mean for chromosome X doses in unaffected female samples (o), unaffected male samples (Δ), and monosomy X samples (XO;+) when chromosome 4 is used as the normalizing chromosome for chromosome X. Figure 41 (E) shows a distribution of chromosome Y doses relative to standard deviations from the mean for unaffected male samples (o), unaffected female samples (Δ), and monosomy X samples (+) when the average sequence tag density of a chromosome group of 1 to 22 and X is used as the normalizing chromosome to determine the chromosome dose of chromosome Y.
该数据表明,三体性21、三体性18、三体性13与未受影响的(正常的) 样品是可清楚区分开的。当具有的染色体X剂量明显低于未受影响的女性样品 (图41(D))的剂量时,并且具有的染色体Y剂量明显低于未受影响的男性样品(图41(E))的剂量时,单体性X样品可容易识别出。The data show that trisomy 21, trisomy 18, and trisomy 13 are clearly distinguishable from unaffected (normal) samples. Monosomy X samples are easily identified when they have a significantly lower chromosome X dose than that of unaffected female samples ( FIG. 41(D) ) and when they have a significantly lower chromosome Y dose than that of unaffected male samples ( FIG. 41(E) ).
因此,所提供的方法是灵敏的并且对于确定一个母体血液样品中存在或不存在染色体性非整倍性是特异性的。Thus, the provided methods are sensitive and specific for determining the presence or absence of chromosomal aneuploidy in a maternal blood sample.
实例12Example 12
对来自母体血液的无细胞胎儿DNA使用大规模平行DNA测序来确定胎儿染色体非Massively parallel DNA sequencing of cell-free fetal DNA from maternal blood to determine fetal chromosomal abnormalities 整倍性:独立于训练组1的测试组1Euploidy: Test set 1 independent of training set 1
本研究由合格的定点临床研究人员在13个美国临床地区在2009年4月和2010年10月之间根据由每个机构的伦理审查委员会(IRB)获准的一项人类受试者科学试验计划来进行。在参与研究之前从每位受试者获得了书面同意书。该科学试验计划被设计为提供血液样品以及临床数据来支持无创性产前遗传学诊断方法的发展。18岁或年龄更大的妊娠妇女有资格参与。对于经历临床指征的绒膜绒毛取样(CVS)或羊膜刺穿的患者在进行该程序之前收集血液,并且同样收集胎儿核型的结果。从所有受试者抽取外周血样品(两管或总共约20mL)置于酸性枸橼酸盐葡萄糖(ACD)管中(Becton Dickinson)。将所有样品都去除身份并且指定一个匿名的患者ID号。将血液样品在对于研究所提供的温控型运送容器中过夜运送至实验室。抽血和样品接受之间所花费的时间记录为样品即位的一部分。This study was conducted by qualified, designated clinical investigators at 13 U.S. clinical sites between April 2009 and October 2010 under a human subjects scientific research plan approved by each institution's institutional review board (IRB). Written consent was obtained from each subject prior to study participation. The scientific research plan was designed to provide blood samples and clinical data to support the development of non-invasive prenatal genetic diagnostic methods. Pregnant women aged 18 years or older were eligible to participate. Blood was collected before the procedure for patients undergoing clinically indicated chorionic villus sampling (CVS) or amniocentesis, and fetal karyotype results were also collected. Peripheral blood samples (two tubes or approximately 20 mL total) were drawn from all subjects and placed in acid citrate dextrose (ACD) tubes (Becton Dickinson). All samples were deidentified and assigned an anonymous patient ID number. Blood samples were shipped overnight to the laboratory in temperature-controlled shipping containers provided by the institute. The time spent between blood draw and sample receipt was recorded as part of sample placement.
定点研究协调人员使用匿名的患者ID号将与患者当前的妊娠情况以及历史相关的临床数据录入研究病例报告表(CRF)中。在每个实验室对来自无创性产前程序的样品进行胎儿核型的细胞遗传学分析并且将结果同样记录在研究CRF中。在CRF上获得的所有数据都录入实验室的临床数据库中。在24 至48小时的静脉穿刺取样后利用两步离心法从单独的血液管获得无细胞的血浆。来自单一血液管的血浆足够进行测序分析。通过使用QIAampDNABlood Mini kit(Qiagen)根据制造商的说明将无细胞DNA从无细胞血浆中提取出来。由于已知这些无细胞的DNA片段在长度上约为170个碱基对(bp)(Fan et al., Clin Chem56:1279-1286[2010]),在测序之前不要求将DNA碎裂。Site study coordinators entered clinical data related to the patient's current pregnancy status and history into the study case report form (CRF) using an anonymous patient ID number. Cytogenetic analysis of fetal karyotype was performed on samples from noninvasive prenatal procedures at each laboratory, and the results were also recorded in the study CRF. All data obtained on the CRF were entered into the laboratory's clinical database. Cell-free plasma was obtained from a single blood tube using a two-step centrifugation method 24 to 48 hours after venipuncture sampling. Plasma from a single blood tube was sufficient for sequencing analysis. Cell-free DNA was extracted from the cell-free plasma using the QIAamp DNA Blood Mini kit (Qiagen) according to the manufacturer's instructions. Because these cell-free DNA fragments are known to be approximately 170 base pairs (bp) in length (Fan et al., Clin Chem 56:1279-1286 [2010]), DNA fragmentation was not required before sequencing.
对于这个训练组的样品,将cfDNA送至Prognosys Biosciences,Inc.(La Jolla,CA)用于测序文库制备(钝端化并且连接到普通适配子上的cfDNA)并且使用标准制造商科学试验计划用Illumina Genome Analyzer IIx仪器 (http://www.illumina.com/)进行测序。获得了36个碱基对的单端读数。在完成测序之后,收集所有碱基判定文件并且进行分析。对于测试组样品,制备测序文库并且在Illumina Genome Analyzer IIx仪器上进行测序。测序文库的制备如下进行。所说明的全长科学试验计划主要是Illumina提供的标准规约,并且仅在扩增的文库的纯化上与Illumina科学试验计划不同。Illumina科学试验计划指示:扩增的文库使用凝胶电泳法进行纯化,而在此说明的科学试验计划使用磁珠进行相同的纯化步骤。使用从母体血浆提取的约2ng纯化的cfDNA来制备一个初级测序文库,这主要使用的NEBNextTM DNA Sample Prep DNA Reagent Set 1(Part No.E6000L;New England Biolabs,Ipswich,MA)根据制造商的说明来进行。除了使用Agencourt磁珠和试剂来代替纯化柱对适配子连接的产物进行最终纯化之外,所有步骤都根据科学试验计划伴随用于基因组 DNA文库的样品制备的NEBNextTM试剂(已使用GAII测序)来进行。NEBNextTMNEBNextTM主要根据Illumina所提供的来进行,这在 grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pdf处可得。For the samples of this training group, cfDNA was sent to Prognosys Biosciences, Inc. (La Jolla, CA) for sequencing library preparation (cfDNA blunt-ended and ligated to common adapters) and sequenced using an Illumina Genome Analyzer IIx instrument (http://www.illumina.com/) using the standard manufacturer's scientific protocol. Single-end reads of 36 base pairs were obtained. After sequencing was completed, all base call files were collected and analyzed. For the test group samples, sequencing libraries were prepared and sequenced on an Illumina Genome Analyzer IIx instrument. The preparation of the sequencing libraries was performed as follows. The full-length scientific protocol described is primarily the standard protocol provided by Illumina and differs from the Illumina scientific protocol only in the purification of the amplified library. The Illumina scientific protocol indicates that the amplified library is purified using gel electrophoresis, while the scientific protocol described herein uses magnetic beads for the same purification steps. About 2 ng of purified cfDNA extracted from maternal plasma was used to prepare a primary sequencing library, which was mainly performed using the NEBNext ™ DNA Sample Prep DNA Reagent Set 1 (Part No. E6000L; New England Biolabs, Ipswich, MA) according to the manufacturer's instructions. Except for the use of Agencourt magnetic beads and reagents instead of purification columns for the final purification of the adaptor-ligated products, all steps were performed according to the scientific experimental plan with the NEBNext ™ reagents for sample preparation of genomic DNA libraries (sequencing was performed using GAII). NEBNext ™ NEBNext ™ was mainly performed according to that provided by Illumina, which is available at grcf.jhml.edu/hts/protocols/11257047_ChIP_Sample_Prep.pdf.
将在40μl中包含的大约2ng纯化的cfDNA片段的突出端通过在1.5ml 微量离心管中将cfDNA用在NEBNextTM DNA Sample Prep DNA Reagent Set 1 中提供的5μl 10X的磷酸化作用的缓冲剂、2μl脱氧核苷酸溶液混合物(每份 dNTP有10mM)、1μl的1:5的DNA聚合酶I的稀释液、1μl T4DNA聚合酶以及1μl T4多核苷酸激酶在20℃下孵育15分钟,根据End Repair Module而转化成磷酸化的钝端。将该样品冷却到4℃,并且使用一个在 QIAQuick PCR Purification Kit(QIAGEN Inc.,Valencia,CA)中提供的QIA快速柱进行纯化。将50μl反应液转移到1.5ml离心管中,并且加入250μl的Qiagen Buffer PB。将得到的300μl到一个QIA快速柱中,将其在一个微量离心机中在13,000RPM下离心1分钟。将该柱用750μl的Qiagen Buffer PE洗涤,并且再离心。残余的乙醇通过在13,000RPM下再离心5分钟来去除。将DNA在 39μl的Qiagen Buffer EB中通过离心来洗脱。使用16μl的含有克列诺片段(3’至5’exo-)(NEBNextTM DNA Sample Prep DNA Reagent Set 1)的dA加尾主混合液完成34μl钝端的DNA的dA加尾,并且根据制造商的dA- 加尾模块(dA-Tailing Module)在37℃下孵育30分钟。将该样品冷却到4℃,并且使用一个在MinElute PCR Purification Kit(QIAGEN Inc., Valencia,CA)中提供的柱进行纯化。将50μl反应液转移到1.5ml微量离心管中,并且加入250μl的Qiagen缓冲液PB(QiagenBuffer PB)。将300μl转移到一个MinElute柱中,将其在一个微量离心机中在13,000RPM下离心1分钟。将该柱用750μl的Qiagen缓冲液(PE Qiagen Buffer PE)洗涤,并且再离心。残余的乙醇通过在13,000RPM下再离心5分钟来去除。将DNA在15μl的 Qiagen Buffer EB中通过离心来洗脱。根据快速连接模块 (Quick Ligation Module),将十微升的DNA洗脱液用1μl的1:5 的Illumina Genomic Adapter Oligo Mix(物品编号1000521)稀释液、15μl 的2X Quick Ligation Reaction Buffer、以及4μl快速T4DNA连接酶在25℃下孵育15分钟。将样品冷却到4℃,并且使用一个如下的MinElute柱。将一百五十微升Qiagen Buffer PE加入30μl反应液中,并且将全部体积转移到一个 MinElute柱中,将其在一个微量离心机中在13,000RPM下离心1分钟。将该柱用750μl的Qiagen Buffer PE洗涤,并且再离心。残余的乙醇通过在13,000 RPM下再离心5分钟来去除。将DNA在28μl的Qiagen Buffer EB中通过离心来洗脱。使用Illumina基因组PCR引物(物品编号100537以及1000538) 以及在NEBNextTM DNA Sample Prep DNA Reagent Set 1中提供的Phusion HFPCR Master Mix(根据制造商的说明),使二十三微升的适配子连接的DNA 洗脱液经受18次PCR循环(98℃下30秒;98℃下18次循环持续10秒,65℃下30秒,并且72℃下30秒;最终伸展在72℃下5分钟,并且保持在4℃下)。使用Agencourt AMPure XP PCR纯化系统(AgencourtBioscience Corporation, Beverly,MA)根据制造商的说明(在www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf处可得)将扩增的产物进行纯化。Agencourt AMPure XP PCR纯化系统去除了未接合的dNTP、引物、引物二聚体、盐类以及其他污染物,并且回收了大于100bp 的扩增子。将纯化后的扩增的产品在40μl的Qiagen EB缓冲液从Agencourt 珠粒中洗脱,并且使用2100Bioanalyzer(Agilenttechnologies Inc.,Santa Clara, CA)的Agilent DNA 1000Kit对文库分析尺寸分布。对于训练以及测试样品集,对36个碱基对的单边读数进行测序。The overhangs of approximately 2 ng of purified cfDNA fragments contained in 40 μl were converted to phosphorylated blunt ends by incubating the cfDNA in a 1.5 ml microcentrifuge tube with 5 μl of 10X phosphorylation buffer provided in NEBNext ™ DNA Sample Prep DNA Reagent Set 1, 2 μl of a deoxynucleotide solution mixture (10 mM each dNTP), 1 μl of a 1:5 dilution of DNA polymerase I, 1 μl of T4 DNA polymerase, and 1 μl of T4 polynucleotide kinase for 15 minutes at 20°C, according to the End Repair Module. The sample was cooled to 4°C and purified using a QIAQuick column provided in the QIAQuick PCR Purification Kit (QIAGEN Inc., Valencia, CA). The 50 μl reaction solution was transferred to a 1.5 ml microcentrifuge tube, and 250 μl of Qiagen Buffer PB was added. The 300 μ l obtained is placed in a QIA quick column and centrifuged at 13,000RPM for 1 minute in a microcentrifuge. The column is washed with 750 μ l of Qiagen Buffer PE and centrifuged again. Residual ethanol is removed by centrifuging again for 5 minutes at 13,000RPM. DNA is eluted by centrifugation in 39 μ l of Qiagen Buffer EB. The dA tailing master mix containing Klenow fragment (3 ' to 5 ' exo-) (NEBNext ™ DNA Sample Prep DNA Reagent Set 1) of 16 μ l is used to complete the dA tailing of the DNA of 34 μ l blunt ends, and according to the dA- tailing module (dA-Tailing Module) of manufacturers, it was hatched at 37 ℃ for 30 minutes. The sample is cooled to 4 ℃, and the column that provides in MinElute PCR Purification Kit (QIAGEN Inc., Valencia, CA) is used to purify. 50 μ l reaction solution is transferred to 1.5 ml microcentrifuge tube, and the Qiagen buffer PB (QiagenBuffer PB) of 250 μ l is added. 300 μ l is transferred to a MinElute column and centrifuged at 13,000 RPM for 1 minute in a microcentrifuge. The column is washed with 750 μ l of Qiagen buffer (PE Qiagen Buffer PE) and centrifuged again. Residual ethanol is removed by centrifugation at 13,000 RPM for 5 minutes. DNA is eluted by centrifugation in 15 μ l of Qiagen Buffer EB. According to the Quick Ligation Module, ten microlitres of DNA eluate are incubated at 25 ℃ for 15 minutes with 1 μ l of 1:5 Illumina Genomic Adapter Oligo Mix (article number 1000521) diluent, 15 μ l of 2X Quick Ligation Reaction Buffer and 4 μ l of fast T4 DNA ligase. The sample was cooled to 4°C and subjected to a MinElute column as follows. One hundred and fifty microliters of Qiagen Buffer PE was added to the 30 μl reaction solution, and the entire volume was transferred to a MinElute column. The column was centrifuged in a microcentrifuge at 13,000 RPM for 1 minute. The column was washed with 750 μl of Qiagen Buffer PE and centrifuged again. Residual ethanol was removed by centrifugation at 13,000 RPM for another 5 minutes. DNA was eluted by centrifugation in 28 μl of Qiagen Buffer EB. Twenty-three microliters of the adapter-ligated DNA eluate was subjected to 18 cycles of PCR (98° C. for 30 seconds; 18 cycles of 98° C. for 10 seconds, 65° C. for 30 seconds, and 72° C. for 30 seconds; final extension at 72° C. for 5 minutes and hold at 4° C.) using Illumina genomic PCR primers (Article Nos. 100537 and 1000538) and Phusion HFPCR Master Mix provided in NEBNext™ DNA Sample Prep DNA Reagent Set 1 (according to the manufacturer's instructions). The amplified product was purified using the Agencourt AMPure XP PCR Purification System (Agencourt Bioscience Corporation, Beverly, MA) according to the manufacturer's instructions (available at www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf). The Agencourt AMPure XP PCR Purification System removed unligated dNTPs, primers, primer dimers, salts, and other contaminants, and recovered amplicons larger than 100 bp. The purified amplified product was eluted from the Agencourt beads in 40 μl of Qiagen EB buffer, and the library size distribution was analyzed using the Agilent DNA 1000 Kit on the 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA). For both the training and test sample sets, single-sided reads of 36 base pairs were sequenced.
数据分析和样品分类Data analysis and sample classification
将长度为36个碱基的序列读数与从UCSC数据库获得的人类基因组组件 hg18进行比对(http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/)。使用在比对过程中允许最多两个碱基错配的Bowtie短序列段比对器(版本0.12.5) (Langmead et al.,Genome Biol 10:R25[2009])来进行比对。只有清楚映射到一个单一基因组位置上的读数才被包括在内。对读数所映射的基因组位点进行了计数并且包括在染色体剂量的计算中(参见以下内容)。来自男性和女性胎儿的序列标签无任何区分地映射之处的Y染色体上的区域被排除在分析以外(确切地说,从碱基0至碱基2x 106、碱基10x 106至碱基13x 106;以及碱基23x 106至Y染色体的末尾。)Sequence reads of 36 bases in length were compared with the human genome assembly hg18 obtained from the UCSC database (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/). Comparison was performed using the Bowtie short sequence segment aligner (version 0.12.5) (Langmead et al., Genome Biol 10: R25 [2009]), which allows up to two base mismatches during the comparison process. Only reads that were clearly mapped to a single genomic position were included. The genomic sites mapped by the reads were counted and included in the calculation of chromosome dosage (see below). Regions on the Y chromosome where sequence tags from male and female fetuses were mapped without any distinction were excluded from the analysis (specifically, from base 0 to base 2x 10 6 , base 10x 10 6 to base 13x 10 6 ; and base 23x 10 6 to the end of the Y chromosome.)
在序列读数的染色体分布中同轮次和轮次间的测序变化可以使胎儿非整倍性对所映射的序列位点的分布不明显。为了校正这种变化,计算出了一个染色体剂量,因为对于给出的感兴趣的染色体的映射位点的计数被归一化为对于预设定的归一化染色体序列所观察到的计数。如之前说明的,一个归一化的染色体序列可以由一个单染色体或由一组染色体组成。在未受影响的(即合格的) 样品的训练组内的一个样品子集中,归一化的染色体序列首先被识别为具有感兴趣的染色体21、18、13和X的二倍体核型,考虑将每个常染色体在具有我们感兴趣的染色体的计数的比率中作为潜在的分母。分母染色体(即归一化的染色体序列)被选择为使得测序批次之间的染色体剂量的变化最小。每个感兴趣的染色体被确定为具有一个显著的归一化染色体序列(分母)(表10)。没有单个染色体可以被识别为针对染色体13的一个归一化染色体序列,因为没有一个染色体被确定为减少了样品中染色体13的剂量的变化,即,染色体13 的NCV值的延展没有被减小至足以允许对T13非整倍性进行正确识别。染色体2至6被随机选择并且作为一个组测试了它们模仿染色体13的行为的能力。染色体2至6的组被发现实质性地减小了训练组样品中针对染色体13的剂量上的变化,并且因此被选择作为染色体13的归一化染色体序列。如上所述,针对染色体Y的染色体剂量的变化是大于30,与其独立地,单染色体在确定染色体Y的剂量时被用作归一化染色体序列。染色体2至6的组被发现实质性地减小了训练组样品中针对染色体Y的剂量上的变化,并且因此被选择作为染色体Y的归一化染色体序列。The order-checking variation with round and round in the chromosome distribution of sequence reading can make the distribution of fetal aneuploidy to the sequence site mapped not obvious.In order to correct this variation, calculated a chromosome dosage, because the counting for the mapping site of the interested chromosome provided is normalized to the observed counting for the normalized chromosome sequence of pre-set.As explained before, a normalized chromosome sequence can be by a single chromosome or by one group of chromosome composition.In a sample subset in the training group of unaffected (i.e. qualified) sample, the normalized chromosome sequence is first identified as the diploid karyotype with interested chromosome 21,18,13 and X, considers each autosome in the ratio with the counting of our interested chromosome as potential denominator.The denominator chromosome (i.e. normalized chromosome sequence) is selected as the variation minimum of the chromosome dosage between the sequencing batch.Each interested chromosome is determined as having a significant normalized chromosome sequence (denominator) (table 10). Do not have single chromosome to be identified as a normalizing chromosome sequence for chromosome 13, because do not have a chromosome to be determined as the variation that has reduced the dosage of chromosome 13 in the sample, that is, the extension of the NCV value of chromosome 13 is not reduced to be enough to allow T13 aneuploidy to be correctly identified.Chromosome 2 to 6 is randomly selected and has tested the ability of the behavior of their imitation chromosome 13 as a group.Chromosome 2 to 6 group has been found to have substantially reduced the variation on the dosage for chromosome 13 in the training group sample, and therefore has been selected as the normalizing chromosome sequence for chromosome 13.As mentioned above, the variation on the chromosome dosage for chromosome Y is greater than 30, and independently thereof, single chromosome is used as the normalizing chromosome sequence when determining the dosage of chromosome Y.Chromosome 2 to 6 group has been found to have substantially reduced the variation on the dosage for chromosome Y in the training group sample, and therefore has been selected as the normalizing chromosome sequence for chromosome Y.
在合格样品中针对每个感兴趣的染色体的染色体剂量提供了对于每个感兴趣的染色体而言映射的序列标签的总数目相对于每个剩余染色体的映射的序列标签的总数目的变化的一个量度。因此,合格的染色体剂量可以识别该染色体或一组染色体,即在样品中具有最好地接近于感兴趣的染色体的变异性的一个变异性、并且将作为勇于进一步统计评估的归一化值的理想序列的归一化染色体序列。The chromosome dose for each chromosome of interest in a qualified sample provides a measure of the variation in the total number of sequence tags mapped for each chromosome of interest relative to the total number of sequence tags mapped for each remaining chromosome. Thus, a qualified chromosome dose can identify the chromosome or set of chromosomes, i.e., the normalized chromosome sequence that has a variability that best approximates the variability of the chromosome of interest in the sample and that will serve as the ideal sequence for the normalized value for further statistical evaluation.
在训练组(即合格并且受影响的)中所有样品的染色体剂量还作为在识别如以下说明的测试样品中的非整倍性时用于确定阈值的基础。The chromosome doses for all samples in the training set (ie, qualified and affected) also served as the basis for determining the thresholds when identifying aneuploidy in the test samples as described below.
表18.用于确定染色体剂量的归一化染色体序列Table 18. Normalizing chromosome sequences used to determine chromosome dosage
对于在测试组的每个样品中每个感兴趣的染色体,确定了一个归一化的值并且被用来确定存在或不存在非整倍性。该归一化值作为可以被进一步计算以提供一个归一化的染色体值(NCV)的染色体剂量而进行计算。For each chromosome of interest in each sample of the test group, a normalized value is determined and is used to determine the presence or absence of aneuploidy. The normalized value is calculated as a chromosome dose that can be further calculated to provide a normalized chromosome value (NCV).
染色体剂量Chromosome dosage
对于测试组,对于每个样品的每个感兴趣的染色体21、18、13、X和Y 计算了一个染色体剂量。如在以上表18中提供的,染色体21的染色体剂量作为映射到测试样品中的染色体21的测试样品中的标签数目与映射到测试样品中的染色体9的测试样品中的标签数目的比率来计算;染色体18的染色体剂量作为映射到测试样品中的染色体18的测试样品中的标签数目与映射到测试样品中的染色体8的测试样品中的标签数目的比率来计算;染色体13的染色体剂量作为映射到测试样品中的染色体13的测试样品中的标签数目与映射到测试样品中的染色体2至6的测试样品中的标签数目的比率来计算;染色体X 的染色体剂量作为映射到测试样品中的染色体X的测试样品中的标签数目与映射到测试样品中的染色体6的测试样品中的标签数目的比率来计算;染色体 Y的染色体剂量作为映射到测试样品中的染色体Y的测试样品中的标签数目与映射到测试样品中的染色体2至6的测试样品中的标签数目的比率来计算。For the test set, a chromosome dose was calculated for each chromosome of interest for each sample: 21, 18, 13, X, and Y. As provided in Table 18 above, the chromosome dose for chromosome 21 was calculated as the ratio of the number of tags in the test sample that mapped to chromosome 21 in the test sample to the number of tags in the test sample that mapped to chromosome 9 in the test sample; the chromosome dose for chromosome 18 was calculated as the ratio of the number of tags in the test sample that mapped to chromosome 18 in the test sample to the number of tags in the test sample that mapped to chromosome 8 in the test sample; the chromosome dose for chromosome 13 was calculated as the ratio of the number of tags in the test sample that mapped to chromosome 13 in the test sample to the number of tags in the test sample that mapped to chromosomes 2 to 6 in the test sample; the chromosome dose for chromosome X was calculated as the ratio of the number of tags in the test sample that mapped to chromosome X in the test sample to the number of tags in the test sample that mapped to chromosome 6 in the test sample; and the chromosome dose for chromosome Y was calculated as the ratio of the number of tags in the test sample that mapped to chromosome Y in the test sample to the number of tags in the test sample that mapped to chromosomes 2 to 6 in the test sample.
归一化的染色体值Normalized chromosome values
使用每个测试样品中针对每个感兴趣的染色体的染色体剂量以及在训练组的合格样品中确定的相应的染色体剂量,使用以下方程计算归一化的染色体值(NCV):Using the chromosome dose for each chromosome of interest in each test sample and the corresponding chromosome dose determined in qualified samples of the training set, the normalized chromosome value (NCV) was calculated using the following equation:
其中和对应地是对于第j个染色体剂量的估算训练组平均值以及标准偏差,并且是对于测试样品i所观察到的第j个染色体剂量。当将染色体剂量进行归一化分布时,NCV对于这些剂量而言相当于一个统计z分数。在来自未受影响的样品的NCV的分位数-分位数绘图中没有观察到与线性度的显著偏离。此外,对于NCV的归一程度的标准测试未能否决正态性的零假设。Wherein and be the estimation training group mean value and standard deviation for the jth chromosome dosage accordingly, and be the jth chromosome dosage observed for test sample i.When chromosome dosage is carried out normalized distribution, NCV is equivalent to a statistical z score for these dosages.In the quantile-quantile drawing of the NCV from unaffected sample, do not observe the remarkable departure from linearity.In addition, the standard test for the normalization degree of NCV fails to reject the null hypothesis of normality.
对于测试组,对于每个样品的每个感兴趣的染色体21、18、13、X和Y 计算了一个NCV。为了确保一个安全且有效的分类方案,对于非整倍性分类选择了保守的边界。为了对常染色体的非整倍性状态进行分类,要求NCV来将染色体归类为受影响的(即,对于该染色体为为非整倍性);;以及NCV<2.5 来将染色体归类为未受影响的。常染色体具有2.5和4.0之间的NCV的样品被归类为“无判定”。For the test group, an NCV was calculated for each chromosome of interest, 21, 18, 13, X, and Y, for each sample. To ensure a safe and effective classification scheme, conservative boundaries were selected for aneuploidy classification. In order to classify the aneuploidy state of the autosomes, NCV was required to classify the chromosome as affected (i.e., aneuploidy for that chromosome); and NCV<2.5 was required to classify the chromosome as unaffected. Samples with an NCV between 2.5 and 4.0 for the autosomes were classified as "no judgment".
在测试中,性染色体的分类是通过对于X和Y都按如下内容相继应用 NCV来进行的:In the test, sex chromosome sorting was performed by applying NCV sequentially for both X and Y as follows:
如果NCV Y>-2.0男性样品标准离均差,则该样品被归类为男性(XY)。If NCV Y > -2.0 male sample standard deviation from the mean, the sample was classified as male (XY).
如果NCV Y<-2.0男性样品标准离均差,并且NCV Y>-2.0女性样品标准离均差,则该样品被归类为女性(XX)。If NCV Y < -2.0 standard deviation from the mean for male samples, and NCV Y > -2.0 standard deviation from the mean for female samples, the sample was classified as female (XX).
如果NCV Y<-2.0男性样品标准离均差,并且NCV Y<-3.0女性样品标准离均差,则该样品被归类为单体性X,即特纳综合征。If NCV Y < -2.0 standard deviation from the mean for male samples and NCV Y < -3.0 standard deviation from the mean for female samples, the sample was classified as monosomy X, ie, Turner syndrome.
如果NCV不符合任何以上标准,则该样品杯归类为对于性别为“无判定”。If the NCV did not meet any of the above criteria, the sample cup was classified as "no call" for gender.
结果result
研究人口统计学Study Demographics
在2009年4月和2010年7月之间共登记了1,014名患者。患者的人口统计、侵入性程序类型以及核型结果总结在表19中研究人群的平均年龄为35.6 岁(范围在17至47岁)并且孕龄范围是6周1天至38周1天(平均为15周 4天)。异常胎儿染色体核型的总体发病率是6.8%,其中T21发病率为2.5%。在具有单胎妊娠以及核型的946名受试者中,906名(96%)呈现出对于产前过程的胎儿非整倍性而言至少一种临床公认的风险因素。即使除去那些仅具有高的怀孕年龄作为其唯一指征的受试者,数据仍展示了对于当前的筛查模态一个非常高的假阳性率。用超声进行的超声检查结果为:增加的颈半透明度、水囊状淋巴管瘤、或其他结构上的先天性异常,这些是这一年龄组中预见性最强的异常核型。A total of 1,014 patients were enrolled between April 2009 and July 2010. The patient demographics, invasive procedure type, and karyotype results are summarized in Table 19. The mean age of the study population was 35.6 years (range 17 to 47 years) and the gestational age range was 6 weeks 1 day to 38 weeks 1 day (average 15 weeks 4 days). The overall incidence of abnormal fetal karyotype was 6.8%, with a T21 incidence of 2.5%. Of the 946 subjects with singleton pregnancies and karyotypes, 906 (96%) presented with at least one clinically recognized risk factor for fetal aneuploidy during the antenatal process. Even excluding those subjects with only high gestational age as their only indication, the data still demonstrated a very high false positive rate for current screening modalities. Ultrasound examinations performed with ultrasound revealed increased nuchal translucency, cystic lymphangioma, or other structural congenital anomalies, which were the most predictive abnormal karyotypes in this age group.
表19.患者人口统计Table 19. Patient Demographics
*包括多胎妊娠的胎儿的结果,**由临床医师评估和报告*Includes fetal outcomes in multiple gestations, **Assessed and reported by clinicians
缩写:AMA=高孕龄,NT=颈半透明度Abbreviations: AMA = advanced gestational age, NT = nuchal translucency
在本研究人群中展示的多样的种族背景的分布也在表19中示出。总体上,在本研究中63%的患者是高加索人,17%是西班牙人,6%是亚洲人,5%是多民族,并且4%是非洲美国人。注意到了,种族的差异在不同的地点变化显著。例如,一个地点登记了60%的西班牙和26%的高加索受试者,而位于相同州的三个临床点没有登记西班牙受试者。如所预期的,在我们的不同种族的结果中没有观察到可辨别的不同之处。The distribution of diverse ethnic backgrounds exhibited in this study population is also shown in Table 19. Overall, 63% of the patients in this study were Caucasian, 17% were Hispanic, 6% were Asian, 5% were multiracial, and 4% were African American. It was noted that racial differences varied significantly between sites. For example, one site enrolled 60% Hispanic and 26% Caucasian subjects, while three clinical sites located in the same state enrolled no Hispanic subjects. As expected, no discernible differences were observed in our results by race.
训练数据集1Training Dataset 1
该训练组研究从2009年4月至2009年12月之间收集的、初期相继积累的435个样品中挑选了71个样品。在该第一系列的受试者中具有受影响的胎儿(异常核型)的所有受试者都包括在内用于测序,以及具有适当的样品和数据的一个随机挑选和随机数目的未受影响的受试者。训练组患者的临床特征与表19中示出的整体研究的人口统计一致。训练组内的样品的孕龄范围是从10 周0天至23周1天的范围。三十八人经历了CVS,32人经历了羊膜穿刺并且 1位患者不具有指定的侵入性程序的类型(未受影响的核型46,XY)。70%的患者是高加索人,8.5%是西班牙人,8.5%是亚洲人,并且8.5%是多民族的。为了训练的目的,从这个集内去除了六个已测序的样品。4个样品来自双胎妊娠的受试者(以下详细讨论),1个样品具有T18,在制备过程中被污染,并且1个样品具有胎儿核型69,XXX,剩下65个样品为该训练组。The training set study selected 71 samples from 435 samples collected between April 2009 and December 2009, which were initially accumulated. All subjects with affected fetuses (abnormal karyotype) in this first series of subjects were included for sequencing, as well as a random selection and a random number of unaffected subjects with appropriate samples and data. The clinical characteristics of the training set patients were consistent with the demographics of the overall study shown in Table 19. The gestational age range of the samples in the training set was from 10 weeks 0 days to 23 weeks 1 day. Thirty-eight people underwent CVS, 32 people underwent amniocentesis, and 1 patient did not have the type of invasive procedure specified (unaffected karyotype 46, XY). 70% of the patients were Caucasian, 8.5% were Hispanic, 8.5% were Asian, and 8.5% were multinational. For training purposes, six sequenced samples were removed from this set. Four samples were from subjects with twin pregnancies (discussed in detail below), one sample had T18, which was contaminated during preparation, and one sample had a fetal karyotype of 69,XXX, leaving 65 samples for the training set.
单一序列位点的数目(即,在基因组中用独特的位点识别的标签)从该训练组研究的早期阶段的2.2M到后期阶段的13.7M(由于随时间在测序技术上的改进)而变化。为了监测在独特的位点中染色体剂量超过这个6倍的范围的任何潜在的改变,在研究的开始和结束时运行了不同的、未受影响的样品。对于前15个未受影响的样品的轮次,独特位点的平均数目是3.8M并且对于染色体21和染色体18的平均染色体剂量分别是0.314和0.528。对于后15个未受影响的样品的轮次,独特位点的平均数目是10.7M并且对于染色体21和染色体18的平均染色体剂量分别是0.316和0.529。在染色体21和染色体18的染色体剂量之间随着训练组研究的时间推移,没有统计性差异。The number of single sequence sites (that is, the label with unique site identification in genome) changes from 2.2M in the early stage of this training group research to 13.7M (due to the improvement in sequencing technology over time) in the later stage. In order to monitor any potential change that chromosome dosage exceeds this 6 times of scope in unique site, different, unaffected samples were moved at the beginning and end of research. For the round of first 15 unaffected samples, the average number of unique sites is 3.8M and is respectively 0.314 and 0.528 for the average chromosome dosage of chromosome 21 and chromosome 18. For the round of rear 15 unaffected samples, the average number of unique sites is 10.7M and is respectively 0.316 and 0.529 for the average chromosome dosage of chromosome 21 and chromosome 18. Between the chromosome dosage of chromosome 21 and chromosome 18, along with the time of training group research, there is no statistical difference.
在图42上示出对于染色体21、18和13的训练组NCV。在图42中示出的结果与一种正态性的假设一致,该假设是:大约99%的二倍体NCV将落入平均值的±2.5标准偏差。在这一集内的65个样品中,8个具有指示出T21 的临床核型的样品具有的NCV范围为从6至20。四个具有的临床核型指示出胎儿T18的样品具有的NCV范围从3.3至12,并且两个具有的临床核型指示出胎儿三体性13(T13)的样品具有的NCV为2.6和4。在受影响的样品中 NCV的散布是由于它们对单个样品中的胎儿cfDNA的百分比的依赖性。The training set NCVs for chromosomes 21, 18, and 13 are shown in FIG42 . The results shown in FIG42 are consistent with the assumption of normality that approximately 99% of diploid NCVs will fall within ±2.5 standard deviations of the mean. Of the 65 samples in this set, 8 samples with a clinical karyotype indicating T21 had NCVs ranging from 6 to 20. Four samples with a clinical karyotype indicating fetal T18 had NCVs ranging from 3.3 to 12, and two samples with a clinical karyotype indicating fetal trisomy 13 (T13) had NCVs of 2.6 and 4. The spread of NCVs among the affected samples is due to their dependence on the percentage of fetal cfDNA in the individual samples.
与常染色体类似,在训练组内确定了性染色体的平均值和标准偏差。性染色体的阈值允许100%地鉴别训练组内的男性和女性胎儿。Similar to the autosomes, the mean and standard deviation of the sex chromosomes were determined within the training set.The threshold value of the sex chromosomes allowed 100% discrimination between male and female fetuses within the training set.
测试数据组1Test data set 1
在确立了染色体剂量平均值以及与训练组的标准离均差之后,从在2010 年1月至2010年6月之间从总共575个样品中收集的样品中选择了48个样品的一个测试组。其中一个来自双胎妊娠的样品从最终分析中去除,这样在测试组内剩下47个样品。使制备用于测序的样品以及操作设备的人员对临床核型信息为盲。孕龄范围与在训练组中看到的相似(表19)。侵入性程序的58%是CVS,比总体的程序性人口统计的更高,但也与训练组类似。50%的受试者是高加索人,27%是西班牙人,10.4%是亚洲人并且6.3%是非洲美国人。After establishing the mean chromosome dose and the standard deviation from the mean with the training group, a test group of 48 samples was selected from the samples collected from a total of 575 samples between January 2010 and June 2010. One of the samples from a twin pregnancy was removed from the final analysis, leaving 47 samples in the test group. The personnel preparing the samples for sequencing and operating the equipment were blinded to the clinical karyotype information. The gestational age range was similar to that seen in the training group (Table 19). 58% of the invasive procedures were CVS, which is higher than the overall procedural demographics, but also similar to the training group. 50% of the subjects were Caucasians, 27% were Hispanics, 10.4% were Asians and 6.3% were African Americans.
在测试组内,独特的序列标签的数目从大约13M至26M而不同。对于未受影响的样品,对于染色体21和染色体18,染色体剂量分别为0.313和0.527。对于染色体21、染色体18和染色体13,测试组NCV在图43中示出并且分类在表20中给出。In the test group, the number of unique sequence tags varies from about 13M to 26M. For unaffected samples, the chromosome doses are 0.313 and 0.527 for chromosome 21 and chromosome 18, respectively. For chromosome 21, chromosome 18, and chromosome 13, the test group NCVs are shown in Figure 43 and the classification is given in Table 20.
表20.测试组分类数据测试组分类数据Table 20. Test group classification data Test group classification data
*MX是X染色体的单体性,而Y染色体没有迹象*MX is monosomy of the X chromosome with no evidence of the Y chromosome
在测试组内,具有指示为胎儿T21的核型的13/13名受试者被正确地识别为具有范围从5至14的NCV。具有指示为胎儿T18的核型的八/八名受试者被正确地识别为具有范围从8.5至22的NCV。在本测试组内,具有归类为T13 的C的单一样品被归类为其中NCV大约为3的无判定。Within the test set, 13/13 subjects with a karyotype indicative of fetal T21 were correctly identified as having NCVs ranging from 5 to 14. Eight/eight subjects with a karyotype indicative of fetal T18 were correctly identified as having NCVs ranging from 8.5 to 22. Within this test set, a single sample with C classified as T13 was classified as no call with an NCV of approximately 3.
对于测试数据组,将所有男性样品正确地识别,包括具有复杂核型46,XY +标记染色体(通过细胞遗传学不能识别)的样品(表11).二十名女性样品中有十九名被正确识别,并且一个女性样品被归类为无判定。对于测试组内核型为45,X的三个样品,三个中有两个被正确识别为单体性X,并且1个被归类为无判定(表20)。For the test data set, all male samples were correctly identified, including samples with a complex karyotype of 46,XY + marker chromosomes (which could not be identified by cytogenetics) (Table 11). Nineteen of the twenty female samples were correctly identified, and one female sample was classified as no call. For the three samples with a karyotype of 45,X in the test set, two of the three were correctly identified as monosomy X, and one was classified as no call (Table 20).
双胞胎twin
对于训练组最初选择的样品中有四个并且测试组内有一个是来自双胎妊娠。在此使用的阈值可能受到在双胎妊娠的环境中预期的cfDNA的不同量值的困扰。在训练组内,来自其中一个双胎样品的核型是单绒膜47,XY+21。一个第二双胎样品是异卵的并且羊膜穿刺是对每个胎儿单独进行的。在这个双胎妊娠中,一个胎儿具有47,XY+21的核型而另一个具有一个正常的核型46,XX。在这两个病例中,基于以上讨论的方法的无细胞的归类将样品归类为T21。在训练组内的其他两个双胎妊娠被正确归类为对于T21未受影响的(所有双胎都显示对于染色体21的二倍体核型)。对于测试组内的双胎妊娠,仅对双胎B 确立了核型(46,XX),并且该算法被正确归类为对于T21是未受影响的。Four of the samples initially selected for the training set and one for the test set were from twin pregnancies. The threshold used here may be hampered by the varying amounts of cfDNA expected in the setting of a twin pregnancy. In the training set, the karyotype from one of the twin samples was monochorionic 47,XY+21. A second twin sample was fraternal, and amniocentesis was performed on each fetus individually. In this twin pregnancy, one fetus had a karyotype of 47,XY+21, while the other had a normal karyotype of 46,XX. In both cases, cell-free classification based on the method discussed above classified the samples as T21. The other two twin pregnancies in the training set were correctly classified as unaffected for T21 (all twins showed a diploid karyotype for chromosome 21). For the twin pregnancies in the test set, a karyotype (46,XX) was established only for twin B, and the algorithm correctly classified it as unaffected for T21.
结论in conclusion
该数据表明大规模平行测序法可以被用来从孕妇的血液中测定多个异常的胎儿核型。这些数据表明,对具有三体性21和三体性18的样品的100%正确的归类可以使用独立的测试组数据进行识别。即使是在具有异常性染色体核型的胎儿的情况下,没有一个样品利用该方法的算法被错误地归类。重要的是,该算法同样在确定在两个双胎妊娠的组内存在或不存在T21方面同样表现良好。此外,本研究检查了来自多个中心的许多连续的样品,不仅代表了人们在商业临床环境中可能见到的异常核型的范围,还展示了将未受普通三体性影响的妊娠准确归类的重要性,以强调在当今的产前筛查中存在的高到不可接受的假阳性率。该数据对于在未来利用该方法的巨大潜力提供了有价值的见解。独特的基因位点的子集的分析表明了在方差一致的泊松计数统计值上的增加。These data demonstrate that massively parallel sequencing can be used to determine multiple abnormal fetal karyotypes from maternal blood. These data demonstrate that 100% correct classification of samples with trisomy 21 and trisomy 18 can be identified using independent test set data. Even in the case of fetuses with abnormal sex chromosome karyotypes, none of the samples were incorrectly classified using the proposed algorithm. Importantly, the algorithm also performed equally well in determining the presence or absence of T21 in a cohort of twin pregnancies. Furthermore, this study examined numerous consecutive samples from multiple centers, representing not only the range of abnormal karyotypes likely to be seen in a commercial clinical setting but also the importance of accurately classifying pregnancies unaffected by common trisomies, highlighting the unacceptably high false-positive rate in current prenatal screening. These data provide valuable insights into the significant potential for future use of this approach. Analysis of a subset of unique loci demonstrated an increase in the variance-consistent Poisson count statistic.
该数据在Fan和Quake的发现的基础上建立,Fan和Quake证实了:使用大规模平行测序从母体血浆无创确定胎儿非整倍性的灵敏度仅受计数统计的限制(Fan和Quake,PLosOne 5,e10439[2010])。因为测序信息是遍及整个基因组采集的,所以这种方法能够确定任何非整倍性或其他拷贝数变异,包括插入和缺失。来自其中一个样品的核型在染色体11中在q21和q23之间具有一个小的缺失,当将测序数据在500k碱基数据箱内进行分析时,观察到在q21 处起始的一个25Mb的区域内标签相对数目约10%的减少。此外,在训练组内,样品中有三个由于细胞遗传分析中的镶嵌现象而具有分钟的性核型。这些核型是:i)47,XXX[9]/45,X[6],ii)45,X[3]/46,XY[17],以及iii)47,XXX[13]/45,X[7]。展现出一些含有XY的细胞的样品ii被正确归类为XY。通过细胞遗传分析(与嵌合体特纳综合征一致)均展示出XXX和X细胞的混合物的样品i(来自CVS 过程)和iii(来自羊膜穿刺)分别被归类为无判定和单体性X。This data builds on the findings of Fan and Quake, who demonstrated that the sensitivity of noninvasively determining fetal aneuploidy from maternal plasma using massively parallel sequencing is limited only by counting statistics (Fan and Quake, PLoS One 5, e10439 [2010]). Because sequencing information is collected across the entire genome, this approach is able to determine any aneuploidy or other copy number variation, including insertions and deletions. The karyotype from one of the samples had a small deletion in chromosome 11 between q21 and q23, and when the sequencing data were analyzed in 500k base bins, a decrease of approximately 10% in the relative number of tags within a 25Mb region starting at q21 was observed. In addition, within the training set, three of the samples had minute karyotypes due to mosaicism in cytogenetic analysis. These karyotypes were: i) 47,XXX[9]/45,X[6], ii) 45,X[3]/46,XY[17], and iii) 47,XXX[13]/45,X[7]. Sample ii, which exhibited some XY-containing cells, was correctly classified as XY. Samples i (from the CVS procedure) and iii (from amniocentesis), which both exhibited a mixture of XXX and X cells by cytogenetic analysis (consistent with mosaic Turner syndrome), were classified as non-confirmed and monosomy X, respectively.
在测试该算法时,对于来自测试组的一个样品(图43)的染色体21,另一个有趣的数据点被观察到具有-5和-6之间的一个NCV。尽管该样品通过细胞遗传学在染色体21上为二倍体,该核型对于染色体9展示了与部分三倍性的嵌合现象:47,XX+9[9]/46,XX[6]。由于染色体9用在分母中来确定染色体21的染色体剂量(表18),这降低了总的NCV值。在以下实例13中提供的结果证实了使用归一化染色体在本样品中确定胎儿三体性9的能力。When testing the algorithm, another interesting data point was observed for chromosome 21 of one sample from the test group (Figure 43) with an NCV between -5 and -6. Although the sample was diploid on chromosome 21 by cytogenetics, the karyotype exhibited mosaicism with partial triploidy for chromosome 9: 47,XX+9[9]/46,XX[6]. Since chromosome 9 was used in the denominator to determine the chromosome dose of chromosome 21 (Table 18), this reduced the overall NCV value. The results provided in Example 13 below demonstrate the ability to determine fetal trisomy 9 in this sample using normalized chromosomes.
Fan等有关这些方法的灵敏度的结论仅在所使用的算法能够考虑测序方法所带来的任何随机或系统性偏差时才是正确的。如果该测序数据未被适当地归一化,则所得的分析结果将劣于计数统计。Chiu等人在他们近期的论文中注意到,他们使用大规模平行测序方法得到的染色体18和13的测量结果是不精确的,并且结论是需要进行更多的研究来将该方法应用至T18和T13的测定 (Chiu等人,BMJ 342:c7401[2011])。在Chiu等人的论文中使用的方法在他们的病例染色体21中简单使用了感兴趣的染色体的序列标签的数目,该数目通过在该测序轮次中的标签的总数目进行了归一化。这种途径的挑战之处在于:标签在每个染色体上的分布可以从测序轮次到测序轮次而不同,并且因此增加了非整倍性测定量度的总体变化。为了将Chiu算法的结果与在本实例中使用的染色体的剂量进行对比,将染色体21和18的测试数据使用Chiu等人推荐的方法进行再分析,如图44中所示。总体上,对于染色体21和18的每一个都观察到了在NCV的范围内的压缩,并且观察到了确定率的减小,其中利用了用于非整倍性分类的NCV阈值4.0从我们的测试组中正确识别出10/13的 T21和5/8的T18样品。The conclusion of the sensitivity of these methods such as Fan is only correct when the algorithm used can take into account any random or systematic deviation brought by the sequencing method. If the sequencing data is not properly normalized, the analysis result obtained will be inferior to counting statistics. Chiu et al. note in their recent paper that the measurement results of chromosomes 18 and 13 obtained by their large-scale parallel sequencing method are inaccurate, and the conclusion is that more research is needed to apply the method to the determination of T18 and T13 (Chiu et al., BMJ 342:c7401[2011]). The method used in the paper of Chiu et al. simply used the number of sequence tags of the chromosome of interest in their case chromosome 21, and the number was normalized by the total number of tags in the sequencing round. The challenge of this approach is that the distribution of tags on each chromosome can be different from sequencing round to sequencing round, and therefore increases the overall change of aneuploidy determination measurement. In order to compare the result of Chiu's algorithm with the dosage of the chromosome used in this example, the test data of chromosomes 21 and 18 are reanalyzed using the method recommended by Chiu et al., as shown in Figure 44. Overall, a compression in the range of NCVs was observed for each of chromosomes 21 and 18, and a reduction in the ascertainment rate was observed, with 10/13 T21 and 5/8 T18 samples correctly identified from our test set using an NCV threshold of 4.0 for aneuploidy classification.
Ehrich等人还只聚焦于T21并且使用与Chiu等人相同的算法(Ehrich et al., AmJ Obstet Gynecol 204:205e1-e11[2011])。此外,在观察到他们的测试组z 分数量度与外部参照数据(即训练组)的一个偏移之后,他们对测试组进行了再训练以确立分类边界。尽管这种方法在原则上是可行的,实际中将具有挑战性的是决定要求多少样品进行训练以及需要多久一次进行再训练来确保这些分类数据的正确。减轻这一问题的一种方法是在每个测序轮次中都包括对照,这些对照度量基线并且对于定量行为进行校准。Ehrich et al. also focused only on T21 and used the same algorithm as Chiu et al. (Ehrich et al., AmJ Obstet Gynecol 204:205e1-e11[2011]). In addition, after observing a deviation in their test group z score measurement from the external reference data (i.e., training group), they retrained the test group to establish the classification boundary. Although this approach is feasible in principle, in practice it will be challenging to decide how many samples are required for training and how often retraining is needed to ensure the correctness of these classification data. One way to alleviate this problem is to include controls in each sequencing run that measure the baseline and calibrate the quantitative behavior.
使用本方法获得的数据表明,当用于将染色体计数数据进行归一化的算法被优化时,大规模平行测序能够从孕妇的血浆确定多种胎儿染色体异常。用于定量的本方法不仅将测序轮次之间的随机和系统变异减至最小,还允许遍及整个基因组对非整倍性进行分类,最显著的是T21和T18.要求较大的样品收集来测试用于T13测定的算法。为此目的,正在进行一个前瞻性的、盲的、多地点的临床研究以进一步证明本方法的诊断准确性。The data obtained using this method show that when the algorithm for chromosome count data being normalized is optimized, large-scale parallel sequencing can determine multiple fetal chromosome abnormalities from the plasma of pregnant women. This method not only minimizes the random and systematic variation between the sequencing rounds for quantitative analysis, but also allows aneuploidy to be classified throughout the entire genome, with the most notable being T21 and T18. requiring larger samples to collect and test the algorithm for T13 determination. For this purpose, a prospective, blind, multi-location clinical study is being conducted to further demonstrate the diagnostic accuracy of this method.
实例13Example 13
在单个测试样品的所有染色体中确定存在或不存在至少5种不同的染色体性非整Determine the presence or absence of at least five different chromosomal anomalous events in all chromosomes of a single test sample. 倍性ploidy
为了证明本方法用于确定每一组母体测试样品(测试组1;实例12)中存在或不存在任何染色体性非整倍性的能力,在未受影响的测试组样品(训练组1;实例12)中识别了系统地确定的归一化染色体序列,并且这些归一化染色体序列被用来计算针对每个测试样品的所有染色体的染色体剂量。确定每个测试和训练组样品中存在或不存在任何一种或多种不同的完整的胎儿染色体性非整倍性是由从对每个单个样品进行的单次测序轮次获得的测序信息实现的。To demonstrate the ability of this method to determine the presence or absence of any chromosomal aneuploidy in each group of maternal test samples (test group 1; Example 12), systematically determined normalized chromosome sequences were identified in unaffected test group samples (training group 1; Example 12), and these normalized chromosome sequences were used to calculate chromosome doses for all chromosomes of each test sample. Determining the presence or absence of any one or more different complete fetal chromosomal aneuploidies in each test and training group sample was achieved by sequencing information obtained from a single sequencing run performed on each individual sample.
使用染色体密度,即对于在实例12中说明的每个测试组的样品中的每个染色体识别的序列标签的数目,通过对于染色体1-22、X和Y中的每一个计算一个单染色体剂量而确定了由一个单染色体或一组染色体组成的、一个系统地确定的归一化染色体序列。通过使用每个可能的染色体组合作为分母而系统地计算针对每个染色体的染色体剂量而确定了对于染色体1-22、X和Y中的每一个的、系统地确定的归一化染色体序列。例如,对于染色体21作为感兴趣的染色体,作为(i)对于染色体21(感兴趣的染色体)获得的序列标签的数目和 (ii)对于每个剩余染色体获得的序列标签的数目与对于剩余染色体(不包括染色体21)的所有可能的组合获得的标签数目之和的比率,计算了染色体剂量,即:1、2、3、4、5等等直到20、21、22、X和Y;1+2、1+3、1+4、1+5等等直到1+20、1+22、1+X、以及1+Y;1+2+3、1+2+4、1+2+5等等直到1+2+20、 1+2+22、1+2+X、以及1+2+Y;1+3+4、1+3+5、1+3+6等等直到1+3+20、1+3+22、 1+3+X、以及1+3+Y;1+2+3+4、1+2+3+5、1+2+3+6等等直到1+2+3+20、 1+2+3+22、1+2+3+X、以及1+2+3+Y;并且如此等等,这样使得所有染色体 1-20、22、X和Y的所有可能的组合都被用作归一化染色体序列(分子)来针对训练组内的这些合格(非整倍性)样品中的每一个的每个感兴趣的染色体来确定所有可能的染色体剂量。对于在所有训练组样品中的染色体21以同样的方式确定了染色体剂量,并且这些针对染色体21系统地确定的归一化染色体序列被确定作为导致在一个剂量中针对21具有遍及所有训练样品具有最小的变异性的单一的或一组染色体。重复进行了相同的分析以确定将作为针对每个剩余染色体(包括染色体13、18、X以及Y)的进行了系统地确定的归一化染色体序列的单染色体或染色体组合,即,使用了所有可能的染色体组合来确定在所有训练样品中针对所有其他感兴趣的染色体1-12、14-17、19-20、22、X 以及Y的归一化序列(单个染色体或一组染色体)。因此,将所有染色体都视作感兴趣的染色体,并且对于训练组内的每个未受影响的样品中所有染色体中的每一个都确定了一个系统地确定的归一化序列。表21提供了作为对于每个感兴趣的染色体1-22、X以及Y的系统地确定的归一化序列识别出的单个染色体或染色体组。如通过表21突出显示,对于一些感兴趣的染色体,系统地确定的归一化染色体序列被确定为单一染色体(例如当染色体4是感兴趣的染色体时),而对于其他感兴趣的染色体,系统地确定的归一化染色体序列被确定为一组染色体(例如当染色体21是感兴趣的染色体时)。Using the chromosome density, i.e., the number of sequence tags identified for each chromosome in each test set of samples described in Example 12, a systematically determined normalizing chromosome sequence consisting of a single chromosome or a group of chromosomes was determined by calculating a single chromosome dose for each of chromosomes 1-22, X, and Y. The systematically determined normalizing chromosome sequence for each of chromosomes 1-22, X, and Y was determined by systematically calculating the chromosome dose for each chromosome using each possible chromosome combination as the denominator. For example, for chromosome 21 as the chromosome of interest, the chromosome dose is calculated as the ratio of (i) the number of sequence tags obtained for chromosome 21 (the chromosome of interest) and (ii) the number of sequence tags obtained for each remaining chromosome to the sum of the number of tags obtained for all possible combinations of the remaining chromosomes (excluding chromosome 21), i.e.: 1, 2, 3, 4, 5, etc. up to 20, 21, 22, X, and Y; 1+2, 1+3, 1+4, 1+5, etc. up to 1+20, 1+22, 1+X, and 1+Y; 1+2+3, 1+2+4, 1+2+5, etc. up to 1+2+20, 1+2+22, 1+2+X, and 1+2+Y; 1+3+4, 1+3+5, 1+3+6, etc. up to 1+3+20, 1+3+22, 1+3+X, and 1+3+Y; 1+2+3+4, 1+2+3+5, 1+2+3+6, etc. until 1+2+3+20, 1+2+3+22, 1+2+3+X, and 1+2+3+Y; and so on, so that all possible combinations of all chromosomes 1-20, 22, X and Y are used as normalizing chromosome sequences (molecules) to determine all possible chromosome doses for each chromosome of interest in each of these qualified (aneuploidy) samples in the training set. Chromosome doses were determined in the same manner for chromosome 21 in all training set samples, and these normalizing chromosome sequences systematically determined for chromosome 21 were determined as a single or a group of chromosomes that resulted in a dose with minimal variability for 21 across all training samples. Repeated identical analysis to determine to use as single chromosome or the chromosome combination of the normalization chromosome sequence that systematically determines for each residue chromosome (comprising chromosome 13,18, X and Y), that is, used all possible chromosome combinations to determine in all training samples for the normalization sequence (single chromosome or one group of chromosome) of all other interested chromosome 1-12,14-17,19-20,22, X and Y.Therefore, all chromosomes are all regarded as interested chromosome, and for each in all chromosome in each unaffected sample in the training group, all have determined a normalization sequence that systematically determines.Table 21 provides single chromosome or the chromosome group that identifies as the normalization sequence that systematically determines for each interested chromosome 1-22, X and Y. As highlighted by table 21, for some interested chromosomes, the normalization chromosome sequence that systematically determines is determined as single chromosome (for example when chromosome 4 is interested chromosome), and for other interested chromosomes, the normalization chromosome sequence that systematically determines is determined as one group of chromosome (for example when chromosome 21 is interested chromosome).
表21.针对所有染色体的、系统地确定的归一化染色体序列Table 21. Systematically determined normalized chromosome sequences for all chromosomes
在表22中给出对于所有染色体中的每一个所确定的系统地确定的归一化染色体序列的平均值、标准偏差(SD)以及变异系数(CV)。The mean, standard deviation (SD) and coefficient of variation (CV) of the systematically determined normalized chromosome sequences determined for each of all chromosomes are given in Table 22.
表22.对于系统地确定的归一化染色体序列的平均值、标准偏差(SD)以及变异系Table 22. Means, standard deviations (SDs), and variants for systematically determined normalized chromosome sequences. 数(CV)Number(CV)
a不包括三体性 a does not include trisomy
b女性胎儿 bFemale fetus
遍及所有训练样品的染色体剂量的变化(如由CV的值所反映的)证实了系统地确定的归一化染色体序列用于提供一个大的信噪比和动态范围的用途,从而允许以高的敏感性和高的特异性对非整倍性进行确定,如以下内容所示。The variation in chromosome dosage across all training samples (as reflected by the CV values) confirms the use of systematically determined normalized chromosome sequences to provide a large signal-to-noise ratio and dynamic range, allowing the determination of aneuploidy with high sensitivity and high specificity, as shown below.
为了证明该方法的敏感性和特异性确定了,针对所有感兴趣的染色体1-22、 X和Y在训练组内的每个样品中对于所有感兴趣的染色体1-22、X和Y的染色体剂量,并且在实例11中说明的测试组内的所有样品的每一个都使用了在以上表21中提供的相应的、系统地确定的归一化染色体序列。To demonstrate the sensitivity and specificity of this method, chromosome doses for all chromosomes of interest 1-22, X, and Y were determined in each sample within the training set, and the corresponding, systematically determined normalizing chromosome sequences provided in Table 21 above were used for each of all samples within the test set described in Example 11.
使用对于每个感兴趣的染色体的系统地确定的归一化染色体序列,在每个训练组的样品中以及每个测试样品中确定了任何胎儿非整倍性的存在或不存在,即,确定了每个样品是否染色体1、2、3、4、5、6、7、8、9、10、11、 12、13、14、15、16、17、18、19、20、21、22、X以及Y都含有一个完整的胎儿染色体性非整倍性。对于在每个训练组的样品中以及每个测试样品中的所有染色体都获得了序列信息,即序列标签的数目,并且对于在每个训练和测试样品中的每个染色体都使用与在测试组内确定的那些相对应的、系统地确定的归一化染色体序列(表21)获得的序列标签的数目如以上所述计算了一个单染色体剂量。在每个训练样品中对于系统地确定的归一化染色体序列获得的序列标签的数目被用来确定每个训练样品中每个染色体的染色体剂量,并且在每个测试样品中对于系统地确定的归一化染色体序列获得的序列标签的数目被用来确定每个测试样品中每个染色体的染色体剂量。为了确保对非整倍性进行安全和有效的分类,如实例12中所说明,选择了同样保守的边界。Using the normalized chromosome sequence systematically determined for each chromosome of interest, the presence or absence of any fetal aneuploidy was determined in each training set of samples and in each test sample, i.e., it was determined whether each sample contained a complete fetal chromosomal aneuploidy for chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, and Y. Sequence information, i.e., the number of sequence tags, was obtained for all chromosomes in each training set of samples and in each test sample, and a single chromosome dose was calculated as described above for each chromosome in each training and test sample using the number of sequence tags obtained using the systematically determined normalized chromosome sequence (Table 21) corresponding to those determined in the test set. The number of sequence tags obtained for the systematically determined normalized chromosome sequence in each training sample was used to determine the chromosome dose for each chromosome in each training sample, and the number of sequence tags obtained for the systematically determined normalized chromosome sequence in each test sample was used to determine the chromosome dose for each chromosome in each test sample. To ensure safe and effective classification of aneuploidy, similarly conservative boundaries were selected as described in Example 12.
训练组结果Training group results
在图45中给出使用系统地确定的归一化染色体序列在训练组的样品中针对染色体21、18和13的染色体剂量的绘图。当使用系统地确定的归一化染色体序列,即染色体4+14+16+20+22的组时,其中临床核型指示T21的8个样品具有5.4与21.5之间的NCV。当使用系统地确定的归一化染色体序列(即染色体4+14+16+20+22的组)时,其中临床核型指示T21的8个样品具有5.4 与21.5之间的NCV。当使用系统地确定的归一化染色体序列(即染色体 2+3+5+7的组)时,其中临床核型指示T18的4个样品具有3.3与15.3之间的 NCV。训练组的T21样品作为染色体21数据的最后8个样品示出(O);训练组的T18样品作为染色体18数据的最后4个样品示出(Δ);并且训练组的 T13样品作为染色体13数据的最后2个样品示出(□)。Figure 45 shows a plot of the chromosome dose for chromosomes 21, 18, and 13 in the samples of the training set using a systematically determined normalizing chromosome sequence. When using a systematically determined normalizing chromosome sequence, i.e., the group of chromosomes 4+14+16+20+22, 8 samples where the clinical karyotype indicated T21 had an NCV between 5.4 and 21.5. When using a systematically determined normalizing chromosome sequence (i.e., the group of chromosomes 4+14+16+20+22), 8 samples where the clinical karyotype indicated T21 had an NCV between 5.4 and 21.5. When using a systematically determined normalizing chromosome sequence (i.e., the group of chromosomes 2+3+5+7), 4 samples where the clinical karyotype indicated T18 had an NCV between 3.3 and 15.3. The T21 sample of the training group is shown as the last 8 samples of the chromosome 21 data (O); the T18 sample of the training group is shown as the last 4 samples of the chromosome 18 data (Δ); and the T13 sample of the training group is shown as the last 2 samples of the chromosome 13 data (□).
这些数据表明,能以高的置信度使用归一化染色体序列来确定不同的、完整的胎儿染色体性非整倍性并且将其正确分类。由于所有具有受影响的核型的样品都具有大于3的NCV,存在约0.1%的可能性,即:这些样品为未受影响的分布中的一部分。These data demonstrate that normalized chromosome sequences can be used with high confidence to determine distinct, complete fetal chromosomal aneuploidies and correctly classify them. Since all samples with affected karyotypes have an NCV greater than 3, there is approximately a 0.1% probability that these samples are part of the unaffected distribution.
与常染色体类似,当系统地确定的归一化染色体序列(即染色体4+8的组)被用于染色体X时,并且当系统地确定的归一化染色体序列(即染色体 4+6的组)被用于染色体Y时,训练组内的所有女性和男性胎儿都被正确地识别出来。此外,将所有5个单体性X样品都识别出来。图46A示出了对于训练组内的每个样品对于X染色体确定的NCV(X轴)以及对于Y染色体确定的NCV(Y轴)的曲线图。通过核型为单体性X的所有样品具有小于-4.83的 NCV值。具有与45,X核型(完全的或嵌合的)一致的核型的那些单体性X 样品具有如所期望的一个接近零的Y NCV值。对于X和Y而言女性样品都聚集在NCV=0附近。Similar to the autosomes, when the systematically determined normalized chromosome sequence (i.e., the chromosome 4+8 group) was used for chromosome X, and when the systematically determined normalized chromosome sequence (i.e., the chromosome 4+6 group) was used for chromosome Y, all female and male fetuses in the training set were correctly identified. In addition, all five monosomy X samples were identified. Figure 46A shows a graph of the NCV (X-axis) determined for the X chromosome and the NCV (Y-axis) determined for the Y chromosome for each sample in the training set. All samples with a monosomy X karyotype had an NCV value of less than -4.83. Those monosomy X samples with a karyotype consistent with a 45,X karyotype (complete or mosaic) had a Y NCV value close to zero, as expected. Female samples clustered around NCV=0 for both X and Y.
测试组结果Test group results
在图47中给出使用相关的系统地确定的归一化染色体序列在测试样品中针对染色体21、18和13的染色体剂量的绘图。当使用系统地确定的归一化染色体序列(即染色体4+14+16+20+22的组)时,其中临床核型指示T21的13 个样品中有13个被正确识别出具有7.2与16.3之间的NCV。当使用系统地确定的归一化染色体序列(即染色体2+3+5+7的组时),其中临床核型指示T18 的所有8个样品都被识别出具有12.7与30.7之间的NCV。当使用系统地确定的归一化染色体序列(即染色体2+3+5+7的组)时,其中临床核型指示T18 的所有8个样品都被识别出具有12.7与30.7之间的NCV。测试组的T21样品作为染色体21数据的最后13个样品示出(O);测试组的T18样品作为染色体18数据的最后8个样品示出(Δ);并且测试组的T13样品作为染色体13 数据的最后的样品示出(□)。Provided in Figure 47 is a drawing of the chromosome dosage for chromosomes 21, 18 and 13 in the test sample using the normalizing chromosome sequence determined systematically. When using the normalizing chromosome sequence determined systematically (i.e. the group of chromosome 4+14+16+20+22), 13 of the 13 samples wherein the clinical karyotype indicates T21 were correctly identified to have an NCV between 7.2 and 16.3. When using the normalizing chromosome sequence determined systematically (i.e. the group of chromosome 2+3+5+7), all 8 samples wherein the clinical karyotype indicates T18 were identified to have an NCV between 12.7 and 30.7. When using the normalizing chromosome sequence determined systematically (i.e. the group of chromosome 2+3+5+7), all 8 samples wherein the clinical karyotype indicates T18 were identified to have an NCV between 12.7 and 30.7. The T21 sample of the test group is shown as the last 13 samples of chromosome 21 data (O); the T18 sample of the test group is shown as the last 8 samples of chromosome 18 data (Δ); and the T13 sample of the test group is shown as the last sample of chromosome 13 data (□).
这些数据表明,能以高的置信度使用系统地确定的、归一化染色体序列来确定不同的完整的胎儿染色体性非整倍性并且将其正确分类。与训练组类似,具有受影响的核型的所有样品都具有大于7的NCV,这表明有一个极小的可能性,即:这些样品是未受影响的分布的一部分。(图47)。These data demonstrate that systematically determined, normalized chromosome sequences can be used to determine and correctly classify different complete fetal chromosomal aneuploidies with high confidence. Similar to the training set, all samples with affected karyotypes had an NCV greater than 7, indicating a very small likelihood that these samples are part of the unaffected distribution. (Figure 47).
与常染色体类似,当系统地确定的归一化染色体序列(即染色体4+8的组)被用于染色体X时,并且当系统地确定的归一化染色体序列(即染色体 4+6的组)被用于染色体Y时,测试组内的所有女性和男性胎儿都被正确地识别出来。此外,将所有3个单体性X样品都识别出来。图46B示出了对于测试组内的每个样品对于X染色体确定的NCV(X轴)以及对于Y染色体确定的NCV(Y轴)的绘图。Similar to the autosomes, all female and male fetuses in the test set were correctly identified when the systematically determined normalizing chromosome sequence (i.e., the set of chromosomes 4+8) was used for chromosome X, and when the systematically determined normalizing chromosome sequence (i.e., the set of chromosomes 4+6) was used for chromosome Y. In addition, all three monosomy X samples were identified. Figure 46B shows a plot of the NCV determined for chromosome X (X-axis) and the NCV determined for chromosome Y (Y-axis) for each sample in the test set.
如以上说明的,本方法允许在每个样品中确定存在或不存在每个染色体 1-22、X和Y的一种完整的、或部分的染色体性非整倍性。除了测定完整的染色体性非整倍性T13、T18、T21单体性X外,该方法还测定了在其中一个测试样品中三体性9的存在。当使用系统测定的归一化染色体序列(即染色体 3+4+8+10+17+19+20+22的组)时,对于感兴趣的染色体9,识别了一个具有 14.4的NCV的样品(图48)。这一样品对应于实例12中的测试样品,该测试样品根据对于染色体21的畸形的低剂量被怀疑对于染色体9是非整倍性(其中在实例12中使用了染色体9作为归一化染色体序列)。As explained above, this method allows to determine in each sample the presence or absence of a complete or partial chromosomal aneuploidy of each chromosome 1-22, X and Y. Except measuring complete chromosomal aneuploidy T13, T18, T21 monosomy X, this method also measures the existence of trisomy 9 in one of the test specimens. When using the normalization chromosome sequence (i.e. the group of chromosome 3+4+8+10+17+19+20+22) measured by the system, for chromosome 9 interested, identified a sample (Figure 48) with an NCV of 14.4. This sample corresponds to the test specimen in example 12, and this test specimen is suspected to be aneuploidy (wherein chromosome 9 has been used as the normalization chromosome sequence in example 12) for chromosome 9 according to the malformed low dosage for chromosome 21.
该数据表明,100%的样品具有指示T21、T13、T18、T9以及单体性X 的临床核型的样品被正确地识别出来。图49示出了在47个测试样品的每一个中对于染色体1-22的每一个的NCV的曲线图。将NCV的中位数归一化至零。该数据表明,本发明的方法(包括使用系统地确定的归一化染色体序列)以100%的灵敏度和100%的特异性确定了这一测试组中存在的所有5种类型的染色体性非整倍性的存在,并且清楚地指出,该方法可以识别在任何样品中对于染色体1-22、X和Y任一者的任何染色体性非整倍性。The data show that 100% of samples have the sample of the clinical karyotype of indication T21, T13, T18, T9 and monosomy X and are correctly identified. Figure 49 shows the curve diagram of the NCV for each of chromosome 1-22 in each of 47 test samples. The median of NCV is normalized to zero. The data show that method of the present invention (comprising the normalization chromosome sequence determined using systemically) has determined the existence of the chromosomal aneuploidy of all 5 types present in this test group with 100% sensitivity and 100% specificity, and clearly points out that the method can identify any chromosomal aneuploidy for any one of chromosome 1-22, X and Y in any sample.
实例14Example 14
确定存在或不存在部分胎儿染色体性非整倍性:确定猫眼综合征Determining the presence or absence of partial fetal chromosomal aneuploidy: Identifying cat eye syndrome
迪格奥尔格综合征(22q11.2缺失综合征),由在染色体22中的缺陷引起的病症,导致数个身体系统的不良发展。通常与迪格奥尔格综合征相关联的医疗问题包括心脏缺陷、不良免疫系统功能、腭裂、甲状旁腺以及行为失常。与迪格奥尔格综合征相关联的问题的数目和严重程度有非常大的变化。几乎每一个具有迪格奥尔格综合征的人都需要来自多个领域的专家的治疗。DiGeorge syndrome (22q11.2 deletion syndrome), a condition caused by a defect in chromosome 22, leads to the poor development of several body systems. Medical problems commonly associated with DiGeorge syndrome include heart defects, poor immune system function, cleft palate, parathyroid gland abnormalities, and behavioral disorders. The number and severity of problems associated with DiGeorge syndrome vary greatly. Almost everyone with DiGeorge syndrome requires treatment from specialists in multiple fields.
为了确定存在或不存在胎儿染色体22的部分缺失,通过对母亲实施静脉穿刺而获得了一个血液样品,并且cfDNA是如以上实例中所描述而制备的。纯化后的cfDNA被连接到适配子上并且使用Illumina cBot聚簇站(cluster station)使其经受成簇扩增。使用可逆染料终止子进行大规模平行测序,以产生数百万的36bp读数。将这些序列读数与人类hg19参照基因组进行比对,并且将独特地映射到参照基因组上的读数作为标签来计数。To determine the presence or absence of a partial deletion of fetal chromosome 22, a blood sample was obtained by venipuncture of the mother, and cfDNA was prepared as described in the above example. Purified cfDNA was connected to an adaptor and subjected to cluster amplification using an Illumina cBot cluster station. Large-scale parallel sequencing was performed using a reversible dye terminator to generate millions of 36bp reads. These sequence reads were aligned with the human hg19 reference genome, and the reads uniquely mapped to the reference genome were counted as tags.
将全部都已知为染色体22的二倍体(即染色体22或其任何部分已知仅以二倍体状态存在)的一个合格样品的组首先进行测序并且进行分析以对于3 兆碱基(Mb)的1000个区段的每一个(不包括区域22q11.2)都获得多个序列标签。如果人类基因组包括大约30亿个碱基(3Gb),3Mb的1000个区段各自大约组成了基因组的剩余部分。这1000个区段中每个都可以单独或作为一个区段序列的组服务,这些区段序列被用来确定感兴趣的区段的归一化区段序列,即22q11.2的3Mb区域。映射到每个单一的1000bp区段上的序列标签的数目被单独地用来计算22q11.2的3Mb区域的区段剂量。此外,两个或更多个区段的所有可能的组合被用来确定对于所有合格的样品中感兴趣的区段的区段剂量。导致具有遍及样品最低的变异性的区段剂量的该单一的3Mb区段或两个或更多个3Mb区段的组合被选作归一化区段序列。A group of qualified samples that are all known to be diploid for chromosome 22 (i.e., chromosome 22 or any part thereof is known to exist only in a diploid state) is first sequenced and analyzed to obtain multiple sequence tags for each of the 1000 segments of 3 megabases (Mb) (excluding region 22q11.2). If the human genome includes approximately 3 billion bases (3Gb), the 1000 segments of 3Mb each approximately constitute the remainder of the genome. Each of these 1000 segments can be served individually or as a group of segment sequences that are used to determine the normalized segment sequence of the segment of interest, i.e., the 3Mb region of 22q11.2. The number of sequence tags mapped to each single 1000bp segment is used individually to calculate the segment dose of the 3Mb region of 22q11.2. In addition, all possible combinations of two or more segments are used to determine the segment dose for the segment of interest in all qualified samples. The single 3 Mb segment or combination of two or more 3 Mb segments that resulted in the segment dose with the lowest variability across samples was selected as the normalizing segment sequence.
在每个合格样品中映射到感兴趣的区段上的序列标签的数目被用来确定每个合格样品中的区段剂量。在所有合格样品中的区段剂量的平均值和标准方差被计算和用来确定阈值,可以将在测试样品中确定的区段剂量与这些阈值进行对比。优选地,对于所有合格样品中的所有感兴趣的区段计算归一化的区段值(NSV),并且使用这些值来设定阈值。The number of sequence tags mapped to the segment of interest in each qualified sample is used to determine the segment dose in each qualified sample. The mean value and standard deviation of the segment dose in all qualified samples are calculated and used to determine a threshold value, and the segment dose determined in the test sample can be compared with these threshold values. Preferably, a normalized segment value (NSV) is calculated for all segments of interest in all qualified samples, and these values are used to set the threshold value.
随后,在相应的测试样品中映射到归一化区段序列的标签的数目被用来确定测试样品中感兴趣的区段的剂量。如之前所描述对于测试样品中的区段计算出一个归一化区段值(NSV)并且将测试样品中感兴趣的区段的NCV与使用合格样品确定的阈值进行比较以确定在测试样品中存在或不存在22q11.2的缺失。Subsequently, the number of tags that map to the normalizing segment sequence in the corresponding test sample is used to determine the dose of the segment of interest in the test sample. A normalized segment value (NSV) is calculated for the segment in the test sample as described previously and the NCV of the segment of interest in the test sample is compared to a threshold value determined using qualified samples to determine the presence or absence of a 22q11.2 deletion in the test sample.
测试NCV<-3表明在感兴趣的区段中的一种丢失,即在测试样品中存在染色体22(22q11.2)的部分缺失。A test NCV < -3 indicates a loss in the segment of interest, ie, a partial deletion of chromosome 22 (22q11.2) is present in the test sample.
实例15Example 15
为得到II阶段结肠癌患者的预测结果进行的粪便DNA测试Stool DNA test for predictive outcome in patients with stage II colon cancer
所有II阶段结肠癌患者中大约30%将会复发并死于其所患的疾病。已出现疾病复发的II阶段结肠癌患者在染色体4、5、15q、17q和18q上显示出显著更多的丢失。具体地讲,II阶段结肠癌患者在4q22.1-4q35.2上的丢失已表明与更差的结果是相关联的。确定存在或不存在这些基因组改变可以辅助选择进行佐剂疗法的患者(Brosens等人,分析细胞病理学/细胞肿瘤学(Analytical Cellular Pathology/Cellular Oncology)33:95–104[2010])。)Approximately 30% of all stage II colon cancer patients will relapse and die from their disease. Stage II colon cancer patients who have experienced disease recurrence show significantly more losses on chromosomes 4, 5, 15q, 17q, and 18q. Specifically, losses on 4q22.1-4q35.2 have been shown to be associated with worse outcomes in stage II colon cancer patients. Determining the presence or absence of these genomic alterations may aid in selecting patients for adjuvant therapy (Brosens et al., Analytical Cellular Pathology/Cellular Oncology 33:95–104 [2010]).
为了确定在患有II阶段结肠癌患者中存在或不存在4q22.1至4q35.2区域中一种或多种染色体缺失,从这个或这些患者获得了粪便和/或血浆样品。粪便 DNA是根据Chen等人,J Natl Cancer Inst 97:1124-1132[2005]描述的方法制备的;并且血浆DNA是根据以上实例中描述的方法制备的。根据在此描述的NGS 法对DNA进行测序,并且这个或这些患者样品的序列信息被用来计算针对跨越4q22.1至4q35.2区域的一个或多个区段的区段剂量。区段剂量是使用分别在一个合格的粪便和/或血浆样品组内在先确定的归一化区段剂量来确定的。计算了测试样品(患者样品)中的区段剂量,并且在4q22.1至4q35.2区域中存在或不存在一种或多种部分染色体缺失是通过将每个感兴趣的区段与由合格样品组内的NSV设定的阈值进行比较来确定的。To determine the presence or absence of one or more chromosome deletions in the 4q22.1 to 4q35.2 region in patients with stage II colon cancer, stool and/or plasma samples were obtained from the patient(s). Fecal DNA was prepared according to the method described in Chen et al., J Natl Cancer Inst 97:1124-1132 [2005]; and plasma DNA was prepared according to the method described in the above examples. DNA was sequenced according to the NGS method described herein, and the sequence information of the patient(s) was used to calculate the segment dose for one or more segments spanning the 4q22.1 to 4q35.2 region. The segment dose was determined using the normalized segment dose previously determined in a qualified stool and/or plasma sample group. The segment dose in the test sample (patient sample) was calculated, and the presence or absence of one or more partial chromosome deletions in the 4q22.1 to 4q35.2 region was determined by comparing each segment of interest with a threshold value set by the NSV in the qualified sample group.
实例16Example 16
通过对母体血浆DNA进行测序来进行全部基因组胎儿非整倍性检测:在前瞻性的、Whole-genome fetal aneuploidy detection by sequencing maternal plasma DNA: a prospective, 盲的多中心研究中的诊断的准确性Diagnostic accuracy in a blinded multicenter study
用于确定母体测试样品中存在或不存在非整倍性的方法用于前瞻性研究,并且其诊断的准确性如下文所述而示出。前瞻性研究进一步证明本发明方法用于针对跨越基因组的复染色体检测胎儿非整倍性的功效。盲的研究模拟实际孕妇群体,其中胎儿核型是未知的,并且选择具有任何异常核型的所有样品进行测序。将根据本发明方法做出的分类的确定结果与得自侵入性程序的胎儿核型相比较以确定该方法对多种染色体非整倍性的诊断能力。The method for determining the presence or absence of aneuploidy in a maternal test sample is used for prospective studies, and the accuracy of its diagnosis is shown as described below. Prospective studies further demonstrate that the inventive method is used for detecting fetal aneuploidy for multiple chromosomes across genomes. Blind research simulates actual pregnant women populations, in which fetal karyotype is unknown, and selects all samples with any abnormal karyotype for sequencing. The determined results of the classification made according to the inventive method are compared with the fetal karyotype obtained from an invasive procedure to determine the diagnostic capability of the method to multiple chromosome aneuploidies.
本实例的概述Overview of this example
在前瞻性盲的研究中,在60个美国站点从2,882名进行产前诊断程序的女性收集血样(clinicaltrials.gov NCT01122524)。In a prospective, blinded study, blood samples were collected from 2,882 women undergoing prenatal diagnostic procedures at 60 US sites (clinicaltrials.gov NCT01122524).
独立的生物统计学家选择具有任何异常核型的所有单胎妊娠和相等数目的随机选择的具有整倍体核型的妊娠。根据本发明的方法对各样品进行染色体分类并且与胎儿核型相比较。An independent biostatistician selected all singleton pregnancies with any abnormal karyotype and an equal number of randomly selected pregnancies with euploid karyotypes. Each sample was chromosomally classified according to the method of the present invention and compared with the fetal karyotype.
在532个样品的分析同龄组内,89/89个三体性21的病例(灵敏度100% (95%CI95.9-100))、35/36个三体性18的病例(灵敏度97.2%,(95%CI 85.5 -99.9))、11/14个三体性13的病例(灵敏度78.6%,(95%CI 49.2-99.9))、 232/233的女性(灵敏度99.6%,(95%CI 97.6->99.9))、184/184的男性(灵敏度100%,(95%CI 98.0-100))以及15/16个单体性X的病例(灵敏度93.8%, (95%CI 69.8-99.8))被分类。在未受影响的受试者中,不存在常染色体非整倍性假阳性(100%特异性,(95%CI>98.5-100))。另外,具有三体性21(3/3)、三体性18(1/1)以及单体性X(2/7)嵌合现象的胎儿、三例易位三体性、两例其他常染色体三体性(20和16)以及其他性染色体非整倍性(XXX、XXY 以及XYY)被正确地分类。Within the analysis cohort of 532 samples, 89/89 cases with trisomy 21 (sensitivity 100% (95% CI 95.9-100)), 35/36 cases with trisomy 18 (sensitivity 97.2%, (95% CI 85.5-99.9)), 11/14 cases with trisomy 13 (sensitivity 78.6%, (95% CI 49.2-99.9)), 232/233 females (sensitivity 99.6%, (95% CI 97.6->99.9)), 184/184 males (sensitivity 100%, (95% CI 98.0-100)), and 15/16 cases with monosomy X (sensitivity 93.8%, (95% CI 69.8-99.8)) were classified. In unaffected subjects, there were no false positives for autosomal aneuploidies (100% specificity, (95% CI > 98.5-100)). In addition, fetuses with mosaicism for trisomy 21 (3/3), trisomy 18 (1/1), and monosomy X (2/7), three translocation trisomies, two other autosomal trisomies (20 and 16), and other sex chromosome aneuploidies (XXX, XXY, and XYY) were correctly classified.
这些结果进一步证明本方法使用母体血浆DNA来检测跨越基因组的复染色体的胎儿非整倍性的功效。用于三体性21、18、13和单体性X检测的高灵敏度和特异性表明本方法可以结合在现存非整倍性筛查算法中以减少不必要的侵入性程序。These results further demonstrate the efficacy of this method for detecting fetal aneuploidy across multiple chromosomes using maternal plasma DNA. The high sensitivity and specificity for detecting trisomies 21, 18, 13, and monosomy X suggest that this method could be incorporated into existing aneuploidy screening algorithms to reduce unnecessary invasive procedures.
物质和方法Materials and methods
进行MELISSA(母体血液是准确地诊断胎儿非整倍性的源头)研究作为前瞻性多中心观察研究,以盲的巢式病例:对照分析。征募经历侵入性产前程序以确定胎儿核型的18岁和18岁以上的孕妇(Clinicaltrials.gov NCT01122524)。合格准则包括妊娠在8周0天与22周0天之间的孕妇,其满足以下附加准则中的至少一项:年龄≥38岁;阳性筛查测试结果(血清分析值和/或颈半透明度(NT)测量值);存在与胎儿非整倍性风险增高相关的超声波标记物;或先前怀有非整倍体胎儿。从同意参与的所有女性获得书面同意书。The MELISSA (Maternal Blood is a Source of Accurate Diagnosis of Fetal Aneuploidy) study was conducted as a prospective, multicenter, observational study with a blinded, nested case:control analysis. Pregnant women aged 18 years and older who underwent an invasive prenatal procedure to determine the fetal karyotype were recruited (Clinicaltrials.gov NCT01122524). Eligibility criteria included pregnant women between 8 weeks and 22 days of gestation who met at least one of the following additional criteria: age ≥38 years; positive screening test results (serum analysis values and/or nuchal translucency (NT) measurements); presence of ultrasound markers associated with an increased risk of fetal aneuploidy; or previous pregnancy with an aneuploid fetus. Written consent was obtained from all women who agreed to participate.
在25个州的60个地理上分散的医疗中心处根据各机构的伦理审查委员会(IRB)批准的方案进行登记。聘请两个临床研究组织(CRO)(昆泰(Quintiles),德罕,北卡罗来纳州;和安普森(Emphusion),旧金山,加利福尼亚州)来保持研究是盲的并且提供临床资料管理、数据监测、生物统计以及数据分析服务。Enrollment was conducted at 60 geographically dispersed medical centers in 25 states under protocols approved by each institution's institutional review board (IRB). Two clinical research organizations (CROs) (Quintiles, Durham, North Carolina; and Emphusion, San Francisco, California) were hired to maintain the study blind and provide clinical data management, data monitoring, biostatistics, and data analysis services.
在任何侵入性程序之前,将周边静脉血样(17mL)收集在两个酸性枸橼酸盐葡萄糖(ACD)管(必帝)中,去除标识并且用独特研究编号进行标记。位置研究人员将研究编号、数据以及抽血时间输入到安全的电子病例报告表 (eCRF)中。全血样品在温度受控制的容器中从多个站点连夜运输到实验室(维瑞那泰健康公司(Verinata Health,Inc.),加利福尼亚州)。在接收并且进行样品检验后,根据先前所述的方法(参见实例13)制备无细胞血浆并且在2到4 个等分试样中冷冻存储在-80℃下直到测序时为止。记录实验室进行样品接收的日期和时间如果样品是连夜接收、摸起来是凉的并且包含至少7mL血液,那么确定其适合分析。每周将接收时合格的样品报告给CRO并且用于随机采样列表的选择(参见下文和图50)。由站点研究人员将得自女性当前妊娠和胎儿核型的临床资料输入到eCRF中并且由CRO进行验证。Before any invasive procedure, peripheral venous blood samples (17mL) were collected in two acid citrate dextrose (ACD) tubes (Bidi), the identification was removed and the blood was marked with a unique study number. The location researcher entered the study number, data and blood draw time into a secure electronic case report form (eCRF). Whole blood samples were transported to the laboratory (Verinata Health, Inc., California) from multiple sites in temperature-controlled containers overnight. After receiving and performing sample inspection, cell-free plasma was prepared according to the previously described method (see Example 13) and frozen in 2 to 4 aliquots and stored at -80°C until sequencing. The date and time that the sample was received was recorded in the laboratory. If the sample was received overnight, was cool to the touch and contained at least 7mL of blood, it was determined that it was suitable for analysis. Qualified samples were reported to the CRO each week and used for the selection of a random sampling list (see below and Figure 50). Clinical data from the woman's current pregnancy and fetal karyotype were entered into the eCRF by site investigators and validated by the CRO.
样品大小的确定基于指数测试的性能特征(灵敏度和特异性)的目标范围的估计值的精确度。确切地说,确定受影响(T21、T18、T13、男性、女性或单体性X)的病例和未受影响(非T21、非T18、非T13、非男性、非女性或非单体性X)的对照的数目,以便基于正态逼近对应地评估灵敏度和特异性在预先规定的较小误差幅度内(N=(1.96√p(1-p)/误差幅度)2,其中p=灵敏度或特异性的估计值)。假设真正的灵敏度是95%或更大,73到114例之间的样品大小确保灵敏度估计值的精确度将使得95%置信分区(CI)的下界将是90%或更大(误差幅度≤5%)。对于更小的样品大小,计划灵敏度的95%CI的估计误差幅度更大(从6%到13.5%)。为了以更大精确度估计特异性,在采样阶段计划更大的未受影响的对照数(针对病例约4:1比率)。由此确保特异性的估计值的精确度达到至少3%。因此,随着灵敏度和/或特异性增加,置信分区的精确度亦将增加。The determination of sample size is based on the precision of the estimated value of the target range of the performance characteristics (sensitivity and specificity) of the index test. Specifically, the number of cases affected (T21, T18, T13, male, female or monosomy X) and the control of unaffected (non-T21, non-T18, non-T13, non-male, non-female or non-monosomy X) is determined so that sensitivity and specificity are assessed accordingly based on the normal approximation within a smaller margin of error specified in advance (N=(1.96√p(1-p)/margin of error) 2 , wherein p=estimate of sensitivity or specificity). Assuming that the true sensitivity is 95% or greater, the sample size between 73 and 114 ensures that the precision of the sensitivity estimate will make the lower bound of the 95% confidence interval (CI) be 90% or greater (margin of error≤5%). For smaller sample sizes, the estimated margin of error of the 95% CI of the planned sensitivity is larger (from 6% to 13.5%). In order to estimate specificity with greater accuracy, a larger number of unaffected controls (approximately a 4:1 ratio for cases) was planned during the sampling phase. This ensured that the accuracy of the estimate of specificity reached at least 3%. Therefore, as sensitivity and/or specificity increased, the accuracy of the confidence partitions also increased.
基于样品大小确定,CRO设计随机抽样方案以产生所选择的样品的列表以便测序(最少110个受T21、T18或T13影响的病例和400个就三体性而言未受影响的,从而允许这些病例中多达一半具有除46,XX或46,XY以外的核型)。适合选择具有单胎妊娠和合格血样的受试者。排除具有不合格样品、无核型记录或多胎妊娠的受试者(图50)。在整个研究中定期产生列表并且送到维瑞那泰健康实验室。Determine based on sample size, CRO design random sampling scheme is to produce the list of selected sample so that order-checking (minimum 110 cases affected by T21, T18 or T13 and 400 unaffected with regard to trisomy, thereby allow in these cases up to half to have except 46,XX or 46, the karyotype beyond XY).Be suitable for selecting the experimenter with singleton pregnancy and qualified blood sample.Get rid of the experimenter (Figure 50) with unqualified sample, no karyotype record or multiple pregnancy.Regularly produce list and send to Wei Rui that Thai health laboratory in whole research.
针对六种独立类别对各合格血样进行分析。这些类别是针对染色体21、 18以及13的非整倍性状态,以及男性、女性以及单体性X的性别状态。虽然仍旧为盲,但针对各血浆DNA样品的六种独立的类别中的每一者前瞻性地产生三种分类(受影响的、未受影响的或未被分类的)之一。使用该方案时,同一样品可能在一个分析中被分类为受影响的(例如针对染色体21的非整倍性) 而在另一个分析中被分类为未受影响的(例如针对染色体18的整倍体)。For six kinds of independent classifications, each qualified blood sample is analyzed.These classifications are for chromosome 21, 18 and 13 aneuploidy states, and the sex status of male, female and monosomy X.Although still being blind, for each of six kinds of independent classifications of each plasma DNA sample, one of three classifications (affected, unaffected or not classified) is produced prospectively.When using this scheme, same sample may be classified as affected (such as for the aneuploidy of chromosome 21) in one analysis and be classified as unaffected (such as for the euploidy of chromosome 18) in another analysis.
通过绒膜绒毛采样(CVS)或羊膜穿刺获得的细胞的常规中期细胞遗传学分析在该研究中用作参照标准。在参与站点通常使用的诊断实验室中进行胎儿核型确定。如果在登记后患者经历了CVS和羊膜穿刺,那么将羊膜穿刺产生的核型用于研究分析。如果无法获得中期核型,那么允许靶向染色体21、18、 13、X以及Y的荧光原位杂交(FISH)结果(表24)。所有异常核型报告(即除46,XX和46,XY以外)都由经过委员会认证的细胞遗传学家审查,并且相对于染色体21、18以及13和性别状态XX、XY以及单体性X分类为受影响的或未受影响的。The conventional metaphase cytogenetic analysis of the cells obtained by chorionic villus sampling (CVS) or amniocentesis is used as a reference standard in this study. In the diagnostic laboratory commonly used by participating sites, fetal karyotype is carried out to determine. If the patient has experienced CVS and amniocentesis after registration, the karyotype produced by amniocentesis is used for research analysis. If metaphase karyotype cannot be obtained, the fluorescence in situ hybridization (FISH) results (table 24) of targeted chromosomes 21, 18, 13, X and Y are allowed. All abnormal karyotype reports (i.e., except 46, XX and 46, XY) are reviewed by a cytogeneticist through committee certification, and are classified as affected or unaffected relative to chromosome 21, 18 and 13 and gender status XX, XY and monosomy X.
预先规定的规约约定规定以下异常核型将由细胞遗传学家指定为核型的‘被检查过的’状态:三倍性、四倍性、除三体性以外所涉及的染色体21、18或 13的复杂核型(例如嵌合性)、具有混合的性染色体的嵌合性、性染色体非整倍性或不能完全由源文档翻译的核型(例如未知来源的标记物染色体)。因为细胞遗传诊断不为测序实验室所知,所以所有经过细胞遗传学检查的样品都被独立地分析并且指定为根据本发明方法使用测序信息确定的分类(测序分类),但不包括在统计分析中。检查过的状态只属于六种分析中的相关一种或多种 (例如将从染色体18分析中检查嵌合性T18,但被其他分析,如染色体21、13、 X以及Y,认为‘未受影响的’)(表25)。没有从分析中检查出在规约设计时无法完全预见的其他异常并且稀有的复杂核型(表26)。The pre-defined protocol stipulates that the following abnormal karyotypes will be designated by the cytogeneticist as the 'checked' state of the karyotype: triploidy, tetraploidy, complex karyotypes (e.g., mosaicism) of chromosomes 21, 18, or 13 involved in addition to trisomy, mosaicism with mixed sex chromosomes, sex chromosome aneuploidy, or a karyotype that cannot be fully translated from the source document (e.g., marker chromosomes of unknown origin). Because cytogenetic diagnosis is unknown to sequencing laboratories, all samples that have undergone cytogenetic examination are independently analyzed and designated as classifications determined using sequencing information according to the method of the present invention (sequencing classifications), but are not included in the statistical analysis. The checked state only belongs to one or more of the six analyses (e.g., mosaicism T18 will be checked from the chromosome 18 analysis, but will be considered 'unaffected' by other analyses, such as chromosomes 21, 13, X, and Y) (Table 25). Other abnormal and rare complex karyotypes that could not be fully anticipated when the protocol was designed were not checked out from the analysis (Table 26).
eCRF和临床资料库中所含的数据仅限于特许用户(研究站点、CRO以及签约临床人员)。维瑞那泰健康的任何雇员都不可存取直到揭晓时为止。The data contained in the eCRF and clinical repository are restricted to authorized users (research sites, CROs, and contracted clinical staff). No employee of Verena Health may access them until the time of disclosure.
在从CRO处接收随机样品列表后,如实例13中所述从经过解冻的所选择的血浆样品中提取总无细胞DNA(母体和胎儿的混合物)。利用伊鲁米纳 TruSeq试剂盒v2.5制备测序文库。进行测序,在维瑞那泰健康实验室在伊鲁米纳HiSeq 2000仪器上进行(6丛,即6个样品/泳道)。获得36个碱基对的单端读数。在整个基因组上映射读数,并且对各感兴趣的染色体上的序列标签进行计数并且用于如上文所述针对独立的类别对样品进行分类。After receiving the random sample list from CRO, total cell-free DNA (a mixture of maternal and fetal) was extracted from the selected plasma sample through thawing as described in Example 13. Sequencing library was prepared using Illumina TruSeq test kit v2.5. Sequencing was performed on an Illumina HiSeq 2000 instrument at Verona Health Laboratory (6 plexes, i.e., 6 samples/lane). Single-end readings of 36 base pairs were obtained. Readings were mapped across the entire genome, and the sequence tags on each chromosome of interest were counted and used to classify samples for independent categories as described above.
临床规约需要胎儿DNA存在的证据以报告分类结果。男性或非整倍体的分类被视为胎儿DNA的充分证据。另外,还针对胎儿DNA之存在,使用两种等位基因特异性方法对各样品进行测试。在第一种方法中,使用AmpflSTR Minifiler试剂盒(生命技术(LifeTechnologies),圣地亚哥,加利福尼亚州) 来审查无细胞DNA中的胎儿组分的存在。在ABI3130基因分析仪上按照制造商的规约来进行短串联重复序列(STR)扩增子的电泳。通过比较呈占所有峰值的强度总和的百分比形式的所报告的各峰值的强度,对该试剂盒中的所有九个STR基因座进行分析,并且次峰值的存在用于提供胎儿DNA的证据。在不存在可以识别的微量STR的情况下,用具有15种单核苷酸多态性(SNP)的 SNP小组检查样品的等分试样,其中从基德(Kidd)等人的小组中选择,平均杂合性≥0.4(基德等人,国际法医学(ForensicSci Int)164(1):20-32[2006])。可用于检测和/或量化母体样品中的胎儿DNA的等位基因特异性方法描述于美国专利公布20120010085、20110224087以及20110201507中,这些公布通过引用结合到本文中。Clinical protocols require evidence of the presence of fetal DNA to report classification results. Classification as male or aneuploid is considered sufficient evidence of fetal DNA. In addition, each sample was tested for the presence of fetal DNA using two allele-specific methods. In the first method, the AmpflSTR Minifiler kit (Life Technologies, San Diego, California) was used to screen for the presence of fetal components in cell-free DNA. Electrophoresis of short tandem repeat (STR) amplicons was performed on an ABI3130 genetic analyzer according to the manufacturer's protocol. All nine STR loci in the kit were analyzed by comparing the intensity of each peak reported as a percentage of the sum of the intensities of all peaks, and the presence of secondary peaks was used to provide evidence of fetal DNA. In the absence of identifiable trace STRs, an aliquot of the sample was examined using an SNP panel of 15 single nucleotide polymorphisms (SNPs), selected from the panel of Kidd et al., with an average heterozygosity ≥ 0.4 (Kidd et al., Forensic Sci Int 164(1):20-32 [2006]). Allele-specific methods that can be used to detect and/or quantify fetal DNA in maternal samples are described in U.S. Patent Publications 20120010085, 20110224087, and 20110201507, which are incorporated herein by reference.
归一化的染色体值(NCV)是通过如实例13中所述计算所有常染色体和性染色体的所有可能的分母排列来确定,然而,因为该研究中的测序是在与我们先前用多样品/泳道工作不同的仪器上进行,所以不得不确定新的归一化染色体分母。当前研究中的归一化染色体分母是基于在分析研究样品之前对具有 110个独立的(即并非来自MELISSA合格样品)未受影响的样品(即合格的样品)的训练组进行测序而确定。新的归一化染色体分母是通过计算所有常染色体和性染色体的所有可能的分母排列来确定,从而针对整个基因组的所有染色体将未受影响的训练组的变异最小化(表23)。Normalized chromosome value (NCV) is to determine by all possible denominators of calculating all autosomes and sex chromosomes as described in example 13, yet, because the order-checking in this research is to carry out on the instrument different from our previous work of multi-sample/swimming lane, so have to determine new normalized chromosome denominator.The normalized chromosome denominator in current research is based on before analytical research sample to the training group with 110 independent unaffected samples (i.e. not from MELISSA qualified samples) unaffected samples (i.e. qualified sample) being checked order and determined.New normalized chromosome denominator is to determine by all possible denominators of calculating all autosomes and sex chromosomes, thereby for all chromosomes of whole genome with the variation minimization (table 23) of unaffected training group.
被应用于提供各测试样品的常染色体分类的NCV规则是实例12中所述的,即对于常染色体的非整倍性的分类,NCV>4.0要求将染色体分类为受影响的(即该染色体的非整倍体)并且NCV<2.5则将染色体分类为未受影响的。具有NCV在2.5与4.0之间的常染色体的样品称为“未被分类的”。The NCV rule applied to provide the autosomal classification of each test sample is that described in Example 12, i.e., for the classification of autosomal aneuploidy, NCV>4.0 requires that the chromosome be classified as affected (i.e., aneuploidy of the chromosome) and NCV<2.5 classifies the chromosome as unaffected. Samples with autosomes having NCVs between 2.5 and 4.0 are referred to as "unclassified."
本测试中的性染色体分类通过按顺序应用针对X和Y的NCV来进行,如下:Sex chromosome classification in this test is performed by applying the NCVs for X and Y in sequence as follows:
1.如果NCV X<-4.0并且NCV Y<2.5,那么将样品分类为单体性X。1. If NCV X < -4.0 and NCV Y < 2.5, then classify the sample as monosomy X.
2.如果NCV X>-2.5并且NCV X<2.5并且NCV Y<2.5,那么将样品分类为女性(XX)。2. If NCV X > -2.5 and NCV X < 2.5 and NCV Y < 2.5, then classify the sample as female (XX).
3.如果NCV X>4.0并且NCV Y<2.5,那么将样品分类为XXX。3. If NCV X > 4.0 and NCV Y < 2.5, then classify the sample as XXX.
4.如果NCV X>-2.5并且NCV X<2.5并且NCV Y>33,那么将样品分类为XXY。4. If NCV X > -2.5 and NCV X < 2.5 and NCV Y > 33, then classify the sample as XXY.
5.如果NCV X<-4.0并且NCV Y>4.0,那么将样品分类为男性(XY)。5. If NCV X < -4.0 and NCV Y > 4.0, then classify the sample as male (XY).
6.如果满足条件5,但NCV Y是NCV X预期测量值的约2倍,那么将样品分类为XYY。6. If condition 5 is met, but NCV Y is approximately twice the expected measured value of NCV X, then classify the sample as XYY.
7.如果染色体X和Y的NCV不符合任何以上准则,那么将样品分类为就性别而言未被分类的。7. If the NCVs for chromosomes X and Y do not meet any of the above criteria, the sample is classified as unclassified with respect to sex.
因为实验室对临床信息为盲,所以没有针对任何以下人口统计变量来调节测序结果:母体体重指数、抽烟状态、存在糖尿病、妊娠类型(自发的或辅助的)、先前的妊娠、先前的非整倍性或妊娠年龄。利用既不是母体又不是父体的样品来分类,并且根据本方法的分类不取决于特定基因座或等位基因的测量值。Because the laboratory was blinded to clinical information, sequencing results were not adjusted for any of the following demographic variables: maternal body mass index, smoking status, presence of diabetes, type of pregnancy (spontaneous or assisted), previous pregnancy, previous aneuploidy, or gestational age. Classification was performed using samples that were neither maternal nor paternal, and classification according to this method does not depend on the measured values of specific loci or alleles.
在揭晓和分析之前将测序结果送还独立的签约生物统计学家。研究站点的人员、CRO(包括产生随机采样列表的生物统计学家)以及签约细胞遗传学家对测序结果为盲。Sequencing results were returned to an independent contract biostatistician before unblinding and analysis. Research site personnel, CRO (including the biostatistician who generated the randomization list), and contract cytogeneticist were blinded to the sequencing results.
表23.所有染色体的系统地确定的归一化染色体序列Table 23. Systematically determined normalized chromosome sequences for all chromosomes
统计方法记录在该研究的详细统计分析计划中。针对六种分析类别中的每一种,使用克洛珀-皮尔逊法(Clopper-Pearson method)计算灵敏度和特异性的点估计值以及准确的95%置信分区。对于所进行的所有统计估计程序,去除未检测到胎儿DNA、‘被检查过的’复杂核型(根据规约定义的约定)或通过测序测试‘未被分类的’样品。Statistical methods are documented in the detailed statistical analysis plan for this study. For each of the six analytical categories, point estimates of sensitivity and specificity, along with exact 95% confidence intervals, were calculated using the Clopper-Pearson method. For all statistical estimation procedures performed, samples for which fetal DNA was not detected, complex karyotypes that were 'checked' (according to protocol-defined conventions), or 'unclassified' by sequencing were excluded.
结果result
在2010年6月与2011年8月之间,该研究中登记了2,882名孕妇。合格受试者和所选择的同龄组的特征提供于表24中。登记并且提供血液但随后发现在数据监测期间逾越包含准则并且登记时的实际妊娠年龄超过22周0天的受试者允许保留在研究中(n=22)。这些样品中的三个在所选择的组中。图 50示出了样品在登记与分析之间的流程。存在2,625个适合选择的样品。Between June 2010 and August 2011, 2,882 pregnant women were enrolled in the study. The characteristics of the eligible subjects and the selected cohort are provided in Table 24. Subjects who enrolled and provided blood but were subsequently found to have exceeded the inclusion criteria during data monitoring and whose actual gestational age at enrollment was greater than 22 weeks 0 days were allowed to remain in the study (n=22). Three of these samples were in the selected group. Figure 50 shows the flow of samples between enrollment and analysis. There were 2,625 samples suitable for selection.
表24.患者人口统计学Table 24. Patient Demographics
*在侵入性程序时的GA。 * GA during invasive procedures.
**在具有异常核型的胎儿中超声波异常的穿透率更高 ** The penetration rate of ultrasound abnormalities is higher in fetuses with abnormal karyotypes
缩写:BMI-体重指数;IUGR-胎儿宫内生长迟缓Abbreviations: BMI - body mass index; IUGR - intrauterine growth retardation
根据随机抽样方案,选择具有异常核型的所有合格受试者以及怀有整倍体胎儿的受试者组用于分析(图50B),以便总测序研究群体针对三体性21产生近似地为4:1的未受影响的:受影响的受试者的比率。由该工艺,选择534 名受试者。随后由于样品追踪问题从分析中去除两个样品,其中样品管与数据获取之间整个保管链未通过品质监察(图50)。由此产生由60个研究站点中的 53个贡献的532名受试者以供分析。所选择的同龄组的人口统计与总同龄组相似。According to the random sampling scheme, all qualified subjects with abnormal karyotype and the subject group with euploid fetus are selected for analysis (Figure 50B), so that the total sequencing research population produces an approximately 4:1 ratio of unaffected: affected subjects for trisomy 21. By this process, 534 subjects are selected. Subsequently, two samples are removed from the analysis due to sample tracking problems, wherein the entire chain of custody between the sample tube and the data acquisition does not pass quality supervision (Figure 50). Thus, 532 subjects contributed by 53 of the 60 research sites are generated for analysis. The demographics of the selected peer group are similar to those of the total peer group.
测试性能Test performance
图51A-51C示出了染色体21、18以及13的非整倍性分析的流程图,并且图51D-51F示出了性别分析流程。表27示出了六个分析中每一者的灵敏度、特异性以及置信分区,并且图52、53以及54示出了根据测序后的NCV的图解样品分布。在所有6个分析类别中,由于未检测到胎儿DNA而去除16个样品(3.0%)。在揭晓后,这些样品不存在可辨别的临床特征。各类别的检查过的核型的数目取决于正在分析的状况(充分详述于图52中)。Figure 51A-51C shows the flow chart of the aneuploidy analysis of chromosomes 21, 18 and 13, and Figure 51D-51F shows the gender analysis process. Table 27 shows the sensitivity, specificity and confidence partition of each of the six analyses, and Figures 52, 53 and 54 show the graphic sample distribution according to the NCV after sequencing. In all 6 analysis categories, 16 samples (3.0%) were removed due to the lack of fetal DNA. After disclosure, there were no discernible clinical features in these samples. The number of karyotypes examined in each category depends on the situation being analyzed (fully described in detail in Figure 52).
用于检测分析群体(n=493)中的T21的方法的灵敏度和特异性对应地为100%(95%CI=95.9,100.0)和100%(95%CI=99.1,100.0)(表27和图51A)。该实例包括对以下的正确分类:一种复杂T21核型 47,XX,inv(7)(p22q32),+21;和两种起因于罗伯逊易位(Robertsonian translocations)的易位T21,其中一种就单体性X而言还是嵌合性 (45,X,+21,der(14;21)q10;q10)[4]/46,XY,+21,der(14;21)q10;q10)[17]和46, XY,+21,der(21;21)q10;q10)。The sensitivity and specificity of the method for detecting T21 in the analysis population (n=493) were 100% (95% CI = 95.9, 100.0) and 100% (95% CI = 99.1, 100.0), respectively (Table 27 and Figure 51A). This example includes the correct classification of a complex T21 karyotype, 47,XX,inv(7)(p22q32),+21, and two translocation T21s arising from Robertsonian translocations, one of which was also mosaic for monosomy X (45,X,+21,der(14;21)q10;q10)[4]/46,XY,+21,der(14;21)q10;q10)[17] and 46,XY,+21,der(21;21)q10;q10).
检测分析群体(n=496)中的T18的灵敏度和特异性为97.2%(85.5,99.9) 和100%(99.2,100.0)(表27和图51B)。虽然被从初步分析中检查过(根据规约),但是就T21和T18而言具有嵌合性核型的四个样品都被本发明的方法正确地分类为就非整倍性而言‘受影响的’(表25)。因为它们被正确地检测出来,所以它们在图51A和51B的左侧被指出。所有其余检查过的样品都被正确地分类为就染色体21、18以及三体性13而言未受影响的(表25)。检测分析群体中的T13的灵敏度和特异性为78.6%(49.2,99.9)和100%(99.2,100.0)(图51C)。所检测到的一个T13病例由罗伯逊易位所致 (46,XY,+13,der(13;13)q10;q10)。在染色体21分析中有七个未被分类的样品 (1.4%),在染色体18分析中有五个(1.0%),并且在染色体13分析中有两个 (0.4%)(图51A-51C)。在所有类别中,有三个样品重叠,这些样品兼有被检查过的核型(69,XXX)和未检测到胎儿DNA。将染色体21分析中的一个未被分类的样品正确地识别为染色体13分析中的T13,并且将染色体18分析中的一个未被分类的样品正确地识别为染色体21分析中的T21。The sensitivity and specificity of detecting T18 in the analysis population (n=496) were 97.2% (85.5, 99.9) and 100% (99.2, 100.0) (Table 27 and Figure 51B). Although examined from the preliminary analysis (according to the protocol), the four samples with mosaic karyotypes for T21 and T18 were correctly classified as 'affected' with respect to aneuploidy by the method of the present invention (Table 25). Because they were correctly detected, they are indicated on the left side of Figures 51A and 51B. All other examined samples were correctly classified as unaffected with respect to chromosomes 21, 18 and trisomy 13 (Table 25). The sensitivity and specificity of detecting T13 in the analysis population were 78.6% (49.2, 99.9) and 100% (99.2, 100.0) (Figure 51C). One detected case of T13 was due to a Robertsonian translocation (46,XY,+13,der(13;13)q10;q10). Seven samples (1.4%) were unclassified in the chromosome 21 analysis, five (1.0%) in the chromosome 18 analysis, and two (0.4%) in the chromosome 13 analysis ( Figures 51A-51C ). Three samples overlapped across all categories, with both a karyotype that was examined (69,XXX) and no fetal DNA detected. One unclassified sample in the chromosome 21 analysis was correctly identified as T13 in the chromosome 13 analysis, and one unclassified sample in the chromosome 18 analysis was correctly identified as T21 in the chromosome 21 analysis.
表25.被检查过的核型Table 25. Karyotypes examined
*由于一个细胞系中的标记物染色体而从所有分析类别中排除的受试者。 * Subjects excluded from all analysis categories due to marker chromosomes in one cell line.
**核型48,XXY,+18在染色体18分析中未被分类并且未检测到性染色体非整倍性的受试者。 ** Subjects in whom karyotype 48,XXY,+18 was not classified in chromosome 18 analysis and no sex chromosome aneuploidy was detected.
表26.未被检查过的异常并且复杂的核型Table 26. Abnormal and complex karyotypes that were not examined
*在揭晓后,从染色体6中的测序标签中注意到增加的归一化的染色体值(NCV) 是3.6。 * After disclosure, an increased normalized chromosome value (NCV) of 3.6 was noted from sequencing tags in chromosome 6.
用于确定该方法性能的性染色体分析群体(女性、男性或单体性X)是 433。我们用于对性别状态进行分类的精化算法允许对性染色体非整倍性进行准确的确定,从而获得更高的未被分类的结果数。用于检测二倍体女性状态 (XX)的灵敏度和特异性对应地是99.6%(95%CI=97.6,>99.9)和99.5%(95% CI=97.2,>99.9);用于检测男性(XY)的灵敏度和特异性都是100%(95%CI =98.0,100.0);并且用于检测单体性X(45,X)的灵敏度和特异性是93.8%(95% CI=69.8,99.8)和99.8%(95%CI=98.7,>99.9)。虽然由分析检查过(根据规约),但是嵌合性单体性X核型的测序分类如下(表25):2/7被分类为单体性X,3/7被分类为具有被分类为XY的Y染色体组分,并且具有 XX染色体组分的2/7被分类为女性。根据本发明的方法分类为单体性X的两个样品具有核型47,XXX和46,XX。对于核型47,XXX、47,XXY以及47,XYY,十分之八的性染色体非整倍性被正确地分类(表25)。如果性染色体分类局限于单体性X、XY以及XX,那么将能够把大部分未被分类的样品正确地分类为男性,但将不能够识别XXY和XYY性非整倍性。The sex chromosome analysis population (female, male, or monosomy X) used to determine the performance of this method was 433. Our refined algorithm for classifying sex status allows accurate determination of sex chromosome aneuploidy, resulting in a higher number of unclassified results. The sensitivity and specificity for detecting diploid female status (XX) were 99.6% (95% CI = 97.6, >99.9) and 99.5% (95% CI = 97.2, >99.9), respectively; the sensitivity and specificity for detecting male status (XY) were both 100% (95% CI = 98.0, 100.0); and the sensitivity and specificity for detecting monosomy X (45, X) were 93.8% (95% CI = 69.8, 99.8) and 99.8% (95% CI = 98.7, >99.9), respectively. Although checked (according to convention) by analysis, the order-checking classification of mosaic monosomy X karyotype is as follows (table 25): 2/7 are classified as monosomy X, 3/7 are classified as having the Y chromosome component that is classified as XY, and 2/7 with XX chromosome component are classified as female. Two samples that are classified as monosomy X according to the method for the present invention have karyotype 47, XXX and 46, XX. For karyotype 47, XXX, 47, XXY and 47, XYY, 8 out of 10 sex chromosome aneuploidies are correctly classified (table 25). If sex chromosome classification is confined to monosomy X, XY and XX, most of the unclassified samples can be correctly classified as males, but XXY and XYY sex aneuploidies will not be recognized.
除了对染色体21、18、三体性13以及性别进行准确地分类之外,测序结果还能将两个样品(47,XX,+16和47,XX,+20)中针对染色体16和20的非整倍性正确地分类(表26)。令人感兴趣的是,具有染色体6的长臂(6q)和两个复制(其中之一在大小上是37.5兆碱基)的临床上复杂的变化的一个样品显示染色体6中的测序标签导致NCV增加(NCV=3.6)。在另一个样品中,根据本发明的方法检测到染色体2的非整倍性,但在羊膜穿刺时的胎儿核型中并未观测到(46,XX)。表25和26中所示出的其他复杂核型变异体包括来自具有染色体倒位、缺失、易位、三倍性以及此处未检测的其他异常的胎儿的样品,但可能使用本发明的方法在更高的测序密度下和/或在进一步算法优化下进行分类。在这些情况下,本发明的方法可将样品正确地分类为就三体性21、18 或13而言未受影响的以及男性或女性。In addition to accurately classifying chromosomes 21, 18, trisomy 13, and sex, sequencing results can also correctly classify the aneuploidy for chromosomes 16 and 20 in two samples (47, XX, +16 and 47, XX, +20) (Table 26). Interestingly, a sample with a clinically complex change in the long arm (6q) and two duplications of chromosome 6 (one of which is 37.5 megabases in size) shows that the sequencing tags in chromosome 6 cause an increase in NCV (NCV=3.6). In another sample, the aneuploidy of chromosome 2 was detected according to the method of the present invention, but was not observed in the fetal karyotype at the time of amniocentesis (46, XX). The other complex karyotype variants shown in Tables 25 and 26 include samples from fetuses with chromosome inversion, deletion, translocation, triploidy, and other abnormalities not detected here, but may be classified using the method of the present invention at a higher sequencing density and/or under further algorithm optimization. In these cases, the methods of the invention can correctly classify the sample as unaffected with respect to trisomy 21, 18, or 13, and as male or female.
在本研究中,38/532经过分析的样品来自经历过辅助生殖的女性。其中, 17/38的样品具有染色体异常;在该亚群中未检测到假阳性或假阴性。In this study, 38 of the 532 samples analyzed came from women who had undergone assisted reproduction. Of these, 17 of the 38 samples had chromosomal abnormalities; no false positives or false negatives were detected in this subpopulation.
表27.该方法的灵敏度和特异性Table 27. Sensitivity and specificity of the method
讨论discuss
该由母体血浆确定整染色体胎儿非整倍性的前瞻性研究是设计用来模拟现实世界中样品收集、处理以及分析的情形。在登记站点获得全血样品,不需要立即处理,并且连夜运输到测序实验室。与先前仅涉及染色体21的前瞻性研究(帕洛迈基(Palomaki)等人,遗传医学(Genetics in Medicine)2011:1) 相反,在本研究中,对具有任何异常核型的所有合格样品进行测序和分析。测序实验室事先不知道哪些胎儿染色体可能受影响,也不知道非整倍体与整倍体样品的比率。该研究设计征募高风险研究孕妇群来确保统计上显著的非整倍性流行率,并且表25和26指出了所分析的核型的复杂性。结果证明:i)可在高灵敏度和特异性下检测胎儿非整倍性(包括由易位三体性、嵌合性以及复杂变异所造成的);以及ii)一个染色体中的非整倍性不影响本发明方法用于正确地识别其他染色体的整倍体状态的能力。先前研究中所利用的算法似乎不能有效地确定将不可避免地存在于一般临床群体中的其他非整倍性(埃里希(Erich) 等人,美国妇产科杂志(Am J Obstet Gynecol)2011年3月;204(3):205e1-11;赵等人,英国医学期刊(BMJ)2011;342:c7401)。This prospective study of whole chromosome fetal aneuploidy determined by maternal plasma is designed to simulate the situation of sample collection, processing and analysis in the real world. Whole blood samples are obtained at the registration site, do not need to be processed immediately, and are transported to the sequencing laboratory overnight. In contrast to the previous prospective study involving only chromosome 21 (Palomaki et al., Genetics in Medicine 2011:1), in this study, all qualified samples with any abnormal karyotype are sequenced and analyzed. The sequencing laboratory does not know in advance which fetal chromosomes may be affected, nor does it know the ratio of aneuploid to euploid samples. The study design recruited a high-risk research pregnant group to ensure a statistically significant prevalence of aneuploidy, and Tables 25 and 26 indicate the complexity of the karyotypes analyzed. The results show that: i) fetal aneuploidy (including that caused by translocation trisomy, mosaicism and complex variation) can be detected with high sensitivity and specificity; and ii) aneuploidy in one chromosome does not affect the ability of the method of the present invention to correctly identify the euploid state of other chromosomes. The algorithms utilized in previous studies appear to be ineffective in determining other aneuploidies that will inevitably be present in the general clinical population (Erich et al., Am J Obstet Gynecol 2011 Mar;204(3):205e1-11; Zhao et al., BMJ 2011;342:c7401).
关于嵌合性,本研究中对测序信息的分析能够正确地对4/4的受影响的样品中针对染色体21和18具有嵌合性核型的样品进行分类。这些结果证明用于检测复杂混合物中无细胞DNA的特定特征的分析的灵敏度。在一个病例中,针对染色体2的测序数据指示完整的或部分的染色体非整倍性,而针对染色体 2的羊膜穿刺核型结果是二倍体。在两个其他实例中,一个样品具有47,XXX 核型而另一个样品具有46,XX核型,本发明的方法将这些样品分类为单体性X。有可能这些是嵌合性病例,或者孕妇自身是嵌合性的。(重要的是应记住,测序是对总DNA进行的,该总DNA是母体和胎儿DNA的组合。)虽然由侵入性程序对羊膜细胞或绒毛进行细胞遗传学分析当前是非整倍性分类的参照标准,但对有限数目的细胞进行的核型不能排除低水平嵌合性。当前的临床研究设计不包括长期婴儿随访或在分娩时接触胎盘组织,因此我们不能确定这些是真还是假阳性结果。我们推测,与标准核型确定相比,测序工艺的特异性与根据用于检测整个基因组的本发明方法加以优化的算法组合最终可提供对胎儿 DNA异常的更灵敏的识别,特别是在嵌合性的情况下。About mosaicism, in this study, the analysis of sequencing information can correctly classify samples with mosaic karyotypes for chromosomes 21 and 18 in 4/4 of the affected samples. These results demonstrate the sensitivity of the analysis for detecting the specific characteristics of cell-free DNA in complex mixtures. In one case, the sequencing data for chromosome 2 indicated complete or partial chromosomal aneuploidy, while the amniocentesis karyotype result for chromosome 2 was diploid. In two other examples, one sample had 47,XXX karyotype and the other sample had 46,XX karyotype, and the method of the present invention classified these samples as monosomy X. It is possible that these are mosaic cases, or that the pregnant woman herself is mosaic. (It is important to remember that sequencing is carried out on total DNA, which is a combination of maternal and fetal DNA.) Although it is currently the reference standard for aneuploidy classification to carry out cytogenetic analysis on amniotic cells or chorionic villi by invasive procedures, the karyotype carried out on a limited number of cells cannot rule out low-level mosaicism. The current clinical study design does not include long-term infant follow-up or access to placental tissue at delivery, so we cannot determine whether these are true or false-positive results. We speculate that the specificity of the sequencing process, combined with algorithms optimized according to the present method for whole-genome testing, will ultimately provide more sensitive identification of fetal DNA abnormalities, particularly in cases of mosaicism, compared with standard karyotyping.
国际产前诊断学会已发表了对大规模平行测序(MPS)用于唐氏综合征 (Downsyndrome)产前检测的可供商业使用性进行评论的快速反应声明(本 (Benn)等人,产前诊断(Prenat Diagn)2012doi:10.1002/pd.2919)。他们声明,在引入针对胎儿唐氏综合征的基于常规大规模平行测序的群体筛查之前,需要在一些亚群中进行测试的证据,如在通过体外受精怀孕的女性中。此处报告的结果表明,本方法在该孕妇群组中是准确的,其中多人存在较高非整倍性风险。The International Society for Prenatal Diagnosis has published a rapid response statement (Benn et al., Prenat Diagn 2012 doi: 10.1002/pd.2919) commenting on the commercial availability of massively parallel sequencing (MPS) for prenatal testing of Down syndrome. They state that before introducing population screening based on conventional massively parallel sequencing for fetal Down syndrome, evidence of testing in some subpopulations, such as in women conceived through in vitro fertilization, is needed. The results reported here show that this method is accurate in this group of pregnant women, many of whom have a higher risk of aneuploidy.
虽然这些结果证明了利用经过优化的算法的本方法用于对来自非整倍性风险较高的女性的单胎妊娠中的整个基因组进行非整倍性检测时的优良性能,但是当流行率较低并且是多胎妊娠时,特别是在低风险群体中,需要更多经验来建立对该方法的诊断能力的可信程度。在临床实施的早期阶段,应在阳性妊娠第一或第二个三月期筛查结果之后根据本方法使用测序信息对染色体21、18 以及13进行分类。由此将减少由假阳性筛查结果所致的不必要的侵入性程序,同时伴随着与不良事件相关的程序的减少。侵入性程序可能局限于证实由测序得到的阳性结果。然而,存在孕妇想要避免侵入性程序的临床情形(例如母体高龄和不育症);他们可能要求该测试作为初步筛查和/或侵入性程序的替代方案。所有的患者都应该接受充分的测试前咨询以确保他们了解测试的限制和结果的含义。随着利用更多样品进行经验积累,该测试有可能将替代当前的筛查实验计划并且变成初步筛查,并且最终变成胎儿非整倍性的非侵入性诊断性试验。Although these results demonstrate the excellent performance of this method using an optimized algorithm for aneuploidy detection of the entire genome in singleton pregnancies from women at high risk of aneuploidy, more experience is needed to establish confidence in the diagnostic ability of this method when the prevalence is low and the pregnancy is multiple, especially in low-risk groups. In the early stages of clinical implementation, chromosomes 21, 18, and 13 should be classified according to this method using sequencing information after a positive first or second trimester screening result. This will reduce unnecessary invasive procedures caused by false-positive screening results, accompanied by a reduction in procedures associated with adverse events. Invasive procedures may be limited to confirming positive results obtained by sequencing. However, there are clinical situations (such as advanced maternal age and infertility) where pregnant women want to avoid invasive procedures; they may request this test as an alternative to preliminary screening and/or invasive procedures. All patients should receive adequate pre-test counseling to ensure they understand the limitations of the test and the meaning of the results. As experience accumulates with larger samples, this test has the potential to replace current screening protocols and become the primary screening and, ultimately, the noninvasive diagnostic test for fetal aneuploidy.
实例17Example 17
由NCV确定胎儿分数以辨别分析样品中存在完整的或部分的胎儿染色体非整倍性Fetal fraction is determined by NCV to identify the presence of complete or partial fetal chromosomal aneuploidy in the analyzed sample
假设母体样品中相关胎儿染色体的染色体剂量与增加的胎儿分数成比例地增加,人们预期对于完整的感兴趣的染色体,基于NCV值的ff值将决定存在或不存在完整的胎儿染色体非整倍性。为了证明由NCV确定的ff可用于辨别完整的染色体非整倍性与部分的染色体非整倍性的存在或嵌合性样本的贡献,使用来自母亲和她们子女的基因组DNA来建立模拟在孕妇循环中发现的胎儿与母体cfDNA的混合物的人工样品。胎儿分数的基于NCV的值是上述假设胎儿分数的一种形式。Assuming that the chromosomal dosage of the relevant fetal chromosome in the maternal sample increases proportionally with the increased fetal fraction, it is expected that for the complete chromosome of interest, the ff value based on the NCV value will determine the presence or absence of complete fetal chromosomal aneuploidy. In order to demonstrate that the ff determined by NCV can be used to distinguish the presence of complete chromosomal aneuploidy from partial chromosomal aneuploidy or the contribution of mosaic samples, genomic DNA from mothers and their children was used to create an artificial sample that simulates the mixture of fetal and maternal cfDNA found in the circulation of pregnant women. The NCV-based value of the fetal fraction is a form of the above-mentioned hypothetical fetal fraction.
母亲和子女的DNA购自科里尔医学研究学会(Coriell Institute for MedicalResearch)(卡姆登,新泽西州)。DNA识别和样品核型提供在表27中。DNA from mothers and children was purchased from the Coriell Institute for Medical Research (Camden, NJ). DNA identification and sample karyotypes are provided in Table 27.
表27.实例17Table 27. Example 17
如下对包含完整的染色体或部分的染色体非整倍性的样品进行分析。Samples containing complete chromosome or partial chromosome aneuploidies were analyzed as follows.
在所有情况下,通过声处理剪切来自母亲的基因组DNA和来自子女的基因组DNA,其中峰值为200bp。对包含母亲DNA外加0%、5%或10%w/w 子女DNA的人工样品进行处理以制备测序文库,如实例12中所述以大规模平行方式使用合成法测序对其进行测序。各人工DNA样品在测序器上使用独立的流动池测序四次,以提供包含0%、5%以及10%子女DNA的各样品的4个序列信息集。将36bp读数与人类参照序列基因组hg19进行比对,并且对独特地映射的标签进行计数。对于每个样品使用的4个流动池泳道中的每一者,获得约125X 106个序列标签。In all cases, genomic DNA from the mother and genomic DNA from the child were sheared by sonication, with a peak of 200 bp. Artificial samples containing maternal DNA plus 0%, 5% or 10% w/w child DNA were processed to prepare sequencing libraries, which were sequenced using synthesis sequencing in a massively parallel manner as described in Example 12. Each artificial DNA sample was sequenced four times using an independent flow cell on a sequencer to provide four sequence information sets for each sample containing 0%, 5% and 10% child DNA. The 36 bp reads were aligned with the human reference sequence genome hg19, and uniquely mapped tags were counted. For each of the four flow cell lanes used for each sample, approximately 125 x 10 6 sequence tags were obtained.
在包含20个男性和20个女性gDNA文库的合格样品组中识别归一化染色体(单一或染色体群组),如本文其他部分所述。针对染色体21的归一化染色体被识别为染色体4+染色体16+染色体22;针对染色体7的归一化染色体被识别为染色体4+染色体6+染色体8+染色体12+染色体19+染色体 20;针对染色体15的归一化染色体被识别为染色体9+染色体12+染色体 14+染色体19+染色体20;针对染色体22的归一化染色体被识别为染色体 19;并且针对染色体X的归一化染色体被识别为染色体4+染色体6+染色体7+染色体8。对由对人工样品进行测序而获得的感兴趣的染色体和相应的归一化染色体(单一染色体或染色体群组)的序列标签进行计数,并且用于计算染色体剂量并且计算NCV。Normalizing chromosomes (single or chromosome groups) were identified in a qualified sample group comprising 20 male and 20 female gDNA libraries, as described elsewhere herein. The normalizing chromosome for chromosome 21 was identified as chromosome 4+chromosome 16+chromosome 22; the normalizing chromosome for chromosome 7 was identified as chromosome 4+chromosome 6+chromosome 8+chromosome 12+chromosome 19+chromosome 20; the normalizing chromosome for chromosome 15 was identified as chromosome 9+chromosome 12+chromosome 14+chromosome 19+chromosome 20; the normalizing chromosome for chromosome 22 was identified as chromosome 19; and the normalizing chromosome for chromosome X was identified as chromosome 4+chromosome 6+chromosome 7+chromosome 8. The sequence tags of the chromosome of interest and the corresponding normalizing chromosome (single chromosome or chromosome group) obtained by sequencing the artificial sample were counted and used to calculate chromosome dose and calculate NCV.
在本实例中,使用针对样品混合物(1)中的染色体21的NCV确定ff,其中NCV21A是针对测试样品(1)中的染色体21所确定的NCV值,该测试样品包含三倍体染色体21,而CV21U是在合格样品(包含二倍体染色体21)中所确定的染色体21的剂量的变异系数;并且其中NCVXA是针对测试样品(1)中的染色体X所确定的NCV值,该测试样品包含三倍体染色体21,而CVXU是在合格样品(包含不受影响的女性胎儿染色体)中所确定的染色体X的剂量的变异系数。In this example, ff is determined using the NCV for chromosome 21 in the sample mixture (1), where NCV 21A is the NCV value determined for chromosome 21 in the test sample (1), which contains triploid chromosome 21, and CV 21U is the coefficient of variation of the dose of chromosome 21 determined in qualified samples (containing diploid chromosome 21); and where NCV XA is the NCV value determined for chromosome X in the test sample (1), which contains triploid chromosome 21, and CV XU is the coefficient of variation of the dose of chromosome X determined in qualified samples (containing unaffected female fetal chromosomes).
图56示出了在合成的母体样品(1)中使用染色体21的剂量(ff21)确定的百分比“ff”随着使用染色体X的剂量(ffX)确定的百分比“ff”变化的图,该样品包含来自具有三体性21的子女的DNA。56 shows a graph of the percentage "ff" determined using the dose of chromosome 21 (ff 21 ) as a function of the percentage "ff" determined using the dose of chromosome X (ff X ) in a synthetic maternal sample ( 1 ) comprising DNA from a child with trisomy 21. FIG.
数据显示,染色体剂量和源自于其的NCV随着ff增加而成比例地增加,并且在使用三倍体染色体(即染色体21)的剂量确定的百分比ff与使用已知作为单一染色体存在的染色体(即染色体X)的剂量确定的百分比ff之间存在 1:1关系。The data show that chromosome dose and the NCV derived therefrom increase proportionally with increasing ff, and that there is a 1:1 relationship between the percent ff determined using the dose of a triploid chromosome (i.e., chromosome 21) and the percent ff determined using the dose of a chromosome known to exist as a single chromosome (i.e., chromosome X).
图57示出了在合成的母体样品(2)中使用染色体7的剂量(ff7)确定的百分比“ff”随着使用染色体X的剂量(ffX)确定的百分比“ff”变化的图,该样品包含来自一名整倍体母亲和她的子女的DNA,其该子女在染色体7中携带部分缺失。Figure 57 shows a graph of the percentage "ff" determined using the dose of chromosome 7 ( ff7 ) as a function of the percentage "ff" determined using the dose of chromosome X ( ffX ) in a synthetic maternal sample (2) comprising DNA from a euploid mother and her child carrying a partial deletion in chromosome 7.
如针对样品(1)和(2)所显示,数据显示染色体剂量和源自于其的NCV 随着ff增加而成比例地增加。然而,在非整倍性是部分的染色体非整倍性的情况下,使用部分非整倍体染色体的染色体剂量(ff7)确定的百分比ff不与使用染色体X的剂量(ffX)确定的百分比ff相对应。因此,偏离完整的三体性样品所示出的1:1关系表明存在部分非整倍性。As shown for samples (1) and (2), the data show that the chromosome dose and the NCV derived therefrom increase proportionally with increasing ff. However, in cases where the aneuploidy is a partial chromosomal aneuploidy, the percentage ff determined using the chromosome dose of the partially aneuploid chromosome (ff 7 ) does not correspond to the percentage ff determined using the dose of chromosome X (ff X ). Therefore, deviations from the 1:1 relationship shown for the complete trisomy samples indicate the presence of partial aneuploidy.
图58示出了在合成的母体样品(3)中使用染色体15的剂量(ff15)确定的百分比“ff”随着使用染色体X的剂量(ffX)确定的百分比“ff”变化的图,该样品包含来自一名整倍体母亲和她的子女的DNA,该子女是具有染色体15的部分复制的25%嵌合型。Figure 58 shows a graph of the percentage "ff" determined using the dose of chromosome 15 ( ff15 ) as a function of the percentage "ff" determined using the dose of chromosome X ( ffX ) in a synthetic maternal sample (3) comprising DNA from a euploid mother and her child who is 25% mosaic for a partial duplication of chromosome 15.
如针对样品(1)和(2)所显示,使用剂量确定的ff和源自于其的NCV 随着ff增加而成比例地增加。如样品(2)中所显示,样品(3)包含部分的染色体非整倍性,并且使用部分非整倍体染色体的染色体剂量(ff15)确定的百分比ff不与使用针对染色体X的剂量(ffX)确定的百分比ff相对应。两个ff 之间缺乏对应性表明存在部分的非整倍性而不是完整的染色体非整倍性。As shown for samples (1) and (2), the ff determined using the dose and the NCV derived therefrom increase proportionally with increasing ff. As shown in sample (2), sample (3) contains a partial chromosomal aneuploidy, and the percentage ff determined using the chromosome dose for the partially aneuploid chromosome (ff 15 ) does not correspond to the percentage ff determined using the dose for chromosome X (ff X ). The lack of correspondence between the two ff indicates the presence of a partial aneuploidy rather than a complete chromosomal aneuploidy.
图59示出了人工样品(4)中使用染色体22的剂量(ff22)确定的百分比“ff”和源自于其的NCV的图,该样品包含0%子女DNA(i);和10%来自未受影响的双胞胎儿子的DNA(ii),已知该儿子不具有染色体22的部分的染色体非整倍性;以及10%来自受影响的双胞胎儿子的DNA(iii),已知该儿子具有染色体 22的部分的染色体非整倍性。数据显示,针对包含来自未受影响的双胞胎的DNA的样品并且由根据染色体22的剂量计算的四个NCV确定的“ff”接近于零,这表明在未受影响的子女中不存在染色体22的非整倍性;并且当根据染色体 X的剂量计算时,未受影响的双胞胎的“ff”证实未受影响的双胞胎样品的“ff”为约10%。数据还显示,针对包含来自受影响的双胞胎的DNA的样品并且由根据染色体22的剂量(ff22)计算的四个NCV确定的“ff”为约3%,这表明在染色体22中存在非整倍性;而当根据染色体X的剂量(ffX)计算时,“ff”证实未受影响的双胞胎样品的“ff”为约10%。ff22与ffX之间缺乏对应性表明在受影响的双胞胎中染色体22的非整倍性是部分的染色体非整倍性。Figure 59 shows a plot of the percentage "ff" determined using the dose of chromosome 22 ( ff22 ) and the NCVs derived therefrom in an artificial sample (4), the sample comprising 0% child DNA (i); and 10% DNA from the unaffected twin son (ii), who is known not to have a chromosomal aneuploidy of part of chromosome 22; and 10% DNA from the affected twin son (iii), who is known to have a chromosomal aneuploidy of part of chromosome 22. The data show that the "ff" determined for the sample comprising DNA from the unaffected twin and from the four NCVs calculated from the dose of chromosome 22 is close to zero, indicating that aneuploidy of chromosome 22 is not present in the unaffected child; and that the "ff" of the unaffected twin confirms that the "ff" of the unaffected twin sample is approximately 10% when calculated from the dose of chromosome X. The data also showed that the "ff" determined for the sample containing DNA from the affected twin and calculated from the four NCVs based on the dosage of chromosome 22 ( ff22 ) was approximately 3%, indicating the presence of aneuploidy in chromosome 22; whereas when calculated based on the dosage of chromosome X ( ffX ), the "ff" confirmed that the "ff" of the unaffected twin sample was approximately 10%. The lack of correspondence between ff22 and ffX suggests that the aneuploidy of chromosome 22 in the affected twin is a partial chromosomal aneuploidy.
因此,数据显示,在包含男性胎儿的cfDNA的母体样品中,染色体剂量和源自于其的NCV值可用于区分存在完整三体性与嵌合性样本中所存在的部分的非整倍性和/或完整的或部分的非整倍性。部分的非整倍性可以是染色体一部分的增加或减少。任选地,可以如实例12中所述通过使用染色体剂量和估计的胎儿分数来获得部分的非整倍性和/或嵌合性的拆分。Therefore, the data show that in maternal samples comprising cfDNA of male fetuses, chromosome doses and NCV values derived therefrom can be used to distinguish between partial aneuploidy and/or complete or partial aneuploidy present in complete trisomy and mosaic samples. Partial aneuploidy can be an increase or decrease in a portion of a chromosome. Optionally, the splitting of partial aneuploidy and/or mosaicism can be obtained by using chromosome doses and estimated fetal fractions as described in Example 12.
上述胎儿分数法还可以用于确定多胎妊娠中一个或多个胎儿具有非整倍性的可能性。例如,在一个异卵双胞胎的病例中,发现根据NCVX值确定的胎儿分数是8.3%,而由NCV21值测得的分数是5.0%。由此表明该一对男性胎儿中只有一个具有T21非整倍性,并且通过核型结果证实了该结果。在另一个具有母体双胞胎的实例中,根据X染色体确定的胎儿分数是7.3%,而由染色体 18确定的胎儿分数是8.9%。在该实例中,根据核型确定两个双胞胎都是T18 男性。The fetal fraction method described above can also be used to determine the likelihood that one or more fetuses in a multiple pregnancy have aneuploidy. For example, in a case of fraternal twins, the fetal fraction determined by the NCV X value was 8.3%, while the fraction determined by the NCV 21 value was 5.0%. This indicated that only one of the male fetuses in the pair had T21 aneuploidy, a finding confirmed by karyotype. In another example involving maternal twins, the fetal fraction determined by chromosome X was 7.3%, while the fetal fraction determined by chromosome 18 was 8.9%. In this case, both twins were confirmed to be T18 males based on karyotype.
实例18Example 18
由NCV确定胎儿分数以识别临床样品中完整的胎儿染色体非整倍性的存在Determination of fetal fraction by NCV to identify the presence of complete fetal chromosomal aneuploidy in clinical samples
为了证明根据NCV确定的ff(CNff)可用于区分临床样品中完整的染色体非整倍性与部分的染色体非整倍性的存在,使用获自孕妇血液的cfDNA对临床样品中的感兴趣的染色体21、13以及18进行量化。通过核型验证三体性的存在。To demonstrate that ff determined from NCV (CNff) can be used to distinguish the presence of complete from partial chromosomal aneuploidies in clinical samples, cfDNA obtained from maternal blood was used to quantify chromosomes of interest 21, 13, and 18 in clinical samples. The presence of trisomy was confirmed by karyotyping.
从以下样品中获得cfDNA:各怀有一个具有三体性21(T21)的男性胎儿的孕妇46个母体样品;各怀有一个具有三体性18(T18)的胎儿的孕妇的 13个母体样品;以及怀有一个具有三体性13(T13)的男性胎儿的孕妇的3个母体样品。这些临床样品为来自实例16中所述的临床研究的样品。分离cfDNA,并且如实例16中所述,但使用新的伊鲁米纳v3化学物质来制备测序文库。cfDNA was obtained from the following samples: 46 maternal samples from pregnant women each carrying a male fetus with trisomy 21 (T21); 13 maternal samples from pregnant women each carrying a fetus with trisomy 18 (T18); and 3 maternal samples from pregnant women carrying a male fetus with trisomy 13 (T13). These clinical samples were from the clinical study described in Example 16. cfDNA was isolated and sequencing libraries were prepared as described in Example 16, but using the new ILlumina v3 chemistry.
也使用新的伊鲁米纳v3化学物质对由得自已知针对染色体21、18以及 13未受影响的合格样品的cfDNA制得的测序文库进行测序。将针对合格样品获得的序列读数映射到人类参照序列基因组hg19,并且对独特地映射对应于人类参照序列基因组hg19的所有染色体序列(未屏蔽重复序列)的序列读数进行计数,并且用于系统地确定在测试样品中哪个染色体或哪组染色体将充当各个感兴趣的染色体21、18以及13的归一化染色体。The novel ILlumina v3 chemistry was also used to sequence sequencing libraries prepared from cfDNA obtained from qualified samples known to be unaffected for chromosomes 21, 18, and 13. Sequence reads obtained for the qualified samples were mapped to the human reference sequence genome hg19, and sequence reads that uniquely mapped to all chromosomal sequences corresponding to the human reference sequence genome hg19 (unmasked repeat sequences) were counted and used to systematically determine which chromosome or group of chromosomes in the test sample would serve as the normalizing chromosome for each of chromosomes of interest, 21, 18, and 13.
以下表28示出了所识别的用于确定各测试样品中针对染色体1-22、X以及Y的染色体剂量(比率)的归一化染色体(分母染色体)。Table 28 below shows the normalizing chromosomes (denominator chromosomes) identified for determining chromosome doses (ratios) for chromosomes 1-22, X, and Y in each test sample.
表28.实例18-系统地识别的供用于T21、T18以及T13测试样品的归一化染色体Table 28. Example 18 - Systematically identified normalizing chromosomes for T21, T18, and T13 test samples
已识别了合格样品中的归一化染色体时,对测试样品进行测序,并且对映射到测试样品中的各染色体21、18、13以及相应的归一化染色体的序列标签进行计数,并且用于计算染色体剂量(比率)。然后,如先前所述根据以下等式计算NCV值:When the normalizing chromosome in the qualified sample has been identified, the test sample is sequenced, and the sequence tags of each chromosome 21, 18, 13 mapped to the test sample and the corresponding normalizing chromosome are counted and used to calculate the chromosome dose (ratio). Then, the NCV value is calculated according to the following equation as previously described:
对于各测试样品,根据本说明书中其他部分所述的以下等式确定针对染色体x和感兴趣的染色体的胎儿分数:For each test sample, the fetal fraction for chromosome x and the chromosome of interest was determined according to the following equations described elsewhere in this specification:
ff=2×|NCViACViU| 等式28。ff = 2 × |NCV iA CV iU | Equation 28.
图60示出了在包含胎儿T三体性21的样品中所确定的CNffx对比CNff21 的图。如针对完整的染色体非整倍性所预料,CNffx与使用染色体21的NCV 所确定的(CNff21)相匹配。Figure 60 shows a graph of CNffx versus CNff21 determined in a sample containing fetal T trisomy 21. As expected for a complete chromosomal aneuploidy, CNffx matches that determined using the NCV for chromosome 21 (CNff21).
类似地,在T18测试样品中,CNffx与使用染色体18的NCV所确定的 (CNff18)相匹配(图61),并且在T13测试样品中,CNffx与使用染色体13 的NCV所确定的(CNff13)相匹配(图62)。Similarly, in the T18 test sample, CNffx matched that determined using the NCV of chromosome 18 (CNff18) (Figure 61), and in the T13 test sample, CNffx matched that determined using the NCV of chromosome 13 (CNff13) (Figure 62).
图60还示出了针对女性胎儿受T21影响的样品所获得的胎儿分数。正如所预期的,这些“女性”样品中的CNff21无法通过与染色体X相比较来验证。为了验证女性样品的CNff21,可确定已知不能成为胎儿非整倍体的染色体(例如染色体1)的CNff。作为替代方案,“女性”样品的CNff21可通过将其与NCNff 进行比较来确定,例如通过如本文其他部分所述对多态序列的标签进行计数而确定的。Figure 60 also shows the fetal fraction obtained for the sample affected by T21 for female fetus.As expected, the CNW21 in these " female " samples can not be verified by comparing with chromosome X.In order to verify the CNW21 of female sample, it is possible to determine the CNW of the chromosome (such as chromosome 1) that is known to be aneuploid of fetus.As an alternative, the CNW21 of " female " sample can be determined by comparing it with NCNW, for example, by counting and determining the label of polymorphic sequence as described in other parts of this paper.
因此,序列标签数目和识别完整的染色体的拷贝数变异的所得NCV值可用于确定非整倍体/受影响的样品中的相应胎儿分数。感兴趣的染色体的CNff 与已知不是非整倍体的染色体的CNff的对应性可用于证实完整的染色体三体性的存在。Therefore, the number of sequence tags and the resulting NCV value identifying the copy number variation of the entire chromosome can be used to determine the corresponding fetal fraction in an aneuploid/affected sample. The correspondence of the CNff of the chromosome of interest with the CNff of a chromosome known not to be aneuploid can be used to confirm the presence of a complete chromosomal trisomy.
实例19Example 19
由NCV确定胎儿分数以识别临床样品中存在部分的胎儿染色体非整倍性Determine fetal fraction from NCV to identify the presence of partial fetal chromosomal aneuploidy in clinical samples
为了证明根据NCV确定的ff(CNff)可用于识别并且定位临床样品中部分的染色体非整倍性与部分的染色体非整倍性的存在,如实例18中所述对来自已识别为具有染色体17非整倍性的临床样品的cfDNA进行测序和分析。To demonstrate that ff determined according to NCV (CNff) can be used to identify and locate the presence of partial chromosomal aneuploidies and partial chromosomal aneuploidies in clinical samples, cfDNA from clinical samples identified as having chromosome 17 aneuploidy was sequenced and analyzed as described in Example 18.
使用映射到测试样品中的染色体17和合格样品组中所识别的归一化染色体(染色体16+染色体20+染色体22)的序列标签(以上表28),计算测试样品中针对各染色体的NCV值。The NCV value for each chromosome in the test sample was calculated using sequence tags mapped to chromosome 17 in the test sample and the normalizing chromosomes identified in the qualified sample set (chromosome 16+chromosome 20+chromosome 22) (Table 28 above).
图63示出了针对测试样品中染色体1-22和X的NCV值的图。如图中所显示,针对染色体17的NCV值被确定为具有NCV>4,其为选出用于识别非整倍体染色体的阈值。该图还示出了针对染色体X的NCV值,正如所料,染色体X具有负NCV。Figure 63 shows a graph of the NCV values for chromosomes 1-22 and X in the test sample. As shown in the figure, the NCV value for chromosome 17 is determined to have NCV>4, which is the threshold value selected for identifying aneuploid chromosomes. The figure also shows the NCV value for chromosome X, and as expected, chromosome X has a negative NCV.
根据以下等式计算染色体17和染色体X的CNff:The CNff for chromosome 17 and chromosome X was calculated according to the following equation:
ff(i)=2*NCVjACVjU 等式25,ff (i) = 2*NCV jA CV jU Equation 25,
并且确定CNff17=3.9%并且CNffX=13.5%。And it was determined that CNff17=3.9% and CNffX=13.5%.
CNff之间的差异表明存在部分的非整倍性或可能是嵌合性。The discrepancies between CNffs suggest the presence of partial aneuploidy or possibly mosaicism.
为了区分部分的非整倍性与可能的嵌合性,针对染色体17上的各100Kbp 连续基块/分区来对标签数进行计数,并且针对各分区计算归一化的二进制值 (NBV)。单独分区中标签数的归一化是通过确定标签/数据箱与具有相同大小并且具有与所分析数据箱最接近的GC含量的20个数据箱中的标签数总和的比率来进行。因此,在该情况下,归一化与GC含量有关。任选地,数据箱归一化还可能与数据箱剂量的变异性有关,如在针对染色体剂量/比率所述的合格样品中所确定。在该实例中,GCC Z得分等于如以下所确定的NBV值:In order to distinguish the aneuploidy of part from possible mosaicism, the number of tags is counted for each 100Kbp continuous base block/partition on chromosome 17, and the normalized binary value (NBV) is calculated for each partition. The normalization of the number of tags in the separate partition is performed by determining the ratio of the sum of the number of tags in the 20 data bins with the same size and the GC content closest to the analyzed data bin. Therefore, in this case, normalization is related to GC content. Optionally, data bin normalization may also be related to the variability of data bin dosage, as determined in the qualified samples described for chromosome dosage/ratio. In this example, the GCC Z score is equal to the NBV value determined as follows:
其中Mj和MADj对应地为针对合格样品组中的第j个染色体剂量的估计中位数和经过中位数调整的偏差,而xij是针对测试样品i观察到的第j个染色体剂量。where Mj and MADj are the estimated median and median-adjusted deviation of the jth chromosome dose in the qualified sample set, respectively, and xij is the observed jth chromosome dose for test sample i.
针对沿染色体17的长度的各100Kbp分区的归一化的二进制值(NBV) 作为指示GC归一化的GCC Z得分形式示出在图64的Y轴上。图64中所示出的图明显示出对应于染色体17中近似最后200,000bp的分区的拷贝数增加。该发现与针对说明染色体17的q ter处的一个复制的样品所提供的核型相符。The normalized binary value (NBV) for each 100 Kbp partition along the length of chromosome 17 is shown on the Y-axis of Figure 64 as a form of GCC Z-score indicating GC normalization. The graph shown in Figure 64 clearly shows an increase in copy number for the partition corresponding to the approximately last 200,000 bp of chromosome 17. This finding is consistent with the karyotype provided for the sample illustrating one duplication at the q ter of chromosome 17.
因此,CNff可用于识别和定位染色体中的部分的非整倍性。Therefore, CNff can be used to identify and locate partial aneuploidies in chromosomes.
______________________________________________________
实例20Example 20
在母体cfDNA的多重生物检验中验证样品完整性Verifying sample integrity in multiplexed bioassays of maternal cfDNA
将具有已知不包含在任何已知的基因组中的序列的标记物分子合成并且用以验证全血和血浆母体来源样品的完整性,这些样品经过处理以提取出母体样品中的胎儿和母体cfDNA的混合物并且对其进行测序。Marker molecules with sequences known not to be contained in any known genome were synthesized and used to verify the integrity of whole blood and plasma maternal-derived samples, which were processed to extract and sequence the mixture of fetal and maternal cfDNA in the maternal samples.
当时和以前的实验数据已经显示cfDNA的平均长度是大约170bp。使用BLAST 搜索,针对所有基因组登录,识别出在任一个已知的基因组中不存在的170bp 的反基因链序列。六个标记物分子(MM1-MM6)基于所识别的反基因链序列的序列(SEQ ID NO:1-6;表29)合成,并且如下用以验证样品的完整性。Experimental data at that time and previously showed that the average length of cfDNA is approximately 170 bp. Using a BLAST search across all genomic accessions, a 170 bp antigenome sequence not present in any known genome was identified. Six marker molecules (MM1-MM6) were synthesized based on the sequence of the identified antigenome sequence (SEQ ID NOs: 1-6; Table 29) and used to verify sample integrity as follows.
表29Table 29
标记物分子marker molecules
从一个孕妇体内收集外周血到4个血液收集管(内布拉斯加州奥马哈市施特雷克公司(Streck,Inc.Omaha NE)的Cell-Free DNATM BCT)中并且连夜运送至实验室进行分析。两个全血来源样品如下外加标记物分子。一个血液来源样品外加720pg标记物分子1(MM1),并且第二血液来源样品外加720pg 标记物分子2。所有4个管都在4℃下在1600g下离心10分钟。从四个管中的每一个中移出血浆上清液,并且将其放入5mL高速离心管中并且在4℃下在16000g下离心10分钟。已经外加标记物分子的全血的血浆部分等分到分开的管中并且在-80℃下存储。来自将两个剩余血液管(未进行外加)的血浆部分接着分成1.1mL等分试样。血浆来源样品如下制备。将一百皮克MM1加入一个血浆等分试样中,100pg MM2加入血浆等分试样2中,等等,以获得6 个经过标记的血浆来源样品,每一个血浆来源样品包含在-80℃下存储不同的标记物分子(MM1-MM6)。Peripheral blood was collected from a pregnant woman into four blood collection tubes (Cell-Free DNA ™ BCT from Streck, Inc. Omaha, Nebraska) and shipped overnight to the laboratory for analysis. Two whole blood-derived samples were spiked with marker molecules as follows. One blood-derived sample was spiked with 720 pg marker molecule 1 (MM1), and the second blood-derived sample was spiked with 720 pg marker molecule 2. All four tubes were centrifuged at 1600 g for 10 minutes at 4°C. The plasma supernatant was removed from each of the four tubes and placed in a 5 mL high-speed centrifuge tube and centrifuged at 16,000 g for 10 minutes at 4°C. The plasma portion of the whole blood to which the marker molecules had been spiked was aliquoted into separate tubes and stored at -80°C. The plasma portion from the two remaining blood tubes (not spiked) was then divided into 1.1 mL aliquots. The plasma-derived samples were prepared as follows. One hundred picograms of MM1 were added to one plasma aliquot, 100 pg of MM2 was added to plasma aliquot 2, and so on, to obtain six labeled plasma-derived samples, each containing a different marker molecule (MM1-MM6) stored at -80°C.
将每一个经过标记的血浆来源样品的一个管和每一个经过标记的源血液样品的1个管解冻,并且根据实例1中所述的方法,使用凯杰血液小型试剂盒 (Qiagen Blood MiniKit)提取出DNA。使用包括索引1-6的TruSeqTM DNA 样品制备试剂盒(加利福尼亚州圣地亚哥市的),使用三十微升的每种样品DNA来制备文库。对测序文库进行制备,从而使得包括MM1的样品使用索引分子1编索引,包括MM2的样品使用索引2编索引等等。测序文库使用安捷伦生物分析器DNA1000试剂盒(安捷伦技术公司,圣克拉拉,加利福尼亚州)来量化并且用凯杰缓冲液EB稀释到4nM。将编索引并且经过标记的样品汇集并且进一步稀释到2nM,接着使用伊鲁米纳TruSeq SBS试剂盒 v3,根据表30,在伊鲁米纳HiSeq流动池的四个泳道中进行测序。One tube of each labeled plasma-derived sample and one tube of each labeled source blood sample were thawed and DNA was extracted using a Qiagen Blood MiniKit according to the method described in Example 1. Thirty microliters of each sample DNA was used to prepare libraries using a TruSeq ™ DNA Sample Preparation Kit (San Diego, California) including indexes 1-6. Sequencing libraries were prepared such that samples including MM1 were indexed using index molecule 1, samples including MM2 were indexed using index 2, and so on. Sequencing libraries were quantified using an Agilent Bioanalyzer DNA1000 Kit (Agilent Technologies, Santa Clara, California) and diluted to 4 nM with Qiagen Buffer EB. The indexed and labeled samples were pooled and further diluted to 2 nM and then sequenced in four lanes of an Illumina HiSeq flow cell using an Illumina TruSeq SBS Kit v3 according to Table 30.
表30Table 30
多重测序流动池的布局Layout of multiplex sequencing flow cells
将序列读数与人类参考基因组hg19进行比对并且与包含反基因链标记物分子序列的合成的参考基因组进行比对。对独特(即仅仅一次)映射到hg19 参考基因组或具有标记物分子序列的合成的参考基因组的序列读数进行计数 (表31)。Sequence reads were aligned to the human reference genome hg19 and to a synthetic reference genome containing the antigenic marker molecule sequence. Sequence reads that mapped uniquely (i.e., only once) to the hg19 reference genome or the synthetic reference genome with the marker molecule sequence were counted (Table 31).
表31Table 31
MM序列与来源样品cfDNA序列的对应Correspondence between MM sequences and cfDNA sequences of source samples
*I=索引*I=index
**L=泳道**L = lane
数据表明,对于每个样品来说,确定已经加入来源样品中的MM的序列只与已经加入MM的来源样品的cfDNA的序列相对应。举例来说,样品1的数据表明,确定映射到MM1的读数的序列只与已经从已经加入MM1的来源样品(血浆样品1)获得的cfDNA的序列相对应。另外,从来源样品1的测序 cfDNA获得的读数中不存在不同的序列(例如MM2)表明来源样品1没有被另一个样品(例如来源样品2)交叉污染。The data demonstrates that, for each sample, the sequences of the MM spiked into the source sample corresponded exclusively to the sequences of the cfDNA from the source sample spiked with MM. For example, the data for Sample 1 demonstrates that the sequences of reads mapped to MM1 corresponded exclusively to the sequences of cfDNA obtained from the source sample spiked with MM1 (Plasma Sample 1). Furthermore, the absence of distinct sequences in reads obtained from sequenced cfDNA from Source Sample 1 (e.g., MM2) indicates that Source Sample 1 was not cross-contaminated with another sample (e.g., Source Sample 2).
实例21Example 21
内部阳性对照Internal positive control
发展一种用于对母体cfDNA进行大规模平行测序的过程中阳性对照,为三体性13、三体性18以及三体性21提供定性阳性染色体剂量和NCV值。To develop an in-process positive control for massively parallel sequencing of maternal cfDNA to provide qualitative positive chromosome dose and NCV values for trisomy 13, trisomy 18, and trisomy 21.
将来自对应地具有Chr13、Chr18和Chr21的已知三体性的三个男性患者的成片段的基因组DNA外加到女性成片段的DNA背景中。通过PAGE对成片段的基因组DNA进行尺寸选择,以包含长度在从约150bp到约250bp范围内的片段,从而模拟胎儿cfDNA的尺寸。对T13、T18和T21对照的经过尺寸选择的DNA进行纯化并且进行末端修复,并且使用Nanodrop(特拉华州威尔明顿市(Wilmington,DE))测量浓度。所制备的DNA在生物分析器高灵敏度DNA芯片(安捷伦,圣克拉拉,加利福尼亚州)上进行确认。三体性13、三体性18以及三体性21的这些DNA从科瑞尔医学研究所(Coriell Institute for Medical Research)(新泽西州卡姆登市(Camden,NJ))获得。女性基因组 DNA从博诚公司(The Biochain Institute)(加利福尼亚州海沃德市(Hayward, CA))获得。将少量的三体DNA外加到主要女性DNA背景中,以模拟在女性“母体”DNA背景中的“男性胎儿”DNA分数。对此DNA混合物的组成进行最佳化,使得当用于测序检验中来确定拷贝数变异时,混合物总是定性地对三体性13、三体性18以及三体性21报导阳性,其中13、18以及21的NCV值大于4。The fragmented genomic DNA of three male patients with known trisomy of Chr13, Chr18 and Chr21 was added to the fragmented DNA background of female. The fragmented genomic DNA was size-selected by PAGE to include fragments with a length ranging from about 150bp to about 250bp, thereby simulating the size of fetal cfDNA. The size-selected DNA of T13, T18 and T21 controls was purified and end-repaired, and the concentration was measured using Nanodrop (Wilmington, DE, Delaware). The prepared DNA was confirmed on a bioanalyzer high-sensitivity DNA chip (Agilent, Santa Clara, California). These DNAs of trisomy 13, trisomy 18 and trisomy 21 were obtained from Coriell Institute for Medical Research (Camden, NJ, New Jersey). Female genomic DNA was obtained from The Biochain Institute (Hayward, CA, California). A small amount of trisomic DNA was added to the predominantly female DNA background to simulate the fraction of "male fetal" DNA in the female "maternal" DNA background. The composition of this DNA mixture was optimized so that when used in a sequencing assay to determine copy number variation, the mixture consistently reported qualitatively positive for trisomy 13, trisomy 18, and trisomy 21, with NCV values greater than 4 for 13, 18, and 21.
母体cfDNA从血浆样品中提取出,这些血浆样品从孕妇获得;并且制备母体样品cfDNA和T13、T18以及T21的对照DNA的测序文库用于多重测序,使用伊鲁米纳平台来进行该多重测序。在测序仪的每一个流动池中对四个阳性对照和56个样品进行测序。如在本申请的其他地方所述,获得36bp读数,识别出多个染色体的标签,并且计算NCV值。Maternal cfDNA was extracted from plasma samples obtained from pregnant women; sequencing libraries of maternal sample cfDNA and control DNA of T13, T18, and T21 were prepared for multiple sequencing using the Illumina platform. Four positive controls and 56 samples were sequenced in each flow cell of the sequencer. As described elsewhere in this application, 36bp reads were obtained, multiple chromosome labels were identified, and NCV values were calculated.
图69A、B以及C展示母体测试样品(◇)和内部阳性对照(□)的NCV 值。NCV值超过4被确定为针对感兴趣的染色体13(A)、18(B)和21(C) 对应地具有拷贝数变异。该图展示了阳性对照的NCV与母体测试样品的NCV 进行关联,识别出其具有拷贝数变异,即染色体13、18和21的额外拷贝。Figures 69A, B, and C show the NCV values for the maternal test sample (◇) and the internal positive control (□). NCV values greater than 4 were identified as copy number variations for chromosomes 13 (A), 18 (B), and 21 (C), respectively. The figure shows that the NCV of the positive control was correlated with the NCV of the maternal test sample, identifying it as having copy number variations, namely extra copies of chromosomes 13, 18, and 21.
内部阳性对照可以设计成能模拟完整染色体变异和部分染色体变异,这些内部阳性对照可以用于产前诊断检验和例如如遍及本说明书所述通过大规模平行测序来确定胎儿分数等相关的检验。Internal positive controls can be designed to mimic both complete and partial chromosome variations and can be used in prenatal diagnostic tests and related tests such as determining fetal fraction by massively parallel sequencing as described throughout this specification.
实例22Example 22
使用大规模平行测序确定胎儿分数:样品处理和cfDNA提取Determination of fetal fraction using massively parallel sequencing: sample processing and cfDNA extraction
从处于妊娠期的第一个三月期或第二个三月期并且被认为存在胎儿非整倍性风险的孕妇体内收集外周血样品。在抽血前从各参与者处获得同意书。在羊膜穿刺或绒膜绒毛采样前收集血液。使用绒膜绒毛或羊膜穿刺样品进行核型分析以确定胎儿核型。Peripheral blood samples were collected from pregnant women in the first or second trimester who were considered at risk for fetal aneuploidy. Written consent was obtained from each participant before blood draw. Blood was collected before amniocentesis or chorionic villus sampling. Karyotyping was performed on the chorionic villus or amniocentesis samples to determine the fetal karyotype.
将从各受试者抽取的外周血收集在ACD管中。将一管血样(约6到9毫升/管)转移到一个15毫升低速离心管中。使用贝克曼Allegra 6R离心机和 GA 3.8型转子,在2640rpm、4℃下将血液离心10分钟。Peripheral blood was drawn from each subject and collected in an ACD tube. One blood sample (approximately 6 to 9 ml/tube) was transferred to a 15 ml low-speed centrifuge tube. The blood was centrifuged at 2640 rpm and 4°C for 10 minutes using a Beckman Allegra 6R centrifuge with a GA 3.8 rotor.
对于无细胞血浆提取,将上部血浆层转移到15毫升高速离心管中,并且使用贝克曼库尔特Avanti J-E离心机和JA-14转子,在16000×g、4℃下离心 10分钟。在血液收集后,在72小时内进行两个离心步骤。将包含cfDNA的无细胞血浆存储在-80℃下,并且在血浆cfDNA扩增或cfDNA纯化前只解冻一次。For cell-free plasma extraction, the upper plasma layer was transferred to a 15-ml high-speed centrifuge tube and centrifuged at 16,000 × g at 4°C for 10 minutes using a Beckman Coulter Avanti J-E centrifuge and a JA-14 rotor. Two centrifugation steps were performed within 72 hours of blood collection. Cell-free plasma containing cfDNA was stored at −80°C and thawed only once before plasma cfDNA amplification or cfDNA purification.
使用QIAamp血液DNA小型试剂盒(凯杰),基本上根据制造商说明书从无细胞血浆中提取经过纯化的无细胞DNA(cfDNA)。将一毫升缓冲液AL和 100μl蛋白酶溶液加入1ml血浆中。在56℃下将该混合物孵育15分钟。将一毫升100%乙醇加入血浆消化液中。将所得混合物转移到与QIAvac 24Plus柱组合件(凯杰)中所提供的VacValve和VacConnector组合的QIAamp微型柱中。向样品施加真空,并且在真空下用750μl缓冲液AW1对截留在柱过滤器上的cfDNA进行洗涤,继而用750μl缓冲液AW24进行第二次洗涤。在14,000 RPM下将该柱离心5分钟以便从过滤器中去除任何残余缓冲液。通过在14,000 RPM下离心用缓冲液AE洗提cfDNA,并且使用QubitTM量化平台(Invitrogen (英杰))确定浓度。Purified cell-free DNA (cfDNA) was extracted from cell-free plasma using the QIAamp Blood DNA Mini Kit (Qiagen) essentially according to the manufacturer's instructions. One milliliter of buffer AL and 100 μl of protease solution were added to 1 ml of plasma. The mixture was incubated at 56°C for 15 minutes. One milliliter of 100% ethanol was added to the plasma digest. The resulting mixture was transferred to a QIAamp microcolumn combined with the VacValve and VacConnector provided in the QIAvac 24Plus column assembly (Qiagen). Vacuum was applied to the sample, and the cfDNA trapped on the column filter was washed under vacuum with 750 μl of buffer AW1, followed by a second wash with 750 μl of buffer AW24. The column was centrifuged at 14,000 RPM for 5 minutes to remove any residual buffer from the filter. The cfDNA was eluted with buffer AE by centrifugation at 14,000 RPM, and the concentration was determined using the Qubit ™ quantification platform (Invitrogen).
实例23Example 23
使用大规模平行测序确定胎儿分数:制备测序文库、测序以及分析测序数据Determining fetal fraction using massively parallel sequencing: preparing sequencing libraries, sequencing, and analyzing sequencing data
a.制备测序文库a. Preparation of sequencing library
所有测序文库,即目标、初级以及经过富集的文库,都由从母体血浆中提取的约2ng经过纯化的cfDNA制备。使用的NEBNextTM DNA样品制备DNA试剂集1(物品编号E6000L;纽英伦生物实验室,伊普斯威奇,马萨诸塞州)的试剂如下进行文库制备。因为无细胞血浆DNA本质上是成片段的,因此不再通过喷雾法或声处理使该血浆DNA样品成片段。根据末端修复模块,通过将cfDNA与NEBNextTM DNA样品制备DNA 试剂集1中所提供的5μl 10X磷酸化缓冲液、2μl脱氧核苷酸溶液混合物(10 mM每一种dNTP)、1μl 1:5DNA多聚酶I稀释液、1μl T4DNA多聚酶以及 1μl T4多核苷酸激酶一起在1.5ml微量离心管中在20℃下孵育15分钟,将包括在40μl中的约2ng经过纯化的cfDNA片段的突出端转化成磷酸化钝端。接着通过在75℃将该反应混合物孵育5分钟对该酶进行热灭活。将该混合物冷却到4℃,并且使用10μl包含克列诺片段(3’到5’exo-)的dA加尾主混合液(NEBNextTM DNA样品制备DNA试剂集1)并且在37℃下孵育15分钟来实现钝端DNA的dA加尾。随后,通过在75℃下将该反应混合物孵育5分钟对克列诺片段进行热灭活。在克列诺片段灭活之后,使用NEBNextTMDNA样品制备DNA试剂集1中所提供的4μl T4DNA连接酶,通过在25℃下将反应混合物孵育15分钟,用1μl伊鲁米纳基因组适配子寡聚混合物(物品编号 1000521;加利福尼亚州海沃德市伊鲁米纳公司)的1:5稀释液将伊鲁米纳适配子(非索引Y适配子)连接到带dA尾的DNA。将该混合物冷却到4℃,并且使用安金科特AMPure XP PCR纯化系统(物品编号A63881;贝克曼库尔特基因组,丹弗斯,马萨诸塞州)中所提供的磁珠,从未连接的适配子、适配子二聚体以及其他试剂中纯化出适配子连接的cfDNA。使用高保真主混合液(芬姿美,沃本,马萨诸塞州)和补偿适配子的伊鲁米纳PCR引物(物品编号1000537和1000537)进行18次PCR循环以便选择性地富集适配子连接的cfDNA。使用伊鲁米纳基因组PCR引物(物品编号100537和1000538) 和NEBNextTM DNA样品制备DNA试剂集1中所提供的Phusion HF PCR主混合液,根据制造商说明书对适配子连接的DNA进行PCR(98℃,30秒;98℃, 10秒,18次循环;65℃,30秒;以及72℃,30秒;在72℃下最终延伸5分钟,并且保持在4℃)。使用安金科特AMPure XP PCR纯化系统(安金科特生物科技公司,比利佛,马萨诸塞州),根据可在www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf处获得的制造商说明书来纯化经过扩增的产物。在40μl凯杰EB缓冲液中洗提经过纯化的扩增产物,并且使用针对2100生物分析器(安捷伦技术公司,圣克拉拉,加利福尼亚州)的安捷伦DNA 1000试剂盒来分析经过扩增的文库的浓度和大小分布。All sequencing libraries, i.e., target, primary, and enriched libraries, were prepared from approximately 2 ng of purified cfDNA extracted from maternal plasma. The reagents of the NEBNext ™ DNA Sample Preparation DNA Reagent Set 1 (Article No. E6000L; New England Biolabs, Ipswich, Massachusetts) were used for library preparation as follows. Because cell-free plasma DNA is fragmented in nature, the plasma DNA sample was no longer fragmented by nebulization or sonication. According to the end repair module, the overhangs of approximately 2 ng of purified cfDNA fragments included in 40 μl were converted to phosphorylated blunt ends by incubating cfDNA with 5 μl 10X phosphorylation buffer, 2 μl deoxynucleotide solution mixture (10 mM each dNTP), 1 μl 1:5 DNA polymerase I dilution, 1 μl T4 DNA polymerase, and 1 μl T4 polynucleotide kinase provided in a 1.5 ml microcentrifuge tube at 20°C for 15 minutes. The enzyme was then heat-inactivated by incubating the reaction mixture at 75°C for 5 minutes. The mixture was cooled to 4°C and dA-tailing of blunt-ended DNA was achieved using 10 μl of a dA-tailing master mix (NEBNext ™ DNA Sample Preparation DNA Reagent Set 1) containing Klenow fragment (3' to 5' exo-) and incubated at 37°C for 15 minutes. Subsequently, the Klenow fragment was heat-inactivated by incubating the reaction mixture at 75°C for 5 minutes. After Klenow fragment inactivation, 4 μl of T4 DNA ligase provided in NEBNext ™ DNA Sample Preparation DNA Reagent Set 1 were used to ligate ILUMINA adaptors (non-indexed Y adaptors) to the dA-tailed DNA using a 1:5 dilution of 1 μl ILUMINA Genomic Adaptor Oligo Mix (Article No. 1000521; ILUMINA, Hayward, CA) by incubating the reaction mixture at 25°C for 15 minutes. The mixture was cooled to 4°C, and the adapter-ligated cfDNA was purified from unligated adapters, adapter dimers, and other reagents using magnetic beads provided in the Agilent AMPure XP PCR Purification System (Cat. No. A63881; Beckman Coulter Genomics, Danvers, MA). 18 cycles of PCR were performed using High Fidelity Master Mix (Fenzimei, Woburn, MA) and ILlumina PCR primers (Cat. Nos. 1000537 and 1000537) that compensate for the adapters to selectively enrich the adapter-ligated cfDNA. PCR was performed on the adapter-ligated DNA using Illumina genomic PCR primers (item numbers 100537 and 1000538) and the Phusion HF PCR Master Mix provided in the NEBNext ™ DNA Sample Preparation DNA Reagent Set 1 according to the manufacturer's instructions (98° C., 30 sec; 98° C., 10 sec, 18 cycles; 65° C., 30 sec; and 72° C., 30 sec; final extension at 72° C. for 5 min and hold at 4° C.). The amplified product was purified using the Agencot AMPure XP PCR Purification System (Agencot Biotechnologies, Beverly, MA) according to the manufacturer's instructions available at www.beckmangenomics.com/products/AMPureXPProtocol_000387v001.pdf. Purified amplification products were eluted in 40 μl Qiagen EB buffer, and the concentration and size distribution of the amplified libraries were analyzed using the Agilent DNA 1000 kit for the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA).
b.测序b. Sequencing
使用基因组分析仪II(伊鲁米纳公司,圣地亚哥,美国加利福尼亚州),根据标准制造商规约,对文库DNA进行测序。使用伊鲁米纳/索乐科萨技术进行全基因组测序的规约的拷贝可以在2006年12月公开的BioTechniques.RTM. 规约指导2007第29页找到,以及在万维网 biotechniques.com/default.asp?page=protocol&subsection=article_display&id=112 378上找到。The library DNA was sequenced using a Genome Analyzer II (Illumina, San Diego, CA, USA) according to standard manufacturer's protocols. A copy of the protocol for whole genome sequencing using Illumina/Solaxsa technology can be found in BioTechniques.RTM. Protocol Guide 2007, published December 2006, page 29, and on the World Wide Web at biotechniques.com/default.asp?page=protocol&subsection=article_display&id=112378.
将DNA文库稀释到1nM并且变性。根据可以在万维网 illumina.com/systems/genome analyzer/cluster_station.ilmn上获得的伊鲁米纳成簇站用户指南(Illumina’sCluster Station User Guide)和成簇站操作指南(Cluster Station Operations Guide)中所述的程序,使文库DNA(5pM)进行成簇扩增。使用伊鲁米纳基因组分析仪II对经过扩增的DNA进行测序,以便获得36bp 的单端读数。识别出一个序列属于一个特定的人染色体,仅仅需要约30bp的随机序列信息。更长的序列可以独特地识别出更具体的目标。在当前的情况下,获得了众多36bp读数,覆盖了基因组的大约10%。The DNA library is diluted to 1nM and denatured. According to the program described in the Illumina Cluster Station User Guide (Illumina 'sCluster Station User Guide) and the Cluster Station Operations Guide (Cluster Station Operations Guide) that can be obtained on the world wide web illumina.com/systems/genome analyzer/cluster_station.ilmn, library DNA (5pM) is subjected to cluster amplification. Use Illumina Genome Analyzer II to sequence the amplified DNA to obtain a single-end read of 36bp. It only takes about 30bp of random sequence information to identify that a sequence belongs to a specific human chromosome. Longer sequences can uniquely identify more specific targets. In the present case, numerous 36bp reads were obtained, covering approximately 10% of the genome.
c.分析测序数据以确定胎儿分数c. Analyze sequencing data to determine fetal fraction
一旦完成了样品的测序,伊鲁米纳“序列控制软件”将影像和碱基判定文件转移到一个运行伊鲁米纳“基因组分析仪流水线(Genome Analyzer Pipeline)”软件版本1.51的Unix服务器中。使用BOWTIE程序,将36bp读数与人工参考基因组(例如SNP基因组)进行比对。该人工参考基因组识别为涵盖了多态目标序列中所包含的等位基因的多态DNA序列的分组。举例来说,人工参考基因组是包含SEQ ID NO:7-62的SNP基因组。仅仅独特映射到该人工基因组的读数用于分析胎儿分数。完全匹配SNP基因组的读数算作标签并且将其进行过滤。在剩余读数中,仅仅具有一个或两个错配的读数算作标签并且包括在分析中。对映射到多态等位基因中的每一个的标签进行计数,并且胎儿分数确定为映射到主等位基因(即母体等位基因)的标签的数目与映射到次等位基因(即胎儿等位基因)的标签的数目的比率。Once the sequencing of the sample is completed, the ILUMINA "Sequence Control Software" transfers the image and base call files to a Unix server running the ILUMINA "Genome Analyzer Pipeline" software version 1.51. Using the BOWTIE program, 36bp reads are compared with an artificial reference genome (e.g., a SNP genome). The artificial reference genome is identified as a grouping of polymorphic DNA sequences that encompasses the alleles contained in the polymorphic target sequence. For example, the artificial reference genome is a SNP genome comprising SEQ ID NO: 7-62. Only the reads uniquely mapped to the artificial genome are used to analyze fetal fraction. Reads that fully match the SNP genome are counted as tags and filtered. Of the remaining reads, reads with only one or two mismatches are counted as tags and included in the analysis. The tags mapped to each of the polymorphic alleles are counted, and the fetal fraction is determined as the ratio of the number of tags mapped to the major allele (i.e., maternal allele) to the number of tags mapped to the minor allele (i.e., fetal allele).
实例24Example 24
选择常染色体SNP以确定胎儿分数Selection of autosomal SNPs to determine fetal fraction
一组28个常染色体SNP是选自92个SNP的清单(帕克斯等人,人类遗传学127:315-324[2010])以及选自万维网地址是appliedbiosystems.com的Life TechnologiesTM(加利福尼亚州卡尔斯巴德市)的应用生物系统。引物被设计成与一个接近cfDNA上的SNP位点的序列杂交以确保该SNP位点包括在通过在伊鲁米纳分析仪GII上进行大规模平行测序所产生的36bp读数内,并且产生长度足以在成簇形成期间进行桥式扩增的扩增子。因此,引物被设计成能产生至少110bp的扩增子,这些扩增子在与用于成簇扩增的通用适配子(加利福尼亚州圣地亚哥市伊鲁米纳公司)组合时产生至少200bp的DNA分子。识别出引物序列,并且通过集成DNA技术(圣地亚哥,加利福尼亚州)合成引物集合(即正向和反向引物)并且以1μM溶液形式存储,待用于如实例25到 27中所述,扩增多态目标序列。表33提供了RefSNP(rs)寄存身份编号、用于扩增目标cfDNA序列的引物以及包含将使用这些引物产生的可能的SNP等位基因的扩增子的序列。在表33中给出的SNP用于在一个多重检验中同时扩增13个目标序列。在表33中提供的小组是一个例示性SNP小组。可以采用更少或更多的SNP来针对多态目标核酸富集胎儿和母体DNA。可以使用的额外的SNP包括在表34中给出的SNP。SNP等位基因用粗体展示并且加下划线。可用于根据本发明的方法确定胎儿分数的其他额外的SNP包括rs315791、 rs3780962、rs1410059、rs279844、rs38882、rs9951171、rs214955、rs6444724、rs2503107、rs1019029、rs1413212、rs1031825、rs891700、rs1005533、rs2831700、rs354439、rs1979255、rs1454361、rs8037429和rs1490413,已经通过TaqMan PCR 针对确定胎儿分数分析了这些SNP,并且披露在美国临时申请表61/296,358和 61/360,837中。A set of 28 autosomal SNPs was selected from a list of 92 SNPs (Parks et al., Human Genetics 127:315-324 [2010]) and from Applied Biosystems of Life Technologies ™ (Carlsbad, California) at appliedbiosystems.com. Primers were designed to hybridize to a sequence close to the SNP site on cfDNA to ensure that the SNP site was included within the 36 bp read generated by massively parallel sequencing on the Illumina analyzer GII and to generate amplicons of sufficient length to allow bridge amplification during cluster formation. Thus, primers were designed to generate amplicons of at least 110 bp, which, when combined with universal adapters for cluster amplification (Illumina, San Diego, California), generated DNA molecules of at least 200 bp. Primer sequences are identified and primer sets (i.e., forward and reverse primers) are synthesized by Integrated DNA Technologies (San Diego, California) and stored in 1 μM solution form to be used for amplifying polymorphic target sequences as described in Examples 25 to 27. Table 33 provides RefSNP (rs) deposited identity numbers, primers for amplifying target cfDNA sequences, and sequences of amplicons containing possible SNP alleles that will be produced using these primers. The SNPs given in Table 33 are used to simultaneously amplify 13 target sequences in a multiplex test. The panel provided in Table 33 is an exemplary SNP panel. Fewer or more SNPs can be used to enrich fetal and maternal DNA for polymorphic target nucleic acids. Additional SNPs that can be used include those given in Table 34. The SNP alleles are shown in bold and are underlined. Other additional SNPs that can be used to determine fetal fraction according to the methods of the present invention include rs315791, rs3780962, rs1410059, rs279844, rs38882, rs9951171, rs214955, rs6444724, rs2503107, rs1019029, rs1413212, rs1031825, rs891700, rs1005533, rs2831700, rs354439, rs1979255, rs1454361, rs8037429, and rs1490413, which have been analyzed by TaqMan PCR for determining fetal fraction and are disclosed in U.S. Provisional Application Nos. 61/296,358 and 61/360,837.
表33Table 33
用于确定胎儿分数的SNP小组SNP Panel for Determining Fetal Fraction
表34Table 34
用于确定胎儿分数的额外的SNPAdditional SNPs for determining fetal fraction
实例25Example 25
通过对目标文库进行大规模平行测序来确定胎儿分数Determine fetal fraction by massively parallel sequencing of targeted libraries
为了确定母体样品中胎儿的cfDNA分数,对每一个都包含SNP的目标多态核酸序列进行扩增并且用于制备以大规模平行模式进行测序的目标文库。To determine the fetal cfDNA fraction in a maternal sample, target polymorphic nucleic acid sequences, each containing a SNP, are amplified and used to prepare target libraries that are sequenced in a massively parallel format.
如上所述提取cfDNA。目标测序文库如下制备。将5μl经过纯化的cfDNA 中所包含的cfDNA在包含7.5μl 1μM引物混合物(表1)、10μl NEB 5X主混合液以及27μl水的50μl反应体积中扩增。使用以下循环条件,用Gene Amp9700(应用生物系统)进行热循环:在95℃下孵育1分钟,继而在95℃下20秒,在68℃下1分钟,以及在68℃下30秒,循环20到30次,接着在68℃下最终孵育5分钟。最终保持在4℃下,直到为与经过纯化的cfDNA样品的未扩增部分组合而移出样品。使用安金科特AMPure XP PCR纯化系统(物品编号A63881;贝克曼库尔特基因组,丹弗斯,马萨诸塞州)对经过扩增的产物进行纯化。最终保持在4℃下,直到为制备目标文库而移出。用2100生物分析器分析(安捷伦技术公司,加利福尼亚州森尼韦尔市(Sunnyvale,CA)) 经过扩增的产物并且确定经过扩增的产物的浓度。经过扩增的目标核酸的测序文库如实例23中所述来制备,并且使用借助可逆染料终止子的合成法测序以及根据伊鲁米纳规约(2006年12月公开的BioTechniques.RTM.规约指南2007 第29页,以及在万维网 biotechniques.com/default.asp?page=protocol&subsection=article_display&id=112 378)以大规模平行模式进行测序。如所述,对映射到由包含SNP的26个序列(13对,每一对表示两个等位基因)(即SEQ ID NO:7-32)组成的参考基因组的标签进行分析和计数。cfDNA was extracted as described above. The target sequencing library was prepared as follows. The cfDNA contained in 5 μl of purified cfDNA was amplified in a 50 μl reaction volume containing 7.5 μl of 1 μM primer mix (Table 1), 10 μl of NEB 5X master mix, and 27 μl of water. Thermal cycling was performed with a Gene Amp9700 (Applied Biosystems) using the following cycling conditions: incubation at 95°C for 1 minute, followed by 20 seconds at 95°C, 1 minute at 68°C, and 30 seconds at 68°C, for 20 to 30 cycles, followed by a final incubation at 68°C for 5 minutes. The sample was finally kept at 4°C until it was removed for combination with the unamplified portion of the purified cfDNA sample. The amplified product was purified using the Ankincot AMPure XP PCR Purification System (Article No. A63881; Beckman Coulter Genomics, Danvers, MA). The sample was finally kept at 4°C until it was removed for preparation of the target library. The amplified products were analyzed by 2100 bioanalyzer (Agilent Technologies, Sunnyvale, CA) and the concentration of the amplified products was determined. The sequencing library of the amplified target nucleic acid was prepared as described in Example 23, and sequencing by synthesis with a reversible dye terminator was used and sequenced according to the Illumina protocol (BioTechniques.RTM. Protocol Guide 2007, page 29, disclosed in December 2006, and on the world wide web biotechniques.com/default.asp?page=protocol&subsection=article_display&id=112 378) in a massively parallel mode. As described, the tags mapped to the reference genome consisting of 26 sequences (13 pairs, each pair representing two alleles) (i.e., SEQ ID NO:7-32) comprising SNPs were analyzed and counted.
表35提供了从对目标文库进行测序所获得的标签计数,和从测序数据获得的所计算的胎儿分数。Table 35 provides the tag counts obtained from sequencing the target library and the calculated fetal fraction obtained from the sequencing data.
表35Table 35
通过对多态核酸文库进行大规模平行测序来确定胎儿分数Determining fetal fraction by massively parallel sequencing of polymorphic nucleic acid libraries
结果表明,每一个包含至少一个SNP的多态核酸序列可以从来源于母体血浆样品的cfDNA扩增而来,以构造一个文库,该文库可以通过大规模平行模式进行测序以确定母体样品中胎儿核酸的分数。The results showed that each polymorphic nucleic acid sequence containing at least one SNP can be amplified from cfDNA derived from a maternal plasma sample to construct a library that can be sequenced in a massively parallel format to determine the fraction of fetal nucleic acid in the maternal sample.
实例26Example 26
在cfDNA测序文库样品中胎儿和母体核酸富集后确定胎儿分数。Fetal fraction was determined after enrichment of fetal and maternal nucleic acids in cfDNA sequencing library samples.
为了富集使用经过纯化的胎儿和母体cfDNA所构造的初级测序文库中所包含的胎儿和母体cfDNA,使用经过纯化的cfDNA样品的一部分来扩增多态目标核酸序列,并且制备所扩增的多态目标核酸的测序文库,该测序文库用以富集该初级文库中所包含的胎儿和母体核酸序列。In order to enrich the fetal and maternal cfDNA contained in the primary sequencing library constructed using purified fetal and maternal cfDNA, a portion of the purified cfDNA sample is used to amplify polymorphic target nucleic acid sequences, and a sequencing library of the amplified polymorphic target nucleic acids is prepared, which is used to enrich the fetal and maternal nucleic acid sequences contained in the primary library.
该方法对应于图10中所图示的工作流程。如实例23中所述,从经过纯化的cfDNA的一部分制备目标测序文库。如实例23中所述,使用经过纯化的 cfDNA的剩余部分制备初级测序文库。通过将初级和目标测序文库稀释到 10nM,并且将目标文库与初级文库以1:9的比率组合以提供富集的测序文库,来实现针对目标文库中所包含的经过扩增的多态核酸对初级文库的富集。如实例23中所述,对富集的文库进行测序并且对测序数据进行分析。This method corresponds to the workflow illustrated in FIG10 . A target sequencing library was prepared from a portion of the purified cfDNA, as described in Example 23. A primary sequencing library was prepared using the remaining portion of the purified cfDNA, as described in Example 23. The primary library was enriched for amplified polymorphic nucleic acids contained in the target library by diluting the primary and target sequencing libraries to 10 nM and combining the target library with the primary library at a 1:9 ratio to provide an enriched sequencing library. The enriched library was sequenced and the sequencing data analyzed, as described in Example 23.
表36提供了映射到信息性SNP的SNP基因组的序列标签的数目,这些信息性SNP通过对来源于每一个对应地怀有T21、T13、T18和单体性X胎儿的孕妇的血浆样品的富集文库进行测序而识别出。胎儿分数如下计算:Table 36 provides the number of sequence tags mapped to the SNP genome for informative SNPs identified by sequencing enriched libraries from plasma samples from pregnant women carrying fetuses with T21, T13, T18, and monosomy X, respectively. Fetal fraction was calculated as follows:
等位基因x胎儿分数%=((∑等位基因x的胎儿序列标签)/(∑等位基因x的母体序列标签))×100Allele x fetal fraction % = ((∑ fetal sequence tag for allele x )/(∑ maternal sequence tag for allele x )) × 100
表36还提供了映射到人类参考基因组的序列标签的数目。使用与用于确定相应的胎儿分数相同的血浆样品,使用映射到人类参考基因组的标签来确定存在或不存在非整倍性。使用序列标签计数来确定非整倍性的方法描述于美国临时申请61/407,017和61/455,849778中,这些申请通过引用以其全文结合于此。Table 36 also provides the number of sequence tags mapped to the human reference genome.Use the plasma sample identical with for determining corresponding fetal fraction, use the label that is mapped to the human reference genome to determine the presence or absence of aneuploidy.Use sequence tag counting to determine the method for aneuploidy to be described in U.S. Provisional Application 61/407,017 and 61/455,849778, these applications are incorporated herein by reference in their entirety.
表36通过对多态核酸的富集文库进行大规模平行测序来确定胎儿分数Table 36 Determination of fetal fraction by massively parallel sequencing of enriched libraries of polymorphic nucleic acids
实例27Example 27
通过大规模平行测序来确定胎儿分数:Determining fetal fraction by massively parallel sequencing:
在经过纯化的cfDNA样品中针对多态核酸的胎儿和母体核酸的富集。Enrichment of fetal and maternal nucleic acids for polymorphic nucleic acids in purified cfDNA samples.
为了富集从母体血浆样品提取出的cfDNA的纯化样品中所包含的胎儿和母体cfDNA,使用经过纯化的cfDNA的一部分来扩增多态目标核酸序列,每一个多态目标核酸序列包含一个选自在表33中给出的SNP小组的SNP。In order to enrich fetal and maternal cfDNA contained in a purified sample of cfDNA extracted from a maternal plasma sample, a portion of the purified cfDNA is used to amplify polymorphic target nucleic acid sequences, each polymorphic target nucleic acid sequence comprising a SNP selected from the SNP group given in Table 33.
该方法对应于图9中所图示的工作流程。如实例22中所述,从母体血液样品获得无细胞血浆,并且从血浆样品纯化cfDNA。测定出最终浓度是92.8 pg/μl.。将5μl经过纯化的cfDNA中所包含的cfDNA在包含7.5μl 1μM引物混合物(表1)、10μl NEB 5X主混合液以及27μl水的50μl反应体积中扩增。用Gene Amp9700(应用生物系统)进行热循环。使用以下循环条件:在95℃下孵育1分钟,继而在95℃下20秒,在68℃下1分钟,以及在68℃下30 秒,循环30次,接着在68℃下最终孵育5分钟。最终保持在4℃下,直到为与经过纯化的cfDNA样品的未扩增部分组合而移出样品。使用安金科特 AMPure XP PCR纯化系统(物品编号A63881;贝克曼库尔特基因组,丹弗斯,马萨诸塞州)对经过扩增的产物进行纯化,并且使用Nanodrop2000(赛默科技(Thermo Scientific),威尔明顿,特拉华州)量化浓度。将经过纯化的扩增产物在水中1:10稀释并且0.9μl(371pg)加入40μl经过纯化的cfDNA样品中以获得10%外加。经过纯化的cfDNA样品中所存在的富集的胎儿和母体 cfDNA用于制备测序文库,并且如实例22中所述来测序。This method corresponds to the workflow illustrated in Figure 9. As described in Example 22, cell-free plasma was obtained from a maternal blood sample, and cfDNA was purified from the plasma sample. The final concentration was determined to be 92.8 pg/μl. The cfDNA contained in 5 μl of purified cfDNA was amplified in a 50 μl reaction volume containing 7.5 μl of 1 μM primer mixture (Table 1), 10 μl of NEB 5X master mix, and 27 μl of water. Thermal cycling was performed using Gene Amp9700 (Applied Biosystems). The following cycling conditions were used: incubation at 95°C for 1 minute, followed by 20 seconds at 95°C, 1 minute at 68°C, and 30 seconds at 68°C, 30 cycles, followed by a final incubation at 68°C for 5 minutes. The sample was finally kept at 4°C until it was removed for combination with the unamplified portion of the purified cfDNA sample. Amplified products were purified using the Agencot AMPure XP PCR Purification System (Article No. A63881; Beckman Coulter Genomics, Danvers, MA), and concentrations were quantified using a Nanodrop 2000 (Thermo Scientific, Wilmington, DE). The purified amplified product was diluted 1:10 in water, and 0.9 μl (371 pg) was added to 40 μl of purified cfDNA sample to obtain a 10% spike-in. The enriched fetal and maternal cfDNA present in the purified cfDNA sample was used to prepare sequencing libraries and sequenced as described in Example 22.
表37提供针对染色体21、18、13、X和Y中的每一个所获得的标签计数,即序列标签密度,以及针对SNP参考基因组中所包含的信息性多态序列所获得的标签计数,即SNP标签密度。数据表明测序信息可以通过对由经过纯化的母体cfDNA样品构造的单文库进行测序而获得,该母体cfDNA样品已经富集包含SNP的序列,以同时确定存在或不存在非整倍性和胎儿分数。如美国临时申请61/407,017和61/455,849中所述,使用映射到染色体的标签的数目确定存在或不存在非整倍性。在所给出的实例中,数据表明血浆样品AFR105中胎儿DNA的分数可以从五个信息性SNP测序结果量化并且确定为3.84%。针对染色体21、13、18、X和Y,提供序列标签密度。Table 37 provides the tag counts obtained for each of chromosomes 21, 18, 13, X and Y, i.e., sequence tag density, and the tag counts obtained for the informative polymorphic sequences contained in the SNP reference genome, i.e., SNP tag density. The data show that sequencing information can be obtained by sequencing a single library constructed from purified maternal cfDNA samples, which has been enriched with sequences containing SNPs to simultaneously determine the presence or absence of aneuploidy and fetal fraction. As described in U.S. provisional applications 61/407,017 and 61/455,849, the number of tags mapped to chromosomes is used to determine the presence or absence of aneuploidy. In the example given, data show that the score of fetal DNA in plasma sample AFR105 can be quantified and determined to be 3.84% from five informative SNP sequencing results. For chromosomes 21, 13, 18, X and Y, sequence tag density is provided.
该实例表明,富集规约为通过单测序过程确定非整倍性和胎儿分数提供必需的标签计数。This example demonstrates that the enrichment protocol provides the necessary tag counts for determining aneuploidy and fetal fraction in a single sequencing process.
表37Table 37
通过大规模平行测序来确定胎儿分数:Determining fetal fraction by massively parallel sequencing:
在经过纯化的cfDNA样品中针对多态核酸富集胎儿和母体核酸Enrichment of fetal and maternal nucleic acids for polymorphic nucleic acids in purified cfDNA samples
实例28Example 28
通过包含STR的多态序列的毛细管电泳确定胎儿分数Determination of fetal fraction by capillary electrophoresis of polymorphic sequences containing STRs
为确定包含胎儿和母体cfDNA的母体样品中的胎儿分数,从怀有男性或女性胎儿的志愿者孕妇中收集外周血样品。如实例22中所述,获得和处理外周血样品以提供经过纯化的cfDNA。To determine the fetal fraction in a maternal sample containing fetal and maternal cfDNA, peripheral blood samples were collected from volunteer pregnant women carrying either a male or female fetus. Peripheral blood samples were obtained and processed as described in Example 22 to provide purified cfDNA.
使用MiniFilerTM PCR扩增试剂盒(应用生物系统,福斯特城,加利福尼亚州),根据制造商说明书,分析十微升cfDNA样品。简单地说,包含在10μl中的cfDNA在包含5μl荧光标记的引物( MiniFilerTM引物集合)和MiniFilerTM主混合液的25μl反应体积中扩增,该MiniFilerTM主混合液包含AmpliTaqDNA聚合酶和相关缓冲液、盐(1.5mM MgCl2)以及200μM脱氧三磷酸核苷(dNTP:dATP、 dCTP、dGTP以及dTTP)。荧光标记的引物是用6FAMTM、VICTM、NEDTM、和PETTM染料进行标记的正向引物。使用以下循环条件,用Gene Amp9700(应用生物系统)进行热循环:在95℃下孵育10分钟,继而在94℃下20秒,在 59℃下2分钟,以及在72℃下1分钟,循环30次,接着在60℃下最终孵育 45分钟。最终保持在4℃下,直到为进行分析而移出样品。通过在8.7μl Hi-DiTM甲酰胺(应用生物系统)和0.3μl GeneScanTM-500LIZ内部尺寸标准(应用生物系统)中稀释1μl经过扩增的产物来制备经过扩增的产物,并且使用数据收集HID_G5_POP4(应用生物系统)以及36cm毛细管阵列,用ABI PRISM3130xl基因分析仪(应用生物系统)来分析。所有基因分型都用 GeneMapper_ID v3.2软件(应用生物系统),使用制造商提供的等位基因分型标准物(allelic ladders)和数据箱以及小组进行。Ten microliters of cfDNA samples were analyzed using the MiniFiler ™ PCR Amplification Kit (Applied Biosystems, Foster City, CA) according to the manufacturer's instructions. Briefly, cfDNA contained in 10 μl was amplified in a 25 μl reaction volume containing 5 μl of fluorescently labeled primers (MiniFiler ™ Primer Set) and MiniFiler ™ Master Mix, which contains AmpliTaq DNA polymerase and associated buffers, salts ( 1.5 mM MgCl2 ), and 200 μM deoxynucleoside triphosphates (dNTPs: dATP, dCTP, dGTP, and dTTP). Fluorescently labeled primers were forward primers labeled with 6FAM ™ , VIC ™ , NED ™ , and PET ™ dyes. The following cycling conditions were used to perform thermal cycling on a Gene Amp9700 (Applied Biosystems): incubation at 95°C for 10 minutes, followed by 20 seconds at 94°C, 2 minutes at 59°C, and 1 minute at 72°C for 30 cycles, followed by a final incubation at 60°C for 45 minutes. The samples were then kept at 4°C until analysis was performed. The amplified products were prepared by diluting 1 μl of the amplified products in 8.7 μl Hi-Di™ formamide (Applied Biosystems) and 0.3 μl GeneScan™-500LIZ internal size standards (Applied Biosystems), and analyzed using an ABI PRISM 3130xl genetic analyzer (Applied Biosystems) using data collection HID_G5_POP4 (Applied Biosystems) and a 36 cm capillary array. All genotyping was performed with GeneMapper_ID v3.2 software (Applied Biosystems) using allelic ladders and data boxes and panels provided by the manufacturer.
所有基因分型测量都在应用生物系统3130xl基因分析仪上,使用针对每一个等位基因所获得的尺寸±0.5-nt“窗”执行,以允许检测和校正等位基因的比对。尺寸在±0.5-nt窗之外的任何样品等位基因确定为OL,即“分型标准物外的 (Off Ladder)”。OL等位基因是尺寸在MiniFilerTM等位基因分型标准物中未表现的等位基因,或者是不与等位基因分型标准物相对应,但由于测量误差而使得尺寸正好在窗外的等位基因。最小峰值高度阈值>50RFU是基于验证实验设置的,执行这些验证实验以避免在随机效应可能干扰混合物的精确解读时进行分型。胎儿分数的计算是基于将所有信息性标记物求平均值。信息性标记物通过在电泳图上存在着落入针对所分析的STR的预置数据箱的参数内的峰值来识别。All genotyping measurements are performed on an Applied Biosystems 3130xl genetic analyzer using a size ± 0.5-nt "window" obtained for each allele to allow detection and correction of allelic comparisons. Any sample allele of size outside the ± 0.5-nt window is defined as OL, i.e., "off ladder." The OL allele is an allele whose size is not represented in the MiniFiler ™ allelic ladder, or is not corresponding to the allelic ladder, but whose size is just outside the window due to measurement error. A minimum peak height threshold value of >50 RFU is set based on a validation experiment, which is performed to avoid typing when random effects may interfere with the accurate interpretation of a mixture. The calculation of fetal fraction is based on averaging all informative markers. Informative markers are identified by having a peak value that falls within the parameters of the preset data box for the analyzed STR on the electrophoretogram.
使用根据一式三份的注射所确定的每个STR基因座上主等位基因和次等位基因的平均峰值高度对胎儿分数进行计算。适用于该计算的规则是:Fetal fraction was calculated using the mean peak height of the major and minor alleles at each STR locus determined from triplicate injections. The rule applicable to this calculation was:
1.针对不包括在计算中的等位基因的分型标准物外的等位基因(OL)数据;以及1. Allele out of ladder (OL) data for alleles not included in the calculation; and
2.仅仅由>50RFU(相对荧光单位)获得的峰值高度包括在计算中。2. Only peak heights obtained with >50 RFU (relative fluorescence units) were included in the calculation.
3.若只有一个数据箱存在,则标记物被认为是非信息性的;以及3. If only one data bin exists, the marker is considered non-informative; and
4.若判定了一个第二数据箱,但第一和第二数据箱的峰值在峰值高度上在其相对荧光单位(RFU)的50%到70%内,则不测量少数的分数并且该标记物不被认为是信息性的。4. If a second bin is identified, but the peaks of the first and second bins are within 50% to 70% of their relative fluorescence units (RFUs) in peak height, the minority score is not measured and the marker is not considered informative.
针对任何给出的信息性标记物的次等位基因的分数通过将次要组分的峰值高度除以主要组分的峰值高度和来计算,并且表示为百分比,首先针对每一个信息性基因座计算为The fraction of minor alleles for any given informative marker was calculated by dividing the peak height of the minor component by the sum of the peak heights of the major components and expressed as a percentage, first calculated for each informative locus as
胎儿分数=(∑次等位基因的峰值高度/∑主等位基因的峰值高度)X 100,Fetal fraction = (∑ peak height of minor allele/∑ peak height of major allele) × 100,
包含两个或更多个信息性STR的样品的胎儿分数将计算为针对两个或更多个信息性标记物计算的胎儿分数的平均值。The fetal fraction for a sample containing two or more informative STRs will be calculated as the average of the fetal fractions calculated for the two or more informative markers.
表38提供从对怀有男性胎儿的受试者的cfDNA进行分析所获得的数据。Table 38 provides data obtained from analysis of cfDNA from subjects carrying a male fetus.
表38Table 38
通过分析STR在怀孕受试者的cfDNA中确定的胎儿分数Fetal fraction determined by analysis of STRs in cfDNA of pregnant subjects
结果表明,cfDNA可用于确定存在或不存在胎儿DNA,如在一个或多个 STR等位基因上次要组分的检测所指示,用于确定胎儿分数百分比,以及用于确定胎儿性别,如存在或不存在Amelogenin等位基因所指示。The results show that cfDNA can be used to determine the presence or absence of fetal DNA, as indicated by the detection of a minor component at one or more STR alleles, to determine the fetal fraction percentage, and to determine the sex of the fetus, as indicated by the presence or absence of the Amelogenin allele.
序列表Sequence Listing
<110> 维里纳塔健康公司<110> Verinata Health
<120> 拷贝数变异的检测和分类<120> Detection and classification of copy number variations
<130> IP-0722-CN<130> IP-0722-CN
<140> CN 201210441134.8<140> CN 201210441134.8
<141> 2012-11-07<141> 2012-11-07
<150> 13/555,037<150> 13/555,037
<151> 2012-07-20<151> 2012-07-20
<150> 13/482,964<150> 13/482,964
<151> 2012-05-29<151> 2012-05-29
<150> 13/445,778<150> 13/445,778
<151> 2012-04-12<151> 2012-04-12
<160> 118<160> 118
<170> SIPOSequenceListing 1.0<170> SIPOSequenceListing 1.0
<210> 1<210> 1
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 1<400> 1
gcacatcccg ctccgggtga ctattaaaga cgaccctcga tcatagcact cgatcagatt 60gcacatcccg ctccgggtga ctattaaaga cgaccctcga tcatagcact cgatcagatt 60
gtgacgtatg atctgtagga catacttctt ggccactaac cagacggtgc gagatatttc 120gtgacgtatg atctgtagga catacttctt ggccactaac cagacggtgc gagatatttc 120
gaattgcgcc tacctatctg gaacgactaa tgtcaattct tcgaatgaca 170gaattgcgcc tacctatctg gaacgactaa tgtcaattct tcgaatgaca 170
<210> 2<210> 2
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 2<400> 2
cgccaatcgc gctctatgct taacgcacgt cctgtctctt tatagagata ccgtgggtga 60cgccaatcgc gctctatgct taacgcacgt cctgtctctt tatagagata ccgtgggtga 60
cggcgtgacc gggagccttg aggagagcat aaagcgtaac cggattatcc cgaatggtat 120cggcgtgacc gggagccttg aggagagcat aaagcgtaac cggattatcc cgaatggtat 120
atgacggtcc ctcgcatacc ggaccgggca ttactcagca gcggttctgc 170atgacggtcc ctcgcatacc ggaccgggca ttactcagca gcggttctgc 170
<210> 3<210> 3
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 3<400> 3
ccccaatagt gcggtgatct aacacctgac atcgggccga aagaggaatt aagagccgac 60ccccaatagt gcggtgatct aacacctgac atcgggccga aagaggaatt aagagccgac 60
cggctagact gcccatgtgc caaatcaggg gtcgaggagg ttgtgtggcg acatcctatt 120cggctagact gcccatgtgc caaatcaggg gtcgaggagg ttgtgtggcg acatcctatt 120
ggttccacct ggcggaatcg ggcaaagcca ccatcactgg actgagaacc 170ggttccacct ggcggaatcg ggcaaagcca ccatcactgg actgagaacc 170
<210> 4<210> 4
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 4<400> 4
agtccagtaa ttgcgaggaa ccacttactc ggtacaccgc tcctggctgg ggttggcaga 60agtccagtaa ttgcgaggaa ccacttactc ggtacaccgc tcctggctgg ggttggcaga 60
ccagtcatgt tgctgaggac cgacgacccc ggaccattta actctcagac gtaccgacag 120ccagtcatgt tgctgaggac cgacgacccc ggaccattta actctcagac gtaccgacag 120
caactttgcc gaattctctc cagcaatcga gagcgggaag gcataagtgc 170caactttgcc gaattctctc cagcaatcga gagcgggaag gcataagtgc 170
<210> 5<210> 5
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 5<400> 5
agaaccatct ccggcgcaag tctacgcgag ttggccttag ctcataccta cggatgtgga 60agaaccatct ccggcgcaag tctacgcgag ttggccttag ctcataccta cggatgtgga 60
ggataagtcc ttagctcgta ccatcgtaac ctagtggcgt catgcgccta cgtgagaagg 120ggataagtcc ttagctcgta ccatcgtaac ctagtggcgt catgcgccta cgtgagaagg 120
attctttact gagcgcagag ttgtccgtct actgccacgg gccataacgc 170attctttact gagcgcagag ttgtccgtct actgccacgg gccataacgc 170
<210> 6<210> 6
<211> 170<211> 170
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 6<400> 6
cctaaggcct acttcaatat cgtgatgcac ccgaatgact aaaggggtat atggagtatg 60cctaaggcct acttcaatat cgtgatgcac ccgaatgact aaaggggtat atggagtatg 60
tccatggcgt cattgagccc gcttaggatc tactgtaatc cgagggatac atgcctcacg 120tccatggcgt cattgagccc gcttaggatc tactgtaatc cgagggatac atgcctcacg 120
cgagtctttc ctaccgctac tagacattat ggtgcgcgcc ttgagtacgt 170cgagtctttc ctaccgctac tagacattat ggtgcgcgcc ttgagtacgt 170
<210> 7<210> 7
<211> 111<211> 111
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 7<400> 7
cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60
ctgttcaggt ttctctccat ctctatttac tcaggtcaca ggaccttggg g 111ctgttcaggt ttctctccat ctctatttac tcaggtcaca ggaccttggg g 111
<210> 8<210> 8
<211> 111<211> 111
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 8<400> 8
cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60cacatgcaca gccagcaacc ctgtcagcag gagttcccac cagtttcttt ctgagaacat 60
ctgttcaggt ttctctccat ctctgtttac tcaggtcaca ggaccttggg g 111ctgttcaggt ttctctccat ctctgtttac tcaggtcaca ggaccttggg g 111
<210> 9<210> 9
<211> 126<211> 126
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 9<400> 9
tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60
attttacact ccctgcctcc cacaccagtt tctccagagt ggaaagactt tcatctcgca 120attttacact ccctgcctcc cacaccagtt tctccagagt ggaaagactt tcatctcgca 120
ctggca 126ctggca 126
<210> 10<210> 10
<211> 126<211> 126
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 10<400> 10
tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60tgaggaagtg aggctcagag ggtaagaaac tttgtcacag agctggtggt gagggtggag 60
attttacact ccctgcctcc cacaccagtt tctccggagt ggaaagactt tcatctcgca 120attttacact ccctgcctcc cacaccagtt tctccggagt ggaaagactt tcatctcgca 120
ctggca 126ctggca 126
<210> 11<210> 11
<211> 121<211> 121
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 11<400> 11
gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60
aaagtgggtt gtttctctag ccagacaggg caggcaaata ggggtggctg gtgggatggg 120aaagtgggtt gtttctctag ccagacaggg caggcaaata ggggtggctg gtgggatggg 120
a 121a 121
<210> 12<210> 12
<211> 121<211> 121
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 12<400> 12
gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60gtgccttcag aacctttgag atctgattct atttttaaag cttcttagaa gagagattgc 60
aaagtgggtt gtttctctag ccagacaggg caggtaaata ggggtggctg gtgggatggg 120aaagtgggtt gtttctctag ccagacaggg caggtaaata ggggtggctg gtgggatggg 120
a 121a 121
<210> 13<210> 13
<211> 111<211> 111
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 13<400> 13
aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60
gcagtgagca ttcaaatcct caaggaacag ggtggggagg tgggacaaag g 111gcagtgagca ttcaaatcct caaggaacag ggtggggagg tgggacaaag g 111
<210> 14<210> 14
<211> 111<211> 111
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 14<400> 14
aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60aggtgtgtct ctcttttgtg aggggagggg tcccttctgg cctagtagag ggcctggcct 60
gcagtgagca ttcaaatcct cgaggaacag ggtggggagg tgggacaaag g 111gcagtgagca ttcaaatcct cgaggaacag ggtggggagg tgggacaaag g 111
<210> 15<210> 15
<211> 139<211> 139
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 15<400> 15
cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60
ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctgag cagcctcctg 120ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctgag cagcctcctg 120
gaatactcag ctgggatgg 139gaatactcag ctgggatgg 139
<210> 16<210> 16
<211> 139<211> 139
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 16<400> 16
cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60cctcgcctac tgtgctgttt ctaaccatca tgcttttccc tgaatctctt gagtcttttt 60
ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctggg cagcctcctg 120ctgctgtgga ctgaaacttg atcctgagat tcacctctag tccctctggg cagcctcctg 120
gaatactcag ctgggatgg 139gaatactcag ctgggatgg 139
<210> 17<210> 17
<211> 117<211> 117
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 17<400> 17
aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60
atggactgga actgaggatt ttcaatttcc tctccaaccc aagacacttc tcactgg 117atggactgga actgaggatt ttcaatttcc tctccaaccc aagacacttc tcactgg 117
<210> 18<210> 18
<211> 117<211> 117
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 18<400> 18
aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60aattgcaatg gtgagaggtt gatggtaaaa tcaaacggaa cttgttattt tgtcattctg 60
atggactgga actgaggatt ttcaatttcc tttccaaccc aagacacttc tcactgg 117atggactgga actgaggatt ttcaatttcc tttccaaccc aagacacttc tcactgg 117
<210> 19<210> 19
<211> 114<211> 114
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 19<400> 19
gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60
caatggctcg tctatggtta gtctcacagc cacattctca gaactgctca aacc 114caatggctcg tctatggtta gtctcacagc cacattctca gaactgctca aacc 114
<210> 20<210> 20
<211> 114<211> 114
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 20<400> 20
gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60gaaatgcctt ctcaggtaat ggaaggttat ccaaatattt ttcgtaagta tttcaaatag 60
caatggctcg tctatggtta gtctcgcagc cacattctca gaactgctca aacc 114caatggctcg tctatggtta gtctcgcagc cacattctca gaactgctca aacc 114
<210> 21<210> 21
<211> 128<211> 128
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 21<400> 21
acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60
gaccagcttc tgtctggaag ttcgtcaaat tgcagttaag tccaagtatg ccacatagca 120gaccagcttc tgtctggaag ttcgtcaaat tgcagttaag tccaagtatg ccacatagca 120
gataaggg 128gataaggg 128
<210> 22<210> 22
<211> 128<211> 128
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 22<400> 22
acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60acccaaaaca ctggaggggc ctcttctcat tttcggtaga ctgcaagtgt tagccgtcgg 60
gaccagcttc tgtctggaag ttcgtcaaat tgcagttagg tccaagtatg ccacatagca 120gaccagcttc tgtctggaag ttcgtcaaat tgcagttagg tccaagtatg ccacatagca 120
gataaggg 128gataaggg 128
<210> 23<210> 23
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 23<400> 23
gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60
ccagaagcaa ctccagcaca cagagaggcg ctgatgtgcc tgtcaggtgc 110ccagaagcaa ctccagcaca cagagaggcg ctgatgtgcc tgtcaggtgc 110
<210> 24<210> 24
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 24<400> 24
gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60gcaccagaat ttaaacaacg ctgacaataa atatgcagtc gatgatgact tcccagagct 60
ccagaagcaa ctccagcaca cggagaggcg ctgatgtgcc tgtcaggtgc 110ccagaagcaa ctccagcaca cggagaggcg ctgatgtgcc tgtcaggtgc 110
<210> 25<210> 25
<211> 116<211> 116
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 25<400> 25
tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60
tataagagct gatttctgtg tctgcctctc acactagact tccacatcct tagtgc 116tataagagct gatttctgtg tctgcctctc acactagact tccacatcct tagtgc 116
<210> 26<210> 26
<211> 116<211> 116
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 26<400> 26
tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60tgactgtata ccccaggtgc acccttgggt catctctatc atagaactta tctcacagag 60
tataagagct gatttctgtg tctgcctgtc acactagact tccacatcct tagtgc 116tataagagct gatttctgtg tctgcctgtc acactagact tccacatcct tagtgc 116
<210> 27<210> 27
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 27<400> 27
tgtacgtggt caccagggga cgcctggcgc tgcgagggag gccccgagcc tcgtgccccc 60tgtacgtggt caccagggga cgcctggcgc tgcgaggggag gccccgagcc tcgtgccccc 60
gtgaagcttc agctcccctc cccggctgtc cttgaggctc ttctcacact 110gtgaagcttc agctcccctc cccggctgtc cttgaggctc ttctcacact 110
<210> 28<210> 28
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 28<400> 28
tgtacgtggt caccagggga cgcctggcgc tgcgagggag gccccgagcc tcgtgccccc 60tgtacgtggt caccagggga cgcctggcgc tgcgaggggag gccccgagcc tcgtgccccc 60
gtgaagcttc agctcccctc cctggctgtc cttgaggctc ttctcacact 110gtgaagcttc agctcccctc cctggctgtc cttgaggctc ttctcacact 110
<210> 29<210> 29
<211> 114<211> 114
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 29<400> 29
cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60
gtgtaccacc ttctctgtca ccaaccctgg cctcacaact ctctcctttg ccac 114gtgtaccacc ttctctgtca ccaaccctgg cctcacaact ctctcctttg ccac 114
<210> 30<210> 30
<211> 114<211> 114
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 30<400> 30
cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60cagtggaccc tgctgcacct ttcctcccct cccatcaacc tcttttgtgc ctccccctcc 60
gtgtaccacc ttctctgtca ccacccctgg cctcacaact ctctcctttg ccac 114gtgtaccacc ttctctgtca ccacccctgg cctcacaact ctctcctttg ccac 114
<210> 31<210> 31
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 31<400> 31
cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60
agcgcaggca gagaacccgc tggaagaatc ggcggaagtt gtcggagagg 110agcgcaggca gagaacccgc tggaagaatc ggcggaagtt gtcggagagg 110
<210> 32<210> 32
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 32<400> 32
cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60cagtggcata gtagtccagg ggctcctcct cagcacctcc agcaccttcc aggaggcagc 60
agcgcaggca gagaacccgc tggaaggatc ggcggaagtt gtcggagagg 110agcgcaggca gagaacccgc tggaaggatc ggcggaagtt gtcggagagg 110
<210> 33<210> 33
<211> 129<211> 129
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 33<400> 33
aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60
caatgaatgg tgtgatgtaa aagcttggga ggtgatttct gagggtaggt gctgggttta 120caatgaatgg tgtgatgtaa aagcttggga ggtgatttct gagggtaggt gctgggttta 120
atgggagga 129atgggagga 129
<210> 34<210> 34
<211> 129<211> 129
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 34<400> 34
aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60aggtctgggg gccgctgaat gccaagctgg gaatcttaaa tgttaaggaa caaggtcata 60
caatgaatgg tgtgatgtaa aagcttggga ggtgattttt gagggtaggt gctgggttta 120caatgaatgg tgtgatgtaa aagcttggga ggtgattttt gagggtaggt gctgggttta 120
atgggagga 129atgggagga 129
<210> 35<210> 35
<211> 107<211> 107
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 35<400> 35
acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctgggatgc aacatgagag 60acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctggggatgc aacatgagag 60
agcagcacac tgaggcttta tggattgccc tgccacaagt gaacagg 107agcagcacac tgaggcttta tggattgccc tgccacaagt gaacagg 107
<210> 36<210> 36
<211> 107<211> 107
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 36<400> 36
acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctgggatgc aacatgagag 60acggttctgt cctgtagggg agaaaagtcc tcgttgttcc tctggggatgc aacatgagag 60
agcagcacac tgaggcttta tgggttgccc tgccacaagt gaacagg 107agcagcacac tgaggcttta tgggttgccc tgccacaagt gaacagg 107
<210> 37<210> 37
<211> 127<211> 127
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 37<400> 37
gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60
tttctctcct tctgtctcac cttctttcgt gtgcctgtgc acacacacgt ttgggacaag 120tttctctcct tctgtctcac cttctttcgt gtgcctgtgc acacacacgt ttgggacaag 120
ggctgga 127ggctgga 127
<210> 38<210> 38
<211> 127<211> 127
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 38<400> 38
gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60gcgcagtcag atgggcgtgc tggcgtctgt cttctctctc tcctgctctc tggcttcatt 60
tttctctcct tctgtctcac cttctttcgt gtgcctgtgc atacacacgt ttgggacaag 120tttctctcct tctgtctcac cttctttcgt gtgcctgtgc atacacacgt ttgggacaag 120
ggctgga 127ggctgga 127
<210> 39<210> 39
<211> 130<211> 130
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 39<400> 39
gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60
gggacccagt gttcccagct tgcagctgag gagcccgagg ttgccgtcag atcagagccc 120gggacccagt gttcccagct tgcagctgag gagcccgagg ttgccgtcag atcagagccc 120
cagttgcccg 130cagttgcccg 130
<210> 40<210> 40
<211> 130<211> 130
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 40<400> 40
gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60gccggacctg cgaaatccca aaatgccaaa cattcccgcc tcacatgatc ccagagagag 60
gggacccagt gttcccagct tgcagctgag gagcccgagt ttgccgtcag atcagagccc 120gggacccagt gttcccagct tgcagctgag gagcccgagt ttgccgtcag atcagagccc 120
cagttgcccg 130cagttgcccg 130
<210> 41<210> 41
<211> 121<211> 121
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 41<400> 41
agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60
gagggctggg tgactcgtgg ctcagtcagc atcaagattc ctttcgtctt tcccctctgc 120gagggctggg tgactcgtgg ctcagtcagc atcaagattc ctttcgtctt tcccctctgc 120
c 121c 121
<210> 42<210> 42
<211> 121<211> 121
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 42<400> 42
agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60agcagcctcc ctcgactagc tcacactacg ataaggaaaa ttcatgagct ggtgtccaag 60
gagggctggg tgactcgtgg ctcagtcagc gtcaagattc ctttcgtctt tcccctctgc 120gagggctggg tgactcgtgg ctcagtcagc gtcaagattc ctttcgtctt tcccctctgc 120
c 121c 121
<210> 43<210> 43
<211> 138<211> 138
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 43<400> 43
tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60
tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttacccaaa 120tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttacccaaa 120
atcattagaa tggtgctt 138atcattagaa tggtgctt 138
<210> 44<210> 44
<211> 138<211> 138
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 44<400> 44
tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60tggcattgcc tgtaatatac atagccatgg ttttttatag gcaatttaag atgaatagct 60
tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttccccaaa 120tctaaactat agataagttt cattacccca ggaagctgaa ctatagctac tttccccaaa 120
atcattagaa tggtgctt 138atcattagaa tggtgctt 138
<210> 45<210> 45
<211> 136<211> 136
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 45<400> 45
atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60
gtggcttagt cactgccaat gtatttccat atgagggacg atgattacta aggaaatata 120gtggcttagt cactgccaat gtatttccat atgagggacg atgattacta aggaaatata 120
gaaacaacaa ctgatc 136gaaacaacaa ctgatc 136
<210> 46<210> 46
<211> 136<211> 136
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 46<400> 46
atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60atgaagcctt ccaccaactg cctgtatgac tcatctgggg acttctgctc tatactcaaa 60
gtggcttagt cactgccaat gtatttccat atgagggacg gtgattacta aggaaatata 120gtggcttagt cactgccaat gtatttccat atgagggacg gtgattacta aggaaatata 120
gaaacaacaa ctgatc 136gaaacaacaa ctgatc 136
<210> 47<210> 47
<211> 118<211> 118
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 47<400> 47
acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60
tgaaagaatg aaagatggac ggaacaaaat taggacctta attctttgtt cagttcag 118tgaaagaatg aaagatggac ggaacaaaat taggacctta attctttgtt cagttcag 118
<210> 48<210> 48
<211> 118<211> 118
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 48<400> 48
acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60acaacagaat caggtgattg gagaaaagat cacaggccta ggcacccaag gcttgaagga 60
tgaaagaatg aaagatggac ggaagaaaat taggacctta attctttgtt cagttcag 118tgaaagaatg aaagatggac ggaagaaaat taggacctta attctttgtt cagttcag 118
<210> 49<210> 49
<211> 150<211> 150
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 49<400> 49
ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60
ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120
gacagataaa cagagtctaa ttcccacccc 150gacagataaa cagagtctaa ttcccacccc 150
<210> 50<210> 50
<211> 150<211> 150
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 50<400> 50
ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60ttggggtaaa ttttcattgt catatgtgga atttaaatat accatcatct acaaagaatt 60
ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120ccacagagtt aaatatctta agttaaacac ttaaaataag tgtttgcgtg atattttgat 120
gatagataaa cagagtctaa ttcccacccc 150gatagataaa cagagtctaa ttcccacccc 150
<210> 51<210> 51
<211> 145<211> 145
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 51<400> 51
tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60
ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaact 120ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaact 120
cagagctgac aaacctcgat gttgc 145cagagctgac aaacctcgat gttgc 145
<210> 52<210> 52
<211> 145<211> 145
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 52<400> 52
tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60tgcaattcaa atcaggaagt atgaccaaaa gacagagatc ttttttggat gatccctagc 60
ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaatt 120ctagcaatgc ctggcagcca tgcaggtgca atgtcaacct taaataatgt attgcaaatt 120
cagagctgac aaacctcgat gttgc 145cagagctgac aaacctcgat gttgc 145
<210> 53<210> 53
<211> 124<211> 124
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 53<400> 53
ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60
tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca tttcggattc tccatgagca 120tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca tttcggattc tccatgagca 120
tggt 124tggt 124
<210> 54<210> 54
<211> 124<211> 124
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 54<400> 54
ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60ctgtgctctg cgaatagctg cagaagtaac ttggggaccc aaaataaagc agaatgctaa 60
tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca ttttggattc tccatgagca 120tgtcaagtcc tgagaaccaa gccctgggac tctggtgcca ttttggattc tccatgagca 120
tggt 124tggt 124
<210> 55<210> 55
<211> 118<211> 118
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 55<400> 55
tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60
gggaagcaaa ggagcacagg tagtccacag aataagacac aagaaacctc aagctgtg 118gggaagcaaa ggagcacagg tagtccacag aataagacac aagaaacctc aagctgtg 118
<210> 56<210> 56
<211> 118<211> 118
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 56<400> 56
tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60tttttccagc caactcaagg ccaaaaaaaa tttcttaata tagttattat gcgaggggag 60
gggaagcaaa ggagcacagg tagtccacag aataggacac aagaaacctc aagctgtg 118gggaagcaaa ggagcacagg tagtccacag aataggacac aagaaacctc aagctgtg 118
<210> 57<210> 57
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 57<400> 57
tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60
gggtgtgcac gcttgatggg cctctgagcc cctgttgcac aaaccagaaa 110gggtgtgcac gcttgatggg cctctgagcc cctgttgcac aaaccagaaa 110
<210> 58<210> 58
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 58<400> 58
tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60tcttctcgtc ccctaagcaa acaacatccg cttgcttctg tctgtgtaac cacagtgaat 60
gggtgtgcac gcttggtggg cctctgagcc cctgttgcac aaaccagaaa 110gggtgtgcac gcttggtggg cctctgagcc cctgttgcac aaaccagaaa 110
<210> 59<210> 59
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 59<400> 59
cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60
tgcatttgaa ttttcgagtt cccaggatgt gtttttgtgc tcatcgatgt 110tgcatttgaa ttttcgagtt cccaggatgt gtttttgtgc tcatcgatgt 110
<210> 60<210> 60
<211> 110<211> 110
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 60<400> 60
cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60cacatggggg cattaagaat cgcccaggga ggaggaggga gaacgcgtgc ttttcacatt 60
tgcatttgaa tttttgagtt cccaggatgt gtttttgtgc tcatcgatgt 110tgcatttgaa tttttgagtt cccaggatgt gtttttgtgc tcatcgatgt 110
<210> 61<210> 61
<211> 128<211> 128
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 61<400> 61
gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60
ggactttaca tttcaaacat ttcagacatg tatcacaaca cgaaggaata acagttccag 120ggactttaca tttcaaacat ttcagacatg tatcacaaca cgaaggaata acagttccag 120
ggatatct 128ggatatct 128
<210> 62<210> 62
<211> 128<211> 128
<212> DNA<212> DNA
<213> 智人(homo sapiens)<213> Homo sapiens
<400> 62<400> 62
gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60gggctctgag gtgtgtgaaa taaaaacaaa tgtccatgtc tgtcctttta tggcattttg 60
ggactttaca tttcaaacat ttcagacatg tatcacaaca cgagggaata acagttccag 120ggactttaca tttcaaacat ttcagacatg tatcacaaca cgagggaata acagttccag 120
ggatatct 128ggatatct 128
<210> 63<210> 63
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 63<400> 63
cacatgcaca gccagcaacc c 21cacatgcaca gccagcaacc c 21
<210> 64<210> 64
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 64<400> 64
ccccaaggtc ctgtgacctg agt 23ccccaaggtc ctgtgacctg agt 23
<210> 65<210> 65
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 65<400> 65
tgaggaagtg aggctcagag ggt 23tgaggaagtg aggctcagag ggt 23
<210> 66<210> 66
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 66<400> 66
tgccagtgcg agatgaaagt cttt 24tgccagtgcg agatgaaagt cttt 24
<210> 67<210> 67
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 67<400> 67
gtgccttcag aacctttgag atctgat 27gtgccttcag aacctttgag atctgat 27
<210> 68<210> 68
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 68<400> 68
tcccatccca ccagccaccc 20tcccatccca ccagccaccc 20
<210> 69<210> 69
<211> 25<211> 25
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 69<400> 69
aggtgtgtct ctcttttgtg agggg 25aggtgtgtct ctcttttgtg agggg 25
<210> 70<210> 70
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 70<400> 70
cctttgtccc acctccccac c 21cctttgtccc acctccccac c 21
<210> 71<210> 71
<211> 26<211> 26
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 71<400> 71
cctcgcctac tgtgctgttt ctaacc 26cctcgcctac tgtgctgttt ctaacc 26
<210> 72<210> 72
<211> 25<211> 25
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 72<400> 72
ccatcccagc tgagtattcc aggag 25ccatcccagc tgagtattcc aggag 25
<210> 73<210> 73
<211> 26<211> 26
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 73<400> 73
aattgcaatg gtgagaggtt gatggt 26aattgcaatg gtgagaggtt gatggt 26
<210> 74<210> 74
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 74<400> 74
ccagtgagaa gtgtcttggg ttgg 24ccagtgagaa gtgtcttggg ttgg 24
<210> 75<210> 75
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 75<400> 75
gaaatgcctt ctcaggtaat ggaaggt 27gaaatgcctt ctcaggtaat ggaaggt 27
<210> 76<210> 76
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 76<400> 76
ggtttgagca gttctgagaa tgtggct 27ggtttgagca gttctgagaa tgtggct 27
<210> 77<210> 77
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 77<400> 77
acccaaaaca ctggaggggc ct 22acccaaaaca ctggaggggc ct 22
<210> 78<210> 78
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 78<400> 78
cccttatctg ctatgtggca tacttgg 27cccttatctg ctatgtggca tacttgg 27
<210> 79<210> 79
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 79<400> 79
gcaccagaat ttaaacaacg ctgacaa 27gcaccagaatttaaacaacgctgacaa 27
<210> 80<210> 80
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 80<400> 80
gcacctgaca ggcacatcag cg 22gcacctgaca ggcacatcag cg 22
<210> 81<210> 81
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 81<400> 81
tgactgtata ccccaggtgc accc 24tgactgtata ccccaggtgc accc 24
<210> 82<210> 82
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 82<400> 82
gcactaagga tgtggaagtc tagtgtg 27gcactaagga tgtggaagtc tagtgtg 27
<210> 83<210> 83
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 83<400> 83
tgtacgtggt caccagggga cg 22tgtacgtggt caccagggga cg 22
<210> 84<210> 84
<211> 26<211> 26
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 84<400> 84
agtgtgagaa gagcctcaag gacagc 26agtgtgagaa gagcctcaag gacagc 26
<210> 85<210> 85
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 85<400> 85
cagtggaccc tgctgcacct t 21cagtggaccc tgctgcacct t 21
<210> 86<210> 86
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 86<400> 86
gtggcaaagg agagagttgt gagg 24gtggcaaagg agagagttgt gagg 24
<210> 87<210> 87
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 87<400> 87
cagtggcata gtagtccagg ggct 24cagtggcata gtagtccagg ggct 24
<210> 88<210> 88
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 88<400> 88
cctctccgac aacttccgcc g 21cctctccgac aacttccgcc g 21
<210> 89<210> 89
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 89<400> 89
aggtctgggg gccgctgaat 20aggtctgggg gccgctgaat 20
<210> 90<210> 90
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 90<400> 90
tcctcccatt aaacccagca cct 23tcctcccatt aaacccagca cct 23
<210> 91<210> 91
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 91<400> 91
acggttctgt cctgtagggg aga 23acggttctgt cctgtagggg aga 23
<210> 92<210> 92
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 92<400> 92
cctgttcact tgtggcaggg ca 22cctgttcact tgtggcaggg ca 22
<210> 93<210> 93
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 93<400> 93
gcgcagtcag atgggcgtgc 20gcgcagtcag atgggcgtgc 20
<210> 94<210> 94
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 94<400> 94
tccagccctt gtcccaaacg tgt 23tccagccctt gtcccaaacg tgt 23
<210> 95<210> 95
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 95<400> 95
gccggacctg cgaaatccca a 21gccggacctg cgaaatccca a 21
<210> 96<210> 96
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 96<400> 96
cgggcaactg gggctctgat c 21cgggcaactg gggctctgat c 21
<210> 97<210> 97
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 97<400> 97
agcagcctcc ctcgactagc t 21agcagcctcc ctcgactagc t 21
<210> 98<210> 98
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 98<400> 98
ggcagagggg aaagacgaaa gga 23ggcagagggg aaagacgaaa gga 23
<210> 99<210> 99
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 99<400> 99
tggcattgcc tgtaatatac atag 24tggcattgcc tgtaatatac atag 24
<210> 100<210> 100
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 100<400> 100
aagcaccatt ctaatgattt tgg 23aagcaccatt ctaatgattt tgg 23
<210> 101<210> 101
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 101<400> 101
atgaagcctt ccaccaactg 20atgaagcctt ccaccaactg 20
<210> 102<210> 102
<211> 27<211> 27
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 102<400> 102
gatcagttgt tgtttctata tttcctt 27gatcagttgt tgtttctata tttcctt 27
<210> 103<210> 103
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 103<400> 103
acaacagaat caggtgattg ga 22acaacagaat caggtgattg ga 22
<210> 104<210> 104
<211> 25<211> 25
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 104<400> 104
ctgaactgaa caaagaatta aggtc 25ctgaactgaa caaagaatta aggtc 25
<210> 105<210> 105
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 105<400> 105
ttggggtaaa ttttcattgt ca 22ttggggtaaa ttttcattgt ca 22
<210> 106<210> 106
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 106<400> 106
ggggtgggaa ttagactctg 20ggggtgggaa ttagactctg 20
<210> 107<210> 107
<211> 23<211> 23
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 107<400> 107
tgcaattcaa atcaggaagt atg 23tgcaattcaa atcaggaagt atg 23
<210> 108<210> 108
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 108<400> 108
gcaacatcga ggtttgtcag 20gcaacatcga ggtttgtcag 20
<210> 109<210> 109
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 109<400> 109
ctgtgctctg cgaatagctg 20ctgtgctctg cgaatagctg 20
<210> 110<210> 110
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 110<400> 110
accatgctca tggagaatcc 20accatgctca tggagaatcc 20
<210> 111<210> 111
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 111<400> 111
tttttccagc caactcaagg 20tttttccagc caactcaagg 20
<210> 112<210> 112
<211> 21<211> 21
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 112<400> 112
cacagcttga ggtttcttgt g 21cacagcttga ggtttcttgt g 21
<210> 113<210> 113
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 113<400> 113
tcttctcgtc ccctaagcaa 20tcttctcgtc ccctaagcaa 20
<210> 114<210> 114
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 114<400> 114
tttctggttt gtgcaacagg 20tttctggttt gtgcaacagg 20
<210> 115<210> 115
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 115<400> 115
cacatggggg cattaagaat 20cacatggggg cattaagaat 20
<210> 116<210> 116
<211> 22<211> 22
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 116<400> 116
acatcgatga gcacaaaaac ac 22acatcgatga gcacaaaaac ac 22
<210> 117<210> 117
<211> 20<211> 20
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 117<400> 117
gggctctgag gtgtgtgaaa 20gggctctgag gtgtgtgaaa 20
<210> 118<210> 118
<211> 24<211> 24
<212> DNA<212> DNA
<213> 人工序列(artificial sequence)<213> Artificial sequence
<400> 118<400> 118
agatatccct ggaactgtta ttcc 24agatatccct ggaactgtta ttcc 24
Claims (31)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/445,778 | 2012-04-12 | ||
| US13/482,964 | 2012-05-29 | ||
| US13/555,037 | 2012-07-20 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1260281A1 HK1260281A1 (en) | 2019-12-13 |
| HK1260281B true HK1260281B (en) | 2022-08-19 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11697846B2 (en) | Detecting and classifying copy number variation | |
| US11875899B2 (en) | Analyzing copy number variation in the detection of cancer | |
| CN103374518B (en) | Detection and Classification of Copy Number Variations | |
| US20200219588A1 (en) | Detecting and classifying copy number variation | |
| US9411937B2 (en) | Detecting and classifying copy number variation | |
| EP2877594B1 (en) | Detecting and classifying copy number variation in a fetal genome | |
| US9323888B2 (en) | Detecting and classifying copy number variation | |
| EP3230469B1 (en) | Using cell-free dna fragment size to determine copy number variations | |
| HK1244844A1 (en) | Using cell-free dna fragment size to determine copy number variations | |
| AU2019200162B2 (en) | Detecting and classifying copy number variation | |
| AU2019200163B2 (en) | Detecting and classifying copy number variation | |
| US20240203601A1 (en) | Analyzing copy number variation in the detection of cancer | |
| HK1260281B (en) | Detecting and classifying copy number variation | |
| HK1260281A1 (en) | Detecting and classifying copy number variation | |
| HK1187363A (en) | Detecting and classifying copy number variation |