HK40104030A

HK40104030A - Methods and compositions for analyzing cellular components

Info

Publication number: HK40104030A
Application number: HK42024091793.0A
Authority: HK
Inventors: K·L·冈德森; F·J·斯蒂莫斯; J·S·费希尔; R·里加蒂
Original assignee: 亿明达股份有限公司
Priority date: 2015-02-10
Filing date: 2024-05-22
Publication date: 2024-07-19

Description

Methods and compositions for analyzing cellular components

本申请是申请日：2016年2月10日，申请号：201680017121.4，发明名称“用于分析细胞组分的方法和组合物”的中国专利申请的分案申请。This application is a divisional application of Chinese patent application No. 201680017121.4, filed on February 10, 2016, entitled "Method and Composition for Analyzing Cell Components".

技术领域Technical Field

本申请的实施方案涉及用于分析细胞组分的方法和组合物。在一些实施方案中，本申请涉及用于分析单细胞(single cell)的组分的方法和组合物。在一些实施方案中，本申请涉及用于鉴定单细胞类型的方法和组合物。在一些实施方案中，所述方法和组合物涉及测序核酸。提供的方法和组合物的一些实施方案在得出此类单细胞的复合状态中是有用的。Embodiments of this application relate to methods and compositions for analyzing cellular components. In some embodiments, this application relates to methods and compositions for analyzing components of a single cell. In some embodiments, this application relates to methods and compositions for identifying single cell types. In some embodiments, the methods and compositions involve sequencing nucleic acids. Some embodiments of the provided methods and compositions are useful in determining the complex state of such single cells.

背景技术Background Technology

已经使用存在于生物样品中的特定核酸序列的检测，例如，作为用于鉴定和分类微生物、诊断感染性疾病、检测和表征遗传异常、鉴定与癌症相关的遗传变化、研究对疾病的遗传易感性、以及测量对各种类型治疗的反应的方法。用于检测生物样品中特定核酸序列的常见技术是核酸测序。Detection of specific nucleic acid sequences present in biological samples has been used, for example, as a method for identifying and classifying microorganisms, diagnosing infectious diseases, detecting and characterizing genetic abnormalities, identifying genetic changes associated with cancer, studying genetic susceptibility to diseases, and measuring responses to various types of treatments. A common technique for detecting specific nucleic acid sequences in biological samples is nucleic acid sequencing.

核酸测序方法已经从由Maxam和Gilbert使用的化学降解法和由Sanger使用的链延长法显著演化。今天，几种测序方法在使用，其允许用于均在单次测序运行中的核酸的并行处理。因此，从单次测序运行产生的信息可能是巨大的。Nucleic acid sequencing methods have evolved significantly from chemical degradation methods used by Maxam and Gilbert and chain extension methods used by Sanger. Today, several sequencing methods are in use that allow for parallel processing of nucleic acids, all within a single sequencing run. Therefore, the information generated from a single sequencing run can be enormous.

发明内容Summary of the Invention

本发明包括以下内容：This invention includes the following:

1.分析单细胞的至少两个或更多个分析物的方法，所述方法包括：1. A method for analyzing at least two or more analytes in a single cell, the method comprising:

(a)提供多个邻近保留元件(contiguity preserving elements(CE))，其中每个CE包含单细胞；(a) Provide multiple contiguity preserving elements (CEs), each CE containing a single cell;

(b)在所述CE内裂解所述单细胞，其中在所述单细胞内的所述分析物在所述CE内释放；(b) Lysing the single cell within the CE, wherein the analyte within the single cell is released within the CE;

(c)向每个CE的所述单细胞内的第一分析物提供第一报告物部分；(c) Provide a first reporter portion of the first analyte to the single cell of each CE;

(d)向每个CE的所述单细胞内的第二分析物提供第二报告物部分；(d) Provide a second reporter portion of the second analyte to each CE within the single cell;

(e)修饰所述分析物，使得所述CE的所述第一和第二分析物的至少一些分别包含所述第一和第二报告物部分；(e) Modify the analytes such that at least some of the first and second analytes of the CE respectively include the first and second reporter portion;

(f)组合包含所述分析物的CE，所述分析物包含所述报告物部分；(f) A combination of a CE containing the analyte, wherein the analyte contains the reporter portion;

(g)将包含所述第一和第二分析物的CE分隔(compartmentalizing)成多个隔室，所述第一和第二分析物分别包含所述第一和第二报告物部分；(g) The CE containing the first and second analytes is compartmentalized into multiple compartments, wherein the first and second analytes each contain the first and second reporter portion;

(h)向包含每个CE的所述第一报告物部分的所述第一分析物提供第三报告物部分；(h) Provide a third report portion to the first analyte containing the first report portion of each CE;

(i)向包含每个CE的所述第二报告物部分的所述第二分析物提供第四报告物部分；(i) Provide a fourth report portion to the second analyte, which contains the second report portion of each CE;

(j)进一步修饰所述分析物，使得至少一些第一分析物包含所述第一和第三报告物部分以及至少一些第二分析物包含所述第二和第四报告物部分；(j) The analytes are further modified such that at least some of the first analytes include portions of the first and third reporting substances and at least some of the second analytes include portions of the second and fourth reporting substances;

(j)分析每个隔室的包含所述报告物部分的所述分析物，其中此类分析检测单细胞的所述分析物。(j) Analyze the analyte in each compartment containing the report portion, wherein such analysis detects the analyte in a single cell.

2.项1的方法，其中所述第一和第二报告物部分鉴定所述分析物的来源。2. The method of item 1, wherein the first and second report portions identify the source of the analyte.

3.项1的方法，其中所述报告物部分的组合鉴定所述分析物的来源。3. The method of item 1, wherein the combination of the report portion identifies the source of the analyte.

4.项1-3中任一项的方法，其中同时进行所述分析物的检测。4. The method of any one of items 1-3, wherein the analyte is detected simultaneously.

5.项1-4中任一项的方法，所述第一分析物是基因组DNA并且所述第二分析物是cDNA或RNA。5. The method of any one of items 1-4, wherein the first analyte is genomic DNA and the second analyte is cDNA or RNA.

6.项5的方法，其中修饰基因组DNA、cDNA或RNA的至少一些以包含第一和第二报告物部分包括在条件下将基因组DNA、cDNA或RNA与多个转座体(transposome)相接触，每个转座体包含转座酶和包含第一报告物部分或第二报告物部分的转座子序列，使得将所述转座子序列的至少一些插入到所述基因组DNA、cDNA或RNA中。6. The method of claim 5, wherein modifying at least some of the genomic DNA, cDNA, or RNA to include first and second reporter portions includes contacting the genomic DNA, cDNA, or RNA with a plurality of transposons under conditions, each transposome comprising a transposase and a transposon sequence comprising a first reporter portion or a second reporter portion, such that at least some of the transposon sequences are inserted into the genomic DNA, cDNA, or RNA.

7.项6的方法，其中步骤(g)还包括从所述基因组DNA、cDNA或RNA除去转座酶。7. The method of item 6, wherein step (g) further includes removing the transposase from the genomic DNA, cDNA or RNA.

8.项7的方法，其中在步骤(e)修饰所述分析物后除去所述转座酶。8. The method of item 7, wherein the transposase is removed after the analyte is modified in step (e).

9.项8的方法，其中除去转座酶包括选自下组的方法：添加去污剂、改变温度、改变pH、添加蛋白质酶、添加蛋白质伴侣(chaperone)、改变盐浓度，以及添加链置换聚合酶。9. The method of item 8, wherein removing transposase includes methods selected from the group consisting of: adding detergent, changing temperature, changing pH, adding protease, adding chaperone, changing salt concentration, and adding chain displacement polymerase.

10.项6的方法，其中步骤(e)包括将目标核酸与多个转座体接触，每个转座子含有包含第一报告物部分的第一转座子序列、与所述第一转座子序列不连续(noncontiguous)的第二转座子序列，以及与所述第一转座子序列和所述第二转座子序列相关的转座酶。10. The method of claim 6, wherein step (e) includes contacting a target nucleic acid with a plurality of transposons, each transposon containing a first transposon sequence comprising a first reporter portion, a second transposon sequence that is noncontiguous to the first transposon sequence, and a transposon enzyme associated with the first transposon sequence and the second transposon sequence.

11.项6-10中任一项的方法，其中所述第一转座子序列包含第一引物位点并且所述第二转座子序列包含第二引物位点。11. The method of any one of items 6-10, wherein the first transposon sequence comprises a first primer site and the second transposon sequence comprises a second primer site.

12.项13的方法，其中所述第一引物位点还包含第一条形码(barcode)并且所述第二引物位点还包含第二条形码。12. The method of item 13, wherein the first primer site further comprises a first barcode and the second primer site further comprises a second barcode.

13.项1-12中任一项的方法，其中所述第一、第二、第三，或第四报告物部分包含条形码。13. The method of any one of items 1-12, wherein the first, second, third, or fourth report portion comprises a barcode.

14.项1的方法，其中一个分析物是蛋白质，并且任选地其中用核酸报告物部分标记所述蛋白质，并且进一步任选地其中所述核酸报告物部分包含组合衍生的条形码集。14. The method of claim 1, wherein one of the analytes is a protein, and optionally wherein the protein is labeled with a nucleic acid reporter portion, and further optionally wherein the nucleic acid reporter portion comprises a combination-derived set of barcodes.

15.项14的方法，其中所述蛋白质来自单细胞。15. The method of item 14, wherein the protein is derived from a single cell.

16.项1-15中任一项的方法，其中至少再重复一次步骤(c)-(j)。16. The method of any one of items 1-15, wherein steps (c)-(j) are repeated at least once more.

17.项16的方法，其中每个额外步骤中的额外报告物部分与所述第一、第二、第三，和第四报告物部分不同。17. The method of item 16, wherein the additional report portion in each additional step is different from the first, second, third, and fourth report portions.

18.分析分析物的方法，所述方法包括：18. A method for analyzing an analyte, the method comprising:

(a)提供邻近保留元件(CE)，其中每个CE包含至少一个分析物；(a) Provide adjacent retention elements (CEs), wherein each CE contains at least one analyte;

(b)将包含所述至少一个分析物的所述CE分隔成多个第一隔室；(b) Dividing the CE containing the at least one analyte into a plurality of first compartments;

(c)在所述第一隔室中将第一组报告物部分提供至每个CE的分析物，其中所述第一组报告物部分鉴定CE；(c) In the first compartment, a first set of report portions is provided to the analyte of each CE, wherein the first set of report portions identifies the CE;

(d)修饰所述分析物，使得所述CE的至少一些分析物包含所述第一报告物部分；(d) Modify the analytes such that at least some of the analytes of the CE include portions of the first reportant;

(e)组合包含所述分析物的所述CE，所述分析物包含所述第一报告物部分；(e) Combining the CE containing the analyte, the analyte comprising the first reporter portion;

(f)将包含所述分析物的所述CE分隔成多个第二隔室，所述分析物包含所述第一报告物部分；(f) Dividing the CE containing the analyte into a plurality of second compartments, the analyte containing the first report portion;

(g)将第二组报告物部分提供至所述分析物，所述分析物包含每个CE的所述第一报告物部分；(g) Provide a second set of report portions to the analyte, the analyte comprising the first report portion of each CE;

(h)进一步修饰所述分析物，使得至少一些分析物包含所述第二报告物部分；(h) Further modify the analytes such that at least some of the analytes contain portions of the second reporter;

(i)分析所述分析物，所述分析物包含每个第二隔室的所述第一和第二报告物部分。(i) Analyze the analyte, which comprises the first and second report portions of each second compartment.

19.项18的方法，其中所述分析物包含核酸。19. The method of item 18, wherein the analyte comprises a nucleic acid.

20.项19的方法，其中所述核酸包含DNA。20. The method of item 19, wherein the nucleic acid comprises DNA.

21.项20的方法，其中所述DNA包含基因组DNA。21. The method of item 20, wherein the DNA comprises genomic DNA.

22.项19的方法，其中所述核酸包含RNA或cDNA。22. The method of item 19, wherein the nucleic acid comprises RNA or cDNA.

23.项18的方法，其中所述分析物选自下组：组织切片、细胞器、脂质、碳水化合物，和细胞代谢物。23. The method of item 18, wherein the analyte is selected from the group consisting of tissue sections, organelles, lipids, carbohydrates, and cellular metabolites.

24.项19的方法，其中步骤(d)包括在条件下将核酸与多个转座体相接触，每个转座体包含转座酶和包含第一报告物部分的转座子序列，使得将转座子序列的至少一些插入到目标核酸。24. The method of item 19, wherein step (d) includes contacting a nucleic acid with a plurality of transposons under conditions, each transposon containing a transposase and a transposon sequence containing a first reporter portion, such that at least some of the transposon sequences are inserted into the target nucleic acid.

25.项19的方法，其中步骤(d)包括将目标核酸与多个转座体相接触，每个转座体含有包含第一报告物部分的第一转座子序列，与所述第一转座子序列不连续的第二转座子序列，以及与第一转座子序列和第二转座子序列相关的转座酶。25. The method of item 19, wherein step (d) includes contacting a target nucleic acid with a plurality of transposons, each transposon containing a first transposon sequence comprising a first reporter portion, a second transposon sequence discontinuous from the first transposon sequence, and a transposon enzyme associated with the first and second transposon sequences.

26.项19的方法，其中步骤(f)包括从分隔的第一索引化(indexed)模板核酸中除去转座酶。26. The method of item 19, wherein step (f) includes removing transposase from the separated first indexed template nucleic acid.

27.项26的方法，其中在步骤(d)之后除去所述转座酶。27. The method of item 26, wherein the transposase is removed after step (d).

28.项26的方法，其中在步骤(i)之前除去所述转座酶。28. The method of item 26, wherein the transposase is removed prior to step (i).

29.项26的方法，其中除去转座酶包括选自下组的方法：添加去污剂、改变温度、改变pH、添加蛋白质酶、添加蛋白质伴侣、改变盐浓度，以及添加链置换聚合酶。29. The method of item 26, wherein removing transposases includes methods selected from the group consisting of: adding detergent, changing temperature, changing pH, adding protease, adding protein chaperone, changing salt concentration, and adding chain displacement polymerase.

30.项24-29中任一项的方法，其中所述第一转座子序列包含第一引物位点并且所述第二转座子序列包含第二引物位点。30. The method of any one of items 24-29, wherein the first transposon sequence comprises a first primer site and the second transposon sequence comprises a second primer site.

31.项30的方法，其中所述第一引物位点还包含第一条形码条形码并且所述第二引物位点还包含第二条形码。31. The method of item 30, wherein the first primer site further comprises a first barcode and the second primer site further comprises a second barcode.

32.项18的方法，其中所述第一报告物部分包含引物，并且其中步骤(d)包括用至少一个引物扩增所述核酸。32. The method of item 18, wherein the first reporter portion comprises primers, and wherein step (d) comprises amplifying the nucleic acid with at least one primer.

33.项18的方法，其中所述第一报告物部分包含引物，并且其中步骤(d)包括用至少一个引物连接所述核酸。33. The method of item 18, wherein the first reporter portion comprises primers, and wherein step (d) comprises ligating the nucleic acid with at least one primer.

34.项18-33中任一项的方法，其中提供至每个第一隔室的分析物的所述第一报告物部分是不同的。34. The method of any one of items 18-33, wherein the first report portion of the analyte provided to each first compartment is different.

35.项18-34中任一项的方法，其中所述分析物是单细胞的组分，其中所述第一和/或第二报告物部分中的每个对所述第一细胞是独特的，并且其中所述第一和/或第二报告物部分鉴定单细胞。35. The method of any one of claims 18-34, wherein the analyte is a component of a single cell, wherein each of the first and/or second reporter portions is unique to the first cell, and wherein the first and/or second reporter portions identify a single cell.

36.项18-35中任一项的方法，其还包括在步骤(c)中将质粒提供至CE，其中在步骤(d)-(h)中修饰所述质粒以包含所述第一和/或第二报告物部分，其中包含所述第一或所述所述报告物部分的所述质粒鉴定所述CE。36. The method of any one of claims 18-35, further comprising providing a plasmid to the CE in step (c), wherein the plasmid is modified in steps (d)-(h) to include the first and/or second reporter portion, wherein the plasmid including the first or the reporter portion identifies the CE.

37.项18的方法，其中所述分析物是蛋白质。37. The method of item 18, wherein the analyte is a protein.

38.项37的方法，其中所述蛋白质来自单细胞。38. The method of item 37, wherein the protein is derived from a single cell.

39.项18-37中任一项的方法，其中至少再重复一次步骤(c)-(h)。39. The method of any one of items 18-37, wherein steps (c)-(h) are repeated at least once more.

40.项39的方法，其中每个额外步骤中的报告物部分的额外组与报告物部分的所述第一和第二组是不同的。40. The method of item 39, wherein the additional group of the report portion in each additional step is different from the first and second groups of the report portion.

41.项18的方法，其中所述分析物是mRNA或cDNA。41. The method of item 18, wherein the analyte is mRNA or cDNA.

42.项41的方法，其中所述mRNA或cDNA来自单细胞。42. The method of item 41, wherein the mRNA or cDNA is derived from a single cell.

43.项41-42中任一项的方法，其中所述CE包含能够固定所述mRNA或cDNA的固体支持物。43. The method of any one of claims 41-42, wherein the CE comprises a solid support capable of immobilizing the mRNA or cDNA.

44.项43的方法，其中所述固体支持物包含固定在所述固体支持物上的寡聚物(dT)探针。44. The method of item 43, wherein the solid support comprises an oligomer (dT) probe fixed on the solid support.

45.项41-44中任一项的方法，其中所述第一组报告物部分包含第一条形码序列。45. The method of any one of items 41-44, wherein the first set of report portions comprises a first barcode sequence.

46.项41-45中任一项的方法，其中所述第二组报告物部分包含第二条形码序列。46. The method of any one of items 41-45, wherein the second set of report portion comprises a second barcode sequence.

47.项41-44中任一项的方法，其中步骤(d)包括在条件下将核酸与多个转座体相接触，每个转座体包含转座酶和包含第一报告物部分的转座子序列，使得将转座子序列的至少一些插入到目标核酸。47. The method of any one of claims 41-44, wherein step (d) comprises contacting a nucleic acid with a plurality of transposons under conditions, each transposon containing a transposase and a transposon sequence containing a first reporter portion, such that at least some of the transposon sequences are inserted into the target nucleic acid.

48.项41-44中任一项的方法，其中步骤(d)包括将目标核酸与多个转座体接触，每个转座子含有包含第一报告物部分的第一转座子序列、与所述第一转座子序列不连续的第二转座子序列，以及与所述第一转座子序列和所述第二转座子序列相关的转座酶。48. The method of any one of claims 41-44, wherein step (d) includes contacting a target nucleic acid with a plurality of transposons, each transposon containing a first transposon sequence including a first reporter portion, a second transposon sequence discontinuous from the first transposon sequence, and a transposon enzyme associated with the first transposon sequence and the second transposon sequence.

49.项47-48中任一项的方法，其中所述步骤(f)包括从所述分隔的第一索引化模板核酸除去所述转座酶。49. The method of any one of items 47-48, wherein step (f) comprises removing the transposase from the separated first indexed template nucleic acid.

50.项49的方法，其中在步骤(d)之后除去所述转座酶。50. The method of item 49, wherein the transposase is removed after step (d).

51.项49的方法，其中在步骤(i)之前除去所述转座酶。51. The method of item 49, wherein the transposase is removed prior to step (i).

52.项49的方法，其中除去转座酶包括选自下组的方法：添加去污剂、改变温度、改变pH、添加蛋白质酶、添加蛋白质伴侣、改变盐浓度，以及添加链置换聚合酶。52. The method of item 49, wherein removing transposase includes methods selected from the group consisting of: adding detergent, changing temperature, changing pH, adding protease, adding protein chaperone, changing salt concentration, and adding chain displacement polymerase.

53.项41-52中任一项的方法，其中所述第一和第二报告物部分鉴定所述分析物的来源。53. The method of any one of items 41-52, wherein the first and second report portions identify the source of the analyte.

54.项41-52中任一项的方法，其中所述报告物部分的组合鉴定所述分析物的来源。54. The method of any one of items 41-52, wherein the combination of the report portion identifies the source of the analyte.

55.项41-54的方法，其中至少再重复一次步骤(d)-(i)。55. The method of items 41-54, wherein step (d)-(i) is repeated at least once more.

56.项55的方法，其中每个额外步骤中的报告物部分的额外组与报告物部分的所述第一和第二组是不同的。56. The method of item 55, wherein the additional group of the report portion in each additional step is different from the first and second groups of the report portion.

57.项1-56中任一项的方法，其中所述单细胞与疾病相关。57. The method of any one of items 1-56, wherein the single cell is associated with the disease.

58.项57的方法，其中所述单细胞与癌症相关。58. The method of item 57, wherein the single cell is associated with cancer.

59.项57的方法，其中所述单细胞与遗传疾病相关。59. The method of item 57, wherein the single cell is associated with a genetic disease.

60.从单细胞中的目标核酸获得序列信息的方法，所述方法包括：60. A method for obtaining sequence information from a target nucleic acid in a single cell, the method comprising:

a)将细胞悬浮液导入第一液滴执行器(droplet actuator)；a) Introduce the cell suspension into the first droplet actuator;

b)使用电极介导的液滴操作以分散包含细胞悬浮液的液滴阵列，每个液滴包含单细胞；b) Use electrode-mediated droplet manipulation to disperse an array of droplets containing a cell suspension, each droplet containing a single cell;

c)将细胞裂解缓冲液导入第二液滴执行器；c) Introduce cell lysis buffer into the second droplet actuator;

c)使用电极介导的液滴操作以分配包含细胞裂解缓冲液的液滴阵列；c) Use electrode-mediated droplet manipulation to dispense an array of droplets containing cell lysis buffer;

d)使用电极介导的液滴操作以组合细胞裂解缓冲液液滴的阵列和细胞悬浮液液滴的阵列以生成细胞裂解物液滴的阵列；d) Use electrode-mediated droplet manipulation to combine arrays of cell lysis buffer droplets and arrays of cell suspension droplets to generate an array of cell lysate droplets;

e)将包含第一报告物部分和酶的试剂溶液导入第三液滴执行器，其中至少一个酶能够将所述第一报告物部分导入所述目标核酸，并且其中所述第一报告物部分鉴定所述单细胞；e) Introducing a reagent solution containing a first reporter portion and an enzyme into a third droplet actuator, wherein at least one enzyme is capable of introducing the first reporter portion into the target nucleic acid, and wherein the first reporter portion identifies the single cell;

f)使用电极介导的液滴操作以分配包含试剂溶液的液滴阵列；f) Use electrode-mediated droplet manipulation to dispense droplet arrays containing reagent solutions;

g)使用电极介导的液滴操作以组合细胞裂解物液滴的阵列与试剂液滴以生成第一报告物部分液滴的阵列，其中将所述报告物部分导入细胞裂解物液滴的核酸；g) Using electrode-mediated droplet manipulation to combine an array of cell lysate droplets with reagent droplets to generate an array of first reporter fraction droplets, wherein the reporter fraction is introduced into the nucleic acid of the cell lysate droplets;

h)使用电极介导的液滴操作以分配包含洗涤溶液的液滴阵列；h) Use electrode-mediated droplet manipulation to dispense an array of droplets containing a washing solution;

i)使用电极介导的液滴操作以组合第一报告物部分液滴的阵列与洗涤溶液液滴，其中在反应后从所述核酸除去所述酶以生成包含目标核酸的液滴阵列，其中所述目标核酸包含第一报告物部分；i) Using electrode-mediated droplet manipulation to combine an array of first reporter portion droplets with washing solution droplets, wherein the enzyme is removed from the nucleic acid after the reaction to generate an array of droplets containing a target nucleic acid, wherein the target nucleic acid contains a first reporter portion;

(j)重复步骤(e)-(i)至少一次，其中后续的试剂溶液包含与所述第一报告物部分不同的报告物部分以生成包含目标核酸的液滴阵列，所述目标核酸包含多个报告物部分；(j) Repeat steps (e)-(i) at least once, wherein the subsequent reagent solution contains a reporter portion different from the first reporter portion to generate a droplet array containing a target nucleic acid comprising a plurality of reporter portions;

(k)从所述包含一个或更多个报告物部分的目标核酸获得序列信息并将所述序列信息与所述单细胞相关联。(k) Obtain sequence information from the target nucleic acid containing one or more reporter portions and associate the sequence information with the single cell.

61.项60的方法，其还包括：61. The method of item 60, which also includes:

(l)向液滴执行器导入包含对蛋白质特异性的标记物的蛋白质标记试剂溶液；(l) Introduce a protein labeling reagent solution containing a protein-specific marker into the droplet actuator;

(m)使用电极介导的液滴操作以分配包含蛋白质标记试剂溶液的液滴阵列；(m) Using electrode-mediated droplet manipulation to dispense an array of droplets containing a protein-labeled reagent solution;

(n)使用电极介导的液滴操作以将蛋白质标记试剂溶液阵列组合至包含目标核酸的液滴阵列，所述目标核酸包含多个报告物部分，其中来自所述单细胞的蛋白质的至少一部分包含对所述单细胞独特的标记物，并且鉴定包含所述标记物的蛋白质的至少一部分。(n) Using electrode-mediated droplet manipulation to combine an array of protein labeling reagent solutions into a droplet array containing a target nucleic acid comprising multiple reporter portions, wherein at least a portion of the protein from the single cell contains a marker unique to the single cell, and identifying at least a portion of the protein containing the marker.

62.项61的方法，其中从目标核酸获得序列信息的所述方法与鉴定包含所述标记物的蛋白质的至少一部分同时进行。62. The method of claim 61, wherein the method of obtaining sequence information from the target nucleic acid is performed simultaneously with the identification of at least a portion of a protein containing the marker.

63.项60-62中任一项的方法，其中所述目标核酸是RNA。63. The method of any one of items 60-62, wherein the target nucleic acid is RNA.

64.项60-62中任一项的方法，其中所述目标核酸是DNA。64. The method of any one of items 60-62, wherein the target nucleic acid is DNA.

65.项64的方法，其中所述目标核酸是基因组DNA。65. The method of item 64, wherein the target nucleic acid is genomic DNA.

66.项60-62中任一项的方法，其中所述目标核酸是RNA和DNA两者。66. The method of any one of items 60-62, wherein the target nucleic acid is both RNA and DNA.

67.项60-66中任一项的方法，其中通过连接将所述第一报告物部分导入所述目标核酸。67. The method of any one of claims 60-66, wherein the first reporter portion is introduced into the target nucleic acid via a linker.

68.项60-67中任一项的方法，其还包括：68. The method of any one of items 60-67, which further includes:

(i)将包含对细胞代谢物特异性的报告物部分的试剂溶液导入液滴执行器；(i) Introduce a reagent solution containing a reporter portion specific to cell metabolites into the droplet actuator;

(ii)使用电极介导的液滴操作以分配包含对细胞代谢物特异性的报告物部分的试剂液滴阵列；(ii) Using electrode-mediated droplet manipulation to dispense an array of reagent droplets containing a reporter portion specific to cell metabolites;

(iii)使用电极介导的液滴操作以将所述试剂液滴阵列组合至包含目标核酸的液滴阵列，所述目标核酸包含多个报告物部分，其中细胞代谢物的至少一部分来自所述单细胞，所述单细胞包含对所述单细胞独特的报告物部分，并鉴定所述细胞代谢物的至少一部分。(iii) Using electrode-mediated droplet manipulation to combine the reagent droplet array into a droplet array containing a target nucleic acid comprising multiple reporter portions, wherein at least a portion of the cell metabolites are derived from the single cell, the single cell containing a reporter portion unique to the single cell, and identifying at least a portion of the cell metabolites.

69.从细胞核酸获得序列信息的方法，所述方法包括：69. A method for obtaining sequence information from cellular nucleic acids, the method comprising:

(a)向固体支持物导入一个或更多个细胞，其中所述固体支持物包含细胞附接部分，并且其中在此类导入后，将所述一个或更多个细胞固定到所述固体支持物；(a) Introducing one or more cells into a solid support, wherein the solid support includes a cell attachment portion, and wherein, after such introduction, the one or more cells are fixed to the solid support;

(b)裂解固定在所述固体支持物上的细胞以生成细胞裂解物；(b) Lysing the cells immobilized on the solid support to generate cell lysates;

(c)向所述细胞裂解物的核酸提供报告物部分并修饰所述核酸以生成包含第一报告物部分的核酸；(c) Provide a reporter portion to the nucleic acid of the cell lysate and modify the nucleic acid to generate a nucleic acid containing the first reporter portion;

(d)从包含所述第一报告物部分的所述核酸获得序列信息。(d) Obtain sequence information from the nucleic acid containing the first reporter portion.

70.项69的方法，其中所述核酸是基因组DNA。70. The method of item 69, wherein the nucleic acid is genomic DNA.

71.项69的方法，其中所述核酸是RNA。71. The method of item 69, wherein the nucleic acid is RNA.

72.项71的方法，还包括将RNA逆转录为cDNA。72. The method in item 71 also includes reverse transcription of RNA into cDNA.

73.项69-72中任一项的方法，还包括向步骤(c)的包含第一报告物部分的核酸提供额外的报告物部分，修饰来自步骤(c)的所述核酸以进一步包含一种或多种额外的报告物部分。73. The method of any one of items 69-72, further comprising providing an additional reporter portion to the nucleic acid comprising the first reporter portion in step (c), modifying said nucleic acid from step (c) to further comprise one or more additional reporter portions.

74.项73的方法，其中额外的报告物部分与所述第一报告物部分不同。74. The method of item 73, wherein the additional reporting portion differs from the first reporting portion.

75.项69-74中任一项的方法，其中所述第一报告物部分对于每个细胞是独特的，使得可以鉴定出包含所述第一索引的核酸的细胞来源。75. The method of any one of items 69-74, wherein the first reporter portion is unique for each cell, making it possible to identify the cell origin of the nucleic acid containing the first index.

76.项69-75中任一项的方法，其中所述核酸来自单细胞。76. The method of any one of items 69-75, wherein the nucleic acid is derived from a single cell.

77.项69-76中任一项的方法，其中所述核酸是RNA和DNA两者。77. The method of any one of items 69-76, wherein the nucleic acid is both RNA and DNA.

78.项69-77中任一项的方法，其中所述固体支持物选自下组：流动池、珠，和微孔。78. The method of any one of items 69-77, wherein the solid support is selected from the group consisting of flow cells, beads, and micropores.

79.项69-78中任一项的方法，其中所述细胞附接部分是抗体，其中所述抗体结合细胞表面蛋白质。79. The method of any one of claims 69-78, wherein the cell attachment portion is an antibody, wherein the antibody binds to cell surface proteins.

80.项69的方法，其中所述抗体是单克隆抗体。80. The method of item 69, wherein the antibody is a monoclonal antibody.

81.项61-62中任一项的方法，其中所述抗体特异性结合对癌细胞特异的细胞表面蛋白质。81. The method of any one of items 61-62, wherein the antibody specifically binds to a cell surface protein specific to cancer cells.

82.项69-81中任一项的方法，其还包括用对蛋白质特异的标记物暴露细胞裂解物蛋白质，其中来自所述细胞的蛋白质的至少一部分包括对所述细胞独特的标记物，并且鉴定包含所述标记物的蛋白质的至少一部分。82. The method of any one of claims 69-81, further comprising exposing cell lysate proteins with a protein-specific marker, wherein at least a portion of the proteins from said cells includes a marker unique to said cell, and identifying at least a portion of the proteins containing said marker.

83.项82的方法，其中鉴定包含所述标记物的蛋白质的至少一部分和从细胞核酸获得序列信息同时完成。83. The method of claim 82, wherein identifying at least a portion of a protein containing the marker and obtaining sequence information from cellular nucleic acids are performed simultaneously.

84.项69-83中任一项的方法，其中一个或更多个报告物部分包含至少一个条形码。84. The method of any one of items 69-83, wherein one or more report portions contain at least one barcode.

85.项69-84中任一项的方法，其中一个或更多个报告物部分包含引物结合位点。85. The method of any one of items 69-84, wherein one or more reporter portions contain primer binding sites.

86.项69-85中任一项的方法，其中通过连接来修饰包含所述第一报告物部分的所述核酸。86. The method of any one of items 69-85, wherein the nucleic acid comprising the first reporter portion is modified by a linker.

87.项69-84中任一项的方法，其中通过连接来修饰包含所述一个或更多个额外的报告物部分的所述核酸。87. The method of any one of items 69-84, wherein the nucleic acid comprising the one or more additional reporter portions is modified by connection.

88.项69-87中任一项的方法，所述在获得序列信息前扩增包含所述第一报告物部分的所述修饰的核酸。88. The method of any one of items 69-87, wherein the modified nucleic acid comprising the first reporter portion is amplified prior to obtaining sequence information.

89.项69-88中任一项的方法，其中在获得序列信息前扩增包含两个或更多个报告物部分的所述修饰的核酸。89. The method of any one of claims 69-88, wherein the modified nucleic acid comprising two or more reporter moieties is amplified prior to obtaining sequence information.

90.项69-88中任一项的方法，其中所述细胞的至少一部分与疾病相关。90. The method of any one of items 69-88, wherein at least a portion of said cells is associated with the disease.

91.项90的方法，其中所述细胞的至少一部分与癌症相关。91. The method of item 90, wherein at least a portion of said cells is associated with cancer.

92.分析细胞组成的方法，所述方法包括：92. A method for analyzing cell composition, the method comprising:

(a)向固体支持物导入一个或更多个细胞，其中所述固体支持物包括细胞附接部分，并且其中在此类导入后将所述一个或更多个细胞固定至所述固体支持物；(a) Introducing one or more cells into a solid support, wherein the solid support includes a cell attachment portion, and wherein the one or more cells are fixed to the solid support after such introduction;

(c)向所述细胞裂解物提供一个或更多个报告物部分，使得所述细胞裂解物内的一个或更多个分析物包含所述一个或更多个报告物部分，其中所述一个或更多个报告物部分鉴定所述细胞；(c) Provide one or more reporter portions to the cell lysate such that one or more analytes within the cell lysate contain the one or more reporter portions, wherein the one or more reporter portions identify the cells;

(d)分析包含所述一个或更多个报告物部分的所述分析物，其中所述分析物鉴定所述细胞并检测所述细胞的组成。(d) Analyze the analyte comprising one or more reporter portions, wherein the analyte identifies the cell and detects the composition of the cell.

93.项92的方法，其中将单细胞固定至所述固体支持物。93. The method of item 92, wherein a single cell is fixed to the solid support.

94.项92-93中任一项的方法，其中所述固体支持物选自下组：流动池、珠、和微孔。94. The method of any one of items 92-93, wherein the solid support is selected from the group consisting of flow cells, beads, and micropores.

95.项92-94中任一项的方法，其中所述细胞附接部分是抗体，其中所述抗体结合细胞表面蛋白质。95. The method of any one of claims 92-94, wherein the cell attachment portion is an antibody, wherein the antibody binds to cell surface proteins.

96.项95的方法，其中所述抗体是单克隆抗体。96. The method of item 95, wherein the antibody is a monoclonal antibody.

97.项95-96中任一项的方法，其中所述抗体特异性结合对癌细胞特异的细胞表面蛋白质。97. The method of any one of claims 95-96, wherein the antibody specifically binds to a cell surface protein specific to cancer cells.

98.项90的方法，其中所述分析物选自下组：蛋白质、核酸、细胞器、脂质、碳水化合物，和细胞代谢物。98. The method of item 90, wherein the analyte is selected from the group consisting of proteins, nucleic acids, organelles, lipids, carbohydrates, and cellular metabolites.

99.项1-13中任一项的方法，其中所述第一、第二、第三，或第四报告物部分包含引物结合位点。99. The method of any one of items 1-13, wherein the first, second, third, or fourth reporter portion comprises a primer binding site.

100.项1-17中任一项的方法，其中所述第一、第二，或这两种分析物是核酸。100. The method of any one of items 1-17, wherein the first, second, or both analytes are nucleic acids.

101.项100的方法，其中在分析前扩增包含报告物部分的所述核酸。101. The method of 100, wherein the nucleic acid containing the reporter portion is amplified prior to analysis.

102.项100-101中任一项的方法，其中通过测序来分析核酸。102. The method of any one of items 100-101, wherein nucleic acids are analyzed by sequencing.

附图说明Attached Figure Description

图1：4层组合索引化，描绘了通过将单细胞内容物包埋入聚合物基质或附接于珠而创建的DNA邻近保留元件(CE)的四层组合索引化(four tier combinatoric indexing)的示意图。在每个组合的合并(pooling)和再分布(redistribution)步骤(层)中附接隔室特异的索引。在所示的示例中，四个层次导致被串联在一起的四个索引(经由连接、聚合酶延伸、标签片段化(tagmentation)等的重复的轮次)，使得能够易于测序读出。可选地，包含DNA的邻近保留元件可以通过包封在基质中或固定在珠上的分隔的(compartmentalized)DNA分区(partition)(即对原始DNA样品进行二次抽样的DNA稀释)来创建。稀释的这种类型在定相和组装应用中是有用的。Figure 1: Four-tier combinatorial indexing, illustrating a schematic diagram of four-tier combinatorial indexing of DNA proximity retention elements (CEs) created by embedding single-cell contents in a polymer matrix or attaching them to beads. Compartment-specific indexes are attached in the pooling and redistribution steps (layers) of each combination. In the example shown, the four layers result in four indices linked together (via repeated rounds of ligation, polymerase extension, tagging, etc.), enabling easy sequencing readout. Alternatively, proximity retention elements containing DNA can be created by compartmentalized DNA partitions (i.e., DNA dilutions that are secondary samples of the original DNA sample) encapsulated in a matrix or immobilized on beads. This type of dilution is useful in phasing and assembly applications.

图2：用于单细胞文库的两层组合索引化，描绘了使用两层组合索引化方案制备单细胞DNA或cDNA文库的方法，其中经由标签片段化(在转座子中的隔室特异的索引)附接第一水平索引，并且通过PCR(在PCR引物上的隔室特异的索引)附接第二层索引。单细胞容器的内容物(即基因组DNA或cDNA)可以采用任选的全基因组扩增(WGA)或全转录组扩增步骤。Figure 2: Two-layer combinatorial indexing for single-cell libraries, depicting a method for preparing single-cell DNA or cDNA libraries using a two-layer combinatorial indexing scheme, wherein a first-level index is attached via tag fragmentation (compartment-specific indexing in transposons), and a second-level index is attached via PCR (compartment-specific indexing on PCR primers). The contents of the single-cell container (i.e., genomic DNA or cDNA) can be processed using optional whole-genome amplification (WGA) or whole-transcriptome amplification steps.

图3：在CE(如液滴)中单细胞基因表达，描绘了从在CE例如液滴中的单细胞的内容物制备cDNA文库的方法。在所示实例中，索引正被用于标记不同的样品。Figure 3: Single-cell gene expression in a CE (e.g., a droplet), illustrating a method for preparing a cDNA library from the contents of a single cell in a CE, such as a droplet. In the example shown, an index is being used to label different samples.

图4：容器中的单细胞内容物，描绘了可以经由提出的组合索引化方案分析的单细胞的代表性内容物。Figure 4: Single-cell contents in a container, depicting representative contents of a single cell that can be analyzed via the proposed combined indexing scheme.

图5A：邻近元件(CE)的形成，和图5B：由邻近元件(CE)的创建定制的测定，描绘了用于从包封和裂解捕获在CE内例如在聚合物珠内的单细胞的内容物创建邻近保留元件(CE)的示例性示意的实施方案。细胞包埋在例如聚合物珠内。来自单细胞的所有组分在珠中保持彼此接近(proximity)。随后，可以扩增、修饰(cDNA合成)以及随后用索引或标签标记一种或更多种组分。图5C：邻近元件(CE)的形成，描绘了示例性示意的实施方案，其中可以通过在包封、扩增/cDNA、或聚合阶段加入(spiking)编码DNA序列(例如质粒)来完成样品索引化。用不同组的编码质粒或编码质粒的组合制备每个样品。每个组合的索引化的CE将产生相应的编码文库元件的组合的索引化的样品。以这种方式，每个文库元件都可以定位回到其起源CE和起源样品。Figure 5A: Formation of a proximity element (CE), and Figure 5B: Customized assay by the creation of a proximity element (CE), depict exemplary schematic embodiments for creating proximity retention elements (CEs) from the contents of a single cell captured within a CE, such as within a polymer bead, through encapsulation and lysis. The cell is embedded, for example, within a polymer bead. All components from the single cell remain close to each other within the bead. Subsequently, one or more components can be amplified, modified (cDNA synthesis), and subsequently indexed or tagged. Figure 5C: Formation of a proximity element (CE), depicting an exemplary schematic embodiment where sample indexing can be accomplished by adding (spiking) a coding DNA sequence (e.g., a plasmid) during encapsulation, amplification/cDNA, or polymerization stages. Each sample is prepared with different sets of coding plasmids or combinations of coding plasmids. The indexed CEs of each combination will produce an indexed sample of the corresponding combination of coding library elements. In this way, each library element can be located back to its originating CE and originating sample.

图6：单细胞包囊和扩增，描绘了在CE例如聚合物基质珠中包封单细胞内容物的示意图。Figure 6: Single-cell encapsulation and expansion, depicting a schematic diagram of encapsulating single-cell contents in CE, such as polymer matrix beads.

图7：通过直接表面捕获的细胞组分的高通量分析，描绘了通过直接表面捕获对细胞组分的高通量分析的示例性示意图。“A”表示细胞的集合。“B”表示表面结合的转座体。在“C”中细胞流到表面上。在“D”中细胞被裂解，并且允许细胞组分以受控的方式在捕获细胞的部位周围扩散。在“E”中，核酸被转座体捕获(标签片段化(tagmented))。取决于细胞膜或细胞核是否裂解，捕获不同的细胞组分。通过使用组分特异的捕获部分(即抗体、受体、配体)，可以捕获各种细胞组分。对捕获的分子的分析可以直接在捕获表面上进行。可选地，捕获的分子可以在不同的表面上收获和分析。在这种情况下，第一表面由多个区域(即，垫)组成，并且每个垫涂覆有共享相同条形码的寡聚物，使得捕获在同一垫上的分子将共享相同的识别条形码(identifying barcode)。Figure 7: High-throughput analysis of cellular components via direct surface capture. This diagram illustrates an exemplary high-throughput analysis of cellular components via direct surface capture. "A" represents an aggregate of cells. "B" represents a surface-bound transposon. In "C," cells flow onto the surface. In "D," cells are lysed, allowing cellular components to diffuse in a controlled manner around the site of cell capture. In "E," nucleic acids are captured by transposons (tagmented). Different cellular components are captured depending on whether the cell membrane or nucleus is lysed. Various cellular components can be captured by using component-specific capture portions (i.e., antibodies, receptors, ligands). Analysis of the captured molecules can be performed directly on the capture surface. Alternatively, the captured molecules can be harvested and analyzed on different surfaces. In this case, the first surface consists of multiple regions (i.e., pads), and each pad is coated with an oligomer sharing the same barcode, such that molecules captured on the same pad will share the same identifying barcode.

图8：流线型CPT-seq：在珠上的工作流程，描绘了在珠上使用邻近保留元件分析核酸的示例性示意图。Figure 8: Streamlined CPT-seq: Workflow on beads, depicting an exemplary schematic of nucleic acid analysis using proximity retention elements on beads.

图9A-图9D：分别为在基因组DNA中组装重复区域的划分和诱变法、在基因组DNA中组装重复区域的划分和诱变法、在基因组DNA中组装重复区域的划分和诱变法，以及在基因组DNA中组装重复区域的划分和诱变法，其描绘了示例性建模策略。Figures 9A-9D: These represent the methods for assembling repetitive regions in genomic DNA using segmentation and mutagenesis, respectively, illustrating exemplary modeling strategies.

图10示出了创建颗粒的方法，该颗粒对创建邻近元件是有用的。Figure 10 illustrates a method for creating particles that are useful for creating neighboring elements.

具体实施方式Detailed Implementation

本发明的一些方面涉及与评估保留或包埋或包含在邻近保留元件(CE)内的单细胞的组分有关的方法和组合物。Some aspects of the present invention relate to methods and compositions for evaluating components of single cells that are retained, embedded, or contained within a nearby retention element (CE).

一方面，本文公开的是用于分析来自单细胞的多个分析物类型的方法。在一些实施方案中，提供了多个邻近保留元件(CE)，每个CE包含单细胞。细胞在CE内裂解，使得单细胞内的多个分析物在CE内释放。在一些实施方案中，提供多种类型的报告物部分，使得每种类型的报告物部分对每种类型的分析物是特异性的。在一些实施方案中，报告物部分鉴定单细胞。修饰多个分析物，使得每种类型的分析物包含对分析物类型特异的报告物部分。在一些实施方案中，组合包含分析物的CE，该分析物包含报告物部分。在一些实施方案中，分隔包含分析物的组合的CE，该分析物包含报告物部分。在一些实施方案中，提供额外的报告物部分并与包含分析物的分析物组合，使得分析物包含两个或更多个不同的报告物部分。分析包含报告物部分的分析物，使得检测分析物的身份(identity)，并且报告物部分鉴定来自单细胞的分析物的来源。On one hand, this document discloses methods for analyzing multiple analyte types from single cells. In some embodiments, multiple proximity retention elements (CEs) are provided, each CE containing a single cell. The cell is lysed within the CE, causing multiple analytes within the single cell to be released within the CE. In some embodiments, multiple types of reporter portions are provided, such that each type of reporter portion is specific to each type of analyte. In some embodiments, the reporter portions identify the single cell. Multiple analytes are modified such that each type of analyte contains a reporter portion specific to the analyte type. In some embodiments, CEs containing analytes and reporter portions are combined. In some embodiments, CEs containing combinations of analytes and reporter portions are separated. In some embodiments, additional reporter portions are provided and combined with the analytes containing the analytes, such that the analytes contain two or more distinct reporter portions. The analytes containing reporter portions are analyzed, such that the identity of the analytes is detected, and the reporter portions identify the source of the analytes from the single cell.

在一些实施方案中，示例性多个分析物包括但不限于DNA、RNA、cDNA、蛋白质、脂质、碳水化合物、细胞的细胞器(例如细胞核、高尔基体、核糖体、线粒体、内质网、叶绿体、细胞膜、等等)、细胞代谢物、组织切片、细胞、单细胞、来自细胞或来自单细胞的内容物、从细胞或单细胞分离的核酸、或从细胞或单细胞分离并进一步修饰的核酸、或无细胞的DNA(例如，来自胎盘液或血浆)。在一些实施方案中，多个分析物包括基因组DNA和mRNA。在一些实施方案中，mRNA具有poly A尾。在一些实施方案中，将基因组DNA和mRNA同时固定在CE内的固体支持物上。在一些实施方案中，基因组DNA的固定与将mRNA固定至固体支持物是序贯的。在一些实施方案中，将基因组DNA与转座体复合物组合，并将转座子末端固定在固体支持物上，并通过固定在固体支持物上的寡聚体(dT)探针的杂交将mRNA固定至固体。在一些实施方案中，将基因组DNA与转座体复合物组合，并且任选地，转座子末端杂交至固定在固体支持物上的互补序列，使得通过固定在固体支持物上的寡聚体(dT)探针的杂交将mRNA固定至固体。也可以使用其它方法以固定mRNA。在一些实施方案中，固体支持物是珠。在一些实施方案中，固体支持物是流动池表面。在一些实施方案中，固体表面是反应容器的壁。In some embodiments, exemplary analytes include, but are not limited to, DNA, RNA, cDNA, proteins, lipids, carbohydrates, cellular organelles (e.g., nucleus, Golgi apparatus, ribosomes, mitochondria, endoplasmic reticulum, chloroplasts, cell membranes, etc.), cellular metabolites, tissue sections, cells, single cells, contents derived from cells or single cells, nucleic acids isolated from cells or single cells, or nucleic acids isolated from cells or single cells and further modified, or cell-free DNA (e.g., from placental fluid or plasma). In some embodiments, the analytes include genomic DNA and mRNA. In some embodiments, the mRNA has a poly A-tail. In some embodiments, genomic DNA and mRNA are simultaneously immobilized on a solid support within the CE. In some embodiments, the immobilization of genomic DNA and the immobilization of mRNA to the solid support are sequential. In some embodiments, genomic DNA is combined with a transposon complex, the transposon ends are immobilized on the solid support, and the mRNA is immobilized to the solid by hybridization with oligomeric (dT) probes immobilized on the solid support. In some embodiments, genomic DNA is combined with a transposon complex, and optionally, the transposon ends hybridize to a complementary sequence immobilized on a solid support, such that mRNA is immobilized to the solid by hybridization of oligomeric (dT) probes immobilized on the solid support. Other methods may also be used to immobilize mRNA. In some embodiments, the solid support is a bead. In some embodiments, the solid support is a flow cell surface. In some embodiments, the solid surface is the wall of a reaction vessel.

在一些实施方案中，所述方法包括对在CE内保留或包埋或包含的核酸进行测序。特别地，本文提供的方法和组合物的实施方案涉及制备核酸模板并从其获得序列数据。本文提供的方法和组合物涉及在美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开WO 02/061832中提供的方法和组合物，每一个通过引用以其全部并入。本发明的一些实施方案涉及在CE内制备DNA以从目标核酸获得定相和序列组装信息，以及从此类模板获得定相和序列组装序列信息。本文提供的具体实施方案涉及使用整合酶，例如转座酶以维持片段化核酸的相关末端的物理接近性；以及使用组合的索引化以从每个CE创建单个文库(individual libraries)。从CE获得单倍型信息包括区分目标核酸中的不同等位基因(例如，SNP、遗传异常等)。此类方法可用于表征目标核酸中的不同等位基因，并减少序列信息中的错误率。In some embodiments, the method includes sequencing nucleic acids retained, embedded, or contained within a CE. Specifically, embodiments of the methods and compositions provided herein relate to preparing nucleic acid templates and obtaining sequence data therefrom. The methods and compositions provided herein relate to those provided in U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety. Some embodiments of the invention relate to preparing DNA within a CE to obtain phasing and sequence assembly information from target nucleic acids, and to obtaining phasing and sequence assembly sequence information from such templates. Specific embodiments provided herein involve using integrase, such as transposases, to maintain physical proximity of relevant ends of fragmented nucleic acids; and using combined indexing to create individual libraries from each CE. Obtaining haplotype information from a CE includes distinguishing different alleles (e.g., SNPs, genetic abnormalities, etc.) in the target nucleic acid. Such methods can be used to characterize different alleles in the target nucleic acid and reduce error rates in sequence information.

在一个实施方案中，模板核酸可以稀释成CE，例如液滴。可以使用任选的全基因组扩增，并且可以从相当于目标核酸的大约等量单倍型的模板核酸的量获得序列信息。In one implementation, the template nucleic acid can be diluted to a CE, such as a droplet. Optional whole-genome amplification can be used, and sequence information can be obtained from an amount of template nucleic acid equivalent to approximately the same haplotype as the target nucleic acid.

在进一步的实施方案中，可分隔模板核酸，使得染色体的多个拷贝可以存在于相同的隔室中，作为本文提供的两个或多个索引化的结果，仍然也可以确定单倍型。换句话说，可以使用虚拟隔室制备模板核酸。在此类实施方案中，可以在几个第一隔室之间分布核酸，为每个隔室的核酸提供第一索引，组合核酸，在几个第二隔室之间分布核酸，并为每个隔室的核酸提供第二索引。有利地，与在单个隔室中将核酸仅稀释至相当于核酸单倍型的量相比较，此类索引化使单倍型信息以在核酸的较高浓度上获得。In a further embodiment, the template nucleic acid can be separated so that multiple copies of the chromosome can exist in the same compartment, and the haplotype can still be determined as a result of the two or more indexings provided herein. In other words, the template nucleic acid can be prepared using virtual compartments. In such embodiments, nucleic acids can be distributed among several first compartments, providing a first index for the nucleic acids in each compartment, and the nucleic acids can be combined. Nucleic acids can also be distributed among several second compartments, providing a second index for the nucleic acids in each compartment. Advantageously, compared to diluting the nucleic acid only to an amount equivalent to the nucleic acid haplotype in a single compartment, this type of indexing allows haplotype information to be obtained at a higher concentration of nucleic acids.

如本文所使用的，术语“隔室”意指将某物与其它物质分开或分离的区域或体积。示例性隔室包括但不限于小瓶、管、孔、液滴、大药丸(boluses)、珠、容器、表面特征或通过物理力(例如流体流动、磁性、电流等)分开的区域或体积。As used herein, the term "compartment" refers to an area or volume that separates or isolates something from other substances. Exemplary compartments include, but are not limited to, vials, tubes, holes, droplets, boluses, beads, containers, surface features, or areas or volumes separated by physical forces (such as fluid flow, magnetism, electric current, etc.).

用于制造隔室的示例性方法如图10所示。具有柱的硅母板(silicon masterplate)可用于将孔印制到水凝胶片中(水凝胶中的孔是柱的倒像)。水凝胶中的所得孔可以填充与目标分析物或其它试剂一起形成颗粒(例如凝胶或聚合物)的材料。然后可以通过不溶解颗粒的技术溶解水凝胶片。然后可以使用本文所述的方法收集和操纵颗粒。An exemplary method for fabricating the compartments is shown in Figure 10. A silicon masterplate with pillars can be used to imprint holes into a hydrogel sheet (the holes in the hydrogel are inverted images of the pillars). The resulting holes in the hydrogel can be filled with a material that forms particles (e.g., a gel or polymer) together with the target analyte or other reagents. The hydrogel sheet can then be dissolved using a technique that does not dissolve the particles. The particles can then be collected and manipulated using the methods described herein.

在本文提供的一些实施方案中，使用转座体制备模板文库。在一些此类文库中，目标核酸可以被片段化。因此，本文提供的一些实施方案涉及用于维持相邻片段的物理邻近性的序列信息的方法。此类方法包括使用整合酶以维持在目标核酸中相邻的模板核酸片段的缔合。有利地，整合酶维持片段化核酸的物理接近性的此类用途增加了来自相同原始分子例如染色体的片段化核酸将在相同隔室中出现的可能性。In some embodiments provided herein, transposons are used to prepare template libraries. In some such libraries, the target nucleic acid can be fragmented. Therefore, some embodiments provided herein relate to sequence information for maintaining the physical proximity of adjacent fragments. Such methods include using integrases to maintain the association of adjacent template nucleic acid fragments in the target nucleic acid. Advantageously, this use of integrases to maintain the physical proximity of fragmented nucleic acids increases the likelihood that fragmented nucleic acids from the same original molecule, such as chromosomes, will appear in the same compartment.

本文提供的其它实施方案涉及从核酸的每条链获得序列信息，其可用于降低测序信息中的错误率。可以准备制备模板核酸的文库以从核酸的每条链获得序列信息的方法，使得可以区分每个链，并且还可以区分每条链的产物。Other implementations provided in this paper involve obtaining sequence information from each strand of a nucleic acid, which can be used to reduce the error rate in sequencing information. Methods for preparing libraries of template nucleic acids to obtain sequence information from each strand of the nucleic acid can be used, making it possible to distinguish each strand and also to distinguish the products of each strand.

本文提供的一些方法包括分析核酸的方法。此类方法包括制备目标核酸的模板核酸的文库，从模板核酸的文库获得序列数据，以及从此类序列数据组装目标核酸的序列表示。Some of the methods described in this article include methods for analyzing nucleic acids. These methods include preparing libraries of template nucleic acids for the target nucleic acid, obtaining sequence data from the template nucleic acid libraries, and assembling sequence representations of the target nucleic acid from such sequence data.

通常，本文提供的方法和组合物涉及在美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832中提供的方法和组合物，其每一个通过引用以其全部并入。本文提供的方法涉及使用用于将特征插入目标核酸的转座体。此类特征包括片段化位点(fragmentation site)、引物位点、条形码、亲和标签、报告物部分等。Generally, the methods and compositions provided herein relate to those provided in U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety. The methods provided herein relate to the use of transposons for inserting features into target nucleic acids. Such features include fragmentation sites, primer sites, barcodes, affinity tags, reporter moieties, etc.

在用于本文提供的实施方案的方法中，模板核酸文库从包含目标核酸的CE制备。通过在整个目标核酸上插入或附加多个独特的条形码来制备文库。在一些实施方案中，每个条形码包括置于其间的具有片段化位点的第一条形码序列和第二条形码序列。第一条形码序列和第二条形码序列可以被鉴定或指定为彼此配对。配对可以是信息性的，使得第一条形码与第二条形码相关联。有利地，配对的条形码序列可用于组装来自模板核酸文库的测序数据。例如，鉴定包含第一条形码序列的第一模板核酸和包含与第一条形码序列配对的第二条形码序列的第二模板核酸指示第一和第二模板核酸表示在目标核酸的序列表示中彼此相邻的序列。此类方法可以用于从头组装目标核酸的序列表示，而不需要参考基因组。In the methods used in the embodiments provided herein, a template nucleic acid library is prepared from a CE containing a target nucleic acid. The library is prepared by inserting or attaching multiple unique barcodes throughout the target nucleic acid. In some embodiments, each barcode includes a first barcode sequence and a second barcode sequence with fragmentation sites placed therebetween. The first and second barcode sequences can be identified or designated as paired with each other. The pairing can be informative, such that the first barcode is associated with the second barcode. Advantageously, the paired barcode sequences can be used to assemble sequencing data from the template nucleic acid library. For example, identifying a first template nucleic acid containing a first barcode sequence and a second template nucleic acid containing a second barcode sequence paired with the first barcode sequence indicates that the first and second template nucleic acids represent sequences adjacent to each other in the sequence representation of the target nucleic acid. Such methods can be used to assemble the sequence representation of the target nucleic acid de novo without a reference genome.

在一些实施方案中，可以使用多重组合条形码编码(combinatorial barcoding)，使得来自每个单细胞的目标核酸包含独特的条形码(例如条形码的独特组合)，并且可以从来自不同单细胞的不同目标核酸容易地鉴定目标核酸。在一些实施方案中，CE可以包含来自单细胞的目标核酸。在一些实施方案中，CE内的目标核酸将具有不同于不同CE内的目标核酸的可识别的独特的条形码。In some implementations, combinatorial barcoding can be used, such that the target nucleic acid from each single cell contains a unique barcode (e.g., a unique combination of barcodes), and the target nucleic acid can be easily identified from different target nucleic acids from different single cells. In some implementations, the CE may contain the target nucleic acid from a single cell. In some implementations, the target nucleic acid within the CE will have a uniquely identifiable barcode that differs from the target nucleic acids within different CEs.

在一些实施方案中，除了核酸之外，可以将多重组合标记(combinatoriallabeling)方案用于单细胞内的组分，，例如蛋白质、细胞器、脂质或细胞膜，使得可以与来自不同单细胞的组分鉴定单细胞内的组分。在一些实施方案中，CE可以包含单细胞内的组分。在一些实施方案中，CE内的单细胞的组分将具有不同于不同CE内的单细胞的组分的可识别的独特的标记物。In some embodiments, in addition to nucleic acids, combinatorial labeling schemes can be used for intracellular components, such as proteins, organelles, lipids, or cell membranes, enabling the identification of intracellular components from different single cells. In some embodiments, the CE may contain intracellular components. In some embodiments, the intracellular components within the CE will have identifiable and unique markers that differ from those of intracellular components within different CEs.

在一些实施方案中，多重组合条形码编码方案可以用于来自单细胞的目标核酸，并且多重组合标记方案可以一起用于单细胞内的组分。在一些实施方案中，可以在包含单细胞的CE内进行此类组合条形码编码和组合的标记。在一些实施方案中，可以平行地对包含单细胞的多个CE进行此类组合条形码编码和组合标记。In some implementations, multiple combination barcode encoding schemes can be used for target nucleic acids from single cells, and multiple combination labeling schemes can be used together for components within a single cell. In some implementations, such combination barcode encoding and combination labeling can be performed within a single-cell CE containing a single cell. In some implementations, such combination barcode encoding and combination labeling can be performed in parallel on multiple CEs containing single cells.

在一些实施方案中，可以测序保留、包埋、固定或包含在CE内的蛋白质。在一些实施方案中，此类蛋白质是独特标记的。在一些实施方案中，可以通过本领域已知的方法鉴定保留、包埋、固定或包含在CE内的蛋白质。在一些实施方案中，可以与收集核酸的序列信息一起进行蛋白质的鉴定和/或测序。In some embodiments, proteins that are retained, embedded, immobilized, or contained within a CE can be sequenced. In some embodiments, such proteins are uniquely labeled. In some embodiments, proteins that are retained, embedded, immobilized, or contained within a CE can be identified using methods known in the art. In some embodiments, protein identification and/or sequencing can be performed in conjunction with the collection of nucleic acid sequence information.

如本文所用的，术语“核酸”和/或“寡核苷酸”和/或其语法等同物可以指连接在一起的至少两个核苷酸单体。核酸通常可以含有磷酸二酯键；然而，在一些实施方案中，核酸类似物可以具有其它类型的骨架，包括，例如磷酰胺(Beaucage等人，Tetrahedron，49：1925(1993)；Letsinger，J.Org.Chem.，35：3800(1970)；Sprinzl等人，Eur.J.Biochem.，81：579(1977)；Letsinger等人，Nucl.Acids Res.，14：3487(1986)；Sawai等人，Chem.Lett.，805(1984)，Letsinger等人，J.Am.Chem.Soc.，110：4470(1988)；和Pauwels等人，ChemicaScripta，26：141(1986))，硫代磷酸酯(Mag等人，19：1437(1991)；和美国专利号5,644,048)，二硫代磷酸酯(Briu等人，J.Am.Chem.Soc.，111：2321(1989))，O-甲基亚磷酰胺连接(参见Eckstein；Oligonucleotides and Analogues：A Practical Approach，OxfordUniversity Press)，和肽核酸骨架和连接(参见Egholm，J.Am.Chem.Soc.，114：1895(1992)；Meier等人，Chem.Int.Ed，Engl，31：1008(1992)；Nielsen，Nature，365：566(1993)；Carlsson等人，Nature，380：207(1996))。上述参考文献通过引用并入本文。As used herein, the terms “nucleic acid” and/or “oligonucleotide” and/or their grammatical equivalents may refer to at least two nucleotide monomers linked together. Nucleic acids typically contain phosphodiester bonds; however, in some embodiments, nucleic acid analogs may have other types of backbones, including, for example, phosphoramides (Beaucage et al., Tetrahedron, 49:1925 (1993); Letsinger, J. Org. Chem., 35:3800 (1970); Sprinzl et al., Eur. J. Biochem., 81:579 (1977); Letsinger et al., Nucl. Acids Res., 14:3487 (1986); Sawai et al., Chem. Lett., 805 (1984); Letsinger et al., J. Am. Chem. Soc., 110:4470 (1988); and Pauwels et al., Chemica Scripta, 26:141 (1986)), thiophosphates (Mag et al. References cited above are incorporated herein by reference. (e.g., 19:1437 (1991); and US Patent No. 5,644,048), dithiophosphates (Briu et al., J. Am. Chem. Soc., 111:2321 (1989)), O-methylphosphoramide linkages (see Eckstein; Oligonucleotides and Analogues: A Practical Approach, Oxford University Press), and peptide nucleic acid backbones and linkages (see Egholm, J. Am. Chem. Soc., 114:1895 (1992); Meier et al., Chem. Int. Ed., Engl, 31:1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson et al., Nature, 380:207 (1996)).

其它类似核酸包括具有以下的那些：正电荷骨架(Denpcy等人,Proc.Natl.Acad.Sci.USA，92：6097(1995))；非离子型骨架(美国专利号5,386,023；5,637,684；5,602,240；5,216,141；和4,469,863；Kiedrowshi等人，Angew.Chem.Intl.Ed.English，30：423(1991)；Letsinger等人，J.Am.Chem.Soc.110：4470(1988)；Letsinger等人，Nucleosides&Nucleotides，13：1597(1994)；第2章和第3章，ASCSymposium Series 580，“Carbohydrate Modifications in Antisense Research”，Ed.Y.S.Sanghui和P.Dan Cook；Mesmaeker等人，Bioorganic&Medicinal Chem.Lett，4：395(1994)；Jeffs等人，J.Biolecularlecular NMR，34：17(1994)；Tetrahedron Lett，31：7433(1996))和非核糖(美国专利号5,235,033和5,034,506，和第6和7章，ASC SymposiumSeries 580，“Carbohydrate Modifications in Antisense Research”，Ed.Y.S.Sanghui和P.Dan Coo)。核酸还可以含有一种或多种碳环糖(参见Jenkins等人，Chem.Soc.Rev.，(1995)pp.169,176)。以上参考文献通过引用并入本文。Other similar nucleic acids include those with the following: a positively charged backbone (Denpcy et al., Proc. Natl. Acad. Sci. USA, 92: 6097 (1995)); a non-ionic backbone (US Patent Nos. 5,386,023; 5,637,684; 5,602,240; 5,216,141; and 4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed. English, 30: 423 (1991); Letsinger et al., J. Am. Chem. Soc. 110: 4470 (1988); Letsinger et al., Nucleosides & Nucleotides, 13: 1597 (1994); Chapters 2 and 3, ASC Symposium Series 580, “Carbohydrate Modified Nucleic acids may also contain one or more carbocyclic sugars (see Jenkins et al., Chem. Soc. Rev., (1995) pp. 169, 176). Nucleic acids may also contain one or more carbocyclic sugars (see Jenkins et al., Chem. Soc. Rev., (1995) pp. 169, 176). The above references are incorporated into this paper by way of citation.

可以进行核糖-磷酸骨架的修饰以促进额外的部分例如标记物的添加，或者在某些条件下提高此类分子的稳定性。此外，可以制备天然存在的核酸和类似物的混合物。可选地，可以制备不同核酸类似物的混合物，以及天然存在的核酸和类似物的混合物。核酸可以是单链或双链，如指定的，或包含双链或单链序列二者的一部分。核酸可以是DNA，例如基因组或cDNA、RNA或杂合物，其来自单细胞、多细胞，或来自多个物种，如具有宏基因组的样品，例如来自环境样品，还来自混合样品例如混合的组织样品或用于相同物种的不同个体的混合样品，疾病样品如癌症相关核酸等。核酸可以包含脱氧核糖核苷酸和核糖核苷酸的任何组合，以及碱基的任何组合，包括尿嘧啶、腺嘌呤、胸腺嘧啶、胞嘧啶、鸟嘌呤、肌苷、黄嘌呤、次黄嘌呤、异胞嘧啶、异鸟嘌呤和碱基类似物如硝基吡咯(包括3-硝基吡咯)和硝基吲哚(包括5-硝基吲哚)等。The ribose-phosphate backbone can be modified to facilitate the addition of additional components, such as markers, or to improve the stability of such molecules under certain conditions. Furthermore, mixtures of naturally occurring nucleic acids and analogues can be prepared. Optionally, mixtures of different nucleic acid analogues, as well as mixtures of naturally occurring nucleic acids and analogues, can be prepared. Nucleic acids can be single-stranded or double-stranded, as specified, or contain a portion of both double-stranded or single-stranded sequences. Nucleic acids can be DNA, such as genomic DNA or cDNA, RNA, or hybrids derived from single-celled, multi-celled, or multi-species samples, such as samples with metagenomic structures, such as environmental samples, and also from mixed samples such as mixed tissue samples or mixed samples for different individuals of the same species, disease samples such as cancer-related nucleic acids, etc. Nucleic acids can contain any combination of deoxyribonucleotides and ribonucleotides, as well as any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine, isoguanine, and base analogs such as nitropyrrole (including 3-nitropyrrole) and nitroindole (including 5-nitroindole).

在一些实施方案中，核酸可以包括至少一个混杂的碱基(promiscuous base)。混杂的碱基可以与多于一种不同类型的碱基进行碱基配对。在一些实施方案中，混杂的碱基可以与至少两种不同类型的碱基和不超过三种不同类型的碱基进行碱基配对。混杂的碱基的实例包括可以与腺嘌呤、胸腺嘧啶或胞嘧啶配对的肌苷。其它实例包括次黄嘌呤、5-硝基吲哚、无环5-硝基吲哚、4-硝基吡唑、4-硝基咪唑和3-硝基吡咯(Loakes等人，Nucleic AcidRes.22：4039(1994)；Van Aerschot等人，Nucleic Acid Res.23：4363(1995)；Nichols等人，Nature 369：492(1994)；Bergstrom等人，Nucleic Acid Res.25：1935(1997)；Loakes等人，Nucleic Acid Res.23：2361(1995)；Loakes等人，J.Mol.Biol.270：426(1997)；和Fotin等人，Nucleic Acid Res.26：1515(1998))。也可以使用与至少三种、四种或更多种类型的碱基进行碱基配对的混杂的碱基。以上参考文献通过引用并入本文。In some embodiments, the nucleic acid may include at least one promiscuous base. The promiscuous base may pair with more than one different type of base. In some embodiments, the promiscuous base may pair with at least two different types of bases and no more than three different types of bases. Examples of promiscuous bases include inosine, which can pair with adenine, thymine, or cytosine. Other examples include hypoxanthine, 5-nitroindole, acyclic 5-nitroindole, 4-nitropyrazole, 4-nitroimidazole, and 3-nitropyrrole (Loakes et al., Nucleic Acid Res. 22: 4039 (1994); Van Aerschot et al., Nucleic Acid Res. 23: 4363 (1995); Nichols et al., Nature 369: 492 (1995). 4); Bergstrom et al., Nucleic Acid Res. 25: 1935 (1997); Loakes et al., Nucleic Acid Res. 23: 2361 (1995); Loakes et al., J. Mol. Biol. 270: 426 (1997); and Fotin et al., Nucleic Acid Res. 26: 1515 (1998). Mixed bases that pair with at least three, four, or more types of bases may also be used. The above references are incorporated herein by reference.

如本文所用的，术语“核苷酸类似物”和/或其语法等同物可以指具有修饰的核苷酸碱基部分、修饰的戊糖部分，和/或修饰的磷酸酯部分，并且，在多核苷酸的情况下，修饰的核苷酸间连接(internucleotide linkage)的合成类似物，如通常其它地方所述(例如，Scheit，Nucleotide Analogs，John Wiley，New York，1980；Englisch，Angew.Chem.Int.Ed.Engl.30：613-29，1991；Agarwal，Protocols for Polynucleotidesand Analogs，Humana Press，1994；以及S.Verma和F.Eckstein，Ann.Rev.Biochem.67：99-134，1998)。通常，修饰的磷酸酯部分包含磷酸酯的类似物，其中磷原子处于+5氧化态，并且一个或更多个氧原子被非氧部分例如硫替代。示例性的磷酸酯类似物包括但不限于硫代磷酸酯、二硫代磷酸酯、磷酸硒酸酯、磷酸二硒酸酯、苯胺磷酸硫醇酯(Phosphoroanilothioate)、苯胺磷酸酯(Phosphoranilidate)、氨基磷酸酯、硼酸磷酸盐，包括相关的抗衡离子(counter ion)、例如H⁺、NH₄ ⁺、Na⁺(如果存在这种抗衡离子)。修饰的核苷酸碱基部分实例包括但不限于5-甲基胞嘧啶(5mC)；C-5-丙炔基类似物，包括但不限于，C-5丙炔基-C和C-5丙炔基-U；2,6-二氨基嘌呤，也称为2-氨基腺嘌呤或2-氨基-dA)；次黄嘌呤、假尿苷、2-硫代嘧啶、异胞嘧啶(isoC)、5-甲基异胞嘧啶和异鸟嘌呤(isoG；参见，例如美国专利号5,432,272)。示例性的修饰的戊糖部分包括但不限于，锁定核酸(LNA)类似物，包括但不限于Bz-A-LNA、5-Me-Bz-C-LNA、dmf-G-LNA和T-LNA(参见，例如The Glen Report，16(2)：5，2003；Koshkin等人，Tetrahedron 54：3607-30，1998)，和2'-或3'-修饰，其中2'-或3'-位是氢、羟基、烷氧基(例如甲氧基、乙氧基、烯丙氧基、异丙氧基、丁氧基、异丁氧基和苯氧基)、叠氮基、氨基、烷基氨基、氟、氯或溴。修饰的核苷酸间连接包括磷酸酯类似物，具有非手性和不带电的亚基间连接的类似物(例如，Sterchak，E.P.等人，Organic Chem.，52：4202,1987)，和具有非手性亚基间键的不带电荷的吗啉基聚合物(参见，例如美国专利号5,034,506)。一些核苷酸间连接类似物包括吗啉酸、缩醛、和聚酰胺连接的杂环。在一类核酸类似物，称为肽核酸，包括假互补肽核酸(“PNA”)中，常规的糖和核苷酸间连接已用2-氨基乙基甘氨酰胺骨架聚合物替代(参见，例如Nielsen等人，Science，254：1497-1500,1991；Egholm等人，J.Am.Chem.Soc.，114：1895-1897 1992；Demidov等人，Proc.Natl.Acad.Sci.99：5953-58,2002；Peptide Nucleic Acids：Protocols andApplications，Nielsen，ed.，Horizon Bioscience，2004)。以上参考文献通过引用并入本文。As used herein, the term "nucleotide analog" and/or its grammatical equivalents can refer to synthetic analogs having a modified nucleotide base moiety, a modified pentose moiety, and/or a modified phosphate moiety, and, in the case of polynucleotides, a modified internucleotide linkage, as generally described elsewhere (e.g., Scheit, Nucleotide Analogs, John Wiley, New York, 1980; Englisch, Angew. Chem. Int. Ed. Engl. 30: 613-29, 1991; Agarwal, Protocols for Polynucleotides and Analogs, Humana Press, 1994; and S. Verma and F. Eckstein, Ann. Rev. Biochem. 67: 99-134, 1998). Typically, the modified phosphate moiety comprises an analog of a phosphate ester in which the phosphorus atom is in a +5 oxidation state and one or more oxygen atoms are replaced by a non-oxygen moiety, such as sulfur. Exemplary phosphate ester analogues include, but are not limited to, thiophosphates, dithiophosphates, selenophosphates, diselenophosphates, phosphoroanilothioate, phosphoranilidate, aminophosphates, and borate phosphates, including associated counterions such as H ⁺ , _NH4 ⁺ , and Na ⁺ (if such counterions are present). Examples of modified nucleotide base moieties include, but are not limited to, 5-methylcytosine (5mC); C-5-propynyl analogues, including, but not limited to, C-5-propynyl-C and C-5-propynyl-U; 2,6-diaminopurine (also known as 2-aminoadenine or 2-amino-dA); hypoxanthine, pseudouridine, 2-thiopyrimidine, isocytosine (isoC), 5-methylisocytosine, and isoguanine (isoG; see, for example, U.S. Patent No. 5,432,272). Exemplary modifications of the pentose moiety include, but are not limited to, locked nucleic acid (LNA) analogs, including but not limited to Bz-A-LNA, 5-Me-Bz-C-LNA, dmf-G-LNA and T-LNA (see, for example, The Glen Report, 16(2):5, 2003; Koshkin et al., Tetrahedron 54:3607-30, 1998), and 2'- or 3'-modifications, wherein the 2'- or 3'-position is hydrogen, hydroxyl, alkoxy (e.g., methoxy, ethoxy, allyloxy, isopropoxy, butoxy, isobutoxy and phenoxy), azide, amino, alkylamino, fluorine, chlorine or bromine. Modified internucleotide linkages include phosphate ester analogs, analogs with achiral and uncharged subunit linkages (e.g., Sterchak, EP et al., Organic Chem., 52:4202, 1987), and uncharged morpholino polymers with achiral subunit linkages (see, for example, U.S. Patent No. 5,034,506). Some internucleotide linkage analogs include morpholinic acids, acetals, and polyamide-linked heterocycles. In a class of nucleic acid analogs called peptide nucleic acids, including pseudocomplementary peptide nucleic acids (“PNA”), the conventional sugar-nucleotide linkages have been replaced by a 2-aminoethylglycine backbone polymer (see, for example, Nielsen et al., Science, 254: 1497-1500, 1991; Egholm et al., J. Am. Chem. Soc., 114: 1895-1897, 1992; Demidov et al., Proc. Natl. Acad. Sci. 99: 5953-58, 2002; Peptide Nucleic Acids: Protocols and Applications, Nielsen, ed., Horizon Bioscience, 2004). These references are incorporated herein by reference.

如本文所使用的，术语“测序读取(sequencing read)”和/或其语法等同物可以指进行物理或化学步骤的重复过程以获得指示聚合物中单体的顺序的信号。信号可以在单个单体分辨率或更低分辨率下指示单体的顺序。在具体实施方案中，可以在核酸目标上开始步骤并进行以获得指示核酸目标中碱基顺序的信号。该过程可以进行至其典型的完成，其通常由下述点定义，在该点上来自过程的信号不再能以确定的合理水平将目标的碱基区分。如果需要，可以更早地发生完成，例如，一旦获得所需量的序列信息。可以在单个目标核酸分子上或同时在具有相同序列的目标核酸分子群上，或同时在具有不同序列的目标核酸群上进行测序读取。在一些实施方案中，当不再从开始信号获取的一个或更多个目标核酸分子获得信号时，终止测序读取。例如，可以对存在于固相基板(solid phase substrate)上的一个或更多个目标核酸分子开始测序读取，并在从基板中除去一个或更多个目标核酸分子后终止。可以通过在其它情况下停止当测序运行开始时存在于基板上的目标核酸的检测来终止测序。As used herein, the term "sequencing read" and/or its syntactic equivalents may refer to a repetitive process involving physical or chemical steps to obtain a signal indicating the sequence of monomers in a polymer. The signal may indicate the sequence of monomers at a single monomer resolution or lower. In specific embodiments, the steps may be initiated on a nucleic acid target and proceed to obtain a signal indicating the base sequence in the nucleic acid target. This process may proceed to its typical completion, which is generally defined by a point at which the signal from the process can no longer distinguish the bases of the target at a definite and reasonable level. Completion may occur earlier if desired, for example, once the desired amount of sequence information has been obtained. Sequencing reads may be performed on a single target nucleic acid molecule or simultaneously on a group of target nucleic acid molecules having the same sequence, or simultaneously on a group of target nucleic acid molecules having different sequences. In some embodiments, the sequencing read is terminated when no signal is obtained from one or more target nucleic acid molecules from the initiation signal. For example, a sequencing read may be initiated on one or more target nucleic acid molecules present on a solid-phase substrate and terminated after one or more target nucleic acid molecules are removed from the substrate. Sequencing can be terminated by stopping the detection of the target nucleic acid present on the substrate when the sequencing run begins, under other circumstances.

如本文所用的，术语“测序表示(seqencing representation)”和/或其语法等同物可以指表明聚合物中单体单元的顺序和类型的信息。例如，该信息可以指示核酸中核苷酸的顺序和类型。该信息可以是各种形式中的任一种，包括例如绘图、图像、电子介质、一系列符号、一系列数字、一系列字母、一系列颜色等。该信息可以是以单个单体分辨率或以更低的分辨率。示例性聚合物是具有核苷酸单元的核酸，例如DNA或RNA。一系列“A”、“T”、“G”和“C”字母是对DNA的众所周知的序列表示，该DNA可以以单核苷酸分辨率与DNA分子的实际序列相关。其它示例性聚合物是具有氨基酸单元的蛋白质和具有糖单元的多糖。As used herein, the term "seqencing representation" and/or its syntactic equivalents can refer to information indicating the sequence and type of monomeric units in a polymer. For example, this information can indicate the sequence and type of nucleotides in a nucleic acid. This information can be in any of a variety of forms, including, for example, drawings, images, electronic media, a series of symbols, a series of numbers, a series of letters, a series of colors, etc. This information can be at single-monomer resolution or at a lower resolution. Exemplary polymers are nucleic acids containing nucleotide units, such as DNA or RNA. A series of letters "A," "T," "G," and "C" is a well-known sequence representation of DNA, which can be correlated with the actual sequence of the DNA molecule at single-nucleotide resolution. Other exemplary polymers are proteins containing amino acid units and polysaccharides containing sugar units.

如本文所用的，术语“至少一部分”和/或其语法等同物可以指总量的任何分数(fraction)。例如，“至少一部分”可以指总量的至少约1％、2％、3％、4％、5％、6％、7％、8％、9％、10％、15％、20％、25％、30％、35％、40％、45％、50％、55％、60％、65％、70％、75％、80％、85％、90％、95％、99％、99.9％或100％。As used herein, the term “at least a portion” and/or its grammatical equivalents may refer to any fraction of a total quantity. For example, “at least a portion” may refer to at least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, 99.9%, or 100% of the total quantity.

如本文所使用的，术语“检测”和/或其语法等同物可以指鉴定分析物的在场或存在(presence or existence)，鉴定分析物的各个组分(individual component)，例如序列信息，和/或定量这种分析物的量。As used herein, the term “detection” and/or its grammatical equivalents may refer to the presence or existence of an analyte, the identification of individual components of the analyte, such as sequence information, and/or the quantification of the amount of such analyte.

片段化位点(Fragmentation site)Fragmentation site

在包含环状转座体的一些实施方案中，接头(linker)可以包含片段化位点。可以使用片段化位点以切割第一条形码序列和第二条形码序列之间的物理的，而不是信息的关联。切割可能是通过生化、化学或其它方式。在一些实施方案中，片段化位点可以包括可以通过各种方式片段化的核苷酸或核苷酸序列。例如，片段化位点可以包含限制性内切核酸酶位点；用RNA酶可切割的至少一个核糖核苷酸；在某些化学试剂存在下可切割的核苷酸类似物；通过用高碘酸盐处理可切割的二醇连接；用化学还原剂可切割的二硫键；可以经受光化学切割的可切割的部分；以及用肽酶或其它合适方式可切割的肽。参见例如，美国专利申请公开号2002/0208705，美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832，其每一个通过引用以其全部并入。In some embodiments that include a cyclic transposon, the linker may include a fragmentation site. The fragmentation site can be used to cleave a physical, rather than informational, association between a first and second barcode sequence. Cleavage may be biochemical, chemical, or otherwise. In some embodiments, the fragmentation site may include a nucleotide or nucleotide sequence that can be fragmented in various ways. For example, the fragmentation site may include a restriction endonuclease site; at least one ribonucleotide cleavable with RNase; a nucleotide analog cleavable in the presence of certain chemical reagents; a diol linkable by periodate treatment; a disulfide bond cleavable with a chemical reducing agent; a cleavable portion that can withstand photochemical cleavage; and a peptide cleavable with peptidase or other suitable methods. See, for example, U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety.

引物位点Primer site

在一些实施方案中，报告物部分可以包含可与引物杂交的引物位点。在一些实施方案中，报告物部分可以包括用于扩增、测序等的至少一个第一引物位点。In some embodiments, the reporter portion may include primer sites that can hybridize with primers. In some embodiments, the reporter portion may include at least one first primer site for amplification, sequencing, etc.

在一些实施方案中，转座子序列可以包括“测序衔接子(sequencing adaptor)”或“测序衔接子位点”，也就是说包含可以杂交至引物的一个或更多个位点的区域。在一些实施方案中，转座子序列可以包括用于扩增、测序等的至少一个第一引物位点。在包含环状转座体的一些实施方案中，接头可以包括测序衔接子。在包含环状转座体的更多实施方案中，接头包含至少第一引物位点和第二引物位点。在此类实施方案中引物位点的方向可以是使得杂交至第一引物位点的引物和与杂交至第二引物位点的引物处于相同的方向或不同方向。In some embodiments, the transposon sequence may include a "sequencing adapter" or "sequencing adapter site," that is, a region containing one or more sites that can hybridize to a primer. In some embodiments, the transposon sequence may include at least one first primer site for amplification, sequencing, etc. In some embodiments containing a circular transposon, the adapter may include a sequencing adapter. In more embodiments containing a circular transposon, the adapter includes at least a first primer site and a second primer site. In such embodiments, the orientation of the primer sites may be such that the primer hybridizing to the first primer site and the primer hybridizing to the second primer site are in the same or different orientations.

在一些实施方案中，接头可以包括第一引物位点、第二引物位点，具有置于其间的不可扩增位点。不可扩增位点对阻断第一和第二引物位点之间的多核苷酸链的延伸是有用的，其中多核苷酸链杂交至引物位点之一。不可扩增的位点对防止串联体(concatamer)也是有用的。不可扩增位点的实例包括核苷酸类似物、非核苷酸化学部分、氨基酸、肽和多肽。在一些实施方案中，不可扩增位点包含不与A、C、G或T显著碱基配对的核苷酸类似物。一些实施方案包括包含第一引物位点的接头、第二引物位点，具有置于其间的片段化位点。其它实施方案可以使用用于定向测序的叉形或Y形衔接子设计，如美国专利号7,741,463中所述的，其公开通过引用以其整体并入本文。In some embodiments, the adapter may include a first primer site, a second primer site, and a non-amplifiable site disposed therebetween. The non-amplifiable site is useful for blocking the elongation of the polynucleotide chain between the first and second primer sites, where the polynucleotide chain hybridizes to one of the primer sites. The non-amplifiable site is also useful for preventing concatamer formation. Examples of non-amplifiable sites include nucleotide analogs, non-nucleotide chemical moieties, amino acids, peptides, and polypeptides. In some embodiments, the non-amplifiable site comprises a nucleotide analog that does not significantly pair with A, C, G, or T bases. Some embodiments include an adapter comprising a first primer site, a second primer site, and a fragmentation site disposed therebetween. Other embodiments may use fork-shaped or Y-shaped adaptor designs for directed sequencing, as described in U.S. Patent No. 7,741,463, the disclosure of which is incorporated herein by reference in its entirety.

引物结合位点的示例性序列包括但不限于AATGATACGGCGACCACCGAGATCTACAC(P5序列)和CAAGCAGAAGACGGCATACGAGAT(P7序列)。Exemplary sequences of primer binding sites include, but are not limited to, AATGATACGGCGACCACCGAGATCTACAC (P5 sequence) and CAAGCAGAAGACGGCATACGAGAT (P7 sequence).

报告物部分Report Section

如本文所用的，术语“报告物部分”和语法等同物可以指能够确定所研究的分析物的组成(composition)、身份和/或来源的任何可识别的标签、标记物、索引、条形码或基团。As used herein, the term “report portion” and its grammatical equivalents may refer to any identifiable label, marker, index, barcode, or group that can determine the composition, identity, and/or origin of the analyte under study.

本领域技术人员将理解，可以将许多不同种类的报告物部分用于本文所述的方法和组合物，单独或与一种或更多种不同的报告物部分组合。在一些实施方案中，可以使用多于一种不同的报告物部分以同时分析多于一种的分析物。在一些实施方案中，可以同时使用多个不同的报告物部分以独特地鉴定单细胞或单细胞的组分。Those skilled in the art will understand that many different types of reporter portions can be used in the methods and compositions described herein, either alone or in combination with one or more different reporter portions. In some embodiments, more than one different reporter portion can be used to analyze more than one analyte simultaneously. In some embodiments, multiple different reporter portions can be used simultaneously to uniquely identify single cells or components of a single cell.

在某些实施方案中，报告物部分可以发射信号。信号的实例包括但不限于荧光、化学发光、生物发光、磷光、放射性、量热、离子活性、电或电化学发光信号。实例报告物分子列于例如美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832，其每一个通过引用以其全部并入。In some embodiments, the reporter component may emit a signal. Examples of signals include, but are not limited to, fluorescence, chemiluminescence, bioluminescence, phosphorescence, radioactivity, calorimetry, ionic activity, and electro- or electrochemiluminescence signals. Example reporter molecules are listed, for example, in U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety.

在一些实施方案中，报告物部分可以是衔接子。在本文所述的组合物和方法的一些实施方案中，转座子序列可以包括报告物部分。在包含环状转座体的一些实施方案中，接头或衔接子可以包含报告物部分。In some embodiments, the reporter portion may be a connector. In some embodiments of the compositions and methods described herein, the transposon sequence may include the reporter portion. In some embodiments comprising a ring-shaped transposon, the connector or connector may include the reporter portion.

在一些实施方案中，报告物部分可以不发射信号。在一些实施方案中，报告物部分可以是核酸片段，例如条形码、独特的分子索引、质粒。在一些实施方案中，报告物部分可以包含特异性结合蛋白质的抗体。在一些实施方案中，抗体可以包含可检测标记。在一些实施方案中，报告物可以包括用核酸标签标记的抗体或亲和试剂。核酸标签可以例如经由接近连接测定法(PLA)或接近延伸测定法(PEA)来检测。In some embodiments, the reporter portion may not emit a signal. In some embodiments, the reporter portion may be a nucleic acid fragment, such as a barcode, a unique molecular index, or a plasmid. In some embodiments, the reporter portion may contain an antibody that specifically binds to a protein. In some embodiments, the antibody may contain a detectable tag. In some embodiments, the reporter may include an antibody or affinity reagent labeled with a nucleic acid tag. The nucleic acid tag can be detected, for example, via proximity linkage assay (PLA) or proximity extension assay (PEA).

在一些实施方案中，可以使用一组报告物部分。在一些实施方案中，该组报告物部分可以包含报告物部分的亚组的混合物，其中报告物部分的每个亚组对不同类型的分析物，例如蛋白质、核酸、脂质、碳水化合物是特异的。在一些实施方案中，该组报告物部分可以包含报告物部分的亚组的混合物，其中报告物部分的每个亚组彼此不同，但对相同类型的分析物是特异性的。In some embodiments, a set of reporter portions may be used. In some embodiments, the set of reporter portions may comprise a mixture of subgroups of reporter portions, wherein each subgroup of reporter portions is specific to a different type of analyte, such as protein, nucleic acid, lipid, or carbohydrate. In some embodiments, the set of reporter portions may comprise a mixture of subgroups of reporter portions, wherein each subgroup of reporter portions is different from each other but is specific to the same type of analyte.

条形码barcode

通常，条形码可以包括可用于鉴定本文所述或本领域已知的一种或更多种特定分析物(例如核酸、蛋白质、代谢物或其它分析物)的一个或更多个核苷酸序列。条形码可以是人工序列，或者可以是在转座期间产生的天然存在的序列，例如在以前毗连(juxtaposed)的DNA片段的末端的相同的侧翼基因组DNA序列(g-码)。条形码可以包括至少约1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20或更多个连续的核苷酸。在一些实施方案中，条形码包含至少约10、20、30、40、50、60、70、80、90、100或更多个连续核苷酸。在一些实施方案中，在包含条形码的核酸群中的条形码的至少一部分是不同的。在一些实施方案中，条形码的至少约10％、20％、30％、40％、50％、60％、70％、80％、90％、95％、99％是不同的。在更多此类实施方案中，所有条形码是不同的。在包含条形码的核酸群中不同条形码的多样性可以随机生成或非随机生成。Typically, a barcode may comprise one or more nucleotide sequences that can be used to identify one or more specific analytes (e.g., nucleic acids, proteins, metabolites, or other analytes) described herein or known in the art. The barcode may be an artificial sequence or a naturally occurring sequence generated during transposition, such as the same flanking genomic DNA sequence (g-code) at the ends of previously juxtaposed DNA fragments. A barcode may comprise at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more consecutive nucleotides. In some embodiments, the barcode contains at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, or more consecutive nucleotides. In some embodiments, at least a portion of the barcodes within the nucleic acid group containing the barcode is distinct. In some implementations, at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, and 99% of the barcodes are different. In many more such implementations, all barcodes are different. The diversity of different barcodes in the nucleic acid population containing the barcodes can be generated randomly or non-randomly.

在一些实施方案中，转座子序列包含至少一个条形码。在一些实施方案中，例如包含两个不连续转座子序列的转座体，第一转座子序列包含第一条形码，和第二转座子序列包括第二条形码。在一些实施方案中，例如在环状转座体中，转座子序列包含条形码，该条形码包含第一条形码序列和第二条形码序列。在前述实施方案的一些中，可以鉴定或指定第一条形码序列以与第二条形码序列配对。例如，可以知晓，使用包含已知彼此配对的多个第一和第二条形码序列的参考表，已知的第一条形码序列与已知的第二条形码序列配对。In some embodiments, the transposon sequence includes at least one barcode. In some embodiments, such as a transposon comprising two discontinuous transposon sequences, the first transposon sequence including a first barcode and the second transposon sequence including a second barcode. In some embodiments, such as in a circumferential transposon, the transposon sequence includes a barcode comprising a first barcode sequence and a second barcode sequence. In some of the foregoing embodiments, the first barcode sequence can be identified or designated to pair with the second barcode sequence. For example, it can be known that, using a reference table containing a plurality of first and second barcode sequences known to pair with each other, a known first barcode sequence is paired with a known second barcode sequence.

在另一实例中，第一条形码序列可以包含与第二条形码序列相同的序列。在另一实例中，第一条形码序列可以包含第二条形码序列的反向互补物。在一些实施方案中，第一条形码序列和第二条形码序列是不同的。第一和第二条形码序列可以包含双码。In another instance, the first barcode sequence may contain the same sequence as the second barcode sequence. In yet another instance, the first barcode sequence may contain the inverse complement of the second barcode sequence. In some implementations, the first and second barcode sequences are different. The first and second barcode sequences may contain dual codes.

在本文所述的组合物和方法的一些实施方案中，条形码用于模板核酸的制备。如将理解的，大量可用的条形码允许每个模板核酸分子包含独特标识(uniqueIdentification)。模板核酸混合物中每个分子的独特标识可用于多种应用。例如，在具有多个染色体的样品中、在基因组中、在细胞中、在细胞类型中、在细胞疾病状态中和在物种中，例如在单倍型测序中、在亲代等位基因鉴别中、在宏基因组测序中，和在基因组样品测序中，可以将独特识别的分子用于鉴定个体核酸分子(individual nucleic acidmolecule)。示例性条形码序列包括但不限于TATAGCCT、ATAGAGGC、CCTATCCT、GGCTCTGA、AGGCGAAG、TAATCTTA、CAGGACGT和GTACTGAC。In some embodiments of the compositions and methods described herein, barcodes are used for the preparation of template nucleic acids. As will be understood, a wide range of available barcodes allow each template nucleic acid molecule to contain a unique identifier. The unique identifier of each molecule in a mixture of template nucleic acids can be used for a variety of applications. For example, in samples with multiple chromosomes, in genomes, in cells, in cell types, in cellular disease states, and in species, such as in haplotype sequencing, in parental allele identification, in metagenomic sequencing, and in genome sample sequencing, uniquely identified molecules can be used to identify individual nucleic acid molecules. Exemplary barcode sequences include, but are not limited to, TATAGCCT, ATAGAGGC, CCTATCCT, GGCTCTGA, AGGCGAAG, TAATCTTA, CAGGACGT, and GTACTGAC.

接头connector

包含环状转座体的一些实施方案包含转座子序列，该转座子序列包含第一条形码序列和第二条形码序列，具有置于其间的接头。在其它实施方案中，接头可以不存在，或可以是将一个核苷酸连接到另一个核苷酸的糖-磷酸骨架。接头可包含例如核苷酸、核酸、非核苷酸化学部分、核苷酸类似物、氨基酸、肽、多肽或蛋白质中的一种或更多种。在优选的实施方案中，接头包含核酸。接头可包含至少约1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20或更多个核苷酸。在一些实施方案中，接头可以包含至少约10、20、30、40、50、60、70、80、90、100、200、300、400、500或更多个核苷酸。Some embodiments including a cyclic transposon include a transposon sequence comprising a first barcode sequence and a second barcode sequence, with a linker disposed therebetween. In other embodiments, the linker may be absent, or may be a sugar-phosphate backbone linking one nucleotide to another. The linker may comprise one or more of, for example, nucleotides, nucleic acids, non-nucleotide chemical moieties, nucleotide analogs, amino acids, peptides, polypeptides, or proteins. In a preferred embodiment, the linker comprises a nucleic acid. The linker may comprise at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more nucleotides. In some embodiments, the linker may comprise at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, or more nucleotides.

在一些实施方案中，接头可以例如通过PCR、滚环扩增、链置换扩增等可扩增。在其它实施方案中，接头可以包含不可扩增部分。不可扩增接头的实例包括有机化学接头如烷基、丙基、PEG；非天然碱基，如IsoC、isoG；或在基于DNA的扩增方案中不扩增的任何基团。例如，含有isoC、isoG对的转座子可以用缺乏互补isoG和isoC的dNTP混合物扩增，确保跨越插入的转座子不发生扩增。In some implementations, the adapter can be amplifiable, for example, by PCR, rolling circle amplification, strand displacement amplification, etc. In other implementations, the adapter may contain a non-amplifiable portion. Examples of non-amplifiable adapters include organic chemical adapters such as alkyl, propyl, and PEG; non-natural bases such as IsoC and isoG; or any group that is not amplified in DNA-based amplification protocols. For example, transposons containing isoC and isoG pairs can be amplified using a mixture of dNTPs lacking complementary isoG and isoC, ensuring that amplification does not occur across the inserted transposon.

在一些实施方案中，接头包含单链核酸。在一些实施方案中，接头以5'-3'方向、5'-5'方向或3'-3'方向偶联转座子序列。In some implementations, the adapter comprises a single-stranded nucleic acid. In some implementations, the adapter is coupled with a transposon sequence in a 5'-3', 5'-5', or 3'-3' orientation.

亲和标签Affinity Label

在一些实施方案中，转座子序列可以包含亲和标签。在包含环状转座体的一些实施方案中，接头可以包含亲和标签。亲和标签可用于各种应用，例如杂交至杂交标签的目标核酸的大量分离。额外的应用包括但不限于，例如，使用亲和标签用于纯化转座酶/转座子复合物和转座子插入目标DNA、目标RNA或目标蛋白质。如本文所用的，术语“亲和标签”和语法等同物可以指多组分复合物的组分，其中多组分复合物的组分特异性地相互作用或彼此结合。例如，亲和标签的实例可以包含分别结合链霉抗生物素蛋白质或镍的生物素或多聚His(poly-His)。列出了多组分亲和标签复合物的其它实例，例如美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832，其每一个通过引用以其全部并入。In some embodiments, the transposon sequence may contain an affinity tag. In some embodiments that include a circular transposon, the adapter may contain an affinity tag. Affinity tags can be used for a variety of applications, such as the mass isolation of target nucleic acids from hybridization to hybridization tags. Additional applications include, for example, using affinity tags for the purification of transposase/transposon complexes and transposon insertions into target DNA, target RNA, or target proteins. As used herein, the term “affinity tag” and its grammatical equivalents may refer to components of a multicomponent complex, wherein the components of the multicomponent complex specifically interact with or bind to each other. For example, an example of an affinity tag may contain biotin or poly-His, which binds to streptavidin protein or nickel, respectively. Other examples of multicomponent affinity tag complexes are listed, such as U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety.

固体支持物solid support

固体支持物可以是二维或三维的并且可以包括平面表面(例如，载玻片)或可以成形。固体支持物可以包括玻璃(例如、受控孔玻璃(CPG))、石英、塑料(例如聚苯乙烯(低交联和高交联聚苯乙烯)、聚碳酸酯、聚丙烯和聚(甲基丙烯酸甲酯))、丙烯酸共聚物、聚酰胺、硅、金属(例如，链烷硫醇(alkanethiolate)衍生的金)、纤维素、尼龙、胶乳、葡聚糖、凝胶基质(例如硅胶)、聚丙烯醛(polyacrolein)或复合材料。Solid supports can be two-dimensional or three-dimensional and can include planar surfaces (e.g., glass slides) or can be shaped. Solid supports can include glass (e.g., controlled-pore glass (CPG)), quartz, plastics (e.g., polystyrene (low-crosslinked and high-crosslinked polystyrene), polycarbonate, polypropylene, and poly(methyl methacrylate)), acrylic copolymers, polyamides, silicon, metals (e.g., alkanethiolate-derived gold), cellulose, nylon, latex, dextran, gel matrices (e.g., silica gel), polyacrolein, or composite materials.

合适的三维固体支持物包括，例如，球、微粒、珠、纳米颗粒、聚合物基质例如琼脂糖、聚丙烯酰胺、藻酸盐、膜、载玻片、平板、微机械加工的芯片、管(例如，毛细管)、微孔、微流体装置、通道、过滤器、流动池、适于固定核酸、蛋白质或细胞的结构。固体支持物可以包含能够具有包含模板核酸或引物的群的区域的平面阵列或基质。实例包括核苷衍生的CPG和聚苯乙烯载玻片；衍生的磁性载玻片；用聚乙二醇接枝的聚苯乙烯等。Suitable three-dimensional solid supports include, for example, spheres, microparticles, beads, nanoparticles, polymer matrices such as agarose, polyacrylamide, alginate, membranes, glass slides, plates, micromachined chips, tubes (e.g., capillaries), micropores, microfluidic devices, channels, filters, flow cells, and structures suitable for immobilizing nucleic acids, proteins, or cells. Solid supports can comprise planar arrays or matrices capable of having regions containing clusters of template nucleic acids or primers. Examples include nucleoside-derived CPG and polystyrene slides; derived magnetic slides; and polystyrene grafted with polyethylene glycol, etc.

在一些实施方案中，固体支持物包含微球或珠。本文中的“微球”或“珠”或“颗粒”或语法等效物是指小的离散颗粒。合适的珠组合物包括但不限于塑料、陶瓷、玻璃、聚苯乙烯、甲基苯乙烯、丙烯酸聚合物、顺磁性材料、氧化钍溶胶、碳石墨、二氧化钛、胶乳或交联的葡聚糖，如琼脂糖(Sepharose)、纤维素、尼龙、交联胶束和聚四氟乙烯，以及用于固体支持物的本文概述的任何其它材料都可以使用。来自Bangs Laboratories，Fishers Ind.的“Microsphere Detection Guide”是一个有用的指南。在某些实施方案中，微球是磁性微球或珠。在一些实施方案中，珠可以是彩色编码的。例如，可以使用来自Luminex，Austin，TX的微球。In some embodiments, the solid support comprises microspheres or beads. The terms "microsphere," "bead," "particle," or grammatical equivalents as used herein refer to small, discrete particles. Suitable bead compositions include, but are not limited to, plastics, ceramics, glass, polystyrene, methylstyrene, acrylic polymers, paramagnetic materials, thorium oxide sol, carbon graphite, titanium dioxide, latex, or cross-linked dextran such as agarose (Sepharose), cellulose, nylon, cross-linked micelles, and polytetrafluoroethylene, as well as any other materials outlined herein for the solid support. The "Microsphere Detection Guide" from Bangs Laboratories, Fishers Ind. is a useful guide. In some embodiments, the microspheres are magnetic microspheres or beads. In some embodiments, the beads may be color-coded. For example, microspheres from Luminex, Austin, TX may be used.

珠不需要是球形的；可以使用不规则的颗粒。可选地或额外地，珠可以是多孔的。珠的尺寸范围为从纳米，即，100纳米，至以毫米，即1mm，具有从约0.2微米至约200微米的珠是优选的，以及从约0.5至约5微米是特别优选的，尽管在一些实施方案中可以使用更小或更大的珠。在一些实施方案中、珠在直径上可以是约1、1.5、2、2.5、2.8、3、3.5、4、4.5、5、5.5、6、6.5、7、7.5、8、8.5、9、9.5、10、10.5、15或20μm。The beads do not need to be spherical; irregular particles can be used. Optionally or additionally, the beads can be porous. The size of the beads ranges from nanometers, i.e., 100 nanometers, to millimeters, i.e., 1 mm, with beads having a size from about 0.2 micrometers to about 200 micrometers being preferred, and from about 0.5 to about 5 micrometers being particularly preferred, although smaller or larger beads can be used in some embodiments. In some embodiments, the beads can be about 1, 1.5, 2, 2.5, 2.8, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 15, or 20 μm in diameter.

在一些实施方案中，珠可以包含抗体或其它亲和探针(参见mmobilizedBiomolecules in Analysis.A Practical Approach.Cass T，Ligler F S，eds.OxfordUniversity Press，New York，1998.pp 1-14，通过引用并入本文，用于典型的附接方案)。在一些实施方案中，抗体可以是单克隆的，在其它实施方案中，抗体可以是多克隆的。在一些实施方案中，抗体可以对细胞表面表位是特异的。在一些实施方案中，抗体可以对细胞内的蛋白质是特异的。In some embodiments, the beads may contain antibodies or other affinity probes (see *mmobilized Biomolecules in Analysis. A Practical Approach. Cass T, Ligler F S, eds. Oxford University Press, New York, 1998. pp 1-14, incorporated herein by reference for typical attachment schemes). In some embodiments, the antibody may be monoclonal; in other embodiments, the antibody may be polyclonal. In some embodiments, the antibody may be specific to cell surface epitopes. In some embodiments, the antibody may be specific to intracellular proteins.

在一些实施方案中，本文提供的核酸模板可以附接至固体支持物。可以使用本领域众所周知的各种方法将核酸附接、锚定或固定至固体支持物的表面。In some implementations, the nucleic acid templates provided herein can be attached to a solid support. Nucleic acids can be attached, anchored, or fixed to the surface of the solid support using various methods well known in the art.

分析物Analytes

分析物是研究其功能、组成、身份和/或其来源的生物分子。示例性分析物包括但不限于DNA、RNA、cDNA、蛋白质、脂质、碳水化合物、细胞的细胞器(例如核、高尔基体、核糖体、线粒体、内质网、叶绿体、细胞膜等)、细胞代谢物、组织切片、细胞、单细胞、来自细胞或来自单细胞的内容物、从细胞或单细胞分离的核酸，或从细胞或单细胞分离并进一步修饰的核酸，或无细胞的DNA(例如，来自胎盘液或血浆)。An analyte is a biomolecule used to study its function, composition, identity, and/or origin. Exemplary analytes include, but are not limited to, DNA, RNA, cDNA, proteins, lipids, carbohydrates, cellular organelles (e.g., nucleus, Golgi apparatus, ribosomes, mitochondria, endoplasmic reticulum, chloroplasts, cell membranes, etc.), cellular metabolites, tissue sections, cells, single cells, contents derived from cells or single cells, nucleic acids isolated from cells or single cells, or nucleic acids isolated from cells or single cells and further modified, or cell-free DNA (e.g., from placental fluid or plasma).

目标核酸Target nucleic acid

目标核酸可以包括感兴趣的任何核酸。在一个实施方案中，目标核酸可以包括包含、捕获、包埋或固定在CE中的感兴趣的任何核酸，该CE例如基质、液滴、乳剂、固体支持物，或隔室，其在内部保持核酸邻近但允许对液体和酶试剂的可及性。目标核酸可以包括DNA、cDNA、WGA的产物、RNA、肽核酸、吗啉代核酸、锁定核酸、二醇核酸、苏糖核酸、核酸的混合样品、多倍性DNA(即，植物DNA)、其混合物、以及其杂合体(hybrid)。在优选的实施方案中，使用基因组DNA片段或其扩增的拷贝作为目标核酸。在另一个优选的实施方案中，使用cDNA、线粒体DNA或叶绿体DNA。The target nucleic acid can include any nucleic acid of interest. In one embodiment, the target nucleic acid can include any nucleic acid of interest contained, captured, embedded, or immobilized in a CE, such as a matrix, droplet, emulsion, solid support, or compartment that keeps the nucleic acid in proximity internally but allows access to liquid and enzyme reagents. The target nucleic acid can include DNA, cDNA, products of WGA, RNA, peptide nucleic acid, morpholinonucleotide, locked nucleic acid, diol nucleic acid, threonine nucleic acid, mixed samples of nucleic acids, polyploid DNA (i.e., plant DNA), mixtures thereof, and hybrids thereof. In a preferred embodiment, a fragment of genomic DNA or an amplified copy thereof is used as the target nucleic acid. In another preferred embodiment, cDNA, mitochondrial DNA, or chloroplast DNA is used.

目标核酸可以包含任何核苷酸序列。在一些实施方案中，目标核酸包含均聚物序列。目标核酸还可以包括重复序列。重复序列可以是各种长度中的任何一种，包括例如2、5、10、20、30、40、50、100、250、500或1000个核苷酸或更多个核苷酸。重复序列可以是重复的，连续地或不连续地，各种次数的任一个包括例如2、3、4、5、6、7、8、9、10、15或20次或更多。The target nucleic acid can contain any nucleotide sequence. In some embodiments, the target nucleic acid contains a homopolymer sequence. The target nucleic acid may also include repetitive sequences. The repetitive sequences can be of any length, including, for example, 2, 5, 10, 20, 30, 40, 50, 100, 250, 500, or 1000 nucleotides or more. The repetitive sequences can be repeated, continuously or discontinuously, and any number of repetitions includes, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, or 20 times or more.

本文所述的一些实施方案可以利用单个目标核酸。其它实施方案可以利用多个目标核酸。在此类实施方案中，多个目标核酸可以包括多个相同的目标核酸，其中一些目标核酸是相同的多个不同的目标核酸，或其中所有目标核酸是不同的多个目标核酸。利用多个目标核酸的实施方案可以以多重形式进行，使得试剂同时递送至目标核酸，例如在一个或更多个室中或在阵列表面上。在一些实施方案中，多个目标核酸可以包括基本上所有特定生物体的基因组。多个目标核酸可以包括特定生物体基因组的至少一部分，包括例如基因组的至少约1％、5％、10％、25％、50％、75％、80％、85％、90％、95％或99％。在具体实施方案中，该部分可以具有基因组的至多约1％、5％、10％、25％、50％、75％、80％、85％、90％、95％或99％的上限。Some embodiments described herein may utilize a single target nucleic acid. Other embodiments may utilize multiple target nucleic acids. In such embodiments, multiple target nucleic acids may include multiple identical target nucleic acids, some of which are multiple different target nucleic acids, or all of which are multiple different target nucleic acids. Embodiments utilizing multiple target nucleic acids may be carried out in multiple forms, such that reagents are delivered simultaneously to the target nucleic acids, for example, in one or more chambers or on an array surface. In some embodiments, multiple target nucleic acids may include substantially all of the genome of a particular organism. Multiple target nucleic acids may include at least a portion of the genome of a particular organism, including, for example, at least about 1%, 5%, 10%, 25%, 50%, 75%, 80%, 85%, 90%, 95%, or 99% of the genome. In specific embodiments, this portion may have an upper limit of up to about 1%, 5%, 10%, 25%, 50%, 75%, 80%, 85%, 90%, 95%, or 99% of the genome.

目标核酸可以从任何来源获得。例如，目标核酸可以从获自单个生物体的核酸分子或从获自包含一种或多种生物体的天然来源的核酸分子的群制备。核酸分子的来源包括但不限于细胞器、细胞、组织、器官或生物体。可用作目标核酸分子来源的细胞可以是原核的(细菌细胞，例如埃希氏菌属、芽孢杆菌属、沙雷氏菌属、沙门氏菌属、葡萄球菌属、链球菌属、梭菌属、衣原体属、奈瑟球菌属、螺旋体虫属、支原体属、疏螺旋体属、军团菌属、假单胞菌属、分枝杆菌、螺杆菌属、欧文氏菌属、土壤杆菌属、根瘤菌属和链霉菌属)；古细菌(archeaon)，例如泉古菌(crenarchaeota)、纳古生菌(nanoarchaeota)或广古菌门(euryarchaeotia)；或真核生物如真菌(例如酵母)、植物、原生动物和其它寄生虫，以及动物(包括昆虫(例如果蝇属物种)、线虫(例如秀丽隐杆线虫))，和哺乳动物(例如大鼠、小鼠、猴、非人类灵长类和人类)。The target nucleic acid can be obtained from any source. For example, the target nucleic acid can be prepared from nucleic acid molecules obtained from a single organism or from a group of nucleic acid molecules obtained from natural sources comprising one or more organisms. Sources of nucleic acid molecules include, but are not limited to, organelles, cells, tissues, organs, or organisms. Cells that can be used as sources of target nucleic acid molecules can be prokaryotes (bacterial cells, such as Escherichia, Bacillus, Serratia, Salmonella, Staphylococcus, Streptococcus, Clostridium, Chlamydia, Neisseria, Spirochetes, Mycoplasma, Leptospira, Legionella, Pseudomonas, Mycobacterium, Helicobacter, Erwinia, Agrobacterium, Rhizobium, and Streptomyces); archaea, such as Crenarchaeota, Nanoarchaeota, or Euryarchaeotia; or eukaryotes such as fungi (e.g., yeast), plants, protozoa, and other parasites, as well as animals (including insects (e.g., Drosophila species), nematodes (e.g., Caenorhabditis elegans)), and mammals (e.g., rats, mice, monkeys, non-human primates, and humans).

为了某些感兴趣的序列可以使用本领域众所周知的各种方法富集目标核酸和模板核酸。此类方法的实例在国际公开号WO/2012/108864中提供，其通过引用以其全部并入本文。在一些实施方案中，在制备模板文库的方法期间可以进一步富集核酸。例如，对于某些序列，可以在转座体的插入之前、在转座体的插入之后和/或核酸的扩增之后富集核酸。For certain sequences of interest, target and template nucleic acids can be enriched using various methods well known in the art. Examples of such methods are provided in International Publication No. WO/2012/108864, which is incorporated herein by reference in its entirety. In some embodiments, nucleic acids can be further enriched during the method for preparing the template library. For example, for certain sequences, nucleic acids can be enriched before transposon insertion, after transposon insertion, and/or after nucleic acid amplification.

此外，在一些实施方案中，目标核酸和/或模板核酸可以是高度纯化的，例如，在用于本文提供的方法之前，核酸可以是至少约70％、80％、90％、95％、96％、97％、98％、99％或100％不含污染物。在一些实施方案中，使用本领域已知的保持目标核酸的质量和大小的方法是有益的，例如可以使用琼脂糖栓(agarose plug)进行目标DNA的分离和/或直接转座。Furthermore, in some embodiments, the target nucleic acid and/or template nucleic acid can be highly purified; for example, the nucleic acid can be at least about 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% free of contaminants prior to use in the methods provided herein. In some embodiments, it is advantageous to use methods known in the art for maintaining the quality and size of the target nucleic acid, such as using an agarose plug for the isolation and/or direct transposition of the target DNA.

在一些实施方案中，目标核酸可以获自生物样品或患者样品。如本文使用的术语“生物样品”或“患者样品”包括例如一个或更多个细胞、组织或体液的样品。“体液”可以包括但不限于血液、血清、血浆、唾液、脑脊髓液、胸膜液、眼泪、乳腺导管液、淋巴、痰、尿液、羊水或精液。样品可以包括是“无细胞(acellular)”的体液。“无细胞体液”包括小于约1％(w/w)的全细胞材料。血浆或血清是无细胞体液的实例。样品可以包括天然或合成来源的样本(specimen)(即，制成是无细胞的细胞样品)。In some implementations, the target nucleic acid may be obtained from biological samples or patient samples. As used herein, the terms "biological sample" or "patient sample" include, for example, samples of one or more cells, tissues, or body fluids. "Body fluids" may include, but are not limited to, blood, serum, plasma, saliva, cerebrospinal fluid, pleural fluid, tears, mammary duct fluid, lymph, sputum, urine, amniotic fluid, or semen. Samples may include body fluids that are "acellular." "Cellular body fluids" include less than about 1% (w/w) of whole-cell material. Plasma or serum are examples of cellless body fluids. Samples may include specimens of natural or synthetic origin (i.e., prepared as cell-free cellular samples).

如本文所用的，术语“血浆”是指在血液中发现的无细胞液。通过本领域已知的方法(例如，离心、过滤等)从血液中除去全细胞材料可以从血液获得“血浆”。As used herein, the term "plasma" refers to the cell-free fluid found in blood. "Plasma" can be obtained from blood by removing whole-cell material from it using methods known in the art (e.g., centrifugation, filtration, etc.).

制备模板核酸的某些方法Some methods for preparing template nucleic acids

一些实施方案包括制备模板核酸的方法。如本文所用的，“模板核酸”可以指用于获得序列信息的底物(substrate)。在一些实施方案中，模板核酸可以包括目标核酸、其片段，或其任何拷贝，包含至少一个转座子序列，其片段，或其任何拷贝。在一些实施方案中，模板核酸可以包括包含测序衔接子的目标核酸，例如测序引物位点。在一些实施方案中，CE可以包含目标核酸。Some implementations include methods for preparing template nucleic acids. As used herein, "template nucleic acid" can refer to a substrate used to obtain sequence information. In some implementations, the template nucleic acid may include a target nucleic acid, a fragment thereof, or any copy thereof, containing at least one transposon sequence, a fragment thereof, or any copy thereof. In some implementations, the template nucleic acid may include a target nucleic acid containing a sequencing adaptor, such as a sequencing primer site. In some implementations, the CE may contain the target nucleic acid.

制备模板核酸的一些方法包括将转座子序列插入目标核酸中，从而制备模板核酸。一些插入的方法包括在足以用于一种或多种转座子序列整合入目标核酸中的条件下，在酶例如转座酶或整合酶存在下使本文提供的转座子序列与目标核酸接触。在一些实施方案中，CE可以包含此类目标核酸。Some methods for preparing template nucleic acids involve inserting a transposon sequence into a target nucleic acid. Some insertion methods involve contacting the transposon sequence provided herein with the target nucleic acid in the presence of an enzyme such as a transposase or integrase, under conditions sufficient for the integration of one or more transposon sequences into the target nucleic acid. In some embodiments, the CE may comprise such a target nucleic acid.

在一些实施方案中，将转座子序列插入目标核酸中可以是非随机的。在一些实施方案中，转座子序列可以与包含在某些位点抑制整合的蛋白质的目标核酸接触。例如，可以抑制转座子序列整合到包含蛋白质的基因组DNA、包含染色质的基因组DNA、包含核小体的基因组DNA、或包含组蛋白质的基因组DNA中。在一些实施方案中，转座子序列可以与亲和标签相关联，以便在特定的序列将转座子序列整合到目标核酸中。例如，转座子序列可以与靶向特定核酸序列的蛋白质相关联，例如组蛋白质、染色质结合蛋白质、转录因子、起始因子等，以及结合到特定的序列特异性核酸结合蛋白质的抗体或抗体片段。在示例性实施方案中，转座子序列与亲和标签例如生物素相关联；亲和标签可以与核酸结合蛋白质相关联。在一些实施方案中，CE可以包含此类目标核酸。In some embodiments, the insertion of the transposon sequence into the target nucleic acid may be non-random. In some embodiments, the transposon sequence may contact the target nucleic acid containing proteins that inhibit integration at certain sites. For example, the transposon sequence may inhibit integration into genomic DNA containing proteins, genomic DNA containing chromatin, genomic DNA containing nucleosomes, or genomic DNA containing histones. In some embodiments, the transposon sequence may be associated with an affinity tag to integrate the transposon sequence into the target nucleic acid at a specific sequence. For example, the transposon sequence may be associated with proteins that target a specific nucleic acid sequence, such as histones, chromatin-binding proteins, transcription factors, initiation factors, etc., and antibodies or antibody fragments that bind to specific sequence-specific nucleic acid-binding proteins. In exemplary embodiments, the transposon sequence is associated with an affinity tag such as biotin; the affinity tag may be associated with a nucleic acid-binding protein. In some embodiments, the CE may contain such a target nucleic acid.

应当理解，在将一些转座子序列整合到目标核酸中期间，在整合位点处的目标核酸的几个连续核苷酸在整合产物中是重复的(duplicated)。因此，整合的产物可以在目标核酸中在整合序列的每个末端包括重复的序列。如本文所用的，术语“宿主标签”或“g标签”可以指在整合的转座子序列的每个末端是重复的目标核酸序列。可以通过转座子序列的插入产生的核酸的单链部分可以通过本领域众所周知的多种方法进行修复，例如通过使用连接酶、寡核苷酸和/或聚合酶。It should be understood that during the integration of transposon sequences into the target nucleic acid, several consecutive nucleotides of the target nucleic acid at the integration site are duplicated in the integration product. Therefore, the integration product may include a repeating sequence at each end of the integrated sequence in the target nucleic acid. As used herein, the term "host tag" or "g tag" may refer to a target nucleic acid sequence that is repeated at each end of the integrated transposon sequence. Single-stranded portions of nucleic acids resulting from the insertion of transposon sequences can be repaired using a variety of methods well known in the art, such as by using ligases, oligonucleotides, and/or polymerases.

在一些实施方案中，将本文提供的多个转座子序列插入目标核酸中。一些实施方案包括选择足以实现将多个转座子序列整合到目标核酸中的条件，使得每个整合的转座子序列之间的平均距离在目标核酸中包含一定数目的连续核苷酸。In some implementations, multiple transposon sequences provided herein are inserted into the target nucleic acid. Some implementations include selecting conditions sufficient to integrate multiple transposon sequences into the target nucleic acid such that the average distance between each integrated transposon sequence comprises a certain number of consecutive nucleotides in the target nucleic acid.

一些实施方案包括选择足以实现将一种或多种转座子序列插入目标核酸中而不是插入另一种或多种转座子序列中的条件。可以使用多种方法来降低转座子序列插入另一个转座子序列中的可能性。可以在例如美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832中找到用于本文提供的实施方案的此类方法的实例，其每一个通过引用以其全部并入。Some implementations include selecting conditions sufficient to allow insertion of one or more transposon sequences into a target nucleic acid rather than into another one or more transposon sequences. Various methods can be used to reduce the likelihood of a transposon sequence inserting into another transposon sequence. Examples of such methods used in the implementations provided herein can be found, for example, in U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety.

在一些实施方案中，可以选择条件，使得整合的转座子序列之间的目标核酸中的平均距离为至少约5、10、20、30、40、50、60、70、80、90、100或更多连续核苷酸。在一些实施方案中，整合的转座子序列之间的目标核酸中的平均距离为至少约100、200、300、400、500、600、700、800、900、1000或更多连续的核苷酸。在一些实施方案中，整合的转座子序列之间的目标核酸中的平均距离为至少约1kb、2kb、3kb、4kb、5kb、6kb、7kb、8kb、90kb、100kb，或更多连续的核苷酸。在一些实施方案中，整合的转座子序列之间的目标核酸中的平均距离为至少约100kb、200kb、300kb、400kb、500kb、600kb、700kb、800kb、900kb、1000kb，或更多连续的核苷酸。如将理解的，可以选择的一些条件包括将目标核酸与一定数量的转座子序列接触。In some embodiments, conditions can be selected such that the average distance in the target nucleic acid between the integrated transposon sequences is at least about 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more consecutive nucleotides. In some embodiments, the average distance in the target nucleic acid between the integrated transposon sequences is at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more consecutive nucleotides. In some embodiments, the average distance in the target nucleic acid between the integrated transposon sequences is at least about 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 90 kb, 100 kb or more consecutive nucleotides. In some implementations, the average distance between the integrated transposon sequences in the target nucleic acid is at least about 100 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb, 1000 kb, or more consecutive nucleotides. As will be understood, some optional conditions include contacting the target nucleic acid with a certain number of transposon sequences.

本文所述的方法的一些实施方案包括选择足以实现将转座子序列的至少一部分整合到不同的目标核酸中的条件。在本文所述的方法和组合物的优选实施方案中，整合到目标核酸中的每个转座子序列是不同的。可以选择以实现转座子序列的某一部分整合到不同的目标序列中的一些条件包括选择转座子序列群的多样性程度。如将理解的，转座子序列的多样性部分地由于此类转座子序列的条形码的多样性而产生。因此，一些实施方案包括提供其中条形码的至少一部分是不同的转座子序列群。在一些实施方式中，在转座子序列群中条形码的至少约10％、20％、30％、40％、50％、60％、70％、80％、90％、95％、98％、99％或100％是不同的。在一些实施方案中，整合入目标核酸中的转座子序列的至少一部分是相同的。Some embodiments of the methods described herein include selecting conditions sufficient to achieve the integration of at least a portion of a transposon sequence into different target nucleic acids. In preferred embodiments of the methods and compositions described herein, each transposon sequence integrated into the target nucleic acid is different. Some conditions that can be selected to achieve the integration of a portion of a transposon sequence into different target sequences include selecting the degree of diversity of the transposon sequence groups. As will be understood, the diversity of transposon sequences arises in part from the diversity of barcodes of such transposon sequences. Therefore, some embodiments include providing a transposon sequence group in which at least a portion of the barcode is a different transposon sequence group. In some embodiments, at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 100% of the barcodes in the transposon sequence group are different. In some embodiments, at least a portion of the transposon sequence integrated into the target nucleic acid is identical.

制备模板核酸的一些实施方案可以包括复制包含目标核酸的序列。例如，一些实施方案包括将引物杂交至整合到目标核酸中的转座子序列的引物位点。在一些此类实施方案中，可以将引物杂交至引物位点并延伸。复制的序列可以包括至少一个条形码序列和目标核酸的至少一部分。在一些实施方案中，拷贝的序列可以包括第一条形码序列、第二条形码序列、以及置于其间的目标核酸的至少一部分。在一些实施方案中，至少一个复制的核酸可以包括可被鉴定或指定为与第二复制核酸的第二条形码序列配对的第一复制核酸的至少第一条形码序列。在一些实施方案中，引物可以包括测序引物。在一些实施方案中，使用测序引物获得测序数据。在更多的实施方案中，包含引物位点的衔接子可以连接至核酸的每个末端，并且从此类引物位点扩增核酸。Some embodiments for preparing template nucleic acids may include replicating a sequence containing a target nucleic acid. For example, some embodiments include hybridizing primers to primer sites of transposon sequences integrated into the target nucleic acid. In some such embodiments, primers may be hybridized to primer sites and extended. The replicated sequence may include at least one barcode sequence and at least a portion of the target nucleic acid. In some embodiments, the copied sequence may include a first barcode sequence, a second barcode sequence, and at least a portion of the target nucleic acid disposed therebetween. In some embodiments, at least one replicated nucleic acid may include at least a first barcode sequence of a first replicated nucleic acid that can be identified or designated as pairing with a second barcode sequence of a second replicated nucleic acid. In some embodiments, primers may include sequencing primers. In some embodiments, sequencing data is obtained using sequencing primers. In more embodiments, an adaptor containing a primer site may be ligated to each end of the nucleic acid, and the nucleic acid is amplified from such primer sites.

制备模板核酸的一些实施方案可以包括扩增包含一个或更多个转座子序列的至少一部分和目标核酸的至少一部分的序列。在一些实施方案中，目标核酸的至少一部分可以使用杂交至整合到目标核酸中的整合转座子序列的引物位点的引物进行扩增。在一些此类实施方案中，扩增的核酸可以包括第一条形码序列和第二条形码序列，其间置有目标核酸的至少一部分。在一些实施方案中，至少一个扩增的核酸可以包括可被鉴定为与第二扩增序列的第二条形码序列配对的第一扩增核酸的至少第一条形码序列。Some embodiments for preparing template nucleic acids may include amplifying a sequence comprising at least a portion of one or more transposon sequences and at least a portion of a target nucleic acid. In some embodiments, at least a portion of the target nucleic acid may be amplified using primers that hybridize to primer sites that integrate transposon sequences into the target nucleic acid. In some such embodiments, the amplified nucleic acid may include a first barcode sequence and a second barcode sequence with at least a portion of the target nucleic acid disposed therebetween. In some embodiments, at least one amplified nucleic acid may include at least a first barcode sequence of a first amplified nucleic acid that can be identified as pairing with a second barcode sequence of a second amplified sequence.

制备模板核酸的一些方法包括插入包含单链接头的转座子序列。在一个实例中，将转座子序列(ME-P1-接头-P2-ME；镶嵌末端(mosaic end)-引物位点1-接头-引物位点2-镶嵌末端)插入到目标核酸中。可以延伸和扩增具有插入的转座子/接头序列的目标核酸。Some methods for preparing template nucleic acids involve inserting transposon sequences containing a single linker. In one example, a transposon sequence (ME-P1-linker-P2-ME; mosaic end-primer site 1-linker-primer site 2-mosaic end) is inserted into the target nucleic acid. The target nucleic acid with the inserted transposon/linker sequence can be extended and amplified.

在本文所述的组合物和方法的一个实施方案中，使用具有对称可转座末端序列以产生末端加标签的(end-tagged)目标核酸片段(标签片段化的片段或标签片段化)的转座体。因此，每个标签片段化的片段包含相同的末端，缺乏方向性。然后可以应用使用转座子末端序列的单引物PCR以扩增从2n到2n*2^x的模板拷贝数，其中x对应于PCR循环的数目。在随后的步骤中，使用引物的PCR可以添加额外的序列，例如测序衔接子序列。In one embodiment of the compositions and methods described herein, transposable transposable ends are used to generate end-tagged target nucleic acid fragments (tagged fragments or tag fragments). Therefore, each tagged fragment contains the same ends and lacks directionality. Single-primer PCR using the transposon end sequences can then be applied to amplify template copy numbers from 2n to 2n* ^2x , where x corresponds to the number of PCR cycles. In subsequent steps, primer-based PCR can be used to add additional sequences, such as sequencing adaptor sequences.

在一些实施方案中，为每个模板核酸以掺入至少一个通用引物位点可以是有利的。例如，模板核酸可以包括包含第一通用引物位点的第一末端序列和包含第二通用引物位点的第二末端序列。通用引物位点可以具有各种应用，例如在扩增、测序、和/或鉴定一种或更多种模板核酸中使用。第一和第二通用引物位点可以相同、基本相似、相似或不同。可以通过本领域众所周知的各种方法将通用引物位点引入核酸，例如，引物位点至核酸的连接，使用加尾引物的核酸的扩增，以及包含通用引物位点的转座子序列的插入。In some implementations, it may be advantageous to incorporate at least one universal primer site into each template nucleic acid. For example, the template nucleic acid may include a first terminal sequence containing a first universal primer site and a second terminal sequence containing a second universal primer site. The universal primer site can have various applications, such as in amplification, sequencing, and/or identification of one or more template nucleic acids. The first and second universal primer sites may be identical, substantially similar, similar, or different. The universal primer site can be introduced into the nucleic acid by various methods well known in the art, such as primer site-to-nucleic acid ligation, amplification of the nucleic acid using a tailed primer, and insertion of a transposon sequence containing the universal primer site.

转座体Rotary body

“转座体”包含整合酶(integration enzyme)如整合酶(integrase)或转座酶，以及包含整合识别位点如转座酶识别位点的核酸。在本文提供的实施方案中，转座酶可以与能够催化转座反应的转座酶识别位点形成功能性复合物。转座酶可以结合转座酶识别位点，并在有时称为“标签片段化”的过程中将转座酶识别位点插入CE内的目标核酸中。在一些此类插入事件中，转座酶识别位点的一条链可以转移到目标核酸中。在一个实例中，转座体含有包含两个亚基的二聚转座酶，和两个不连续的转座子序列。在另一个实例中，转座酶含有包含两个亚基二聚转座酶，和连续的转座子序列。A transposon comprises an integration enzyme, such as an integrase or transposase, and a nucleic acid containing an integration recognition site, such as a transposase recognition site. In the embodiments provided herein, the transposase can form a functional complex with a transposase recognition site capable of catalyzing a transposition reaction. The transposase can bind to the transposase recognition site and insert the transposase recognition site into a target nucleic acid within the CE in a process sometimes referred to as “tag fragmentation.” In some such insertion events, one strand of the transposase recognition site can be transferred into the target nucleic acid. In one example, the transposon contains a dimeric transposase comprising two subunits and two discontinuous transposon sequences. In another example, the transposase contains a dimeric transposase comprising two subunits and a continuous transposon sequence.

一些实施方案可以包括使用超活性Tn5转座酶和Tn5型转座酶识别位点(Goryshin和Reznikoff，J.Biol.Chem.，273：7367(1998))或MuA转座酶和包含R1和R2末端序列的Mu转座酶识别位点(Mizuuchi，K.，Cell，35：785，1983；Savilahti，H等人，EMBO J.，14：4893，1995)。ME序列也可以由本领域技术人员优化使用。以上参考文献通过引用并入本文。Some implementations may include the use of a highly active Tn5 transposase and a Tn5-type transposase recognition site (Goryshin and Reznikoff, J. Biol. Chem., 273:7367 (1998)) or a MuA transposase and a Mu transposase recognition site containing R1 and R2 terminal sequences (Mizuuchi, K., Cell, 35:785, 1983; Savilahti, H et al., EMBO J., 14:4893, 1995). The ME sequence may also be optimized by those skilled in the art. The above references are incorporated herein by reference.

可以用于本文提供的组合物和方法的某些实施方案的转座系统的更多实例包括金黄色葡萄球菌Tn552(Colegio等人，J.Bacteriol，183：2384-8,2001；Kirby C等人，Mol.Microbiol.，43：173-86，2002),Tyl(Devine&Boeke,Nucleic Acids Res.,22:3765-72,1994和国际公开WO 95/23875),转座子Tn7(Craig,N L,Science.271：1512,1996；Craig,N L,综述于:Curr Top Microbiol Immuno.l,204：27-48,1996)，Tn/O和IS10(Kleckner N等人,Curr Top Microbiol Immunol.,204：49-82,1996)，Mariner转座酶(Lampe D J,等人,EMBO J.,15：5470-9,1996)，Tcl(Plasterk R H,Curr.TopicsMicrobiol.Immunol.,204：125-43,1996)，P元件(Gloor,G B,Methods Mol.Biol.,260:97-114,2004),Tn3(Ichikawa&Ohtsubo,J Biol.Chem.265:18829-32,1990)，细菌插入序列(Ohtsubo&Sekine,Curr.Top.Microbiol.Immunol.204:1-26,1996)，逆转录病毒(Brown等人,Proc Natl Acad Sci USA,86:2525-9,1989)，和酵母的反转座子(Boeke&Corces,AramRev Microbiol.43:403-34,1989)。更多例子包括IS5、Tn10、Tn903、IS911，和转座酶家族酶的工程化版本(Zhang等人，(2009)PLoS Genet.5：el000689.Epub 2009Oct 16；Wilson C.等人(2007)J.Microbiol.Methods 71：332-5)。以上参考文献通过引用并入本文。Further examples of transposable systems that can be used in certain embodiments of the compositions and methods provided herein include Staphylococcus aureus Tn552 (Colegio et al., J. Bacteriol, 183: 2384-8, 2001; Kirby C et al., Mol. Microbiol., 43: 173-86, 2002), Tyl (Devine & Boeke, Nucleic Acids Res., 22: 3765-72, 1994 and International Publication WO 9) 5/23875), transposon Tn7 (Craig, N L, Science. 271: 1512, 1996; Craig, N L, reviewed in: Curr Top Microbiol Immuno. l, 204: 27-48, 1996), Tn/O and IS10 (Kleckner N et al., Curr Top Microbiol Immunol., 204: 49-82, 1996), Mariner transposase (La mpe D J, et al., EMBO J., 15: 5470-9, 1996), Tcl (Plasterk R H, Curr. Topics Microbiol. Immunol., 204: 125-43, 1996), P element (Gloor, G B, Methods Mol. Biol., 260:97-114, 2004), Tn3 (Ichikawa & Ohtsubo, J Biol. Chem. 265:1 8829-32, 1990), bacterial insertion sequences (Ohtsubo & Sekine, Curr. Top. Microbiol. Immunol. 204: 1-26, 1996), retroviruses (Brown et al., Proc Natl Acad Sci USA, 86: 2525-9, 1989), and yeast retrotransposons (Boeke & Corces, Aram Rev Microbiol. 43: 403-34, 1989). More examples include IS5, Tn10, Tn903, IS911, and engineered versions of transposase family enzymes (Zhang et al., (2009) PLoS Genet. 5: el000689. Epub 2009 Oct 16; Wilson C. et al. (2007) J. Microbiol. Methods 71: 332-5). The above references are incorporated into this paper by way of citation.

可以用于本文提供的方法和组合物的整合酶的更多实例包括逆转录病毒整合酶和用于此类逆转录病毒整合酶的整合酶识别序列，例如来自HIV-1、HIV-2、SIV、PFV-1、RSV的整合酶。Further examples of integrases that can be used in the methods and compositions provided herein include retroviral integrases and integrase recognition sequences for such retroviral integrases, such as integrases from HIV-1, HIV-2, SIV, PFV-1, and RSV.

转座子序列Transposon sequence

本文提供的组合物和方法的一些实施方案包括转座子序列。在一些实施方案中，转座子序列包括至少一个转座酶识别位点。在一些实施方案中，转座子序列包括至少一个转座酶识别位点和至少一个条形码。在美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832中提供用于本文提供的方法和组合物的转座子序列，其每一个通过引用以其全部并入本文。在一些实施方案中，转座子序列包括第一转座酶识别位点、第二转座酶识别位点和置于其间的条形码。Some embodiments of the compositions and methods provided herein include transposon sequences. In some embodiments, the transposon sequence includes at least one transposase recognition site. In some embodiments, the transposon sequence includes at least one transposase recognition site and at least one barcode. Transposon sequences for use in the methods and compositions provided herein are provided in U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety. In some embodiments, the transposon sequence includes a first transposase recognition site, a second transposase recognition site, and a barcode disposed therebetween.

具有不连续转座子序列的转座体Transposons with discontinuous transposon sequences

本文提供的一些转座体包括包含两个转座子序列的转座酶。在一些此类实施方案中，两个转座子序列彼此不相连，换言之，转座子序列彼此不连续。此类转座体的实例是本领域已知的，参见例如美国专利申请公开号2010/0120098，其公开通过引用以其全部并入本文。Some transposons described herein include transposases comprising two transposon sequences. In some such embodiments, the two transposon sequences are not connected to each other; in other words, the transposon sequences are not consecutive. Examples of such transposons are known in the art, see, for example, U.S. Patent Application Publication No. 2010/0120098, the disclosure of which is incorporated herein by reference in its entirety.

环状结构Ring structure

在一些实施方案中，转座体包含结合两个转座酶亚基以形成“环状复合物”或“环状转座体”的转座子序列核酸。在一个实例中，转座体包含二聚转座酶和转座子序列。环状复合物可以确保转座子插入目标DNA中，同时保持原始目标DNA的排序信息(orderinginformation)，且不使目标DNA片段化。如将理解的，环状结构可以将引物、条形码、索引等插入到目标核酸中，同时保持目标核酸的物理连接性(physical connectivity)。在一些实施方案中，CE可以包含目标核酸。在一些实施方案中，环状转座体的转座子序列可以包括片段化位点，使得可以使转座子序列片段化以产生包含两个转座子序列的转座体。此类转座体对确保其中插入转座子的相邻目标DNA片段接收可以在测定的稍后阶段被明确组装的密码组合是有用的。In some embodiments, the transposon contains a transposon sequence nucleic acid that binds two transposase subunits to form a "circular complex" or "circular transposon". In one example, the transposon contains a dimeric transposase and a transposon sequence. The circular complex ensures that the transposon is inserted into the target DNA while maintaining the ordering information of the original target DNA and preventing fragmentation of the target DNA. As will be understood, circular structures can insert primers, barcodes, indexes, etc., into the target nucleic acid while maintaining the physical connectivity of the target nucleic acid. In some embodiments, the CE may contain the target nucleic acid. In some embodiments, the transposon sequence of the circular transposon may include a fragmentation site, allowing the transposon sequence to be fragmented to produce a transposon containing two transposon sequences. Such transposons are useful for ensuring that adjacent target DNA fragments in which the transposon is inserted receive a codon combination that can be definitively assembled at a later stage of assay.

制备转座子序列的某些方法Some methods for preparing transposon sequences

本文提供的转座子序列可以通过多种方法制备。示例性的方法包括直接合成和发夹延伸方法。在一些实施方案中，转座子序列可以通过直接合成来制备。例如，包含核酸的转座子序列可以通过包含化学合成的方法制备。此类方法是本领域众所周知的，例如使用亚磷酰胺前体如衍生自保护的2'-脱氧核苷、核糖核苷或核苷类似物的亚磷酰胺前体的固相合成。可以在例如美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开号WO 02/061832中找到制备转座子测序的示例性方法，其每一个通过引用以其全部并入。The transposon sequences provided herein can be prepared by a variety of methods. Exemplary methods include direct synthesis and hairpin extension methods. In some embodiments, the transposon sequences can be prepared by direct synthesis. For example, transposon sequences containing nucleic acids can be prepared by methods involving chemical synthesis. Such methods are well known in the art, such as solid-phase synthesis using phosphoramide precursors such as phosphoramide precursors derived from protected 2'-deoxynucleosides, ribonucleosides, or nucleoside analogs. Exemplary methods for preparing transposon sequences can be found, for example, U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication No. WO 02/061832, each of which is incorporated herein by reference in its entirety.

在包含环状转座体的一些实施方案中，可以制备包含单链接头的转座子序列。在一些实施方案中，接头偶联转座体的转座子序列，使得包含第一转座酶识别序列的转座子序列以5'至3'方向偶联至包含第二转座酶识别序列的第二转座子序列。在一些实施方案中，接头将包含第一转座酶识别序列的转座子序列以5'至5'方向或3'至3'方向偶联至包含第二转座酶识别序列的第二转座子序列。以5'至5'方向或3'至3'方向偶联转座体的转座子序列对防止转座酶识别元件，特别是镶嵌元件(ME或M)彼此相互作用可以是有利的。可以通过制备包含醛基或羟胺基(oxyamine)的转座子序列来制备偶联的转座子序列。醛和羟胺基可以相互作用以形成共价键，从而偶联转座子序列。In some embodiments involving a cyclic transposon, a transposon sequence comprising a single linker can be prepared. In some embodiments, the linker couples the transposon sequence of the transposon such that the transposon sequence containing a first transposase recognition sequence is coupled to a second transposon sequence containing a second transposase recognition sequence in a 5' to 3' orientation. In some embodiments, the linker couples the transposon sequence containing the first transposase recognition sequence to a second transposon sequence containing the second transposase recognition sequence in a 5' to 5' or 3' to 3' orientation. Coupled transposon sequences of the transposon in a 5' to 5' or 3' to 3' orientation can be advantageous in preventing the transposase recognition elements, particularly the mosaic elements (ME or M), from interacting with each other. The coupled transposon sequence can be prepared by preparing a transposon sequence containing an aldehyde or oxyamine group. Aldehydes and oxyamine groups can interact to form a covalent bond, thereby coupling the transposon sequence.

在一些实施方案中，可以制备包含互补序列的转座体。在一个实施方案中，转座酶载有包含互补尾部的转座子序列。尾部杂交以形成连接的转座子序列。杂交可以在稀释条件下发生以减少转座体之间杂交的可能性。In some embodiments, transposons containing complementary sequences can be prepared. In one embodiment, the transposase carries a transposon sequence containing a complementary tail. Tail hybridization is performed to form linked transposon sequences. Hybridization can occur under dilution conditions to reduce the possibility of hybridization between transposons.

靶向的插入Targeted insertion

在本文提供的方法和组合物的一些实施方案中，可以在目标核酸的特定靶向的序列插入转座子序列。转座成dsDNA可以比转座成ssDNA目标更有效。在一些实施方案中，将dsDNA变性为ssDNA并用寡核苷酸探针(20-200个碱基)退火。这些探针产生dsDNA的位点，其可以与本文提供的转座体有效地用作整合位点。在一些实施方案中，使用具有recA-包被的寡聚物探针的D环形成和随后的三链体形成，可以靶向dsDNA。在一些此类实施方案中，D环是对于包含Tn4430转座酶的转座体的优选底物。在更多实施方案中，使用序列特异性DNA结合蛋白质如锌指复合物和对特定DNA区的其它亲和配体，可以靶向dsDNA中感兴趣的区。In some embodiments of the methods and compositions provided herein, transposon sequences can be inserted into specific targeted sequences of the target nucleic acid. Transposition to dsDNA can be more efficient than transposition to ssDNA. In some embodiments, dsDNA is denatured into ssDNA and annealed with oligonucleotide probes (20-200 bases). These probes generate sites on the dsDNA that can be efficiently used as integration sites with the transposons provided herein. In some embodiments, dsDNA can be targeted using D-loop formation and subsequent triplet formation with recA-coated oligomeric probes. In some such embodiments, the D-loop is a preferred substrate for transposons containing the Tn4430 transposase. In many more embodiments, regions of interest in the dsDNA can be targeted using sequence-specific DNA-binding proteins such as zinc finger complexes and other affinity ligands for specific DNA regions.

在一些实施方案中，包含转座酶的转座体可以用于靶向插入目标核酸中，该转座酶具有在目标核酸中错配位置的优选的底物。例如，一些MuA转座酶，如HYPERMU(Epicenter)，对错配的目标具有偏好。在一些此类实施方案中，将包含错配的寡核苷酸探针退火至单链目标核酸。包含MuA转座酶如HYPERMU的转座体可用于靶向目标核酸的错配序列。In some embodiments, transposons containing a transposase can be used for targeted insertion into a target nucleic acid, the transposase having a preferred substrate at a mismatch site in the target nucleic acid. For example, some MuA transposases, such as HYPERMU (Epicenter), are biased towards mismatched targets. In some such embodiments, the mismatched oligonucleotide probe is annealed to a single-stranded target nucleic acid. Transposons containing MuA transposases such as HYPERMU can be used to target mismatched sequences in the target nucleic acid.

邻近保留元件(CE)Proximity retention element (CE)

邻近保留元件(CE)是一种物理实体，其通过一个或更多个测定步骤在紧密接近(close proximity)(或邻近)保留至少两个或更多或全部分析物，并提供对测定试剂的接近并且可以合并和分离多次而不会失去分析物的接近性。A proximity retention element (CE) is a physical entity that retains at least two or more analytes in close proximity (or adjacent) through one or more assay steps, provides proximity to the assay reagent, and can be combined and separated multiple times without losing the proximity of the analytes.

在一些实施方案中，CE可以是固体支持物。在一个实施方案中，CE可以是乳剂或液滴。在一些实施方案中，CE是凝胶、水凝胶或凝胶珠。在一些实施方案中，CE可以包含固体支持物，例如珠。在一些实施方案中，珠可以进一步包含抗体、寡核苷酸和/或条形码。在另一个实施方案中，CE可以构成由WGA、RCA或任何核酸试剂的浓缩(condensation)产生的DNA纳米球。In some embodiments, the CE can be a solid support. In one embodiment, the CE can be an emulsion or droplets. In some embodiments, the CE is a gel, hydrogel, or gel bead. In some embodiments, the CE can contain a solid support, such as beads. In some embodiments, the beads can further contain antibodies, oligonucleotides, and/or barcodes. In another embodiment, the CE can constitute DNA nanospheres produced by condensation of WGA, RCA, or any nucleic acid reagent.

在一些实施方案中，可以通过在聚合物基质如琼脂糖、聚丙烯酰胺、藻酸盐等中包埋来自细胞或来自单细胞的核酸，或其扩增产物(来自WGA等)来制备CE。在一些实施方案中，由通过包封(例如在聚合物基质中)、固定在珠上、或捕获保持组分彼此的物理接近性，通过合并和再分配的重复的轮次在CE内有效地保持邻近信息，来保持在CE内的细胞或单细胞的内容物的邻近性。CE的集合可以独立地合并和分离，与测定试剂反应，再次合并和分离等等，还保持构成单个CE的分析物的连续性的特征使得能够通过不同的分离和合并步骤进行组合索引化。In some embodiments, CEs can be prepared by encapsulating nucleic acids from cells or single cells, or their amplified products (from WGA, etc.), in a polymer matrix such as agarose, polyacrylamide, alginate, etc. In some embodiments, the proximity of the contents of cells or single cells within the CE is maintained by encapsulating (e.g., in a polymer matrix), immobilizing on beads, or trapping to maintain the physical proximity of components to each other, and by repeatedly merging and redistributing them to effectively maintain proximity information within the CE. The ability of CE sets to be independently merged and separated, reacted with assay reagents, merged and separated again, etc., while also maintaining the continuity of the analytes constituting a single CE, enables combinatorial indexing through different separation and merging steps.

在一些实施方案中，邻近保留元件中的分析物可接近测定试剂，该试剂包括水溶液、酶(例如片段化酶(fragmentase)、聚合酶、连接酶、转座酶、激酶、限制性内切核酸酶、蛋白质酶、磷酸酶、脂肪酶)、核酸衔接子、核酸条形码、标记物。In some embodiments, the analyte in the adjacent retention element is accessible to the assay reagent, which includes aqueous solutions, enzymes (e.g., fragmentases, polymerases, ligases, transposases, kinases, restriction endonucleases, proteases, phosphatases, lipases), nucleic acid adaptors, nucleic acid barcodes, and markers.

在一些实施方案中，CE包括细胞或单细胞。在一些实施方案中，CE包含来自细胞或来自单细胞的核酸，如DNA、mRNA或cDNA；包括蛋白质、多糖、脂质和核酸、以及小分子如初级代谢物、次级代谢产物的细胞或单细胞的大分子，和来自细胞或来自单细胞的天然产物。在一些实施方案中，在形成包含核酸的CE之前，核酸经历扩增，例如PCR或全基因组扩增。在一些实施方案中，DNA和mRNA的分析可以平行地进行。In some embodiments, the CE comprises cells or single cells. In some embodiments, the CE comprises nucleic acids, such as DNA, mRNA, or cDNA, from cells or single cells; macromolecules from cells or single cells, including proteins, polysaccharides, lipids, and nucleic acids, as well as small molecules such as primary metabolites, secondary metabolites, and natural products from cells or single cells. In some embodiments, the nucleic acids undergo amplification, such as PCR or whole-genome amplification, prior to the formation of the nucleic acid-containing CE. In some embodiments, DNA and mRNA analysis can be performed in parallel.

在一些实施方案中，CE的一个或更多个分析物用一个或更多个标记物标记。示例性标记物包括但不限于DNA条形码或索引、荧光标记物、化学发光标记物、RNA条形码或索引、放射性标记物、包含标记物的抗体、包含标记物的珠。In some implementations, one or more analytes in the CE are labeled with one or more markers. Exemplary markers include, but are not limited to, DNA barcodes or indexes, fluorescent markers, chemiluminescent markers, RNA barcodes or indexes, radioactive markers, antibodies containing markers, and beads containing markers.

在一些实施方案中，方法可以包括以下步骤：(a)将包含目标核酸的CE分隔成多个第一容器；(b)向每个第一容器的目标核酸提供第一索引，从而获得第一索引化的核酸；(c)组合第一索引化的核酸；(d)将第一索引化的模板核酸分隔成多个第二容器；(e)向每个第二容器的第一索引化的模板核酸提供第二索引，从而获得第二索引化的核酸。可以用从a-e系列的一个或更多个步骤的额外的循环继续步骤a-e以衍生额外的虚拟隔室。组合索引化的这种方法可用于从有限数量的物理隔室有效地创建大量的虚拟隔室。In some implementations, the method may include the following steps: (a) dividing a CE containing a target nucleic acid into multiple first containers; (b) providing a first index to the target nucleic acid in each first container to obtain a first-indexed nucleic acid; (c) combining the first-indexed nucleic acids; (d) dividing the first-indexed template nucleic acid into multiple second containers; and (e) providing a second index to the first-indexed template nucleic acid in each second container to obtain a second-indexed nucleic acid. Steps a-e can be continued with additional loops from one or more steps in the series a-e to derive additional virtual compartments. This method of combining indexes can be used to efficiently create a large number of virtual compartments from a limited number of physical compartments.

在一些实施方案中，方法可以包括以下步骤：(a)提供包含具有附接的核酸报告物的非核酸分析物(例如蛋白质)的CE；(b)将CE分隔成多个第一容器；(c)向每个第一容器的目标核酸报告物提供第一索引，从而获得第一索引化的目标核酸报告物；(c)组合第一索引化的核酸报告物；(d)将第一索引化的CE分隔成多个第二容器；(e)向每个第二容器的第一索引化的核酸报告物提供第二索引，从而获得第二索引化的核酸报告物。可以用从a-e系列的一个或更多个步骤的额外的循环继续步骤a-e以衍生额外的虚拟隔室。分隔步骤可以进一步包括核酸扩增或捕获步骤，例如PLA、PEA或捕获或扩增核酸的其它技术。In some implementations, the method may include the following steps: (a) providing a CE containing a non-nucleic acid analyte (e.g., a protein) with an attached nucleic acid reporter; (b) partitioning the CE into a plurality of first containers; (c) providing a first index to a target nucleic acid reporter in each first container to obtain a first-indexed target nucleic acid reporter; (d) combining the first-indexed nucleic acid reporters; (e) partitioning the first-indexed CE into a plurality of second containers; and (f) providing a second index to a first-indexed nucleic acid reporter in each second container to obtain a second-indexed nucleic acid reporter. Steps a-e can be continued with additional cycles from one or more steps in the series a-e to derive additional virtual compartments. The partitioning steps may further include nucleic acid amplification or capture steps, such as PLA, PEA, or other techniques for capturing or amplifying nucleic acids.

在一些实施方案中，将福尔马林固定的、石蜡包埋的组织可以切成切片，每个切片加至CE。可以随后分析每个CE的内容物或序列，并且在稍后的阶段可以获得每个切片的内容物的2D或3D图。In some implementations, formalin-fixed, paraffin-embedded tissue can be sectioned, with each section added to a CE (cell electrode assembly). The contents or sequences of each CE can then be analyzed, and 2D or 3D images of the contents of each section can be obtained at a later stage.

在一些实施方案中，一种或多种核酸可以包埋入基质，该基质将核酸限制在限定的空间，但允许试剂进入以进行步骤，包括但不限于扩增(PCR、全基因组扩增、随机引物延伸等)、连接、转座、杂交、限制性消化和DNA诱变。诱变的实例包括但不限于易错延伸、烷基化、亚硫酸氢盐转化，和活化诱导的(胞苷)脱氨酶等。In some implementations, one or more nucleic acids may be embedded in a matrix that confines the nucleic acids within a defined space but allows reagents to enter for steps including, but not limited to, amplification (PCR, whole genome amplification, random primer extension, etc.), ligation, transposition, hybridization, restriction digestion, and DNA mutagenesis. Examples of mutagenesis include, but are not limited to, error-prone extension, alkylation, bisulfite conversion, and activation-induced (cytidine) deaminase.

在一些实施方案中，使用CE的方法和组合物可以与诱变组装方法一起组合，以大大改善DNA序列信息的组装。可以使基因组DNA片段化并分成多个CE，每个CE包含一定分数的基因组。基因组的不同分数接收不同的条形码，允许各分数的基因组独立组装。更大的挑战之一是重复的组装。由Levy,D.and Wigler,M.(2014)Facilitated sequence countingand assembly by template mutagenesis.Proc.of the Natl.Acad.Sci.,Ill(43).E4632-E4637.ISSN 0027-8424概述了组装重复的一种方法。在US20140024537中也讨论了组装方法，标题为：Methods And Systems for Determining Haplotypes And Phasing ofHaplotypes，并且该申请通过引用以其全部并入。以上参考文献通过引用并入本文。In some implementations, the methods and compositions using CEs can be combined with mutagenesis assembly methods to significantly improve the assembly of DNA sequence information. Genomic DNA can be fragmented and divided into multiple CEs, each containing a fraction of the genome. Different fractions of the genome receive different barcodes, allowing each fraction of the genome to be assembled independently. One of the greater challenges is the assembly of repetitive sequences. One method for assembling repetitive sequences is outlined by Levy, D. and Wigler, M. (2014) Facilitated sequence counting and assembly by template mutagenesis. Proc. of the Natl. Acad. Sci., Ill(43). E4632-E4637. ISSN 0027-8424. Assembly methods are also discussed in US20140024537, entitled: Methods and Systems for Determining Haplotypes and Phasing of Haplotypes, and this application is incorporated herein by reference in its entirety. The above references are incorporated into this paper by way of citation.

对于组合DNA片段的划分(partitioning)与诱变或相关方法的方法，可以用CE、孔、索引、虚拟索引、物理隔室、液滴等进行划分。可以通过几种方法进行诱变，该方法包括但不限于易错延伸、烷基化、亚硫酸氢盐转化，和活化诱导的(胞苷)脱氨酶等。在常规方法使得组装重复或不同区域具有挑战性的情况下，将核酸划分成CE的方法和诱变方法可以是有用的。For the partitioning and mutagenesis of combined DNA fragments, methods such as CE (cell partitioning), wells, indexing, virtual indexing, physical compartments, and droplets can be used. Mutagenesis can be induced by several methods, including but not limited to error-prone extension, alkylation, bisulfite conversion, and activation-induced (cytidine) deaminase. In cases where conventional methods make assembling repetitive or dissimilar regions challenging, partitioning nucleic acids into CE methods and mutagenesis can be useful.

在一些实施方案中，本文阐述的方法可用于变体定相、(从头)基因组组装，筛选细胞群以确定跨群体的异质性和确定细胞与细胞的差异。In some implementations, the methods described herein can be used for variant phasing, (de novo) genome assembly, screening cell populations to determine heterogeneity across populations, and identifying cell-to-cell differences.

在一些实施方案中，在容器中分离来自细胞或来自单细胞的cDNA并转化成如上所述的通过虚拟隔室化(compartmentalization)方法索引化的CE。这使得能够从1000、10,000、100,000、甚至更多数量的不同索引化的单细胞文库中进行基因表达和转录物序型分析(transcript profiling)。In some implementations, cDNA from cells or single cells is isolated in a container and converted into CEs indexed using a virtual compartmentalization method as described above. This enables gene expression and transcript profiling from 1,000, 10,000, 100,000, or even more differently indexed single-cell libraries.

在一些实施方案中，由于泊松取样，可分析的单细胞的数量大约为虚拟隔室总数的10％。对于在每个步骤中具有96孔隔室的四层索引化方案，可以在一次实验中使用总共4×96＝384个物理隔室来分析总共10％X96X96X96X96＝超过800万个单细胞。在图3的实例中，使用四个组合的稀释和合并步骤来创建大量虚拟隔室(包含独特索引组合的一组分子或DNA库元件)。在该实例中，通过将单细胞的内容物包封在聚合物基质(例如PAM＝聚丙烯酰胺)中创建连续的DNA容器。在用于基因组分析的优选具体实施方案中，通过MDA(WGA多重置换扩增反应)扩增单细胞的基因组DNA内容物。该单细胞MDA产物构成通过组合的索引化方案进行的DNA容器。对于基因表达，如Picelli(Picelli，2014)所述，可以从单细胞容器进行单细胞cDNA制备。在优选的实施方案中，通过使用片段化(酶法)和衔接子连接，或通过使用转座酶复合物的标签片段化(tagmentation)，通过标准文库制备技术将初始索引附接于基因组DNA或cDNA。在优选的实施方案中，经由连接或PCR将随后的索引附接至文库。连接是优选的，因为以顺序的方式添加索引化的接头是容易的。最后一步可能涉及到仅仅索引化的PCR或连接和PCR。In some implementations, due to Poisson sampling, the number of analyzable single cells is approximately 10% of the total number of virtual compartments. For a four-layer indexing scheme with 96-well compartments in each step, a total of 4 × 96 = 384 physical compartments can be used in a single experiment to analyze a total of 10% × 96 × 96 × 96 × 96 = over 8 million single cells. In the example of Figure 3, a large number of virtual compartments (containing a set of molecules or DNA library elements with unique index combinations) are created using four combined dilution and merging steps. In this example, a continuous DNA container is created by encapsulating the contents of single cells in a polymer matrix (e.g., PAM = polyacrylamide). In a preferred specific embodiment for genomic analysis, the genomic DNA contents of single cells are amplified by MDA (WGA multiple displacement amplification reaction). This single-cell MDA product constitutes a DNA container prepared by the combined indexing scheme. For gene expression, as described by Picelli (Picelli, 2014), single-cell cDNA can be prepared from the single-cell container. In a preferred embodiment, an initial index is attached to genomic DNA or cDNA using standard library preparation techniques, either by fragmentation (enzymatic) and adaptor ligation, or by tagging of a transposase complex. In a preferred embodiment, subsequent indexes are attached to the library via ligation or PCR. Ligation is preferred because adding indexed adapters sequentially is straightforward. The final step may involve only indexing PCR or ligation and PCR.

在一些实施方案中，目标核酸是组蛋白/蛋白质保护的(参见Buenrostro等人，Nature Methods 10,1213-1218(2013)doi：10.1038/nmeth.2688，通过引用并入本文)。应用包括表观遗传学序型分析，以及打开的染色质和DNA结合蛋白质和核小体位置的分析。In some implementations, the target nucleic acid is histone/protein protected (see Buenrostro et al., Nature Methods 10, 1213-1218 (2013) doi: 10.1038/nmeth.2688, incorporated herein by reference). Applications include epigenetic sequence analysis, as well as analysis of open chromatin and the location of DNA-binding proteins and nucleosomes.

在一些实施方案中，邻近保留元件可以包含单细胞，并且可以扩增来自细胞的核酸。随后，可以通过组合的索引化方案将每个邻近保留元件进行独特地索引化。短测序读取可以基于独特的索引进行分组。长合成读取可以基于独特的索引各自从头组装(McCoy等人，Plosone 2014(DOI：10.1371/journal.pone.0106689)Illumina TruSeq SyntheticLong-Reads Empower De Novo Assembly and Resolve Complex，Highly-RepetitiveTransposable Elements，通过引用并入本文)In some implementations, the proximity-preserving element can contain a single cell and can amplify nucleic acids from the cell. Each proximity-preserving element can then be uniquely indexed using a combined indexing scheme. Short sequencing reads can be grouped based on unique indexes. Long synthetic reads can be assembled de novo individually based on unique indexes (McCoy et al., Plosone 2014 (DOI: 10.1371/journal.pone.0106689) Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements, incorporated herein by reference).

在一些实施方案中，CE可以包含细胞的内容物，例如蛋白质、细胞器、RNA、DNA、核糖体、抗体、类固醇、专门的结构、聚糖、脂质、小分子、可能影响生物学途径的分子、单和多糖、生物碱类、初级和次级代谢产物。In some implementations, the CE may contain cellular contents such as proteins, organelles, RNA, DNA, ribosomes, antibodies, steroids, specialized structures, glycans, lipids, small molecules, molecules that may affect biological pathways, monosaccharides and polysaccharides, alkaloids, and primary and secondary metabolites.

在一些实施方案中，CE内的细胞器可以被差异染色。细胞器染色试剂的实例是细胞器靶向荧光蛋白质(Cellular Lights^TM)，经典细胞器染色剂或染料缀合物，其选择性或非选择性地可以标记细胞器或细胞结构。In some implementations, organelles within the CE can be differentially stained. Examples of organelle staining reagents are organelle-targeting fluorescent proteins (Cellular Lights ^™ ), classic organelle staining agents, or dye conjugates that can selectively or non-selectively label organelles or cellular structures.

在一些实施方案中，在CE中感兴趣的分析物是蛋白质。蛋白质可以用条形码或替代标记物来标记。可以使用传统阵列或基于序列的方法读出条形码或标记物。邻近连接方法和抗体索引序列可以用于与条形码序列的检测一起检测蛋白质(Fredriksson等人，Nature Biotechnology20,473-477(2002)，通过引用并入本文)以在每个单个细胞中建立蛋白质的身份和丰度。蛋白质可以由技术人员通过已知的各种方法(www.piercenet.com/cat/protein-antibody-labeling)进行标记，包括体内和体外位点特异性化学标记策略。In some implementations, the analyte of interest in the CE is a protein. Proteins can be labeled using barcodes or alternative markers. Barcodes or markers can be read using conventional array or sequence-based methods. Proximity-linking methods and antibody indexing sequences can be used in conjunction with the detection of barcode sequences to detect proteins (Fredriksson et al., Nature Biotechnology 20, 473-477 (2002), incorporated herein by reference) to establish protein identity and abundance in each individual cell. Proteins can be labeled by technicians using a variety of known methods (www.piercenet.com/cat/protein-antibody-labeling), including in vivo and in vitro site-specific chemical labeling strategies.

接近连接(proximity ligation)(Duo-link PLA，www.sigmaaldrich.com/life-science/molecular-biology/molecular-biology-products.html？TablePage＝l12232138，多重接近连接检测EP 2714925 A1)是用于检测蛋白质、蛋白质-蛋白质相互作用和翻译后修饰的实例，所述修饰可适合在邻近保留元件中使用。该方法可用于检测和量化在邻近保留元件中的特定蛋白质或蛋白质复合物。工作流程的一个例子如下：(1)制备一个或多个邻近保留元件，(2)洗涤并添加对感兴趣的蛋白质特异的一对或多对一抗，(3)洗涤并用条形码标记的抗体染色。容器中的邻近保留元件的每个群接收不同的条形码标记的抗体。通过接近连接，一对或多对一抗、可扩增产物可以得到形成，其含有用于特定蛋白质的独特条形码。一个条形码对感兴趣的蛋白质可以是特异性的，而其它条形码用于将蛋白质分配至特定的邻近保留元件和或细胞。通过一个或更多个分开和合并步骤，可以差别性地标记各份。照此，可以分析单个邻近保留元件的内容物，而不需要在许多平行步骤中各自处理每个邻近保留元件。以此类方式处理10、100、1000、10,000、100,000、1,000,000、10,000,000、100,000,000和更多个邻近保留元件是特别大的优点。可以以如为蛋白质所述相似的方式检测类固醇和小分子。条形码标记的抗体可以开发用于类固醇(Hum Reprod.1988Jan；3(l):63-8.Antibodies against steroids.Bosze P等人。可选地，已描述了荧光染料和放射性缀合物(www jenabioscience.eom/cms/en/l/catalog/2305_fluorescent_hormones.html)。可如上所述处理这些用于类固醇的抗体缀合物。可以使用各种方法以检测邻近保留元件的一个或更多个组分。邻近保留元件的一个或更多个组分可以用化学发光、荧光、放射性探针、DNA标签、条形码和索引进行标记。扩增策略可用于增强信号。例如，可以使用滚环扩增(RCA)以检测分析物。随后可以通过测序、荧光解码器(探针)检测RCA产物。此外，微阵列、蛋白质阵列、测序，纳米孔测序、下一代测序、毛细管电泳、珠阵列可以用于读出。Proximity ligation (Duo-link PLA, www.sigmaaldrich.com/life-science/molecular-biology/molecular-biology-products.html?TablePage=l12232138, Multiplex proximity ligation detection EP 2714925 A1) is an example of a method for detecting proteins, protein-protein interactions, and post-translational modifications that can be adapted for use in proximity retention elements. This method can be used to detect and quantify specific proteins or protein complexes in proximity retention elements. An example workflow is as follows: (1) Prepare one or more proximity retention elements, (2) Wash and add one or more pairs of primary antibodies specific to the protein of interest, (3) Wash and stain with barcode-labeled antibodies. Each group of proximity retention elements in the container receives a different barcode-labeled antibody. Through proximity ligation, one or more pairs of primary antibodies, an amplifiable product, can be formed containing a unique barcode for a specific protein. One barcode can be specific to the protein of interest, while other barcodes are used to assign the protein to specific neighboring retention elements and/or cells. Individual portions can be differentially labeled through one or more separate and merge steps. In this way, the contents of a single neighboring retention element can be analyzed without processing each element separately in many parallel steps. Processing 10, 100, 1000, 10,000, 100,000, 1,000,000, 10,000,000, 100,000,000 and more neighboring retention elements in this manner is a particularly significant advantage. Steroids and small molecules can be detected in a similar manner to that described for proteins. Barcode-labeled antibodies can be developed for use with steroids (Hum Reprod. 1988 Jan; 3(l): 63-8. Antibodies against steroids. Bosze P et al.). Optionally, fluorescent dyes and radioconjugates have been described (www.jenabioscience.eom/cms/en/l/catalog/2305_fluorescent_hormones.html). These antibody conjugates for steroids can be processed as described above. Various methods can be used to detect one or more components adjacent to the retained element. One or more components adjacent to the retained element can be labeled with chemiluminescence, fluorescence, radioactive probes, DNA tags, barcodes, and indexes. Amplification strategies can be used to enhance the signal. For example, rolling circle amplification (RCA) can be used to detect the analyte. The RCA product can then be detected by sequencing and a fluorescence decoder (probe). In addition, microarrays, protein arrays, sequencing, nanopore sequencing, next-generation sequencing, capillary electrophoresis, and bead arrays can be used for readout.

建立细胞的内容物的邻近性Establishing proximity of cell contents

在一些实施方案中，细胞或来自单细胞的内容物(例如但不限于DNA、RNA、蛋白质、细胞器、代谢物、小分子)的邻近性可以保留在邻近保留元件(CE)中。可以通过几种方法创建CE，包括但不限于将内容物包封在液滴内、将内容物包埋在聚合物基质中(包封之后)、以及将内容物附接至珠。在优选实施方案中，CE对检测试剂例如水性缓冲液、酶(聚合酶、连接酶、转座酶等)、核苷酸、寡核苷酸衔接子、转座子和引物等是可渗透的。如以上所描述的，从该CE创建索引化的文库。稀释成物理隔室的重复轮次、隔室特异索引的附接、合并和再稀释成额外的隔室导致许多虚拟隔室的指数产生(exponential creation)。如果设计恰当，则每个CE的内容物最终将用独特的条形码虚拟索引化。图1中作为例子，四层索引化方案导致大量虚拟隔室和索引(>84百万)，仅仅4X96＝384总物理隔室。在优选的实施方案中，经由标签片段化、连接或PCR在每个区室化层加入隔室特异性索引。在优选实施方案中，在每个步骤每个物理隔室具有独特的索引。后续的区室化可以使用相同或不同的索引。如果从一个区室化层到下一个使用相同的索引，则在最终序列串内的索引的位置将鉴定隔室和区室化层。In some implementations, the proximity of cells or contents derived from single cells (e.g., but not limited to DNA, RNA, proteins, organelles, metabolites, small molecules) can be preserved in proximity retention elements (CEs). CEs can be created using several methods, including, but not limited to, encapsulating the contents within droplets, embedding the contents in a polymer matrix (after encapsulation), and attaching the contents to beads. In a preferred embodiment, the CE is permeable to detection reagents such as aqueous buffers, enzymes (polymerases, ligases, transposases, etc.), nucleotides, oligonucleotide adaptors, transposons, and primers. As described above, an indexed library is created from this CE. Repeated rounds of dilution into physical compartments, attachment of compartment-specific indexes, merging, and re-dilution into additional compartments result in the exponential creation of numerous virtual compartments. If properly designed, the contents of each CE will ultimately be virtually indexed with a unique barcode. As an example in Figure 1, a four-layer indexing scheme results in a large number of virtual compartments and indexes (>84 million), with only 4 x 96 = 384 total physical compartments. In a preferred embodiment, a compartment-specific index is added to each compartmentation layer via tag fragmentation, ligation, or PCR. In a preferred embodiment, each physical compartment has a unique index at each step. Subsequent compartmentation can use the same or different indexes. If the same index is used from one compartmentation layer to the next, the position of the index within the final sequence string will identify the compartment and the compartmentation layer.

使用液滴分析细胞的组分Analyzing cell components using droplets

在一个实施方案中，CE可以是乳剂或液滴。在一个实施方案中，CE是与油接触的液滴。在一个实例中，包含核酸的CE包括将核酸样品稀释和划分为液滴、隔室或珠。在一个实施方案中，液滴包括细胞或单细胞。在一个实施方案中，包含单细胞的CE包括将单细胞稀释和划分成液滴、隔室或珠。In one embodiment, the CE can be an emulsion or droplets. In one embodiment, the CE is a droplet in contact with oil. In one example, a CE containing nucleic acids includes diluting and dispersing a nucleic acid sample into droplets, compartments, or beads. In one embodiment, the droplets include cells or single cells. In one embodiment, a CE containing single cells includes diluting and dispersing the single cells into droplets, compartments, or beads.

在一些实施方案中，“液滴”可以是至少部分地由填料流体(filler fluid)界定的液滴执行器上的液体体积。例如，液滴可以由填料流体完全包围，或者可以由填料流体和液滴执行器的一个或更多个表面界定。液滴可以采用各种形状；非限制性实例包括通常为圆盘状、块状、截球、椭球、球形、部分压缩球、半球形、卵形、圆柱形和各种形状，其在液滴操作期间形成，例如合并或分开，或者由于此类形状与液滴执行器的一个或更多个表面的接触而形成。In some embodiments, a "droplet" can be a volume of liquid on a droplet actuator that is at least partially defined by a filler fluid. For example, a droplet can be completely surrounded by the filler fluid, or it can be defined by the filler fluid and one or more surfaces of the droplet actuator. Droplets can take on a variety of shapes; non-limiting examples include generally disk-shaped, block-shaped, truncated sphere, ellipsoidal, spherical, partially compressed sphere, hemispherical, oval, cylindrical, and various other shapes that form during droplet operation, such as merging or separating, or due to contact between such shapes and one or more surfaces of the droplet actuator.

液滴执行器用于进行极其多种液滴操作。液滴执行器通常包括由空间分开的两个基板。基板包括用于进行液滴操作的电极。该空间通常填充有与待在液滴执行器上操作的流体不混溶的填料流体。暴露于空间的表面通常是疏水性的。遗传物质(基因组学)及其表达(功能基因组学)、蛋白质组学的分析、组合的文库分析和其它多重化生物分析应用可以在液滴中进行，并且可以在分析液滴执行器上进行以下操作。分别在美国专利申请公开20100130369和20130203606中公开了使用液滴执行器操纵液滴的方法，其的每一个通过引用并入本文。Droplet actuators are used to perform a wide variety of droplet manipulations. A droplet actuator typically comprises two substrates spaced apart. The substrates include electrodes for performing droplet manipulations. The space is typically filled with a filler fluid immiscible with the fluid to be manipulated on the droplet actuator. The surfaces exposed to the space are typically hydrophobic. Analysis of genetic material (genomics) and its expression (functional genomics), proteomics, combinatorial library analysis, and other multiplexed bioanalytical applications can be performed on droplets, and the following manipulations can be performed on the analytical droplet actuator. Methods for manipulating droplets using droplet actuators are disclosed in U.S. Patent Application Publications 20100130369 and 20130203606, each of which is incorporated herein by reference.

“液滴执行器”是指用于操纵液滴的装置。对于液滴执行器的例子，参见Pamula等人，美国专利号6,911,132，题名为“Apparatus for Manipulating Droplets byElectrowetting-Based Techniques”，公告于2005年6月28日；Pamula等人，美国专利公开号20060194331，题名为“Apparatus and Methods for Manipulating Droplets on aPrinted Circuit Board”，公开于2006年8月31日；Pollack等人，国际公开号WO/2007/120241，题名为“Droplet-Based Biochemistry”，公开于2007年10月25日；Shenderov，美国专利号6,773,566，题名为“Electrostatic Actuators for Microfluidics and Methodsfor Using Same”，公告于2004年8月10日；Shenderov，美国专利号6,565,727，题名为“Actuators for Microfluidics Without Moving Parts”，公告于2003年5月20日；Kim等人，美国专利公开号20030205632，题名为“Electrowetting-driven Micropumping”，公开于2003年11月6日；Kim等人，美国专利公开号20060164490，题名为“Method and Apparatusfor Promoting the Complete Transfer of Liquid Drops from a Nozzle”，公开于2006年7月27日；Kim等人,美国专利公开号20070023292，题名为“Small Object Moving onPrinted Circuit Board”，公开于2007年2月1日；Shah等人,美国专利公开号20090283407，题名为“Method for Using Magnetic Particles in Droplet Microfluidics”，公开于2009年11月19日；Kim等人,美国专利公开号20100096266，题名为“Method and Apparatusfor Real-time Feedback Control of Electrical Manipulation of Droplets onChip”，公开于2010年4月22日；Velev,美国专利号7,547,380，题名为“DropletTransportation Devices and Methods Having a Fluid Surface”，公告于2009年6月16日；Sterling等人,美国专利号7,163,612，题名为“Method,Apparatus and Article forMicrofluidic Control via Electrowetting,for Chemical,Biochemical andBiological Assays and the Like”，公告于2007年1月16日；Becker等人,美国专利号7,641,779，题名为“Method and Apparatus for Programmable Fluidic Processing”，公告于2010年1月5日；Becker等人,美国专利号6,977,033，题名为“Method and Apparatus forProgrammable Fluidic Processing”，公告于2005年12月20日；Decre等人,美国专利号7,328,979，题名为“System for Manipulation of a Body of Fluid”，公告于2008年2月12日；Yamakawa等人,美国专利公开号20060039823，题名为“Chemical AnalysisApparatus”，公开于2006年2月23日；Wu,美国专利公开号20110048951，题名为“DigitalMicrofluidics Based Apparatus for Heat-exchanging Chemical Processes”，公开于2011年3月3日；Fouillet等人,美国专利公开号20090192044，题名为“ElectrodeAddressing Method”，公开于20097月30日；Fouillet等人,美国专利号7,052,244，题名为“Device for Displacement of Small Liquid Volumes Along a Micro-catenary Lineby Electrostatic Forces”，公告于2006年5月30日；Marchand等人,美国专利公开号20080124252，题名为“Droplet Microreactor”，公开于2008年5月29日；Adachi等人,美国专利公开号20090321262，题名为“Liquid Transfer Device”，公开于2009年12月31日；Roux等人,美国专利公开号20050179746，题名为“Device for Controlling theDisplacement of a Drop Between Two or Several Solid Substrates”，公开于2005年8月18日；和Dhindsa等人,“Virtual Electrowetting Channels:Electronic LiquidTransport with Continuous Channel Functionality”Lab Chip,10:832–836(2010)，其全部公开通过引用并入本文。某些液滴执行器将包括其间布置有液滴操作间隙一个或更多个基板和与一个或更多个基板相关联(例如，在基板上分层、附接于基板和/或包埋进基板)且布置成进行一个或更多个液滴操作的电极。例如，某些液滴执行器将包括基底(或底部)基板，与基板相关联的液滴操作电极，位于基板和/或电极顶部的一个或更多个介电层，以及任选地位于基板、介电层和/或电极顶部形成液滴操作表面的一个或更多个疏水层。还可以提供顶部基板，其通过间隙(通常称为液滴操作间隙)与液滴操作表面分隔。在上述参考专利和申请中讨论了顶部和/或底部基板上的各种电极布置，并且在本公开的说明书中讨论了某些新颖的电极布置。在液滴操作期间，液滴优选保持与接地或参比电极的连续接触或频繁接触。接地或参比电极可以与间隙中面向间隙的顶部基板、面向间隙的底部基板相关联。在电极设置在两个基板上的情况下，用于将电极偶联至液滴执行器装置用于控制或监测电极的电触头可以与一个或两个板相关联。在一些情况下，在一个基板上的电极电偶联至另一个基板，使得仅一个基板与液滴执行器接触。在一个实施方案中，导电材料(例如，环氧树脂，例如可得自Master Bond，Inc.，Hackensack，NJ的MASTER BOND^TM PolymerSystem EP79)提供了在一个基板上电极和在其它基板上的电路径间的电连接，例如，在顶部基板上的接地电极可以通过此类导电材料偶联到底部基板上的电路径。在使用多个基板的情况下，可以在基板之间提供间隔物以确定其间的间隙的高度并且限定执行器上的分配的储存器。间隔物高度可以例如为至少约5μm、100μm、200μm、250μm、275μm或更大。可选地或额外地，间隔物高度可以为至多约600μm、400μm、350μm、300μm或更小。间隔物可以例如由顶部或底部基板的突起层，和/或插入在顶部和底部基板之间的材料形成。可以在一个或更多个基板中提供一个或更多个开口，用于形成流体路径，通过该流体路径，液体可被输送入液滴操作间隙中。在一些情况下，一个或更多个开口可以对齐用于与一个或更多个电极相互作用，例如对齐，使得流过开口的液体将变得与一个或更多个液滴操作电极足够接近以允许使用液体由液滴操作电极实现液滴操作。在一些情况下，基底(或底部)和顶部基板可以形成为一个整体部件(integral component)。可以在基底(或底部)和/或顶部基板上和/或在间隙中提供一个或更多个参比电极。在上述参考专利和专利申请中提供了参比电极布置的实例。在各种实施方案中，通过液滴执行器的液滴的操纵可以是电极介导的，例如电润湿介导的或介电电泳介导的或库仑力介导的。用于控制可用于本公开的液滴执行器中的液滴操作的其它技术的实例包括使用引起流体动力学流体压力的装置，例如基于机械原理操作的那些(例如外部注射泵、气动膜泵、振动膜泵、真空装置、离心力、压电/超声波泵和声力)；电或磁性原理(例如电渗流、电动泵、铁流体塞、电动力泵、使用磁力的吸引或排斥力和磁流体动力泵)；热力学原理(例如气泡产生/相变诱导的体积膨胀)；表面润湿原理的其它种类(例如电润湿，和光电润湿，以及化学、热、结构和放射性诱导的表面张力梯度)；重力；表面张力(如毛细作用)；静电力(例如电渗流)；离心流(置于光盘上并旋转的基板)；磁力(例如，振荡离子引起流动)；磁流体动力；和真空或压力差。在某些实施方案中，可以采用两种或更多种前述技术的组合来在本公开的液滴执行器中进行液滴操作。类似地，前述中的一种或更多种可以用于将液体输送到液滴操作间隙中，例如从另一装置中的储存器或液滴执行器的外部储存器(例如，与液滴执行器基板相关的储存器和从储存器到液滴操作间隙的流动路径)。本公开的某些液滴执行器的液滴操作表面可以由疏水材料制成，或者可以被涂覆或处理以使其疏水。例如，在一些情况下，一些部分或全部液滴操作表面可以用低表面能材料或化学物质衍生化，例如通过沉积或使用原位合成，采用例如在溶液中的聚或全氟化的化合物，或可聚合单体进行。实例包括(可得自DuPont，Wilmington，DE)、材料的cytop系列的成员、在疏水和超疏水涂层的系列中的涂层(可得自Cytonix Corporation，Beltsville，MD)，硅烷涂层，氟硅烷涂层、疏水性膦酸酯衍生物(例如，由Aculon，Inc.出售的那些)、和NOVEC^TM电子涂层(可得自3MCompany，St.Paul，MN)、用于等离子体增强化学气相沉积(PECVD)的其它氟化单体和用于PECVD的有机硅氧烷(例如，SiOC)。在一些情况下，液滴操作表面可以包括具有厚度范围从约10nm至约1,000nm的疏水涂层。此外，在一些实施方案中，液滴执行器的顶部基板包括导电有机聚合物，然后其用疏水涂层涂覆或以其它方式处理以使液滴操作表面疏水。例如，沉积在塑料基板上的导电有机聚合物可以是聚(3,4-亚乙基二氧噻吩)聚(苯乙烯磺酸)(PEDOT：PSS)。导电有机聚合物和替代导电层的其它实例描述于Pollack等人，国际专利公开号WO/2011/002957，题名为“Droplet Actuator Devices and Methods”，公开于2011年1月6日，其全部公开通过引用并入本文。可以使用印刷电路板(PCB)、玻璃、氧化铟锡(ITO)涂覆的玻璃和/或半导体材料作为基板来制造一个或两个基板。当基板是ITO涂覆的玻璃时，ITO涂层优选的厚度为至少约20nm、50nm、75nm、100nm或更大。可选地或额外地，厚度可以是至多约200nm、150nm、125nm或更小。在一些情况下，顶部和/或底部基板包括用电介质例如聚酰亚胺电介质涂覆的PCB基板，其在一些情况下也可以被涂覆或以其它方式处理以使液滴操作表面疏水。当基板包括PCB时，以下材料是合适材料的实例：MITSUI^TM BN-300(可得自MITSUI ChemicalsAmerica，Inc.，San Jose CA)；ARLON^TM 11N(可得自Arlon，Inc，Santa Ana，CA)；N4000-6和N5000-30/32(可得自Park Electrochemical Corp.，Melville，NY)；ISOLA^TM FR406(可得自Isola Group，Chandler，AZ)，特别是IS620；氟聚合物系列(适用于荧光检测，因为它具有低背景荧光)；聚酰亚胺系列；聚酯；聚萘二甲酸乙二醇酯(polyethylene naphthalate)；聚碳酸酯；聚醚醚酮；液晶聚合物；环烯烃共聚物(COC)；环烯烃聚合物(COP)；芳香族聚酰胺(aramid)；非织造芳香族聚酰胺增强材料(可得自DuPont，Wilmington，DE)；牌纤维(可得自DuPont，Wilmington，DE)；和纸。各种材料也适合用作基板的介电部件。实例包括：气相沉积电介质，例如PARYLENE^TM C(特别是玻璃上)，PARYLENE^TM N和PARYLENE^TM HT(用于高温，约300℃)(可得自Parylene Coating Services，Inc.，Katy，TX)；AF涂料；CYTOP；焊接掩模(soldermask)，例如类似TAIYO^TM PSR4000系列、TAIYO^TM PSR和AUS系列(可得自TaiyoAmerica，Inc.Carson City，NV)(用于涉及热控的应用的良好的热特性)的液体可光成象的焊接掩模(例如在PCB上)，以及PROBIMER^TM 8165(用于涉及热控的应用的良好的热特性(可得自Huntsman Advanced Materials Americas Inc.，Los Angeles，CA)；干膜焊接掩模，例如在干膜焊接掩模系列(可得自DuPont，Wilmington，DE)中的那些；膜电介质，例如聚酰亚胺膜(例如聚酰亚胺膜，可得自DuPont，Wilmington，DE)，聚乙烯和含氟聚合物(例如FEP)，聚四氟乙烯；聚酯；聚萘二甲酸乙二醇酯；环烯烃共聚物(COC)；环烯烃聚合物(COP)；上面列出的任何其它PCB基板材料；黑色基质树脂(black matrix resin)；聚丙烯；以及黑色柔性电路材料，例如DuPont^TM Pyralux HXC和DuPont^TM Kapton MBC(可得自DuPont，Wilmington，DE)。可以选择液滴输送电压和频率用于在特定测定方案中使用的试剂所具有的性能。设计参数可以变化，例如，执行器上储存器的数量和布置，独立电极连接的数量，不同储存器的尺寸(体积)，磁体/珠洗涤区的布置，电极尺寸，电极间间距，和间隙高度(顶部和底部基板之间)可以变化以用于特定试剂、方案、液滴体积等。在一些情况下，本公开的基板可以用低表面能材料或化学物质衍生化，例如使用沉积或原位合成，其在溶液或可聚合单体中使用聚或全氟化合物进行。实例包括用于浸涂或喷涂的涂层和涂层，用于等离子体增强化学气相沉积的其它氟化单体(PECVD)，和用于PECVD的有机硅氧烷(例如SiOC)。此外，在一些情况下，一些部分或全部液滴操作表面可以涂覆有用于降低背景噪声的物质，例如来自PCB基板的背景荧光。例如，降噪涂层可以包括黑色基质树脂，例如可获自Toray industries,Inc.,Japan的黑色基质树脂。液滴执行器的电极通常由控制器或处理器控制，控制器或处理器本身作为系统的一部分提供，其可以包括处理功能以及数据和软件存储以及输入和输出能力。可以在液滴操作间隙中或在流体偶联到液滴操作间隙的储存器中液滴执行器上提供试剂。试剂可以以液体形式，例如液滴，或者在液滴操作间隙中或在流体偶联到液滴操作间隙的储存器中，它们可以以可重构的形式提供。可重构试剂通常可以与用于重构的液体组合。适用于与本文所述的方法和装置使用的可重构试剂的实例包括Meathrel等人在2010年6月1日公告的题名为“Disintegratable Films for Diagnostic Devices”的美国专利号7,727,466中描述的那些，其全部公开通过引用并入本文。A droplet actuator is a device used to manipulate droplets. For examples of droplet actuators, see Pamula et al., U.S. Patent No. 6,911,132, entitled "Apparatus for Manipulating Droplets by Electrorowetting-Based Techniques," published June 28, 2005; Pamula et al., U.S. Patent Publication No. 20060194331, entitled "Apparatus and Methods for Manipulating Droplets on a Printed Circuit Board," published August 31, 2006; Pollack et al., International Publication No. WO/2007/120241, entitled "Droplet-Based Biochemistry," published October 25, 2007; Shenderov, U.S. Patent No. 6,773,566, entitled "Electrostatic Actuators for Microfluidics and Methods for Using Same," published August 10, 2004; and Shenderov, U.S. Patent No. 6,565,727, entitled "Actuators for...""Microfluidics Without Moving Parts," published May 20, 2003; Kim et al., U.S. Patent Publication No. 20030205632, entitled "Electrowetting-driven Micropumping," published November 6, 2003; Kim et al., U.S. Patent Publication No. 20060164490, entitled "Method and Apparatus for Promoting the Complete Transfer of Liquid Drops from a Nozzle," published July 27, 2006; Kim et al., U.S. Patent Publication No. 20070023292, entitled "Small Object Moving on Printed Circuit Board," published February 1, 2007; Shah et al., U.S. Patent Publication No. 20090283407, entitled "Method for Using Magnetic Particles in Droplet.""Microfluidics", published November 19, 2009; Kim et al., U.S. Patent Publication No. 20100096266, entitled "Method and Apparatus for Real-time Feedback Control of Electrical Manipulation of Droplets on Chip", published April 22, 2010; Velev, U.S. Patent No. 7,547,380, entitled "Droplet Transportation Devices and Methods Having a Fluid Surface", published June 16, 2009; Sterling et al., U.S. Patent No. 7,163,612, entitled "Method, Apparatus and Article for Microfluidic Control via Electrowetting, for Chemical, Biochemical and Biological Assays and the Like", published January 16, 2007; Becker et al., U.S. Patent No. 7,641,779, entitled "Method and Apparatus for Programmable Fluidic U.S. Patent Publication No. 6,977,033, entitled "Method and Apparatus for Programmable Fluidic Processing," was published on January 5, 2010; Becker et al., U.S. Patent No. 6,977,033, entitled "Method and Apparatus for Programmable Fluidic Processing," was published on December 20, 2005; Decree et al., U.S. Patent No. 7,328,979, entitled "System for Manipulation of a Body of Fluid," was published on February 12, 2008; Yamakaawa et al., U.S. Patent Publication No. 20060039823, entitled "Chemical Analysis Apparatus," was published on February 23, 2006; Wu, U.S. Patent Publication No. 20110048951, entitled "Digital Microfluidics Based Apparatus for Heat-exchanging Chemical Processes," was published on March 3, 2011; Fouillet et al., U.S. Patent Publication No. 20090192044, entitled "Electrode Addressing," was published on March 3, 2011. Method, published July 30, 2009; Fouillet et al., U.S. Patent No. 7,052,244, entitled "Device for Displacement of Small Liquid Volumes Along a Micro-catenary Lineby Electrostatic Forces", published May 30, 2006; Marchand et al., U.S. Patent Publication No. 20080124252, entitled "Droplet Microreactor", published May 29, 2008; Adachi et al., U.S. Patent Publication No. 20090321262, entitled "Liquid Transfer Device", published December 31, 2009; Roux et al., U.S. Patent Publication No. 20050179746, entitled "Device for Controlling the Displacement of a Drop Between Two or Several Solid Substrates", published August 18, 2005; and Dhindsa et al., "Virtual Electrowetting Channels: Electronic Liquid Transport with Continuous Channel". Functionality, Lab Chip, 10:832–836 (2010), the entire disclosure of which is incorporated herein by reference. Some droplet actuators will include one or more substrates disposed therebetween with droplet operating gaps and electrodes associated with (e.g., layered on, attached to, and/or embedded in) the one or more substrates and arranged to perform one or more droplet operations. For example, some droplet actuators will include a base (or bottom) substrate, droplet operating electrodes associated with the substrates, one or more dielectric layers on top of the substrates and/or electrodes, and optionally one or more hydrophobic layers forming droplet operating surfaces on top of the substrates, dielectric layers, and/or electrodes. A top substrate may also be provided, which is separated from the droplet operating surface by gaps (commonly referred to as droplet operating gaps). Various electrode arrangements on the top and/or bottom substrates are discussed in the foregoing referenced patents and applications, and certain novel electrode arrangements are discussed in the description of this disclosure. During droplet operation, the droplet preferably maintains continuous or frequent contact with a ground or reference electrode. Grounding or reference electrodes can be associated with the top substrate facing the gap and the bottom substrate facing the gap. When electrodes are disposed on two substrates, electrical contacts for coupling the electrodes to the droplet actuator device for controlling or monitoring the electrodes can be associated with one or both substrates. In some cases, electrodes on one substrate are electrically coupled to the other substrate, such that only one substrate contacts the droplet actuator. In one embodiment, a conductive material (e.g., epoxy resin, such as MASTER BOND ^™ PolymerSystem EP79 available from Master Bond, Inc., Hackensack, NJ) provides electrical connections between electrodes on one substrate and electrical paths on other substrates; for example, a grounding electrode on the top substrate can be coupled to an electrical path on the bottom substrate via such a conductive material. When using multiple substrates, spacers can be provided between the substrates to determine the height of the gap and define allocated storage on the actuator. The spacer height can be, for example, at least about 5 μm, 100 μm, 200 μm, 250 μm, 275 μm, or greater. Optionally or additionally, the height of the spacer can be up to about 600 μm, 400 μm, 350 μm, 300 μm, or less. The spacer can be formed, for example, from a protruding layer of the top or bottom substrate and/or from a material inserted between the top and bottom substrates. One or more openings can be provided in one or more substrates to form fluid paths through which liquid can be delivered into the droplet-operating gap. In some cases, one or more openings can be aligned for interaction with one or more electrodes, e.g., aligned such that liquid flowing through the openings will become sufficiently close to one or more droplet-operating electrodes to allow droplet operation using the liquid by the droplet-operating electrodes. In some cases, the substrate (or bottom) and top substrate can be formed as an integral component. One or more reference electrodes can be provided on the substrate (or bottom) and/or top substrate and/or in the gap. Examples of reference electrode arrangements are provided in the foregoing reference patents and patent applications. In various embodiments, the manipulation of droplets via the droplet actuator can be electrode-mediated, such as electrowetting-mediated, dielectrophoresis-mediated, or Coulomb force-mediated. Examples of other techniques for controlling droplet manipulation that can be used in the droplet actuators of this disclosure include the use of devices that induce hydrodynamic fluid pressure, such as those based on mechanical principles (e.g., external injection pumps, pneumatic membrane pumps, vibrating membrane pumps, vacuum devices, centrifugal force, piezoelectric/ultrasonic pumps, and acoustic force); electro- or magnetic principles (e.g., electroosmosis, electric pumps, ferrofluid plugs, electrodynamic pumps, magnetic attraction or repulsion, and magnetohydrodynamic pumps); thermodynamic principles (e.g., volume expansion induced by bubble generation/phase change); other types of surface wetting principles (e.g., electrowetting, photoelectric wetting, and chemically, thermally, structurally, and radioactively induced surface tension gradients); gravity; surface tension (e.g., capillary action); electrostatic forces (e.g., electroosmosis); centrifugal flow (a substrate placed on a disc and rotating); magnetic forces (e.g., flow induced by oscillating ions); magnetohydrodynamics; and vacuum or pressure difference. In some embodiments, a combination of two or more of the foregoing techniques may be used to perform droplet operation in the droplet actuators of this disclosure. Similarly, one or more of the foregoing techniques may be used to deliver liquid into the droplet operation gap, for example from a reservoir in another device or an external reservoir of the droplet actuator (e.g., a reservoir associated with the droplet actuator substrate and a flow path from the reservoir to the droplet operation gap). The droplet operation surfaces of some droplet actuators of this disclosure may be made of hydrophobic materials, or may be coated or treated to make them hydrophobic. For example, in some cases, some or all of the droplet operation surfaces may be derivatized with low surface energy materials or chemicals, for example by deposition or in-situ synthesis, using, for example, poly or perfluorinated compounds in solution, or polymerizable monomers. Examples include (available from DuPont, Wilmington, DE), members of the cytop series of materials, coatings in the series of hydrophobic and superhydrophobic coatings (available from Cytonix Corporation, Beltsville, MD), silane coatings, fluorosilane coatings, hydrophobic phosphonate derivatives (e.g., those sold by Aculon, Inc.), and NOVEC ^™ electronic coatings (available from 3M Company, St. Paul, MN), other fluorinated monomers for plasma-enhanced chemical vapor deposition (PECVD), and organosiloxanes for PECVD (e.g., SiOC). In some cases, the droplet operating surface may include a hydrophobic coating having a thickness ranging from about 10 nm to about 1,000 nm. Furthermore, in some embodiments, the top substrate of the droplet actuator comprises a conductive organic polymer, which is then coated with a hydrophobic coating or otherwise treated to make the droplet operating surface hydrophobic. For example, the conductive organic polymer deposited on a plastic substrate may be poly(3,4-ethylenedioxythiophene)poly(styrene sulfonic acid) (PEDOT: PSS). Other examples of conductive organic polymers and alternative conductive layers are described in Pollack et al., International Patent Publication No. WO/2011/002957, entitled "Droplet Actuator Devices and Methods," published January 6, 2011, the entire disclosure of which is incorporated herein by reference. One or two substrates can be fabricated using printed circuit boards (PCBs), glass, indium tin oxide (ITO) coated glass, and/or semiconductor materials as substrates. When the substrate is ITO-coated glass, the ITO coating is preferably at least about 20 nm, 50 nm, 75 nm, 100 nm, or greater. Optionally or additionally, the thickness may be at most about 200 nm, 150 nm, 125 nm, or less. In some cases, the top and/or bottom substrates comprise PCB substrates coated with a dielectric such as a polyimide dielectric, which may in some cases also be coated or otherwise treated to make the droplet-operating surface hydrophobic. When the substrate includes a PCB, the following materials are examples of suitable materials: MITSUI ^™ BN-300 (available from MITSUI Chemicals America, Inc., San Jose, CA); ARLON ^™ 11N (available from Arlon, Inc., Santa Ana, CA); N4000-6 and N5000-30/32 (available from Park Electrochemical Corp., Melville, NY); ISOLA ^™ FR406 (available from Isola Group, Chandler, AZ), especially IS620; fluoropolymer series (suitable for fluorescence detection due to its low background fluorescence); polyimide series; polyester; polyethylene naphthalate (polyethylene naphthalate). Naphthalate; polycarbonate; polyether ether ketone; liquid crystal polymer; cyclic olefin copolymer (COC); cyclic olefin polymer (COP); aromatic polyamide; nonwoven aromatic polyamide reinforced material (available from DuPont, Wilmington, DE); branded fiber (available from DuPont, Wilmington, DE); and paper. Various materials are also suitable for use as dielectric components of substrates. Examples include: vapor-deposited dielectrics, such as PARYLENE ^™ C (particularly on glass), PARYLENE ^™ N, and PARYLENE ^™ HT (for high temperatures, around 300°C) (available from Parylene Coating Services, Inc., Katy, TX); AF coatings; CYTOP; solder masks, such as liquid photoimageable solder masks (e.g., on PCBs) like the TAIYO ^™ PSR4000 series, TAIYO ^™ PSR, and AUS series (available from TaiyoAmerica, Inc., Carson City, NV) (for good thermal properties in applications involving thermal control); and PROBIMER ^™ 8165 (for good thermal properties in applications involving thermal control) (available from Huntsman Advanced Materials Americas Inc., Los Angeles). Los Angeles, CA); dry film soldering masks, such as those in the dry film soldering mask series (available from DuPont, Wilmington, DE); film dielectrics, such as polyimide films (e.g., polyimide films available from DuPont, Wilmington, DE), polyethylene and fluoropolymers (e.g., FEP), polytetrafluoroethylene; polyesters; polyethylene naphthalate; cyclic olefin copolymers (COC); cyclic olefin polymers (COP); any other PCB substrate materials listed above; black matrix resin; polypropylene; and black flexible circuit materials, such as DuPont ^™ Pyralux HXC and DuPont ^™ Kapton. MBC (available from DuPont, Wilmington, DE). The droplet delivery voltage and frequency can be selected to suit the performance of the reagents used in a particular assay. Design parameters can be varied, for example, the number and arrangement of reservoirs on the actuator, the number of independent electrode connections, the size (volume) of different reservoirs, the arrangement of the magnet/bead washing zones, electrode dimensions, inter-electrode spacing, and gap height (between the top and bottom substrates) can be varied for specific reagents, assays, droplet volumes, etc. In some cases, the substrates of this disclosure can be derivatized with low surface energy materials or chemicals, for example, using deposition or in-situ synthesis, which utilizes poly or perfluorinated compounds in solution or polymerizable monomers. Examples include coatings and coatings for dip coating or spray coating, other fluorinated monomers for plasma-enhanced chemical vapor deposition (PECVD), and organosiloxanes (e.g., SiOC) for PECVD. Furthermore, in some cases, some or all of the droplet operating surfaces can be coated with a substance to reduce background noise, such as background fluorescence from the PCB substrate. For example, the noise-reducing coating may include a black matrix resin, for example, available from Toray. Black matrix resins from industries, Inc., Japan. The electrodes of droplet actuators are typically controlled by a controller or processor, which is provided as part of the system and may include processing functions, data and software storage, and input and output capabilities. Reagents may be provided on the droplet actuator in the droplet operating gap or in a reservoir fluidly coupled to the droplet operating gap. Reagents may be in liquid form, such as droplets, or in the droplet operating gap or in a reservoir fluidly coupled to the droplet operating gap, and they may be provided in a reconfigurable form. Reconfigurable reagents can typically be combined with a liquid for reconfiguration. Examples of reconfigurable reagents suitable for use with the methods and apparatus described herein include those described in U.S. Patent No. 7,727,466, entitled “Disintegratable Films for Diagnostic Devices,” published June 1, 2010, by Meathrel et al., the entire disclosure of which is incorporated herein by reference.

“液滴操作”是指液滴执行器上液滴的任何操作。液滴操作可以例如包括：将液滴加载到液滴执行器中；从源液滴分配一个或更多个液滴；将液滴分裂、分离或分成两个或更多个液滴；在任何方向将液滴从一个位置运输到另一个位置；将两个或更多个液滴合并或组合成单个液滴；稀释液滴；混合液滴；搅动液滴；使液滴变形；将液滴保持在位置；孵育液滴；加热液滴；蒸发液滴；冷却液滴；处理液滴；将液滴从液滴执行器运输出去；本文所述的其它液滴操作；和/或前述的任何组合。术语“合并(merge)”，“合并(merging)”，“组合(combine)”，“组合(combing)”等用于描述从两个或更多个液滴形成一个液滴。应当理解，当此类术语用于关于两个或更多个液滴时，可以使用足以导致两个或更多个液滴组合成一个液滴的液滴操作的任何组合。例如，“将液滴A与液滴B合并”可以通过运输液滴A到与静止的液滴B接触，运输液滴B到与静止的液滴A接触，或运输液滴A和B到彼此接触来实现。术语“分裂”，“分离”和“分开”并不意味着暗示相对于所得液滴的体积的任何特定结果(即，所得液滴的体积可以相同或不同)或所得液滴的数量(所得液滴的数量可以是2、3、4、5或更多)。术语“混合”是指导致液滴内的一种或多种组分的更均匀分布的液滴操作。“加载”液滴操作的实例包括微透析加载、压力辅助加载、机器人加载、被动加载和移液管加载。液滴操作可以是电极介导的。在一些情况下，通过在表面上亲水和/或疏水区域的使用和/或通过物理障碍物进一步促进液滴操作。关于液滴操作的实例，参见上述在“液滴执行器”的定义下引用的专利和专利申请。有时可以使用阻抗或电容感测或成像技术以确定或确认液滴操作的结果。此类技术的实例描述于Sturmer等人的美国专利公开号20100194408，题名为“Capacitance Detection in a Droplet Actuator”，公开于2010年8月5日，其全部通过引用并入本文。一般来说，感测或成像技术可用于确认特定电极处液滴的存在或不存在。例如，在液滴分配操作之后在目标电极处存在分配的液滴证实了液滴分配操作是有效的。类似地，在测定方案中在适当步骤的检测点处的液滴的存在可以确认先前的液滴操作组已成功地产生用于检测的液滴。滴液运输时间可以相当快。例如，在各种实施方案中，从一个电极到下一个的液滴的运输可以超过约1秒、或约0.1秒、或约0.01秒、或约0.001秒。在一个实施方案中，电极以AC模式操作，但切换到DC模式用于成像。对于液滴的足迹区域(footprintarea)与电润湿区域相似对于进行液滴操作是有帮助的；换句话说，通常使用1、2和3个电极分别有效地受控操作lx-、2x-、3x-液滴。如果液滴足迹(droplet footprint)大于可用于在给定时间进行液滴操作的电极数量，则液滴尺寸和电极数量之间的不同通常不应大于1；换句话说，使用1个电极有效地控制了2个液滴，并且使用2个电极有效地控制了3个液滴。当液滴包括珠时，液滴尺寸等于控制液滴的电极数量是有用的，例如运输液滴。"Droplet operation" refers to any operation of a droplet on a droplet actuator. Droplet operations may include, for example: loading a droplet into a droplet actuator; dispensing one or more droplets from a source droplet; splitting, separating, or dividing a droplet into two or more droplets; transporting a droplet from one location to another in any direction; merging or combining two or more droplets into a single droplet; diluting a droplet; mixing a droplet; agitating a droplet; deforming a droplet; holding a droplet in place; incubating a droplet; heating a droplet; evaporating a droplet; cooling a droplet; handling a droplet; transporting a droplet away from a droplet actuator; other droplet operations described herein; and/or any combination of the foregoing. The terms "merge," "merging," "combine," "combing," etc., are used to describe the formation of a single droplet from two or more droplets. It should be understood that when such terms are used with respect to two or more droplets, any combination of droplet operations sufficient to result in the combination of two or more droplets into a single droplet may be used. For example, “merging droplet A with droplet B” can be achieved by transporting droplet A to contact a stationary droplet B, transporting droplet B to contact a stationary droplet A, or transporting droplets A and B to contact each other. The terms “split,” “separate,” and “separate” do not imply any particular result relative to the volume of the resulting droplets (i.e., the volumes of the resulting droplets can be the same or different) or the number of resulting droplets (the number of resulting droplets can be 2, 3, 4, 5, or more). The term “mixing” refers to a droplet manipulation that results in a more uniform distribution of one or more components within the droplet. Examples of “loading” droplet manipulation include microdialysis loading, pressure-assisted loading, robotic loading, passive loading, and pipette loading. Droplet manipulation can be electrode-mediated. In some cases, droplet manipulation is further facilitated by the use of hydrophilic and/or hydrophobic regions on a surface and/or by physical barriers. For examples of droplet manipulation, see the patents and patent applications cited above under the definition of “droplet actuator.” Impedance or capacitance sensing or imaging techniques can sometimes be used to determine or confirm the results of droplet manipulation. An example of such a technique is described in U.S. Patent Publication No. 20100194408 to Sturmer et al., entitled “Capacitance Detection in a Droplet Actuator,” published August 5, 2010, the entirety of which is incorporated herein by reference. Generally, sensing or imaging techniques can be used to confirm the presence or absence of a droplet at a specific electrode. For example, the presence of a dispensed droplet at a target electrode after a droplet dispensing operation confirms that the droplet dispensing operation was effective. Similarly, the presence of a droplet at a detection point at an appropriate step in a assay confirms that a previous set of droplet operations has successfully generated a droplet for detection. Droplet transport time can be quite fast. For example, in various embodiments, transport from one electrode to the next droplet can exceed about 1 second, or about 0.1 seconds, or about 0.01 seconds, or about 0.001 seconds. In one embodiment, the electrode operates in AC mode but is switched to DC mode for imaging. Having a droplet footprint area similar to the electrowetting area is helpful for droplet manipulation; in other words, 1, 2, and 3 electrodes are typically used to effectively control 1x-, 2x-, and 3x- droplets, respectively. If the droplet footprint is larger than the number of electrodes available for droplet manipulation at a given time, the difference between the droplet size and the number of electrodes should generally not exceed 1; that is, 1 electrode effectively controls 2 droplets, and 2 electrodes effectively control 3 droplets. When the droplet includes beads, it is useful for the droplet size to equal the number of electrodes controlling the droplet, for example, in droplet transport.

在一些方面，可以使用CE例如液滴从细胞或单细胞制备核酸文库。在一些实施方案中，细胞可以悬浮在缓冲液中。在一些实施方案中，可以将细胞悬浮液引入液滴执行器。使用电极介导的液滴操作可以分配包含细胞悬浮液的液滴的阵列，使得每个液滴包含单细胞。使用电极介导的液滴操作，可以分配包含细胞裂解缓冲液的试剂液滴的阵列(裂解缓冲液液滴)。可以使用电极介导的操作组合裂解缓冲液液滴和包含单细胞的细胞悬浮液液滴的阵列以形成细胞裂解物液滴，使得细胞裂解物液滴包含单细胞的组分。可以向液滴执行器引入包含独特的核酸条形码、转座子和合适的酶(例如片段化酶(fragmentase)、聚合酶、连接酶、转座酶、逆转录酶等)的反应试剂。在一些实施方案中，转座子和/或条形码可以包含引物结合位点。使用电极介导的液滴操作，可以分配包含反应试剂的试剂液滴的阵列，使得每个试剂液滴包含独特的核酸条形码和合适的酶。可以使用电极介导的操作组合细胞裂解物液滴和试剂液滴以形成第一条形码液滴的阵列，其中来自单细胞的核酸被来自试剂液滴的酶作用，使得核酸包含条形码。在一些实施方案中，当组合细胞裂解物液滴和试剂液滴并且cDNA可以包含条形码时，可以逆转录细胞裂解物液滴中的mRNA。在一些实施方案中，条形码可以包含引物结合位点和独特的分子索引。使用电极介导的液滴操作，可以将第一条形码编码液滴进一步与试剂液滴组合多次以产生第二条形码液滴的阵列、第三条形码液滴的阵列等。在一些实施方案中，对于每轮组合，条形码是不同的。因此，将条形码液滴与试剂液滴多轮组合将产生组合的条形码编码。最后可以汇集和测序来自不同液滴的核酸。测序信息可以揭示来自细胞的核酸的测序信息，并且任选地也可以鉴定核酸的来源(例如细胞或单细胞)。如果核酸包含如与遗传性遗传疾病或癌症等疾病相关的突变，则此类信息是有价值的。In some aspects, CE (cell-mediated droplet manipulation) can be used to prepare nucleic acid libraries from cells or single cells. In some embodiments, cells can be suspended in a buffer. In some embodiments, the cell suspension can be introduced into a droplet actuator. Electrode-mediated droplet manipulation can dispense an array of droplets containing cell suspension, such that each droplet contains a single cell. Electrode-mediated droplet manipulation can dispense an array of reagent droplets containing cell lysis buffer (lysis buffer droplets). Electrode-mediated manipulation can be used to combine an array of lysis buffer droplets and an array of cell suspension droplets containing single cells to form cell lysate droplets, such that the cell lysate droplets contain components of single cells. Reaction reagents containing unique nucleic acid barcodes, transposons, and suitable enzymes (e.g., fragmentases, polymerases, ligases, transposases, reverse transcriptases, etc.) can be introduced into the droplet actuator. In some embodiments, transposons and/or barcodes can contain primer binding sites. Electrode-mediated droplet manipulation can dispense an array of reagent droplets containing reaction reagents, such that each reagent droplet contains a unique nucleic acid barcode and a suitable enzyme. Electrode-mediated manipulation can be used to combine cell lysate droplets and reagent droplets to form an array of first-barcode droplets, wherein nucleic acids from a single cell are acted upon by an enzyme from the reagent droplet, causing the nucleic acid to contain a barcode. In some embodiments, when cell lysate droplets and reagent droplets are combined and cDNA can contain a barcode, mRNA in the cell lysate droplets can be reverse transcribed. In some embodiments, the barcode can contain a primer binding site and a unique molecular index. Using electrode-mediated droplet manipulation, the first-barcode-encoded droplets can be further combined with reagent droplets multiple times to produce arrays of second-barcode droplets, third-barcode droplets, and so on. In some embodiments, the barcode is different for each round of combination. Therefore, multiple rounds of combination of barcode droplets with reagent droplets will produce combined barcode encodings. Finally, nucleic acids from different droplets can be collected and sequenced. The sequencing information can reveal the sequencing information of nucleic acids from the cell and optionally identify the origin of the nucleic acids (e.g., cells or single cells). Information is valuable if nucleic acids contain mutations that are associated with diseases such as hereditary genetic disorders or cancer.

在一些方面，本申请的方法可以应用于蛋白质组学。可以如下制备包含液滴的珠的阵列，即通过将珠悬浮液引入液滴执行器以分配来自珠悬浮液的液滴的阵列，使得在液滴的阵列中的每个液滴包含单个珠(参见美国申请公开20100130369，通过引用并入本文)。珠可以包含抗体或其它亲和探针(参见Immobilized Biomolecules in Analysis.APractical Approach.Cass T,Ligler FS,eds.Oxford University Press,New York,1998.pp 1-14,，通过引用并入本文，用于典型的附接方案)。在一些实施方案中，抗体对细胞表面表位可以是特异性的。在一些实施方案中，抗体可以是单克隆抗体，并且在其它实施方案中，抗体可以是多克隆的。使用电极介导的液滴操作，珠悬浮液滴的阵列可以与包含单细胞的液滴的阵列组合以产生珠液滴上的细胞的阵列，使得珠上的抗体结合至细胞表面蛋白质。在一些实施方案中，抗体对细胞内的蛋白质可以是特异性的。使用电极介导的液滴操作，珠悬浮液滴的阵列可以与包含单细胞裂解物的液滴的阵列组合，使得珠上的抗体结合至细胞内的蛋白质，以产生珠液滴上的蛋白质的阵列。任选地，使用电极介导的液滴操作，珠液滴上的蛋白质的阵列可以与包含蛋白质标记试剂的试剂液滴的阵列组合，使得蛋白质可以被独特地标记。可以从相关标记物或通过其它方式检测结合蛋白质(SDS-聚丙烯酰胺凝胶电泳，ELISA等)。可以确定蛋白质的身份和蛋白质的来源。在一些实施方案中，蛋白质组学数据可以与测序数据相关。In some aspects, the methods of this application can be applied to proteomics. Arrays of beads comprising droplets can be prepared by introducing a bead suspension into a droplet actuator to dispense an array of droplets from the bead suspension, such that each droplet in the array contains a single bead (see U.S. Application Publication 20100130369, incorporated herein by reference). The beads may contain antibodies or other affinity probes (see *Immobilized Biomolecules in Analysis*, *A Practical Approach*, Cass T, Ligler FS, eds., Oxford University Press, New York, 1998, pp. 1-14, incorporated herein by reference, for typical attachment schemes). In some embodiments, the antibody may be specific to a cell surface epitope. In some embodiments, the antibody may be a monoclonal antibody, and in other embodiments, the antibody may be polyclonal. Using electrode-mediated droplet manipulation, an array of bead-suspended droplets can be combined with an array of droplets containing single cells to produce an array of cells on the bead droplets, allowing antibodies on the beads to bind to cell surface proteins. In some embodiments, the antibodies may be specific to intracellular proteins. Using electrode-mediated droplet manipulation, an array of bead-suspended droplets can be combined with an array of droplets containing single-cell lysates, allowing antibodies on the beads to bind to intracellular proteins to produce an array of proteins on the bead droplets. Optionally, using electrode-mediated droplet manipulation, the array of proteins on the bead droplets can be combined with an array of reagent droplets containing protein labeling reagents, allowing the proteins to be uniquely labeled. The bound proteins can be detected from the associated label or by other means (SDS-polyacrylamide gel electrophoresis, ELISA, etc.). The identity and origin of the proteins can be determined. In some embodiments, proteomics data may be correlated with sequencing data.

在一些实施方案中，抗体可以对其它生物分子，且不限于蛋白质是特异性的。此类生物分子可以包括但不限于多糖或脂质。在一些实施方案中，此类生物分子的身份和来源可以与上面产生的序列数据相关。In some embodiments, the antibody may be specific to other biomolecules, not limited to proteins. Such biomolecules may include, but are not limited to, polysaccharides or lipids. In some embodiments, the identity and origin of such biomolecules may be associated with the sequence data generated above.

原位细胞分析In situ cell analysis

在一些实施方案中，可以原位分析细胞及其组分。在一些实施方案中，可使细胞通过流动池。In some implementations, cells and their components can be analyzed in situ. In some implementations, cells can be passed through a flow cell.

如本文所用的，术语“流动池”旨在表示具有可以流过一种或更多种流体试剂的表面的室。通常，流动池将具有流入口和流出口以促进流体的流动。例如在Bentley等人，Nature 456：53-59(2008)，WO 04/018497；US 7,057,026；WO 91/06678；WO 07/123744；US7,329,492；US 7,211,414；US 7,315,019；US 7,405,281和US 2008/0108082中描述了可容易地用于本公开的方法中的流动池和相关流体系统和检测平台的实例，其的每一个通过引用并入本文。As used herein, the term "flow cell" is intended to refer to a chamber having a surface through which one or more fluid reagents can flow. Typically, a flow cell will have an inlet and an outlet to facilitate fluid flow. Examples of flow cells and related fluid systems and detection platforms readily applicable to the methods of this disclosure are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497; US 7,057,026; WO 91/06678; WO 07/123744; US7,329,492; US 7,211,414; US 7,315,019; US 7,405,281 and US 2008/0108082, each of which is incorporated herein by reference.

在一些实施方案中，流动池可容纳阵列。用于核酸测序的阵列通常具有核酸特征的随机空间图案(random spatial pattern)。例如，可得自Illumina Inc.(San Diego，CA)的HiSeq^TM或MiSeq^TM测序平台利用流动池，通过随机接种随后桥接扩增在其上形成核酸阵列。然而，图案化的阵列也可用于核酸测序或其它分析应用。示例性图案化的阵列，其制造方法及其使用方法阐述于美国系列号13/787,396号；美国系列第13/783,043号；美国系列第13/784,368号；美国专利申请公开号2013/0116153A1；和美国专利申请公开号212/0316086A1中，其的每一个通过引用并入本文。此类图案化阵列的特征可以用于捕获单个核酸模板分子以接种随后形成均质集落，例如通过桥扩增。此类图案化阵列对于核酸测序应用是特别有用的。In some implementations, a flow cell can house an array. Arrays used for nucleic acid sequencing typically have a random spatial pattern of nucleic acid features. For example, HiSeq ^™ or MiSeq ^™ sequencing platforms available from Illumina Inc. (San Diego, CA) utilize a flow cell to form nucleic acid arrays by random seeding followed by bridging amplification. However, patterned arrays can also be used for nucleic acid sequencing or other analytical applications. Exemplary patterned arrays, methods of their fabrication, and methods of use are described in U.S. Serial No. 13/787,396; U.S. Serial No. 13/783,043; U.S. Serial No. 13/784,368; U.S. Patent Application Publication No. 2013/0116153A1; and U.S. Patent Application Publication No. 212/0316086A1, each of which is incorporated herein by reference. Features of such patterned arrays can be used to capture individual nucleic acid template molecules for seeding followed by homogeneous colonies, such as by bridging amplification. Such patterned arrays are particularly useful for nucleic acid sequencing applications.

在一些实施方案中，流动池表面可以包含捕获部分，例如抗体以将通过其的细胞固定在流动池表面上。在一些实施方案中，流动池表面上的抗体可以特异性结合细胞表面蛋白质。在一些实施方案中，抗体可以特异性结合癌细胞的细胞表面蛋白质，从而在流动池表面上富集癌细胞。In some embodiments, the flow cell surface may include a capture portion, such as an antibody, to immobilize cells passing through it onto the flow cell surface. In some embodiments, the antibody on the flow cell surface may specifically bind to cell surface proteins. In some embodiments, the antibody may specifically bind to cell surface proteins of cancer cells, thereby enriching cancer cells on the flow cell surface.

在一些实施方案中，可以在将细胞送入流动池之前通过本领域已知的细胞分选技术将细胞分成各种类型。示例性细胞分选技术包括但不限于荧光激活细胞分选术或使用流式细胞术的FACS，磁激活细胞分选术(MACS)(Miltenyi Biotec Inc.，San Diego，CA))，或通过其中将标记细胞的管放置在磁场内的无柱细胞分离技术。阳性选择的细胞保留在管中，而阴性选择的细胞在液体悬浮液中(STEMCELL Technologies Inc.，Vancouver，BC，Canada)。In some implementations, cells can be sorted into various types using cell sorting techniques known in the art before being fed into a flow cell. Exemplary cell sorting techniques include, but are not limited to, fluorescence-activated cell sorting or FACS using flow cytometry, magnetic activated cell sorting (MACS) (Miltenyi Biotec Inc., San Diego, CA), or column-free cell separation techniques in which tubes of labeled cells are placed within a magnetic field. Positively selected cells are retained in the tubes, while negatively selected cells are in a liquid suspension (STEMCELL Technologies Inc., Vancouver, BC, Canada).

在一些实施方案中，通过流动池的细胞可以在流动池内裂解，并从而在流动池中释放细胞的核酸(DNA和RNA)。在一些实施方案中，细胞在裂解前固定在流动池上。细胞裂解的方法是本领域已知的，其包括但不限于超声处理、蛋白质酶处理、通过渗透压休克(osmotic shock)、高盐处理。在一些实施方案中，可以逆转录整个RNA。在一些实施方案中，可以将独特的条形码引入来自细胞的核酸，例如DNA、RNA或cDNA。将条形码引入核酸的方法已在上文中讨论并且包括但不限于使用Nextera^TM技术、连接酶、聚合酶的标签片段化。在一些实施方案中，条形码可用于鉴定细胞来源。在一些实施方案中，条形码可具有引物结合位点。在一些实施方案中，可以将多个条形码引入到核酸中。在一些实施方案中，多个条形码彼此是不同的。在一些实施方案中，可以将具有条形码的核酸扩散；再次合并，并可能引入额外的条形码。在一些实施方案中，在引入条形码之后或期间，可以将核酸片段化。在一些实施方案中，可以在扩散到流动池之前扩增片段化的核酸。在一些实施方案中，包含条形码的片段化核酸可以扩散到包含捕获探针并固定在流动池上的流动池的不同部分。在一些实施方案中，固定化的片段化核酸可以经受桥接扩增。In some embodiments, cells passing through a flow cell can be lysed within the flow cell, thereby releasing the cell's nucleic acids (DNA and RNA) into the flow cell. In some embodiments, cells are immobilized on the flow cell prior to lysis. Methods of cell lysis are known in the art and include, but are not limited to, sonication, protease treatment, osmotic shock, and high-salt treatment. In some embodiments, whole RNA can be reverse transcribed. In some embodiments, unique barcodes can be introduced into nucleic acids from the cell, such as DNA, RNA, or cDNA. Methods for introducing barcodes into nucleic acids have been discussed above and include, but are not limited to, tagging and fragmentation using Nextera ^™ technology, ligases, and polymerases. In some embodiments, barcodes can be used to identify the cell origin. In some embodiments, barcodes may have primer binding sites. In some embodiments, multiple barcodes can be introduced into the nucleic acids. In some embodiments, the multiple barcodes are different from each other. In some embodiments, nucleic acids with barcodes can be diffused; re-merged, and potentially additional barcodes can be introduced. In some embodiments, nucleic acids can be fragmented after or during the introduction of barcodes. In some implementations, fragmented nucleic acids can be amplified before diffusion into the flow cell. In some implementations, fragmented nucleic acids containing barcodes can diffuse into different portions of the flow cell containing capture probes immobilized on the flow cell. In some implementations, the immobilized fragmented nucleic acids can undergo bridging amplification.

在上述方面的一些实施方案中，通过流动池的细胞是单细胞。在一些实施方案中，可评估整个转录组。在一些实施方案中，为了序列信息可以同时评估来自细胞或来自单细胞的DNA和RNA。在一些实施方案中，可以评估来自细胞或来自单细胞的蛋白质的身份或序列信息。在一些实施方案中，可以评估来自细胞或来自单细胞的其它分析物例如脂质、碳水化合物、细胞的细胞器。In some embodiments of the above aspects, the cells passing through the flow cell are single cells. In some embodiments, the entire transcriptome can be evaluated. In some embodiments, DNA and RNA from cells or single cells can be evaluated simultaneously for sequence information. In some embodiments, the identity or sequence information of proteins from cells or single cells can be evaluated. In some embodiments, other analytes from cells or single cells, such as lipids, carbohydrates, and cellular organelles, can be evaluated.

使模板核酸片段化Fragmentation of template nucleic acids

制备模板核酸的一些实施方案可以包括对目标核酸进行片段化。在一些实施方案中，条形码编码的或索引化的衔接子附接到片段化的目标核酸。可以使用本领域众所周知的任何方法附接衔接子，例如连接(酶或化学)、标签片段化、聚合酶延伸等。在一些实施方案中，包含不连续转座子序列的转座体的插入可导致目标核酸的片段化。在包含环状转座体的一些实施方案中，可以在转座子序列的片段化位点将包含转座子序列的目标核酸片段化。可以在例如美国专利申请公开号2002/0208705、美国专利申请公开号2012/0208724和国际专利申请公开WO 02/061832中找到用于片段化用于本文提供的实施方案的目标核酸的方法的其它实例，其的每一个通过引用以其全部并入本文。Some embodiments for preparing template nucleic acids may include fragmentation of the target nucleic acid. In some embodiments, a barcode-encoded or indexed adaptor is attached to the fragmented target nucleic acid. The adaptor can be attached using any method well known in the art, such as ligation (enzymatic or chemical), tag fragmentation, polymerase extension, etc. In some embodiments, insertion of a transposon containing a discontinuous transposon sequence can lead to fragmentation of the target nucleic acid. In some embodiments containing a circular transposon, the target nucleic acid containing the transposon sequence can be fragmented at a fragmentation site of the transposon sequence. Other examples of methods for fragmenting the target nucleic acid used in the embodiments provided herein can be found, for example, U.S. Patent Application Publication No. 2002/0208705, U.S. Patent Application Publication No. 2012/0208724, and International Patent Application Publication WO 02/061832, each of which is incorporated herein by reference in its entirety.

使单分子标签化(Tagging)Tagging of single molecules

本发明提供了用于使分子标签化的方法，使得可以追踪和鉴定单个分子。然后可以将批量数据去卷积并转换回单个分子。当从原始分子到最终产物的过程改变原始群的(化学计量)表示时，区分单个分子并将信息返回关联到原始分子的能力是特别重要的。例如，扩增导致可偏向原始表示的重复(例如PCR复本或偏向的扩增)。这可以改变由于不均匀扩增和/或扩增偏性(amplification bias)引起的甲基化状态调用、拷贝数、等位基因比例。通过鉴定单个分子，在加工后，密码-标签化区分相同的分子。照此，可以滤除重复和扩增偏性，允许分子或分子群的原始表示的精确确定。This invention provides a method for molecular tagging, enabling the tracking and identification of individual molecules. Batch data can then be deconvolved and transformed back to individual molecules. The ability to distinguish individual molecules and return information to the original molecules is particularly important when the process from the original molecule to the final product alters the (stoichiometric) representation of the original population. For example, amplification can lead to repetitions that can be biased towards the original representation (e.g., PCR duplicates or biased amplification). This can alter methylation state calls, copy numbers, and allele ratios due to uneven amplification and/or amplification bias. By identifying individual molecules, codon-tagged differentiation distinguishes identical molecules after processing. In this way, repetitions and amplification bias can be filtered out, allowing for precise determination of the original representation of molecules or populations of molecules.

独特地标签化单分子的优点是，原始池中的相同分子由于其标签化而被独特地识别。在进一步的下游分析中，现在可以区分这些独特标签化的分子。该技术可开发用于其中使用扩增的测定方案中。例如，已知扩增会扭曲分子的混合群的原始表现。如果不使用独特的标签化，则原始表示(如拷贝数或等位基因比率)将需要考虑在表示中每个分子的偏性(已知或未知)。使用独特的标签化，可以通过去除复本和计数分子(每个具有独特的标签)的原始表示来准确地确定表示。因此，可以扩增和测序cDNA，而不用担心偏性，因为可以过滤数据，以便仅选择真实的序列或感兴趣的序列用于进一步分析。可以通过在具有相同条形码的许多读取中达成一致来构建准确的读取。The advantage of uniquely tagging single molecules is that identical molecules in the original pool are uniquely identified due to their tagging. In further downstream analyses, these uniquely tagged molecules can now be distinguished. This technique can be developed for use in assay protocols where amplification is employed. For example, amplification is known to distort the original representation of a mixed population of molecules. Without unique tagging, the original representation (such as copy number or allele ratio) would need to account for the bias (known or unknown) of each molecule in the representation. With unique tagging, the representation can be accurately determined by removing duplicates and counting the original representation of molecules (each with a unique tag). Therefore, cDNA can be amplified and sequenced without concern for bias, as the data can be filtered to select only true sequences or sequences of interest for further analysis. Accurate reads can be constructed by agreeing on many reads with the same barcode.

在本文所述的组合物和方法的一些实施方案中，优选在测定的早期阶段对原始群进行标签化，尽管如果较早的步骤不引起偏性或不重要，但可以在稍后阶段进行标签化。在任何这些应用中，条形码序列的复杂度应该大于待标签化的单个分子的数量。这确保不同的目标分子接收不同的和独特的标签。因此，一定长度(例如长度为5、10、20、30、40、50、100或200个核苷酸)的随机寡核苷酸库是所期望的。标签的随机库(random pool)代表具有密码空间4ⁿ的标签的大复杂度，其中n是核苷酸的数量。可以在不同的阶段掺入额外的密码(无论是设计的还是随机的)以用作进一步的检查，例如用于纠错的奇偶校验。In some embodiments of the compositions and methods described herein, the original population is preferably tagged at an early stage of the assay, although tagging can be performed at a later stage if the earlier steps do not cause bias or are insignificant. In any of these applications, the complexity of the barcode sequence should be greater than the number of individual molecules to be tagged. This ensures that different target molecules receive different and unique tags. Therefore, a random oligonucleotide library of a certain length (e.g., 5, 10, 20, 30, 40, 50, 100, or 200 nucleotides) is desirable. The random pool of tags represents a high complexity of tags with a code space of ⁴ⁿ , where n is the number of nucleotides. Additional codes (whether designed or randomized) can be incorporated at different stages for use as further checks, such as parity checks for error correction.

在本文所述的组合物和方法的一个实施方案中，将单个分子(例如目标DNA)附接至独特的标记物，例如独特的寡聚物序列和/或条形码。标记物的附接可以通过连接、偶联化学、吸附、转座子序列的插入等进行。其它方法包括扩增(例如通过PCR、RCA或LCR)、复制(例如通过聚合酶的添加)和非共价相互作用。In one embodiment of the compositions and methods described herein, a single molecule (e.g., target DNA) is attached to a unique marker, such as a unique oligomeric sequence and/or barcode. Attachment of the marker can be performed by ligation, coupling chemistry, adsorption, insertion of transposon sequences, etc. Other methods include amplification (e.g., via PCR, RCA, or LCR), replication (e.g., via the addition of polymerase), and non-covalent interactions.

具体方法包括含有条形码(例如，设计的或随机的序列)至PCR引物，使得每个模板将在密码空间内接收单独的密码，从而产生可以与其它扩增子区分开的独特的扩增子。该概念可以应用于使用聚合酶扩增的任何方法，例如GoldenGate^TM测定法和美国专利号7,582,420、7,955,794和8,003,354中所公开的测定法，其的每一个通过引用以其全部并入。密码-标签化的目标序列可以通过例如滚环扩增的方法进行环化和扩增，以产生密码-标签化的扩增子。类似地，密码也可以添加到RNA中。Specific methods involve incorporating barcodes (e.g., designed or random sequences) into PCR primers, such that each template receives a separate codon within the codon space, thereby producing unique amplicons that can be distinguished from other amplicons. This concept can be applied to any method using polymerase amplification, such as the GoldenGate ^™ assay and the assays disclosed in U.S. Patent Nos. 7,582,420, 7,955,794, and 8,003,354, each of which is incorporated herein by reference in its entirety. The codon-tagged target sequence can be circularized and amplified, for example, using rolling circle amplification, to produce codon-tagged amplicons. Similarly, codons can also be added to RNA.

分析模板核酸的方法Methods for analyzing template nucleic acids

本文所述的技术的一些实施方案包括分析模板核酸的方法。在此类实施方案中，可以从模板核酸获得测序信息，并且该信息可以用于产生一个或更多个目标核酸的序列表示。Some embodiments of the technology described herein include methods for analyzing template nucleic acids. In such embodiments, sequencing information can be obtained from the template nucleic acid, and this information can be used to generate sequence representations of one or more target nucleic acids.

在本文所述的测序方法的一些实施方式中，可以使用连接的读取策略(linkedread strategy)。连接读取策略可以包括鉴定连接至少两个测序读取的测序数据。例如，第一次测序读取可以包含第一标志物(marker)，并且第二次测序读取可以包含第二标志物。第一和第二标志物可以鉴定在目标核酸的序列表示中来自相邻的每个测序读取的测序数据。在本文所述的组合物和方法的一些实施方案中，标志物可以包括第一条形码序列和第二条形码序列，其中第一条形码序列可以与第二条形码序列配对。在其它实施方案中，标志物可以包括第一宿主标签和第二宿主标签。在更多实施方案中，标记物可以包括具有第一宿主标签的第一条形码序列和具有第二宿主标签的第二条形码序列。In some embodiments of the sequencing methods described herein, a linked read strategy may be used. A linked read strategy may include identifying sequencing data that are linked from at least two sequencing reads. For example, a first sequencing read may include a first marker, and a second sequencing read may include a second marker. The first and second markers may identify sequencing data from each adjacent sequencing read in the sequence representation of the target nucleic acid. In some embodiments of the compositions and methods described herein, the marker may include a first barcode sequence and a second barcode sequence, wherein the first barcode sequence may be paired with the second barcode sequence. In other embodiments, the marker may include a first host tag and a second host tag. In further embodiments, the marker may include a first barcode sequence having a first host tag and a second barcode sequence having a second host tag.

用于测序模板核酸的方法的示例性实施方案可以包括以下步骤：(a)使用杂交至第一引物位点的测序引物对第一条形码序列进行测序；和(b)使用杂交至第二引物的测序引物对第二条形码序列进行测序。结果是两个序列读取，其有助于将模板核酸与其基因组邻居连接。鉴于足够长的读取和足够短的文库片段，可以在信息上合并这两个读取以制备覆盖整个片段的一个长读取。使用条形码序列读取和从插入中存在的9个核苷酸重复序列，现在可以将读取与它们的基因组邻居连接以便在计算机中形成长得多的“连接读取”。An exemplary implementation of a method for sequencing template nucleic acids may include the steps of: (a) sequencing a first barcode sequence using sequencing primers hybridized to a first primer site; and (b) sequencing a second barcode sequence using sequencing primers hybridized to a second primer site. The result is two sequence reads that facilitate the linking of the template nucleic acid to its genomic neighbors. Given sufficiently long reads and sufficiently short library fragments, these two reads can be informationally merged to prepare a single long read covering the entire fragment. Using the barcode sequence reads and 9-nucleotide repeat sequences present in the insert, the reads can now be linked to their genomic neighbors to form a much longer “linked read” in a computer.

如将理解的，包含模板核酸的文库可以包括重复的核酸片段。测序重复的核酸片段在包括为重复片段创建共有序列的方法中是有利的。此类方法可以提高为了向模板核酸和/或模板核酸的文库提供共有序列的准确性。As will be understood, a library containing template nucleic acids may include repetitive nucleic acid fragments. Sequencing repetitive nucleic acid fragments is advantageous in methods that include creating shared sequences for the repetitive fragments. Such methods can improve the accuracy of providing shared sequences to template nucleic acids and/or libraries of template nucleic acids.

在本文所述的测序技术的一些实施方案中，进行实时序列分析。例如，可以通过同时获取和分析测序数据来进行实时测序。在一些实施方案中，获得测序数据的测序方法可以在各个点终止，包括在获得目标核酸序列数据的至少一部分之后或在整个核酸读取被测序之前。在国际专利公开号WO 2010/062913中提供了示例性方法、系统和进一步的实施方案，其公开内容通过引用以其全部并入本文。In some embodiments of the sequencing technology described herein, real-time sequence analysis is performed. For example, real-time sequencing can be performed by simultaneously acquiring and analyzing sequencing data. In some embodiments, the sequencing method for acquiring sequencing data can terminate at various points, including after acquiring at least a portion of the target nucleic acid sequence data or before the entire nucleic acid read is sequenced. Exemplary methods, systems, and further embodiments are provided in International Patent Publication No. WO 2010/062913, the disclosure of which is incorporated herein by reference in its entirety.

在用于使用连接的读取策略组装短测序读取的方法的示例性实施方案中，将包含条形码的转座子序列插入到基因组DNA中，制备文库并获得用于模板核酸的文库的测序数据。可以通过鉴定配对的条形码来组装模板块，然后组装更大的重叠群。在一个实施方案中，通过使用重叠读取的密码配对，组装的读数可以进一步组装成更大的重叠群。In an exemplary embodiment of a method for assembling short sequencing reads using a ligation-based read strategy, transposon sequences containing barcodes are inserted into genomic DNA, a library is prepared, and sequencing data for the library of template nucleic acids is obtained. Template blocks can be assembled by identifying paired barcodes, and then larger contigs can be assembled. In one embodiment, the assembled reads can be further assembled into larger contigs by using codon pairings of overlapping reads.

本文所述的测序技术的一些实施方案包括错误检测和校正特征。错误的实例可包括测序过程期间在碱基调用中的错误，以及在将片段组装成更大重叠群中的错误。如将理解的，错误检测可以包括在检测数据集中的错误的存在或可能性，并且照此，可能不需要检测错误的位置或错误的数量。对于纠错，有关在数据集中错误的位置和/或错误的数量的信息是有用的。用于纠错的方法是本领域众所周知的。实例包括汉明距离(hammingdistance)的使用，以及校验和算法(checksum algorithm)的使用(参见，例如，美国专利申请公开号2010/0323348；美国专利号7,574,305；和美国专利号6,654,696，其公开内容通过引用以其全部并入本文)。Some implementations of the sequencing technologies described herein include error detection and correction features. Examples of errors may include errors in base calls during the sequencing process, as well as errors in assembling fragments into larger contigs. As will be understood, error detection may include detecting the presence or probability of errors in the dataset, and thus may not require detecting the location or number of errors. For error correction, information about the location and/or number of errors in the dataset is useful. Methods for error correction are well known in the art. Examples include the use of Hamming distance and the use of checksum algorithms (see, for example, U.S. Patent Application Publication No. 2010/0323348; U.S. Patent No. 7,574,305; and U.S. Patent No. 6,654,696, the disclosures of which are incorporated herein by reference in their entirety).

巢式文库(Nested libraries)Nested libraries

另一种方法涉及上述的结合标签化法(junction tagging method)和巢式测序文库的制备。巢式子文库由密码-加标签的DNA片段创建。这可以允许跨基因组更少频率的转座子标签化。它也可以创建(巢式)测序读取的更大的多样性。这些因素可以提高覆盖度和准确性。Another approach involves combining the junction tagging method described above with the preparation of nested sequencing libraries. Nested sublibraries are created from codon-tagged DNA fragments. This allows for less frequent transposon tagging across the genome. It also creates greater diversity in (nested) sequencing reads. These factors can improve coverage and accuracy.

二次采样和全基因组扩增可以创建起始分子的某些群的许多拷贝。然后通过转座子特异性片段化产生DNA片段，其中每个片段接收密码，该密码允许将片段返回与具有匹配密码的原始邻居(无论相同、互补或以其它方式信息上连接)连接。加标签的片段通过随机方法或序列特异性方法(例如酶消化、随机剪切、基于转座子的剪切或其它方法)至少第二次片段化，从而产生密码-标签化的DNA片段的亚文库。在前述方法的有用变型中，可以通过使用含有用于下游富集目的的生物素或其它亲和官能度的转座子来优先分离密码-标签化的片段。随后的文库制备将巢式DNA片段转化为测序模板。配对末端测序导致DNA片段的和目标DNA的密码-标签的序列确定。由于创建了用于相同密码-标签的巢式文库，因此可以用短读取对长DNA片段进行测序。Secondary sampling and whole-genome amplification can create numerous copies of certain populations of the starting molecule. DNA fragments are then generated through transposon-specific fragmentation, where each fragment receives a codon that allows the fragment to be linked back to its original neighbor with a matching codon (whether identical, complementary, or otherwise informationally linked). The tagged fragments are fragmented at least a second time by random or sequence-specific methods (e.g., enzyme digestion, random scissing, transposon-based scissing, or other methods), resulting in a sub-library of codon-tagged DNA fragments. In a useful variation of the aforementioned methods, codon-tagged fragments can be preferentially isolated by using transposons containing biotin or other affinity functionalities for downstream enrichment purposes. Subsequent library preparation converts the nested DNA fragments into sequencing templates. Paired-end sequencing leads to the determination of the sequences of the codon-tags of the DNA fragments and the target DNA. Because nested libraries for the same codon-tags are created, long DNA fragments can be sequenced using short reads.

测序方法sequencing methods

本文所述的方法和组合物可以与各种测序技术结合使用。在一些实施方案中，确定目标核酸的核苷酸序列的方法可以是自动化方法。The methods and compositions described herein can be used in conjunction with various sequencing technologies. In some embodiments, the method for determining the nucleotide sequence of the target nucleic acid can be automated.

本文所述的测序方法的一些实施方案包括通过合成(SBS)技术测序，例如焦磷酸测序技术。焦磷酸测序检测到无机焦磷酸盐(PP_i)的释放，因为特定的核苷酸被掺入新生链中(Ronaghi等人，Analytical Biochemistry 242(1)：84-9(1996)；Ronaghi，M.GenomeRes.11(1)：3-11(2001)；Ronaghi等人，Science 281(5375)：363(1998)；美国专利号6,210,891；美国专利号6,258,568和美国专利好6,274,320，其的每一个通过引用以其全部并入本文)。Some implementations of the sequencing methods described herein include sequencing via synthesis (SBS) techniques, such as pyrosequencing. Pyrosequencing detects the release of inorganic pyrophosphate ( _PPi ) as specific nucleotides are incorporated into the nascent strand (Ronaghi et al., Analytical Biochemistry 242(1): 84-9 (1996); Ronaghi, M. Genome Res. 11(1): 3-11 (2001); Ronaghi et al., Science 281(5375): 363 (1998); U.S. Patent No. 6,210,891; U.S. Patent No. 6,258,568 and U.S. Patent No. 6,274,320, each of which is incorporated herein by reference in its entirety).

在SBS的另一实例性的类型中，循环测序通过逐步添加可逆终止子核苷酸来完成，该可逆终止子核苷酸含有例如可切割或可光漂白的染料标记物，例如在美国专利号7,427,67、美国专利号7,414,163和美国专利号7,057,026中所述的，其的每一个通过引用以其全部并入。这种由Illumina Inc.商业化的方法也描述于国际专利申请公开号WO 91/06678和WO 07/123744中，其的每一个通过引用以其全部并入本文。荧光标记的终止子的可用性(其中可以逆转终止并且可以切割荧光标记物)促进有效的循环可逆终止(CRT)测序。聚合酶也可以被共同工程化以有效地掺入这些修饰的核苷酸并从这些修饰的核苷酸延伸出来。In another exemplary type of SBS, cycle sequencing is accomplished by stepwise addition of reversible terminator nucleotides containing, for example, cleavable or photobleachable dye labels, as described in U.S. Patent Nos. 7,427,67, 7,414,163, and 7,057,026, each of which is incorporated herein by reference in its entirety. This method, commercialized by Illumina Inc., is also described in International Patent Application Publications WO 91/06678 and WO 07/123744, each of which is incorporated herein by reference in its entirety. The availability of fluorescently labeled terminators (where termination is reversible and the fluorescent label can be cleaved) facilitates efficient cycle reversible termination (CRT) sequencing. Polymerases can also be co-engineered to efficiently incorporate and extend from these modified nucleotides.

可用于本文所述方法和组合物的其它示例性SBS系统和方法描述于美国专利申请公开号2007/0166705，美国专利申请公开号2006/0188901，美国专利号7057026，美国专利申请公开号2006/0240439，美国专利申请公开号2006/0281109，PCT公开号WO 05/065814，美国专利申请公开号2005/0100900，PCT公开号WO 06/064199和PCT公开号WO 07/010251中，其的每一个通过引用以其全部并入。Other exemplary SBS systems and methods that can be used with the methods and compositions described herein are described in U.S. Patent Application Publication No. 2007/0166705, U.S. Patent Application Publication No. 2006/0188901, U.S. Patent No. 7057026, U.S. Patent Application Publication No. 2006/0240439, U.S. Patent Application Publication No. 2006/0281109, PCT Publication No. WO 05/065814, U.S. Patent Application Publication No. 2005/0100900, PCT Publication No. WO 06/064199 and PCT Publication No. WO 07/010251, each of which is incorporated herein by reference in its entirety.

本文所述的测序技术的一些实施方案可以通过连接技术利用测序。此类技术利用DNA连接酶掺入核苷酸并鉴定此类核苷酸的掺入。可与本文所述方法和组合物使用的示例性SBS系统和方法描述于美国专利号6,969,488，美国专利号6,172,218和美国专利6,306,597中，其的每一个通过引用以其全部并入。Some embodiments of the sequencing technologies described herein can utilize sequencing via ligation techniques. Such techniques utilize DNA ligases to incorporate nucleotides and identify the incorporation of such nucleotides. Exemplary SBS systems and methods that can be used with the methods and compositions described herein are described in U.S. Patent Nos. 6,969,488, 6,172,218, and 6,306,597, each of which is incorporated herein by reference in its entirety.

本文所述的测序方法可以有利地以多种形式进行，使得同时操纵多个不同的目标核酸。在具体实施方案中，不同的目标核酸可以在常见的反应容器中或在特定的基板的表面上进行处理。这以多重方式允许方便地递送测序试剂，去除未反应的试剂和检测掺入事件。在使用表面结合目标核酸的实施方案中，目标核酸可以是以阵列形式。以阵列形式，目标核酸通常可以以空间上可区分的方式偶联到表面。例如，目标核酸可以通过直接共价附，附接于珠或其它颗粒或与聚合酶或与附接至表面的其它分子相关联而结合。阵列可以包括在每个位点的目标核酸的单拷贝(也称为特征)，或者可以在每个位点或特征处存在具有相同序列的多拷贝。通过扩增方法例如桥联扩增或乳液PCR可以产生多拷贝，如本文进一步详细描述的。The sequencing methods described herein can advantageously be performed in a variety of forms, enabling the simultaneous manipulation of multiple different target nucleic acids. In specific embodiments, the different target nucleic acids can be processed in common reaction vessels or on the surface of a specific substrate. This allows for convenient delivery of sequencing reagents, removal of unreacted reagents, and detection of incorporation events in multiple ways. In embodiments using surface-bound target nucleic acids, the target nucleic acids can be in the form of an array. In array form, the target nucleic acids can typically be coupled to the surface in a spatially distinguishable manner. For example, the target nucleic acids can be bound by direct covalent attachment, attachment to beads or other particles, or association with polymerases or other molecules attached to the surface. The array can include a single copy (also called a feature) of the target nucleic acid at each site, or multiple copies with the same sequence can be present at each site or feature. Multiple copies can be generated by amplification methods such as bridging amplification or emulsion PCR, as described further in detail herein.

本文提出的方法可以在各种密度中的任一个使用具有特征的阵列，包括例如至少约10个特征/cm²、100特征/cm²、500特征/cm²、1,000特征/cm²、5,000特征/cm²、10,000特征/cm²、100,000特征/cm²、100,000特征/cm²、5,000,000特征/cm²、10⁷特征/cm²、5×10⁷特征/cm²、10⁸特征/cm²、5×10⁸特征/cm²、10⁹特征/cm²/、5×10⁹特征/cm²或更高。The method proposed in this paper can use arrays with features at any of various densities, including, for example, at least about 10 features/ ^cm² , 100 features/ ^cm² , 500 features/ ^cm² , 1,000 features/ ^cm² , 5,000 features/ ^cm² , 10,000 features/ ^cm² , 100,000 features/ ^cm² , 100,000 features/ ^cm² , 5,000,000 features/ ^cm² , ^10⁷ features/ ^cm² , 5 × ^10⁷ features/ ^cm² , ^10⁸ features/ ^cm² , 5 × ^10⁸ features/ ^cm² , ^10⁹ features/ ^cm² , 5 × ^10⁹ features/ ^cm² , or higher.

降低测序数据中错误率的方法Methods to reduce error rate in sequencing data

本文提供的方法和组合物的一些实施方案包括降低测序数据中的错误率。在一些此类实施方案中，双链目标核酸的有义链和反义链各自与不同的条形码相关联。扩增每个链，从扩增的链的多拷贝获得序列信息，并且从冗余序列信息产生目标核酸的共有序列表示。因此，序列信息可以源自每条链并从每条链鉴定。因此，当源自一条链的序列信息与来自另一条链的序列信息不一致时，可以鉴定和减少序列错误。Some embodiments of the methods and compositions provided herein include reducing error rates in sequencing data. In some such embodiments, the sense and antisense strands of a double-stranded target nucleic acid are each associated with a different barcode. Each strand is amplified, sequence information is obtained from multiple copies of the amplified strand, and a common sequence representation of the target nucleic acid is generated from the redundant sequence information. Thus, sequence information can be derived from and identified from each strand. Therefore, sequence errors can be identified and reduced when sequence information derived from one strand is inconsistent with sequence information from another strand.

在一些实施方案中，目标核酸的有义链和反义链与不同的条形码相关联。条形码可以通过各种方法与目标核酸相关联，该方法包括衔接子的连接和转座子序列的插入。在一些此类实施方案中，Y-衔接子可以连接至目标核酸的至少一个末端。Y-衔接子可以包括双链序列和非互补链，每条链包含不同的条形码。可以扩增和测序具有连接的Y-衔接子的目标核酸，使得每个条形码可用于鉴定原始有义链或反义链。在Kinde I等人(2011)PNAS108：9530-9535中描述了类似的方法，其公开通过引用以其全部并入本文。在一些实施方案中，通过插入本文提供的转座子序列，目标核酸的有义链和反义链与不同的条形码相关联。在一些此类实施方案中，转座子序列可以包含不互补条形码。In some embodiments, the sense and antisense strands of the target nucleic acid are associated with different barcodes. Barcodes can be associated with the target nucleic acid by various methods, including the ligation of adaptors and the insertion of transposon sequences. In some such embodiments, a Y-adaptor can be ligated to at least one end of the target nucleic acid. The Y-adaptor can include a double-stranded sequence and a non-complementary strand, each strand containing a different barcode. The target nucleic acid with the ligated Y-adaptor can be amplified and sequenced, such that each barcode can be used to identify the original sense or antisense strand. A similar method is described in Kinde I et al. (2011) PNAS 108:9530-9535, the disclosure of which is incorporated herein by reference in its entirety. In some embodiments, the sense and antisense strands of the target nucleic acid are associated with different barcodes by inserting transposon sequences provided herein. In some such embodiments, the transposon sequences can contain non-complementary barcodes.

此类方法的一些实施方案包括从目标双链核酸的链获得序列信息，其包括(a)从包含第一测序衔接子和第二测序衔接子(具有置于其间的双链目标核酸的至少一部分)的模板核酸获得序列数据，其中：(i)第一测序衔接子包含双链第一条形码、单链第一引物位点和单链第二引物位点，其中第一和第二引物位点是不互补的，和(ii)第二测序衔接子，其包含双链第二条形码、单链第三引物位点和单链第四引物位点，其中第三和第四引物位点是不互补的。在一些实施方案中，模板核酸的有义链的第一引物位点和模板核酸的反义链的第三引物位点包含相同的序列。在一些实施方案中，每个条形码是不同的。在一些实施方案中，第一测序衔接子包含偶联第一引物位点和第二引物位点的单链发夹。Some implementations of such methods include obtaining sequence information from the strand of a target double-stranded nucleic acid, comprising (a) obtaining sequence data from a template nucleic acid containing a first sequencing adaptor and a second sequencing adaptor (having at least a portion of the double-stranded target nucleic acid disposed therebetween), wherein: (i) the first sequencing adaptor contains a double-stranded first barcode, a single-stranded first primer site, and a single-stranded second primer site, wherein the first and second primer sites are non-complementary; and (ii) the second sequencing adaptor contains a double-stranded second barcode, a single-stranded third primer site, and a single-stranded fourth primer site, wherein the third and fourth primer sites are non-complementary. In some implementations, the first primer site of the sense strand of the template nucleic acid and the third primer site of the antisense strand of the template nucleic acid contain the same sequence. In some implementations, each barcode is different. In some implementations, the first sequencing adaptor contains a single-stranded hairpin conjugating the first and second primer sites.

在另一个实施方案中，目标核酸的每个末端与包含不同条形码的衔接子相关联，使得来自核酸的有义链和反义链的延伸产物可以彼此区分。在一些实施方案中，选择引物位点序列和条形码，使得从退火至有义链的引物的延伸产生可以与从退火至反义链的引物的延伸的产物区分开的产物。在实例中，3'有义引物位点与3'反义引物位点相同，但不同于5'有义和5'反义引物位点两者。退火到3'有义引物位点和3'反义引物位点的引物的延伸将产生从每条链的以下产物：In another implementation, each end of the target nucleic acid is associated with an adaptor containing a different barcode, allowing extensions of the sense and antisense strands of the nucleic acid to be distinguishable from each other. In some implementations, the primer site sequence and barcode are selected such that extensions of the primer annealed to the sense strand produce products distinguishable from extensions of the primer annealed to the antisense strand. In an example, the 3' sense primer site is the same as the 3' antisense primer site, but different from both the 5' sense and 5' antisense primer sites. Extensions of the primers annealed to the 3' sense primer site and the 3' antisense primer site will produce the following products from each strand:

有义链：(5')条形码2-[目标序列]-条形码1(3')Meaningful chain: (5') barcode2-[target sequence]-barcode1(3')

反义链：(5')条形码1-[目标序列]-条形码2(3')Antonym chain: (5') barcode 1 - [target sequence] - barcode 2 (3')

因此，从核酸的有义链和反义链的延伸产物可以彼此区分。在Schmitt M.W.，等人.，PNAS(2012)109：14508-13中示出了示例性方法，其公开通过引用以其全部并入本文。在一些此类方法中，条形码和引物位点可以通过多种方法与目标核酸相关联，该方法包括衔接子的连接和转座子序列的插入。在一些实施方案中，可以设计转座子序列以提供具有发夹的衔接子。发夹提供维持目标核酸的有义链和反义链的物理邻近性的能力。可以使用包含本文所述的接头的转座子序列制备包含发夹的模板核酸。接头的实例包括单链核酸。Therefore, the extensions of the sense and antisense strands of a nucleic acid can be distinguished from each other. An exemplary method is illustrated in Schmitt M.W., et al., PNAS (2012) 109:14508-13, the disclosure of which is incorporated herein by reference in its entirety. In some such methods, barcodes and primer sites can be associated with the target nucleic acid through a variety of methods, including the ligation of an adaptor and the insertion of a transposon sequence. In some embodiments, the transposon sequence can be engineered to provide an adaptor with a hairpin. The hairpin provides the ability to maintain the physical proximity of the sense and antisense strands of the target nucleic acid. A template nucleic acid containing a hairpin can be prepared using a transposon sequence containing the adapter described herein. Examples of adapters include single-stranded nucleic acids.

制备用于从双链目标核酸的每条链获得序列信息的模板核酸的文库的一些实施方案包括(a)提供转座体群，所述转座体包含转座酶和第一转座子序列，所述第一转座子序列包含：(i)第一转座酶识别位点、第一引物位点和第一条形码，和(ii)第二转座子序列，其包含第二转座酶识别位点、第二引物位点和第二条形码，其中第一转座子序列与第二转座子序列是不连续的；和(b)在条件下使转座体与双链核酸接触，使得所述第一和第二转座子序列插入双链目标核酸中，从而制备模板核酸的文库用于从双链目标核酸的每条链中获得序列信息。在一些实施方案中，转座体群还含有包含转座酶和转座子序列的转座体，所述转座子序列包含第三转座酶识别位点和第四转座酶识别位点，具有置于其间的条形码序列，所述条形码序列包括第三条形码和第四条形码，具有置于其间的测序衔接子，所述测序衔接子包含第三引物位点和第四引物位点，具有置于其间的接头。在一些实施方案中，模板核酸的有义链的第一引物位点和模板核酸的反义链的第三引物位点包含相同的序列。一些实施方案还包括步骤(c)选择包含转座子序列的模板核酸，其中第一转座子序列与包含接头的第二转座子序列是不连续的。在一些实施方案中，接头包含适于与捕获探针结合的亲和标签。在一些实施方案中，亲和标签选自下组：His、生物素和链霉抗生物素蛋白质。在一些实施方案中，每个条形码是不同的。在一些实施方案中，接头包含单链核酸。在一些实施方案中，目标核酸包含基因组DNA。Some embodiments for preparing a library of template nucleic acids for obtaining sequence information from each strand of a double-stranded target nucleic acid include (a) providing a group of transposons comprising a transposon and a first transposon sequence comprising: (i) a first transposon recognition site, a first primer site, and a first barcode; and (ii) a second transposon sequence comprising a second transposon recognition site, a second primer site, and a second barcode, wherein the first transposon sequence and the second transposon sequence are discontinuous; and (b) contacting the transposon with the double-stranded nucleic acid under conditions such that the first and second transposon sequences are inserted into the double-stranded target nucleic acid, thereby preparing a library of template nucleic acids for obtaining sequence information from each strand of the double-stranded target nucleic acid. In some embodiments, the transposon population further comprises a transposon containing a transposon enzyme and a transposon sequence, the transposon sequence containing a third transposon recognition site and a fourth transposon recognition site, having a barcode sequence disposed therebetween, the barcode sequence including a third barcode and a fourth barcode, having a sequencing adaptor disposed therebetween, the sequencing adaptor containing a third primer site and a fourth primer site, having an adapter disposed therebetween. In some embodiments, the first primer site of the sense strand of the template nucleic acid and the third primer site of the antisense strand of the template nucleic acid contain the same sequence. Some embodiments further include step (c) selecting the template nucleic acid containing the transposon sequence, wherein the first transposon sequence is discontinuous with the second transposon sequence containing the adapter. In some embodiments, the adapter contains an affinity tag adapted to bind to a capture probe. In some embodiments, the affinity tag is selected from the group consisting of His, biotin, and streptoavidin proteins. In some embodiments, each barcode is different. In some embodiments, the adapter comprises a single-stranded nucleic acid. In some embodiments, the target nucleic acid comprises genomic DNA.

获取单倍型信息的方法Methods for obtaining haplotype information

本文提供的方法和组合物的一些实施方案包括从目标核酸获得单倍型信息的方法。单倍型信息可以包括确定在目标核酸(例如基因组)中在特定基因座处存在或不存在不同序列。例如，可获得针对等位基因的母亲和父亲拷贝的序列信息。在多倍体生物体中，可获得针对至少一种单倍型的序列信息。此类方法也可用于降低从目标核酸获得序列信息的错误率。Some embodiments of the methods and compositions provided herein include methods for obtaining haplotype information from a target nucleic acid. Haplotype information may include determining the presence or absence of distinct sequences at specific loci in the target nucleic acid (e.g., the genome). For example, sequence information for maternal and paternal copies of alleles may be obtained. In polyploid organisms, sequence information for at least one haplotype may be obtained. Such methods can also be used to reduce the error rate in obtaining sequence information from a target nucleic acid.

通常，获得单倍型信息的方法包括将核酸分布到一个或更多个隔室中，使得每个隔室包含相当于大约核酸的单倍型的核酸量，或相当于小于大约核酸的单倍型的核酸量。然后可以从每个隔室获得序列信息，从而获得单倍型信息。将模板核酸分布到多个容器中增加单个容器包含等位基因或SNP的单拷贝的可能性，或者从单个容器获得的共有序列信息反映等位基因或SNP的序列信息的可能性。如将理解的，在一些此类实施方案中，模板核酸可以在将模板核酸分隔成多个容器之前进行稀释。例如，每个容器可以含有等于目标核酸的大约单倍型等同物的目标核酸的量。在一些实施方案中，容器可以包含小于目标核酸的约一个单倍型等同物。Typically, methods for obtaining haplotype information involve distributing nucleic acids into one or more compartments, such that each compartment contains an amount of nucleic acid equivalent to approximately one haplotype of the target nucleic acid, or an amount of nucleic acid equivalent to less than approximately one haplotype of the target nucleic acid. Sequence information can then be obtained from each compartment, thus obtaining haplotype information. Distributing the template nucleic acid into multiple containers increases the likelihood that a single container contains a single copy of the allele or SNP, or that the shared sequence information obtained from a single container reflects the sequence information of the allele or SNP. As will be understood, in some such embodiments, the template nucleic acid may be diluted before being divided into multiple containers. For example, each container may contain an amount of target nucleic acid equal to approximately one haplotype equivalent of the target nucleic acid. In some embodiments, the container may contain approximately one haplotype equivalent less than the target nucleic acid.

确定单倍型信息的方法、用虚拟隔室进行单倍型分析的方法、制备用于单倍型分析的目标核酸的方法描述于WIPO公开WO/2014/142850中，其通过引用并入本文。Methods for determining haplotype information, methods for haplotype analysis using virtual compartments, and methods for preparing target nucleic acids for haplotype analysis are described in WIPO Publication WO/2014/142850, which is incorporated herein by reference.

实施例Example

实施例1：维持模板邻近性Example 1: Maintaining Template Proximity

本实施例说明了用于在CE内维持模板核酸的邻近信息的方法。使用包含不连续转座子序列的转座体制备模板核酸，其中Tn5转座酶保持结合至转座后的模板DNA。目标核酸与包含Tn5转座酶和不连续转座子序列的转座体接触。用SDS进一步处理的样品可能显示为模板核酸的各种片段的涂片；未用SDS处理的样品可能显示推定的高分子量模板核酸的保留。因此，即使可以将核酸片段化，相邻序列仍然可以通过转座酶彼此相关联。This embodiment illustrates a method for maintaining proximity information of template nucleic acids within a CE (Cellular Enzyme-Coefficient). The template nucleic acid is prepared using a transposon containing discontinuous transposon sequences, wherein a Tn5 transposase remains bound to the transposed template DNA. The target nucleic acid is contacted with a transposon containing the Tn5 transposase and the discontinuous transposon sequences. Samples further treated with SDS may appear as smears of various fragments of the template nucleic acid; samples not treated with SDS may show retention of the presumed high molecular weight template nucleic acid. Therefore, even if the nucleic acid can be fragmented, adjacent sequences can still be correlated with each other via the transposase.

在又一示例性方法中，用包含人染色体的目标核酸，使用包含不连续转座子序列的转座体制备模板核酸文库。CE包含目标核酸。对于其中稀释后通过SDS去除转座酶的样品，可以观察到DNA的单倍型阻断。因此，通过实施本文所述的方法，当转座、稀释、以及转化成测序文库时，目标核酸可以保持目标完整性。In yet another exemplary method, a template nucleic acid library is prepared using a transposon containing discontinuous transposon sequences, with the target nucleic acid comprising the human chromosome. The CE contains the target nucleic acid. For samples in which the transposase has been removed by SDS after dilution, haplotype blocking of the DNA can be observed. Therefore, by implementing the methods described herein, the target nucleic acid can maintain its target integrity during transposition, dilution, and conversion into a sequencing library.

本文所用的术语“包含(comprising)”是“包括(including)”、“含有(containing)”或“特征在于(characterized by)”的同义词，并且是包容性的或开放式的，并且不排除额外的未列举的要素(element)或方法步骤。As used herein, the term “comprising” is a synonym for “including,” “containing,” or “characterized by,” and is inclusive or open-ended, and does not exclude additional unlisted elements or methodological steps.

在说明书中使用的表示组分的量、反应条件等的所有数字应被理解为在所有情况下被术语“约”修饰。因此，除非有相反指示，否则本文所述的数值参数是近似值，其可以根据寻求获得的所期望性质而变化。至少，并不是如将等同原则的应用限制在要求本申请优先权的任何申请中的任何权利要求的范围的尝试，每个数值参数应根据有效数字的数量和常规的舍入法来解释。All figures used in the specification to indicate the amount of components, reaction conditions, etc., should be understood to be modified by the term "about" in all cases. Therefore, unless otherwise indicated, the numerical parameters described herein are approximate values that may vary depending on the desired properties sought to be obtained. At least, as is not an attempt to limit the application of the doctrine of equivalence to the scope of any claim in any application asserting priority to this application, each numerical parameter should be interpreted according to the number of significant figures and conventional rounding methods.

上述说明书公开了本发明的几种方法和材料。本发明在方法和材料上易于修改，并且在制造方法和设备上易于改变。通过考虑本文公开的本发明的这一公开或实践，此类修改对于本领域技术人员将变得显而易见。因此，本发明并不旨在限于本文公开的具体实施方式，而是涵盖进入本发明的真实范围和精神内的所有修改和替代方案。The foregoing specification discloses several methods and materials of the present invention. The present invention is readily modifiable in its methods and materials, and readily adaptable in its manufacturing methods and equipment. Such modifications will become apparent to those skilled in the art upon consideration of this disclosure or practice of the invention as disclosed herein. Therefore, the present invention is not intended to be limited to the specific embodiments disclosed herein, but rather encompasses all modifications and alternatives that fall within the true scope and spirit of the invention.

本文引用的所有参考文献，包括但不限于已发表和未发表的申请、专利和参考文献，其通过引用以其全部并入本文，并由此作为本说明书的一部分。在通过引用并入的出版物和专利或专利申请与说明书中包含的公开内容相矛盾的程度上，本说明书旨在取代和/或优先于任何这种相互矛盾的材料。All references cited herein, including but not limited to published and unpublished applications, patents, and references, are incorporated herein in their entirety by reference and are thus part of this specification. Where a publication or patent or patent application incorporated by reference contradicts the disclosure contained in this specification, this specification is intended to supersede and/or give precedence to any such contradictory material.

实施例2：单细胞全转录组测序Example 2: Single-cell whole transcriptome sequencing

本实施例描述了一种方法，该方法用于在cDNA整个长度上进行均匀条形码编码并使用条形码来确定cDNA的邻近信息以及鉴定细胞来源，即鉴定与mRNA相关的单细胞。This embodiment describes a method for uniformly barcoding cDNA along its entire length and using the barcodes to determine cDNA proximity information and identify cell origin, i.e., identifying single cells associated with mRNA.

本实施例说明了用于测序单细胞转录组的方法。在本实施例中，使用液滴微流体(droplet microfluidics)捕获在单个捕获珠上的多个单细胞的转录组，然后使用邻近保留转座和组合索引化(CPT-seq)以条形码编码源自每个单细胞的转录组的cDNA。在一个实施方案中，本发明的方法使用多个条形码编码方法以对单细胞cDNA进行索引化，其中在标签片段化反应中加入第一条形码，并在PCR扩增反应中加入第二条形码。This embodiment illustrates a method for sequencing single-cell transcriptomes. In this embodiment, droplet microfluidics are used to capture the transcriptomes of multiple single cells on a single capture bead, and then cDNA derived from the transcriptome of each single cell is barcoded using proximity-preserving transposation and combinatorial indexing (CPT-seq). In one embodiment, the method of the present invention uses multiple barcoding methods to index single-cell cDNA, wherein a first barcode is added in the tag fragmentation reaction and a second barcode is added in the PCR amplification reaction.

在一个实例中，从单细胞捕获poly-A+RNA，并且批量处理捕获的poly-A+RNA用于产生多重测序文库。In one instance, poly-A+ RNA is captured from a single cell, and the captured poly-A+ RNA is processed in batches to generate multiplex sequencing libraries.

该方法可以包括以下步骤。在步骤1，将来自单细胞的RNA捕获在捕获珠上。例如，将多个单细胞(例如，约1000个单细胞)包封在包含裂解缓冲液和捕获珠的单个液滴(即，平均每个液滴一个细胞和一个珠)中。在捕获珠的表面上固定的是多个捕获探针，其包括多聚-dT(poly-dT)捕获序列和PCR引物序列。液滴的裂解缓冲液组合物解离单细胞胞质膜，释放胞质RNA。通过将RNA上的多聚-A(poly-A)+序列杂交至固定在共包封的捕捉珠的表面上的寡聚-dT(oligo-dT)捕获序列来捕获释放的多聚-A+RNA。每个捕获珠现在包括来自单细胞的转录组的多聚-A+RNA。来自单细胞的所有多聚-A+RNA在捕获珠上保持彼此接近。This method may include the following steps. In step 1, RNA from a single cell is captured on a capture bead. For example, multiple single cells (e.g., approximately 1000 single cells) are encapsulated in a single droplet containing a lysis buffer and a capture bead (i.e., one cell and one bead per droplet on average). Multiple capture probes, including poly-dT capture sequences and PCR primer sequences, are immobilized on the surface of the capture bead. The lysis buffer composition of the droplet dissociates the single cell cytoplasmic membrane, releasing cytoplasmic RNA. The released poly-A+ RNA is captured by hybridizing a poly-A+ sequence on the RNA to an oligo-dT capture sequence immobilized on the surface of the co-encapsulated capture bead. Each capture bead now contains poly-A+ RNA from the transcriptome of the single cell. All poly-A+ RNA from the single cell remains close to each other on the capture bead.

在步骤2，将其上具有单细胞多聚-A+RNA的捕获珠从多个液滴(例如，约1000个捕获珠)合并，并合成双链cDNA。例如，将捕获珠合并、洗涤，并使用能够进行链转换的RNA酶H减去逆转录酶来合成第一链cDNA。在第一链cDNA合成期间包括链转换引物，允许在cDNA的3'末端放置通用引物位点。然后在PCR反应中使用通用引物和高保真DNA聚合酶制备双链cDNA(例如，PCR的1至2个循环)。每个捕获珠现在包括从单细胞的多聚-A+RNA逆转录的cDNA。In step 2, capture beads bearing single-cell poly-A+ RNA are merged from multiple droplets (e.g., approximately 1000 capture beads) and double-stranded cDNA is synthesized. For example, the capture beads are merged, washed, and first-stranded cDNA is synthesized using an RNase H subtracted from a reverse transcriptase capable of strand switching. A strand switching primer is included during first-stranded cDNA synthesis, allowing the placement of a universal primer site at the 3' end of the cDNA. Double-stranded cDNA is then prepared using universal primers and a high-fidelity DNA polymerase in a PCR reaction (e.g., 1 to 2 cycles of PCR). Each capture bead now contains cDNA reverse transcribed from single-cell poly-A+ RNA.

在步骤3，将其上具有双链cDNA的捕获珠分布在96孔板的孔中，使得每孔约有10个捕获珠。In step 3, capture beads with double-stranded cDNA are distributed in the wells of a 96-well plate, so that there are about 10 capture beads per well.

在步骤4中，使用96个独特索引化的转座体对每个孔中的双链cDNA进行标签片段化。标签片段化用于用衔接子和索引序列修饰cDNA，同时保持单细胞邻接。在标签片段化反应中使用的96个独特索引化的转座体组合物的组装在下面更详细地描述。标签片段化反应为每个未来的cDNA片段添加二分条形码的第一部分。每个捕获珠现在包括来自单细胞的标签片段化的cDNA。In step 4, the double-stranded cDNA in each well is tagged and fragmented using 96 uniquely indexed transposons. Tagging and fragmentation is used to modify the cDNA with adaptors and index sequences while maintaining single-cell contiguity. The assembly of the 96 uniquely indexed transposon composition used in the tagging and fragmentation reaction is described in more detail below. The tagging and fragmentation reaction adds the first part of a bi-barcode to each future cDNA fragment. Each capture bead now contains tagged and fragmented cDNA from a single cell.

在步骤5，收集所有孔中的捕获珠，合并，洗涤并重新分布到另一个96孔板的孔中，使得每孔有约10个捕获珠。来自单个细胞的mRNA/cDNA停留在单个珠的表面上，并且转座酶仍然结合到片段化的cDNA上并保持片段免于离解。In step 5, the capture beads from all wells are collected, combined, washed, and redistributed into the wells of another 96-well plate, so that there are approximately 10 capture beads per well. The mRNA/cDNA from the single cell remains on the surface of the individual beads, and the transposase still binds to the fragmented cDNA and keeps the fragment from dissociation.

在步骤6，从捕获珠释放转座酶和标签片段化的cDNA。例如，将SDS(1％SDS)溶液的等分试样加入到每个孔中以从捕获珠释放结合的转座酶和标签片段化的cDNA。In step 6, transposase and tagged fragmented cDNA are released from the capture beads. For example, an aliquot of SDS (1% SDS) solution is added to each well to release the bound transposase and tagged fragmented cDNA from the capture beads.

在步骤7，使用包括P5或P7序列和独特条形码序列的PCR引物扩增每个孔中的标签片段化的cDNA。例如，将条形码编码的P5和P7 PCR引物的96个独特组合中的一个添加到每个孔中，并扩增标签片段化的cDNA片段。PCR反应将二分条形码的剩余部分添加至每个cDNA片段。每个cDNA片段现在包括4个条形码序列：在标签片段化反应中添加的两个序列和在PCR扩增期间添加的2个序列。因此，通过标签片段化索引和通过扩增步骤添加的PCR索引的组合来鉴定来自单个细胞的mRNA/cDNA。In step 7, the tagged fragmented cDNA in each well is amplified using PCR primers that include a P5 or P7 sequence and a unique barcode sequence. For example, one of 96 unique combinations of barcode-encoded P5 and P7 PCR primers is added to each well, and the tagged fragmented cDNA fragment is amplified. The PCR reaction adds the remaining portion of the bi-barcode to each cDNA fragment. Each cDNA fragment now includes four barcode sequences: two sequences added during the tagging and fragmentation reaction and two sequences added during PCR amplification. Thus, mRNA/cDNA from a single cell is identified by a combination of the tagging and fragmentation index and the PCR index added during the amplification step.

在步骤8，将来自每个孔的条形编码的cDNA片段合并并测序。In step 8, the bar-coded cDNA fragments from each well are merged and sequenced.

在本实施例中，96×96组合索引化用于条形码编码约1000个单细胞，具有具备相同条形码的两个细胞的大约5％的几率。通过增加“隔室”的数量可以容易地扩大通量。例如，通过使用384×384组合条形码编码(大约147,456个虚拟隔室)，各自平行地条形码编码大约10,000个单细胞，具有具备相同条形码的两个细胞的大约3％的几率。In this embodiment, 96×96 combined indexing is used to barcode approximately 1000 single cells, with an approximately 5% probability of two cells having the same barcode. Throughput can be easily increased by increasing the number of "compartments." For example, by using 384×384 combined barcode encoding (approximately 147,456 virtual compartments), approximately 10,000 single cells are barcode-encoded in parallel, with an approximately 3% probability of two cells having the same barcode.

本实施例还描述了组装96个独特的条形编码转座体复合物的过程，用于在组合条形码编码方案中添加二分条形码的第一部分。该过程包括但不限于以下步骤。This embodiment also describes a process for assembling 96 unique barcode transposable complexes to add the first part of a binary barcode to a combined barcode encoding scheme. This process includes, but is not limited to, the following steps.

在步骤A中，通过将各自索引化的寡核苷酸(每一个在其3'末端含有Tn5镶嵌末端(ME))退火至通用的5'磷酸化的ME互补寡核苷酸(pMENTS)，形成20个独特索引化的转座子。例如，将包括P5序列、独特的8碱基“i5”索引序列、通用连接子(connector)序列Universalconnector A-C15的索引化寡核苷酸1110和ME序列退火至ME互补序列1115。ME互补序列1115是通用5'磷酸化寡核苷酸(pMENTS)，其与索引化的寡核苷酸1110中的ME序列是互补的。随后使用通用连接子序列A-C15以退火定制索引2(custom index 2)测序引物。In step A, 20 uniquely indexed transposons are formed by annealing the respective indexed oligonucleotides (each containing a Tn5 mosaic terminus (ME) at its 3' end) to universal 5' phosphorylated ME complementary oligonucleotides (pMENTS). For example, indexed oligonucleotide 1110, which includes a P5 sequence, a unique 8-base "i5" index sequence, a universal connector sequence Universalconnector A-C15, and the ME sequence, is annealed to the ME complementary sequence 1115. The ME complementary sequence 1115 is a universal 5' phosphorylated oligonucleotide (pMENTS) that is complementary to the ME sequence in indexed oligonucleotide 1110. Custom index 2 sequencing primers are then annealed using the universal connector sequence A-C15.

执行第二组退火反应(即，12个单独的退火反应)以形成12个转座子的第二组，每个转座子包含与P7序列相邻的独特的8碱基“i7”索引序列。例如，将包括P7序列、独特的8碱基i7索引序列、通用连接子序列B-D15和ME序列的索引化寡核苷酸1120退火至ME互补序列1115。随后使用通用连接子序列B-D15以退火定制索引1(custom index 1)测序引物。A second set of annealing reactions (i.e., 12 separate annealing reactions) is performed to form a second set of 12 transposons, each containing a unique 8-base "i7" index sequence adjacent to the P7 sequence. For example, the indexed oligonucleotide 1120, which includes the P7 sequence, the unique 8-base i7 index sequence, the universal linker sequence B-D15, and the ME sequence, is annealed to the ME complementary sequence 1115. The universal linker sequence B-D15 is then used to anneal the custom index 1 sequencing primers.

在步骤B中，在单独反应中用Tn5转座酶组装退火的P5_i5转座子1125(即，各具有独特的8碱基i5索引序列的8个P5_i5转座子1125)和退火的P7_i7转座子1130(即，各具有独特的8碱基i7索引序列的12个转座子1130)以形成转座体复合物。例如，将每个退火的P5_i5转座子1125与Tn5转座酶1135在约37℃孵育约1小时以形成P5_i5转座体复合体1140。类似地，将每个退火的P7_i7转座子1130与Tn5转座酶1135在约37℃孵育约1小时形成P7_i7转座体复合物1145。In step B, annealed P5_i5 transposons 1125 (i.e., eight P5_i5 transposons 1125, each with a unique 8-base i5 index sequence) and annealed P7_i7 transposons 1130 (i.e., twelve transposons 1130, each with a unique 8-base i7 index sequence) are assembled using Tn5 transposase in separate reactions to form transposome complexes. For example, each annealed P5_i5 transposon 1125 is incubated with Tn5 transposase 1135 at approximately 37°C for approximately 1 hour to form P5_i5 transposome complex 1140. Similarly, each annealed P7_i7 transposon 1130 is incubated with Tn5 transposase 1135 at approximately 37°C for approximately 1 hour to form P7_i7 transposome complex 1145.

在步骤C中，通过将P5_i5转座体复合物1140的等分试样与P7_7转座体复合物1145的等分试样组合来制备96个独特的转座体复合物。例如，将P5_i5转座体复合物1140等分到96孔板的A至H行中，并且将P7_i7转座体复合物1145等分到相同96孔板的第1至第12列中。8个P5_i5转座体复合物1140和12个P7_i7转座体复合物1145的组合产生96个不同的索引组合。In step C, 96 unique transposon complexes are prepared by combining aliquots of P5_i5 transposon complex 1140 with aliquots of P7_7 transposon complex 1145. For example, P5_i5 transposon complex 1140 is aliquoted into rows A through H of a 96-well plate, and P7_i7 transposon complex 1145 is aliquoted into columns 1 through 12 of the same 96-well plate. Combinations of 8 P5_i5 transposon complexes 1140 and 12 P7_i7 transposon complexes 1145 produce 96 different index combinations.

为了评估组装的转座体复合物，使用单次标签片段化反应和单次PCR反应制备来自10个单细胞的测序文库。合并10个包含来自10个单细胞的cDNA的捕获珠，并使用P5_i5_1加P7_i7_1转座体混合物进行标签片段化。然后将标签化的cDNA从捕获珠释放，并使用条形码编码的P5和P7引物进行PCR扩增以产生测序文库。然后使用生物分析仪(Bioanalyzer)分析测序文库中的片段大小分布。在一些实施方案中，在PCR之后进行清理。在一些实施方案中，在第一SPRI清理之后进行第二SPRI清理。在一些实施方案中，在生物分析仪中进行分析之前，将样品稀释10倍。To evaluate the assembled transposon complex, sequencing libraries from 10 single cells were prepared using a single-segment tagging and fragmentation reaction and a single-segment PCR reaction. Ten capture beads containing cDNA from each of the 10 single cells were combined and tagged using a P5_i5_1 + P7_i7_1 transposon mixture. The tagged cDNA was then released from the capture beads and amplified by PCR using barcode-encoded P5 and P7 primers to generate the sequencing library. The fragment size distribution in the sequencing library was then analyzed using a bioanalyzer. In some embodiments, cleanup is performed after PCR. In some embodiments, a second SPRI cleanup is performed after the first SPRI cleanup. In some embodiments, the samples are diluted 10-fold before analysis in the bioanalyzer.

在另一个实例中，使用两种不同的转座体复合物混合物以从100个单细胞制备测序文库。在该实例中，使用分离和合并实验设计来评估转座体复合物。将包含来自100个单细胞的cDNA的100个捕获珠分布到两个标签片段化反应中，使用P5_i5_2加P7_i7_2转座体混合物进行一次标签片段化反应，并使用P5_i5_3加P7_i7_3转座体混合物进行第二标签片段化反应。标签片段化反应后，合并来自每个反应的捕获珠并重新分布用于使用条形码编码的P5和P7 PCR引物的两种独特组合(即，P5和P7 PCR引物的第一组合以及P5和P7 PCR的第二组合引物)的PCR扩增以产生两个测序文库。然后使用生物分析仪分析每个测序文库中的片段大小分布。在单个0.7x SPRI清理步骤后分析条形编码的文库。In another example, two different transposon complex mixtures were used to prepare sequencing libraries from 100 single cells. In this example, a separation and merging experimental design was used to evaluate the transposon complexes. 100 capture beads containing cDNA from 100 single cells were distributed to two tag fragmentation reactions: a first tag fragmentation reaction using a P5_i5_2 + P7_i7_2 transposon mixture, and a second tag fragmentation reaction using a P5_i5_3 + P7_i7_3 transposon mixture. Following the tag fragmentation reactions, the capture beads from each reaction were merged and redistributed for PCR amplification using two unique combinations of barcode-encoded P5 and P7 PCR primers (i.e., a first combination of P5 and P7 PCR primers and a second combination of P5 and P7 PCR primers) to produce two sequencing libraries. The fragment size distribution in each sequencing library was then analyzed using a bioanalyzer. The barcode-encoded libraries were analyzed after a single 0.7x SPRI cleanup step.

Claims

1. A method for analyzing one or more analytes of a plurality of adjacent retention elements (CEs), the method comprising:

(a) Provide a plurality of CEs, wherein the CEs are physical entities that are in close proximity to the retained analyte;

(b) Dividing the CE into a plurality of first compartments, wherein each compartment contains a plurality of CEs;

(c) Provide a set of first report portions to the analyte of each CE in each of the plurality of first compartments, wherein each first report portion in the set of first report portions provided to the analyte of each of the first compartments is different from the first report portions provided to the analyte of each of the other first compartments;

(d) Combining the CEs that include the first report portion;

(e) The CE containing the first report portion is divided into a plurality of second compartments;

(f) Providing a set of second report portions to the analyte of each CE in each of the plurality of second compartments, wherein each second report portion in the set of second report portions provided to the analyte of each of the second compartments is different from the second report portions provided to the analyte of each of the other second compartments;

(g) The CE combination comprising the first and second report portions;

(h) Analyze the analytes comprising the report portion of each compartment.

In this type of analysis, the detection is performed on single cells from which each analyte originates.

2. The method of claim 1, wherein the CE is a cell, an embedded cell, or a retained cell.

3. The method of claim 1, wherein each compartment in step (e) comprises a plurality of CEs.

4. The method of claim 1, wherein the analyte comprises nucleic acid, and the first reporter portion is introduced via tag fragmentation, ligation, or PCR.

5. The method of claim 1, wherein the analyte comprises nucleic acid, and the first reporter portion is introduced by transcribing RNA into cDNA.

6. The method of claim 1, wherein the analyte comprises nucleic acid, and the second reporter portion is introduced via tag fragmentation, ligation, or PCR.

7. The method of claim 1, wherein the first reporter portion comprises a nucleic acid barcode.

8. The method of claim 1, wherein the second reporter portion comprises a nucleic acid barcode.

9. The method of any one of claims 1-8, wherein prior to step (h), the method further comprises: splitting the CE containing the first and second report portions into a plurality of third compartments; providing a set of third report portions to the analyte of each CE in each of the plurality of third compartments, wherein each third report portion in the set of third report portions provided to the analyte of each third compartment is different from the third report portions provided to the analyte of each of the other third compartments; and combining the CE containing the first, second, and third report portions.

10. The method of claim 9, wherein the analyte comprises nucleic acid, and the third reporter portion is introduced via tag fragmentation, ligation, or PCR.

11. The method of claim 9, wherein the third reporter portion comprises a nucleic acid barcode.

12. The method of any one of claims 1-8, wherein steps (b)-(g) are repeated with one or more sets of additional report portions.

13. The method of claim 1, wherein the analyte comprises DNA, RNA or cDNA.

14. The method of claim 13, wherein the DNA comprises genomic DNA.

15. The method of any one of claims 1-8, 13 or 14, wherein the first reporter portion is used to identify the sample origin of each single cell.

16. The method of any one of claims 1-8, 13 or 14, wherein the detection of analytes from the same CE is performed simultaneously.

17. A composition comprising a plurality of compartments, each compartment containing a plurality of CEs,

Each CE contains an analyte, which includes a first reporter portion and a second reporter portion.

Essentially all CE analytes contain the first report portion and the second report portion in a unique combination, unique location, or combination thereof.

18. The composition of claim 17, wherein each CE comprises the first reporter portion and the second reporter portion in a unique combination, a unique location, or a combination thereof.

19. The composition of claim 17, wherein the compartment comprises a three-dimensional solid support.

20. The composition of claim 19, wherein the three-dimensional solid support comprises a porous plate.

21. The composition of claim 17, wherein the CE is a cell, an embedded cell, or a retained cell.

22. The composition of any one of claims 17-21, wherein the analyte comprises nucleic acid.

23. The composition of claim 22, wherein the analyte comprises DNA, RNA or cDNA.

24. The composition of claim 23, wherein the DNA comprises genomic DNA.

25. The composition of any one of claims 17-21, wherein the first reporter portion comprises a nucleic acid barcode.

26. The composition of any one of claims 17-21, wherein the second reporter portion comprises a nucleic acid barcode.

27. The composition of any one of claims 17-21, wherein the plurality of CEs comprises at least 100,000 CEs.

28. The composition of any one of claims 17-21, wherein the plurality of CEs comprises at least 1,000,000 CEs.

29. The composition of any one of claims 17-21, wherein the CE comprises an additional set of reporter portions.

Each CE's analyte comprises, in a unique combination, in a unique location, or in a combination thereof, the first report portion, the second report portion, and an additional set of the report portions.