HK1211310B

HK1211310B - Msp nanopores and related methods

Info

Publication number: HK1211310B
Application number: HK15112202.2A
Authority: HK
Inventors: J‧H‧贡德拉赫; M‧尼德韦斯; T‧Z‧巴特勒; M‧帕夫伦科; M‧A‧特罗尔; S‧斯库马兰
Original assignee: 华盛顿大学; Uab研究基金会
Priority date: 2008-09-22
Filing date: 2015-12-10
Publication date: 2021-12-03

Description

MSP nanopores and related methods

本申请是申请日为2009年9月22日、发明名称为“MSP纳米微孔和相关方法”的中国发明专利申请200980142855.5的分案申请。This application is a divisional application of Chinese invention patent application No. 200980142855.5, filed on September 22, 2009, with the invention name “MSP Nanopores and Related Methods”.

交叉参考相关申请CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2008年9月22日提交的美国临时申请系列案61/098,938的权益，所述临时申请系列案以其全文通过引用合并入本文。This application claims the benefit of U.S. Provisional Application Serial No. 61/098,938, filed September 22, 2008, which is incorporated herein by reference in its entirety.

政府许可权利的声明Statement of Government Licensing Rights

本发明是利用在由国立卫生研究院授予的基金号5R21HG004145下的政府资助进行的。政府在本发明中具有某些权利。This invention was made with government support under Grant No. 5R21HG004145 awarded by the National Institutes of Health. The government has certain rights in this invention.

背景background

已建立的DNA测序技术需要大量DNA和若干冗长的步骤来仅构建全长序列的数十个碱基。然后必须以“鸟枪法”方式(非线性地依赖于基因组的大小和依赖于从其构建全长基因组的片段的长度的工作)装配该信息。这些步骤很昂贵且费时，尤其当测定哺乳动物基因组的序列时。Established DNA sequencing techniques require large amounts of DNA and several lengthy steps to construct just a few dozen bases of a full-length sequence. This information must then be assembled in a "shotgun" fashion (a process that is nonlinearly dependent on the size of the genome and the length of the fragments from which the full-length genome is constructed). These steps are expensive and time-consuming, especially when sequencing mammalian genomes.

概述Overview

本文中提供了包括对具有界定通道(tunnel)的前厅(vestibule)和缢缩区(constriction zone)的耻垢分枝杆菌(Mycobacterium smegmatis)孔蛋白(Msp)孔蛋白施加电场的方法，其中Msp孔蛋白位于第一导电液体介质与第二导电液体介质之间。Provided herein are methods comprising applying an electric field to a Mycobacterium smegmatis porin (Msp) porin having a vestibule and a constriction zone defining a tunnel, wherein the Msp porin is positioned between a first conductive liquid medium and a second conductive liquid medium.

还提供了改进通过Msp孔蛋白的通道(tunnel)的电导的方法，包括除去、添加或置换野生型Msp孔蛋白的前厅或缢缩区中的至少一个氨基酸。Also provided are methods for improving conductance through the tunnel of an Msp porin comprising removing, adding, or substituting at least one amino acid in the vestibule or constriction region of a wild-type Msp porin.

还提供了包括具有界定通道的前厅和缢缩区的Msp孔蛋白的系统，其中所述通道位于第一液体介质与第二液体介质之间，其中至少一种液体介质包含分析物，以及其中系统对于检测分析物的性质是有效的。Also provided is a system comprising an Msp porin having a vestibule and a constriction region defining a channel, wherein the channel is located between a first liquid medium and a second liquid medium, wherein at least one of the liquid media comprises an analyte, and wherein the system is effective for detecting a property of the analyte.

还提供了包括具有界定通道的前厅和缢缩区的Msp孔蛋白的系统，其中所述通道位于第一液体介质与第二液体介质之间的脂双层中，并且其中第一与第二液体介质之间的液体连通的唯一点存在于通道中。Also provided is a system comprising an Msp porin having a vestibule and a constriction region defining a channel, wherein the channel is located in a lipid bilayer between a first liquid medium and a second liquid medium, and wherein the only point of liquid communication between the first and second liquid media exists in the channel.

还提供了突变的Msp孔蛋白。例如，提供了突变的耻垢分枝杆菌孔蛋白A(MspA)，其包含界定通道的前厅和缢缩区，以及至少第一突变MspA单体，所述单体包含位置93上的突变和位置90、位置91或位置90及91上的突变。还提供了突变的MspA孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道，以及还包含至少第一突变MspA旁系同源物或(paralog)同系物(homolog)单体。还提供了突变MspA旁系同源物或同系物，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道。Also provided are mutant Msp porins. For example, mutant Mycobacterium smegmatis porin A (MspA) is provided, comprising a vestibule and a constriction defining a channel, and at least a first mutant MspA monomer comprising a mutation at position 93 and a mutation at position 90, position 91, or positions 90 and 91. Also provided are mutant MspA porins comprising a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the vestibule and constriction together define a channel, and further comprising at least a first mutant MspA paralog or homolog monomer. Also provided are mutant MspA paralogs or homologs comprising a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the vestibule and constriction together define a channel.

描述了产生突变的Msp孔蛋白的方法。例如，本文中提供了产生突变的MspA孔蛋白的方法，包括在位置93上和位置90、位置91或位置90及91上修饰野生型MspA单体。还提供了产生具有界定了通道的前厅和缢缩区的突变MspA孔蛋白的方法，包括在野生型MspA旁系同源物或同系物单体的前厅或缢缩区中缺失、添加或置换任何氨基酸以便所得的突变MspA孔蛋白能够在施加电场后将分析物转位通过通道。Methods for generating mutant Msp porins are described. For example, provided herein are methods for generating mutant MspA porins comprising modifying a wild-type MspA monomer at position 93 and at position 90, position 91, or positions 90 and 91. Also provided are methods for generating mutant MspA porins having a vestibule and a constriction region defining a channel, comprising deleting, adding, or substituting any amino acid in the vestibule or constriction region of a wild-type MspA paralog or homolog monomer such that the resulting mutant MspA porin is capable of translocating an analyte through the channel upon application of an electric field.

还提供了一种方法，所述方法包括在不使用电场的情况下使分析物转位通过耻垢分枝杆菌孔蛋白(Msp)孔蛋白的通道。Also provided is a method comprising translocating an analyte through a channel of a Mycobacterium smegmatis porin (Msp) porin without the use of an electric field.

本文中提供了核酸序列。任选地，核酸序列可包含第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列。该核酸序列还可包含编码氨基酸连接体序列的第三核苷酸序列。任选地，该核酸序列还包含编码第三或更多Msp单体序列的第三或更多核苷酸序列。例如，所述核酸序列还可包含第3、第4、第5、第6、第7和第8核苷酸序列。所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列，并且该核酸序列还包含编码氨基酸连接体序列的第9核苷酸序列。还提供了包含两个或更多个单链Msp的Msp孔蛋白。Nucleic acid sequences are provided herein. Optionally, the nucleic acid sequence may comprise a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes a first Msp monomer sequence and the second nucleotide sequence encodes a second Msp monomer sequence. The nucleic acid sequence may also comprise a third nucleotide sequence encoding an amino acid linker sequence. Optionally, the nucleic acid sequence further comprises a third or more nucleotide sequence encoding a third or more Msp monomer sequence. For example, the nucleic acid sequence may further comprise a 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequence. The 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences encode the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th Msp monomer sequence, and the nucleic acid sequence further comprises a 9th nucleotide sequence encoding an amino acid linker sequence. Msp porins comprising two or more single-chain Msp are also provided.

还提供了由本文中所述的核酸编码的多肽。还提供了包含本文中描述的多肽的载体。还提供了用本文中描述的任何载体转染的培养细胞或其后代，其中所述细胞能够表达Msp孔蛋白或Msp孔蛋白单体。还提供了包含本文中描述的任何载体的耻垢分枝杆菌菌株。Also provided are polypeptides encoded by the nucleic acids described herein. Also provided are vectors comprising the polypeptides described herein. Also provided are cultured cells or progeny thereof transfected with any of the vectors described herein, wherein the cells are capable of expressing an Msp porin or an Msp porin monomer. Also provided are Mycobacterium smegmatis strains comprising any of the vectors described herein.

还提供了能够诱导Msp单体表达的突变的细菌菌株，所述细菌菌株包含：(a)野生型MspA的缺失；(b)野生型MspC的缺失；(c)野生型MspD的缺失；和(d)包含有效地连接于Msp单体核酸序列的诱导型启动子的载体。Also provided are mutant bacterial strains capable of inducible expression of Msp monomers, comprising: (a) a deletion of wild-type MspA; (b) a deletion of wild-type MspC; (c) a deletion of wild-type MspD; and (d) a vector comprising an inducible promoter operably linked to an Msp monomer nucleic acid sequence.

还提供了产生单链Msp孔蛋白的方法，所述方法包括：(a)用包含能够编码单链Msp孔蛋白的核酸序列的载体转化突变的细菌菌株；和任选地(b)从所述细菌纯化单链Msp孔蛋白。该突变的菌株可包含野生型MspA、野生型MspB、野生型MspC和野生型MspD的缺失，以及包含有效地连接于Msp核酸序列的诱导型启动子的载体。可用包含能够编码单链Msp孔蛋白的核酸序列的载体转化该突变的菌株。Also provided is a method for producing a single-chain Msp porin, comprising: (a) transforming a mutant bacterial strain with a vector comprising a nucleic acid sequence capable of encoding a single-chain Msp porin; and optionally (b) purifying the single-chain Msp porin from the bacteria. The mutant strain may comprise a deletion of wild-type MspA, wild-type MspB, wild-type MspC, and wild-type MspD, and a vector comprising an inducible promoter operably linked to the Msp nucleic acid sequence. The mutant strain may be transformed with a vector comprising a nucleic acid sequence capable of encoding a single-chain Msp porin.

还提供了使用Mps孔蛋白例如单链Msp孔蛋白的方法。例如，所述方法可包括产生具有第一侧面和第二侧面的脂双层，向脂双层的第一侧面加入Msp孔蛋白例如纯化的单链Msp孔蛋白，对双分子层的第二侧面施加正电位，使实验性核酸序列或多肽序列转位通过Msp孔蛋白，测量使序列转位通过Msp孔蛋白的阻塞电流，以及将该实验性阻塞电流与阻塞电流标准相比较，确定该实验性序列。Also provided are methods for using Mps porins, such as single-chain Msp porins. For example, the methods can include generating a lipid bilayer having a first side and a second side, adding an Msp porin, such as a purified single-chain Msp porin, to the first side of the lipid bilayer, applying a positive potential to the second side of the bilayer, translocating an experimental nucleic acid sequence or polypeptide sequence through the Msp porin, measuring a blocking current resulting from translocation of the sequence through the Msp porin, and comparing the experimental blocking current to a blocking current standard to identify the experimental sequence.

附图概述Summary of the Figures

上述方面和许多附带的有利方面将变得更容易理解，因为当与下列附图结合，参考下列详述，上述方面和许多附带的有利方面可得到更好地理解。The above-described aspects and many of the attendant advantages will become more readily understood as they become better appreciated with reference to the following detailed description when taken in conjunction with the following drawings.

图1显示野生型MspA(WTMspA)孔蛋白的结构和电荷分布。在pH 8时，预期酸性残基主要带负电荷并且碱性残基主要带正电荷。突变的位置和残基由箭头和标记指示。参见Faller等人，Science，303:1189(2004)。Figure 1 shows the structure and charge distribution of the wild-type MspA (WTMspA) porin. At pH 8, acidic residues are expected to be predominantly negatively charged and basic residues to be predominantly positively charged. The positions and residues of the mutations are indicated by arrows and labels. See Faller et al., Science, 303:1189 (2004).

图2显示WTMspA、突变体D90N/D91N/D93N(M1MspA，也称为M1-NNN)和突变体D90N/D91N/D93N/D118R/E139K/D134R(M2MspA，也称为M2-NNN)孔蛋白的通道形成活性和单通道电导的测定结果。左图显示当MspA孔蛋白存在于水浴双分子层的溶液(1M KCl，20℃)中时，双分子层的电导随时间的变化。电导的逐步增加被解释为MspA孔蛋白至双分子层内的插入。右边是这些电导步长(conductance step)的大小的直方图。WTMspA、M1MspA和M2MspA孔蛋白的直方图分别概括了来自3个重复实验的40个插入、来自3个重复实验的144个插入和来自5个重复实验的169个插入。Figure 2 shows the results of channel formation activity and single-channel conductance measurements of WTMspA, mutant D90N/D91N/D93N (M1MspA, also known as M1-NNN), and mutant D90N/D91N/D93N/D118R/E139K/D134R (M2MspA, also known as M2-NNN) porins. The left panel shows the change in bilayer conductance over time when the MspA porin is present in a solution (1 M KCl, 20°C) in a water bath bilayer. The gradual increase in conductance is interpreted as insertion of the MspA porin into the bilayer. The right panel is a histogram of the size of these conductance steps. The histograms for the WTMspA, M1MspA, and M2MspA porins summarize 40 insertions from three replicates, 144 insertions from three replicates, and 169 insertions from five replicates, respectively.

图3A和3B显示了WTMspA孔蛋白的自发阻塞行为。图3A是实验的示意图。图3B显示在DNA不存在的情况下在60mV(左)和100mV(右)下对于WTMspA孔蛋白观察到的代表性离子电流信号。负电流的间隔时间相应于施加的电压的反转，这通常是重建未被阻塞的离子电流水平所需的。Figures 3A and 3B show the spontaneous blocking behavior of the WTMspA porin. Figure 3A is a schematic diagram of the experiment. Figure 3B shows representative ion current signals observed for the WTMspA porin at 60 mV (left) and 100 mV (right) in the absence of DNA. The intervals of negative current correspond to the reversal of the applied voltage, which is generally required to reestablish unblocked ion current levels.

图4显示电泳凝胶中突变的MspA单体的表达。向各泳道中加入粗制提取物(13μL)。用考马斯蓝染色凝胶。泳道1:蛋白质分子量标准；泳道2:WTMspA；泳道3:无MspA；泳道4:突变体M1MspA；泳道5:突变体D90N/D91N/D93N/D118R；泳道6:突变体D90N/D91N/D93N/D118R/E139R；泳道7:突变体D90N/D91N/D93N/D118R/E139K；泳道8:突变体M2MspA。构建、提取和测定泳道5至7中的突变体以确保对于每一个连续氨基酸置换保持表达和通道形成活性。凝胶上方的简图示意性显示在该实验中突变的氨基酸的大致位置和极性。Figure 4 shows the expression of mutant MspA monomers in the electrophoresis gel. Crude extract (13 μL) was added to each lane. The gel was stained with Coomassie blue. Lane 1: protein molecular weight standard; Lane 2: WTMspA; Lane 3: no MspA; Lane 4: mutant M1MspA; Lane 5: mutant D90N/D91N/D93N/D118R; Lane 6: mutant D90N/D91N/D93N/D118R/E139R; Lane 7: mutant D90N/D91N/D93N/D118R/E139K; Lane 8: mutant M2MspA. The mutants in lanes 5 to 7 were constructed, extracted, and measured to ensure that expression and channel formation activity were maintained for each consecutive amino acid replacement. The schematic diagram above the gel schematically shows the approximate position and polarity of the amino acids mutated in this experiment.

图5A-5C显示利用M1MspA孔蛋白进行的ssDNA发夹构建体的检测。图5A是实验的示意图。图5B显示在180和140mV下在DNA不存在和在8μM hp08(SEQ ID NO:4)发夹DNA存在的情况下对于M1MspA孔蛋白观察到的代表性离子电流信号。图5C显示在放大的时标(expanded time scale)上来自图5B中的描记线的已编号的阻塞。Figures 5A-5C show detection of ssDNA hairpin constructs using the M1MspA porin. Figure 5A is a schematic diagram of the experiment. Figure 5B shows representative ion current signals observed for the M1MspA porin at 180 and 140 mV in the absence of DNA and in the presence of 8 μM hp08 (SEQ ID NO: 4) hairpin DNA. Figure 5C shows the numbered blockades from the traces in Figure 5B on an expanded time scale.

图6显示M1MspA孔蛋白中来自发夹构建体的深度阻塞的特征。每一个点的坐标给出了1个深度阻塞的持续时间和平均电流。分别在140和180mV获得黑色和灰色数据。对于每一个数据集，标出深度阻塞驻留时间tD的log10模式。右边的简图显示每一个发夹构建体的序列：hp08(5′GCTGTTGC TCTCTC GCAACAGC A₅₀3′)(SEQ ID NO:4)、hp10(5′GCTCTGTTGCTCTCTC GCAACAGAGC A₅₀3′)(SEQ ID NO:5)和hp12(5′GCTGTCTGTTGC TCTCTC GCAACAGACAGCA₅₀-3′)(SEQ ID NO:6)。Figure 6 shows the characteristics of deep blockage from hairpin constructs in the M1MspA porin. The coordinates of each point give the duration and average current of one deep blockage. Black and gray data were acquired at 140 and 180 mV, respectively. For each data set, the log10 pattern of the deep blockage dwell time tD is plotted. The diagram on the right shows the sequence of each hairpin construct: hp08 (5' GCTGTTGC TCTCTC GCAACAGC A ₅₀ 3') (SEQ ID NO: 4), hp10 (5' GCTCTGTTGC TCTCTC GCAACAGAGC A ₅₀ 3') (SEQ ID NO: 5), and hp12 (5' GCTGTCTGTTGC TCTCTC GCAACAGACAGC A ₅₀ -3') (SEQ ID NO: 6).

图7是显示M1MspA孔蛋白中针对hp08(SEQ ID NO:4)的部分阻塞驻留时间分布的图。分布十分拟合单指数模式。在180mV下的部分阻塞具有约比在140mV上长3倍的时间常数。Figure 7 is a graph showing the distribution of partial blockade dwell times for hp08 (SEQ ID NO: 4) in the MlMspA porin. The distribution fits a single exponential model very well. The partial blockade at 180 mV has a time constant approximately 3 times longer than at 140 mV.

图8提供了M1MspA孔蛋白中发夹构建体深度阻塞的驻留时间分布的详细外观。左边的图框显示具有驻留时间(x)的log10的概率分布的对数二进制图(阶梯图)和相应核平滑密度估计的驻留时间直方图。将这些平滑的密度估计值的最大值tD用于参数化驻留时间分布。垂直线显示tD值。右边的图框显示来源于驻留时间数据(实线)和单衰减指数(singledecaying exponential)的存活概率曲线，时间常数设置为每一个数据组的tD值(虚线)。数据明显偏离单指数变动形态(simple exponential behavior)。然而，进行tD值与用于其他观察的指数时间常数之间的定性比较是合理的(Kasianowicz等人，Proc.NatlAcad.Sci.USA，93:13770(1996))，因为这两个参数均反映了驻留时间分布的相似方面。Fig. 8 provides the detailed appearance of the residence time distribution of hairpin construct deep obstruction in M1MspA porin. The left frame shows the logarithmic binary plot (staircase plot) of the probability distribution with log10 of residence time (x) and the residence time histogram of the corresponding kernel smooth density estimate. The maximum value tD of these smooth density estimates is used to parameterize the residence time distribution. The vertical line shows the tD value. The right frame shows the survival probability curve derived from the residence time data (solid line) and the single decaying exponential (single decaying exponential), and the time constant is set to the tD value (dashed line) of each data group. The data obviously deviate from the single exponential variation morphology (simple exponential behavior). However, it is reasonable to perform a qualitative comparison between the tD value and the exponential time constant for other observations (Kasianowicz et al., Proc. Natl Acad. Sci. USA, 93:13770 (1996)) because these two parameters reflect the similar aspects of the residence time distribution.

图9A-9G显示获自跨双层探针实验(transbilayer probe experiment)的数据。图9A显示分子构型的动画(animation):(1)未被阻塞的孔；(2)具有阻止nA–ssDNA复合物转位的neutravidin(nA)的丝状ssDNA；(3)与nA–ssDNA杂交的靶DNA在负电压下分离；和(4)nA–ssDNA复合物在某一电压下(依赖于靶DNA的杂交)从孔排出。图9B是外施电压的时间序列。电流阻塞在约200ms的延迟后触发从180mV捕捉电压至40mV的维持电压的变化。维持电压维持5秒以允许杂交，然后向负电压下降。图9C和9D各自显示表明nA–ssDNA分别在负和正电压下排出的电流时间序列。由于瞬间电压改变和在大的负电压下自发的孔关闭，大电流尖脉冲产生。图9E–9G是排出电压(exit voltage)(Vexit)直方图。图9E显示一个实验，在该实验中，探针5'-C₆A₅₄-CTCTATTCTTATCTC-3'(SEQ ID NO:7)与靶ssDNA分子5'-GAGATAAGAATAGAG-3'(SEQ ID NO:9)互补。图9F显示与图9E中相同的孔，但其使用不与靶DNA互补的探针5'-C₆A₅₄-CACACACACACACAC-3'(SEQ ID NO:8)。图9G显示来自使用与图9E中的探针相同的探针(SEQ ID NO:7)但在反面区室(trans compartment)中不存在靶DNA的单独的对照的结果。只在图9E中观察到大量负Vexit事件，其中探针(SEQ ID NO:7)与靶互补。图9F和9G中负Vexit事件的极少发生排除了图9E中的大部分负Vexit是由非特异性探针-靶结合或由探针对孔的结合所引起这种可能性。Figures 9A-9G show data obtained from a transbilayer probe experiment. Figure 9A shows an animation of the molecular configurations: (1) an unblocked pore; (2) filamentous ssDNA with neutravidin (nA) blocking translocation of the nA–ssDNA complex; (3) target DNA hybridized to the nA–ssDNA dissociated under a negative voltage; and (4) the nA–ssDNA complex ejected from the pore at a voltage dependent on target DNA hybridization. Figure 9B is a time series of applied voltages. Current blockage triggered a change from a capture voltage of 180 mV to a holding voltage of 40 mV after a delay of approximately 200 ms. The holding voltage was maintained for 5 seconds to allow hybridization and then decreased toward a negative voltage. Figures 9C and 9D show current time series indicating ejection of nA–ssDNA under negative and positive voltages, respectively. A large current spike is generated due to the transient voltage change and spontaneous pore closure at a large negative voltage. Figures 9E-9G are histograms of exit voltage (Vexit). Figure 9E shows an experiment in which the probe 5'- _C6A54 _- CTCTATTCTTATCTC-3' (SEQ ID NO:7) was complementary to the target ssDNA molecule 5'-GAGATAAGAATAGAG-3' (SEQ ID NO:9). Figure 9F shows the same well as in Figure 9E, but using the probe 5'- _C6A54 _- CACACACACACACAC-3' (SEQ ID NO:8), which is not complementary to the target DNA. Figure 9G shows results from a separate control using the same probe (SEQ ID NO:7) as in Figure 9E, but without target DNA in the trans compartment. Significant negative Vexit events were observed only in Figure 9E, where the probe (SEQ ID NO:7) was complementary to the target. The rare occurrence of negative Vexit events in Figures 9F and 9G excludes the possibility that most of the negative Vexit in Figure 9E is caused by nonspecific probe-target binding or by binding of the probe to the well.

图10A-10C比较了M1MspA与M2MspA孔蛋白的dT50(SEQ ID NO:32)均聚物阻塞。图10A是实验的示意图。图10B显示了对于利用8μM dT50(左)的M1MspA孔蛋白和利用2μM dT50(右)的M2MspA孔蛋白观察到的代表性离子电流信号。图10C显示在放大的时标上来自图10B中的描记线的已编号的阻塞。Figures 10A-10C compare dT50 (SEQ ID NO: 32) homopolymer blockade of M1MspA and M2MspA porins. Figure 10A is a schematic diagram of the experiment. Figure 10B shows representative ion current signals observed for M1MspA porins using 8 μM dT50 (left) and M2MspA porins using 2 μM dT50 (right). Figure 10C shows the numbered blockades from the traces in Figure 10B on an enlarged time scale.

图11显示M2MspA孔蛋白中dT50(SEQ ID NO:32)阻塞的统计学特征。在阻塞开始和结束时平均结构的比较。通过重叠事件开始时(左)和事件结束时(右)对齐的数据文件中的事件来产生图。显示了阻塞以离子电流的短暂向下偏转终止的趋势以及该趋势随电压升高而增加。Figure 11 shows the statistical characteristics of dT50 (SEQ ID NO: 32) blockade in the M2MspA porin. Comparison of average structures at the beginning and end of blockade. The graph was generated by overlaying events from the aligned data files at the beginning of the event (left) and the end of the event (right). The trend of blockade ending with a brief downward deflection of ionic current and the increase in this trend with increasing voltage are shown.

图12A显示被DNA构建体阻塞的M1MspA孔蛋白中阻塞电流水平的直方图。DNA构建体从上至下为：3'-A₄₇AAC-hp-5'(SEQ ID NO:14)；3'-A₄₇ACA-hp-5'(SEQ ID NO:33)；3'-A₄₇CAA-hp-5'(SEQ ID NO:13)；3'-C₅₀-hp-5'(SEQ ID NO:16)；3'-A₅₀-hp-5'(SEQ ID NO:10)。图12B显示针对poly-C(＝1.0)与poly-A(＝0.0)水平之间的差异标度的电流水平对单个C的位置的曲线。高斯拟合表明对于单个C的识别位置距离发夹末端1.7±0.8个核苷酸(nt)。Figure 12A shows a histogram of blockade current levels in the M1MspA porin blocked by DNA constructs. The DNA constructs are, from top to bottom: 3'-A ₄₇ AAC-hp-5' (SEQ ID NO: 14); 3'-A ₄₇ ACA-hp-5' (SEQ ID NO: 33); 3'-A ₄₇ CAA-hp-5' (SEQ ID NO: 13); 3'-C ₅₀ -hp-5' (SEQ ID NO: 16); 3'-A ₅₀ -hp-5' (SEQ ID NO: 10). Figure 12B shows a plot of current levels versus the position of a single C, scaled for the difference between poly-C (= 1.0) and poly-A (= 0.0) levels. Gaussian fit indicates that the recognition position for a single C is 1.7 ± 0.8 nucleotides (nt) from the end of the hairpin.

图13显示许多阻塞M1-NNN MspA(也称为M1MspA)孔蛋白的DNA的电流直方图。DNA构建体从上到下为：3'-C₅₀-hp-5'(SEQ ID NO:16)；3'-A₅₀-hp-5'(SEQ ID NO:10)；3'-T₄₇TTT-hp-5'(SEQ ID NO:17)；3'-A₄₇AAT-hp-5'(SEQ ID NO:34)；3'-A₄₇ATA-hp-5'(SEQ IDNO:35)；3'-A₄₇TAA-hp-5'(SEQ ID NO:36)；3'-C₄₇CCA-hp-5'(SEQ ID NO:37)；3'-C₄₇CAC-hp-5'(SEQ ID NO:38)；3'-C₄₇ACC-hp-5'(SEQ ID NO:39)。每一个构建体或混合物示于左边。每一个直方图中事件的数量示于右边。上图:"校准混合物"(poly-A-hp和poly-C-hp)。图2-5:Poly-T-hp和poly-A背景中的单个T碱基。下方三个图:poly-A背景中的单个A碱基。Poly-A-hp包含在用于参照的混合物中(小峰值在19.5％上)。全部数据采用180mV。Figure 13 shows a current histogram of a number of DNA blocking M1-NNN MspA (also known as M1MspA) porins. The DNA constructs are, from top to bottom: 3'- _C50 -hp-5' (SEQ ID NO: 16); 3'- _A50 -hp-5' (SEQ ID NO: 10); 3'- _T47TTT -hp-5' (SEQ ID NO: 17); 3'- _A47AAT -hp-5' (SEQ ID NO: 34); 3'- _A47ATA -hp-5' (SEQ ID NO: 35); 3'- _A47TAA -hp-5' (SEQ ID NO: 36); 3'- _C47CCA -hp-5' (SEQ ID NO: 37); 3'- _C47CAC -hp-5' (SEQ ID NO: 38); 3'- _C47ACC -hp-5' (SEQ ID NO: 39). Each construct or mixture is shown on the left. The number of events in each histogram is shown on the right. Top panel: "Calibration mixture" (poly-A-hp and poly-C-hp). Figures 2-5: Single T bases in poly-T-hp and poly-A background. Bottom three panels: Single A bases in poly-A background. Poly-A-hp is included in the reference mixture (small peak at 19.5%). All data were acquired at 180 mV.

图14显示DNA尾不影响识别性质。图例是关于图13的。两个非均质尾('ran1'(SEQID NO:51)、'ran2'(SEQ ID NO:52)，各自47个碱基)连接至三核苷酸和发夹。中间的图显示当将A50-hp DNA(SEQ ID NO:10)与ran1-C3-hp DNA的混合物用于孔时产生的电流直方图，其为其他图的参照点。电流水平与A50或C50尾的电流水平相同。全部数据采用180mV。Figure 14 shows that the DNA tails do not affect the recognition properties. The legend is related to Figure 13. Two heterogeneous tails ('ran1' (SEQ ID NO:51) and 'ran2' (SEQ ID NO:52), each 47 bases) are attached to the trinucleotide and hairpin. The middle figure shows the current histogram generated when a mixture of A50-hp DNA (SEQ ID NO:10) and ran1-C3-hp DNA is applied to the pore, which serves as a reference point for the other figures. The current levels are the same as those of the A50 or C50 tails. All data are at 180mV.

图15A和15B显示M2-QQN孔蛋白(另一种突变的MspA孔蛋白)的表征数据。图15A显示了该突变体的表达水平。全部蛋白质在ML16耻垢分枝杆菌中表达。将10μl 0.5％辛基聚氧乙烯(octylpolyoxyethylene)粗制提取物加入每一个孔。泳道1:WTMspA；泳道2:背景(pMS2，空载体)；泳道3:M2-QQN(pML866)。图15B显示在1M KCl中记录的二植烷酰磷脂酰胆碱(diphytanoylphosphatidylcholine)脂双层中的M2-QQN孔蛋白的电流描记线。将大约70pg的蛋白质加入至双层小室。在脂双层实验中分析4个膜的约100个孔。M2-QQN孔蛋白的主电导(main conductance)为2.4纳秒(nS)。Figures 15A and 15B show characterization data for the M2-QQN porin, another mutant MspA porin. Figure 15A shows the expression levels of this mutant. All proteins were expressed in ML16 Mycobacterium smegmatis strains. 10 μl of a 0.5% octylpolyoxyethylene crude extract was added to each well. Lane 1: WTMspA; Lane 2: Background (pMS2, empty vector); Lane 3: M2-QQN (pML866). Figure 15B shows the current traces of the M2-QQN porin in a diphytanoylphosphatidylcholine lipid bilayer recorded in 1 M KCl. Approximately 70 pg of protein was added to the bilayer chamber. Approximately 100 pores from four membranes were analyzed in the lipid bilayer experiment. The main conductance of the M2-QQN porin was 2.4 nanoseconds (nS).

图16显示关于暴露于hp-T50(SEQ ID NO:17)、hp-C50(SEQ ID NO:16)和hp-A50(SEQ ID NO:10)的发夹DNA混合物的3种不同突变MspA孔蛋白的阻塞电流直方图。在每一种情况下，对于每一个突变体，将电流针对右边显示的开放态电流(open state current)进行标准化。将hp-C50和hp-A50作为混合物使用，而T50单独使用。Figure 16 shows blockade current histograms for three different mutant MspA porins exposed to a mixture of hairpin DNAs: hp-T50 (SEQ ID NO: 17), hp-C50 (SEQ ID NO: 16), and hp-A50 (SEQ ID NO: 10). In each case, the current was normalized to the open state current shown on the right for each mutant. hp-C50 and hp-A50 were used as a mixture, while T50 was used alone.

图17是显示两种突变MspA孔蛋白的深度电流阻塞的存在概率。显示了比t持续更长的事件的概率。圆圈表示M2-QQN孔蛋白，十字形表示M2-NNN孔蛋白。双层两侧施加的电压是100、120和140mV。将数据针对每一个记录中的事件总数标准化。FIG17 shows the probability of deep current blockade for two mutant MspA porins. The probability of an event lasting longer than t is shown. Circles represent the M2-QQN porin, and crosses represent the M2-NNN porin. The voltages applied across the bilayer were 100, 120, and 140 mV. Data were normalized to the total number of events in each recording.

图18显示耻垢分枝杆菌的MspA、MspB、MspC和MspD单体的比对。开放阅读框架的第一ATG或GTG密码子被当作假定的起始密码子。蛋白质的编号始于成熟部分的第一个氨基酸。MspA单体氨基酸序列是SEQ ID NO:28，MspB单体氨基酸是SEQ ID NO:29，MspC单体氨基酸是SEQ ID NO:30，以及MspD单体氨基酸序列是SEQ ID NO:31。Figure 18 shows an alignment of the MspA, MspB, MspC, and MspD monomers of Mycobacterium smegmatis. The first ATG or GTG codon of the open reading frame is considered the presumed start codon. Protein numbering begins with the first amino acid in the mature portion. The amino acid sequence of the MspA monomer is SEQ ID NO:28, the amino acid sequence of the MspB monomer is SEQ ID NO:29, the amino acid sequence of the MspC monomer is SEQ ID NO:30, and the amino acid sequence of the MspD monomer is SEQ ID NO:31.

图19是显示耻垢分枝杆菌孔蛋白-四重突变体(quadruple mutant)ML59中每一个孔蛋白基因的缺失的凝胶图像。FIG. 19 is an image of a gel showing the deletion of each porin gene in the M. smegmatis porin-quadruple mutant ML59.

图20显示显示Msp孔蛋白在耻垢分枝杆菌中表达和耻垢分枝杆菌孔蛋白突变体的Western印迹。泳道1是1:10稀释的野生型耻垢分枝杆菌的蛋白质提取物，泳道2是突变体MN01(△mspA)，泳道3是突变体ML10(△mspAC)，泳道4是突变体ML16(△mspACD)，以及泳道5是突变体ML180(△mspABCD)。Figure 20 shows a Western blot showing expression of the Msp porin in M. smegmatis and M. smegmatis porin mutants. Lane 1 is a 1:10 dilution of a protein extract of wild-type M. smegmatis, lane 2 is mutant MN01 (ΔmspA), lane 3 is mutant ML10 (ΔmspAC), lane 4 is mutant ML16 (ΔmspACD), and lane 5 is mutant ML180 (ΔmspABCD).

图21A和21B显示用于四重孔蛋白突变体的构建的质粒图谱。Hyg:潮霉素抗性基因；ColE1:大肠杆菌(E.coli)复制起始点。图21A是用于MspA表达的整合质粒图谱。AmiC、A、D、S是MspA的乙酰胺诱导型表达所必需的。attP:噬菌体L5的染色体附着位点；int:L5整合酶；FRT:Flp重组酶位点。图21B是MspB缺失载体的质粒图谱。MspBup、MspBdown:MspB的上游和下游区域；loxP:Cre重组位点；SacB:蔗糖6-果糖基转移酶；XylE:儿茶酚-2,3-双加氧酶；Gfp2+:绿色荧光蛋白；tsPAL5000:分枝杆菌的温度敏感型复制起点。Figures 21A and 21B show plasmid maps for the construction of quadruple porin mutants. Hyg: hygromycin resistance gene; ColE1: E. coli replication origin. Figure 21A is an integration plasmid map for MspA expression. AmiC, A, D, S are required for acetamide-inducible expression of MspA. attP: chromosomal attachment site of bacteriophage L5; int: L5 integrase; FRT: Flp recombinase site. Figure 21B is a plasmid map of the MspB deletion vector. MspBup, MspBdown: upstream and downstream regions of MspB; loxP: Cre recombination site; SacB: sucrose 6-fructosyltransferase; XylE: catechol-2,3-dioxygenase; Gfp2+: green fluorescent protein; tsPAL5000: temperature-sensitive replication origin of mycobacteria.

图22是显示耻垢分枝杆菌中MspA单体的诱导型表达的考马斯蓝染色凝胶的图像。FIG22 is an image of a Coomassie blue-stained gel showing inducible expression of MspA monomers in M. smegmatis.

图23是显示Msp四重突变体ML705在Middlebrook 7H10琼脂板上生长的图像。FIG. 23 is an image showing growth of the Msp quadruple mutant ML705 on Middlebrook 7H10 agar plates.

图24是显示ML705在丰富液体培养中的生长率的照片。FIG24 is a photograph showing the growth rate of ML705 in rich liquid culture.

图25是显示在用乙酰胺诱导后MspA单体在四重突变体ML705中的表达的Western印迹图像。泳道1是野生型耻垢分枝杆菌，泳道2是利用乙酰胺的四重突变菌株ML705，泳道3是不使用乙酰胺的四重msp突变菌株ML705，以及泳道4是三重突变菌株ML16。使用针对MspA的多克隆抗体检测蛋白质。Figure 25 is a Western blot image showing the expression of MspA monomers in the quadruple mutant ML705 after induction with acetamide. Lane 1 is wild-type Mycobacterium smegmatis, lane 2 is the quadruple mutant strain ML705 using acetamide, lane 3 is the quadruple msp mutant strain ML705 without acetamide, and lane 4 is the triple mutant strain ML16. Proteins were detected using a polyclonal antibody against MspA.

图26A-26D显示单链MspA纳米孔二聚体的结构和通道活性。图26A是单链纳米孔MspA二聚体的分子模型的图像。图26B显示单链MspA纳米孔二聚体(scMspA)基因构建体的示意图。氨基酸连接体区域(GGGGS)₃(SEQ ID NO:3)被放大。还显示了氨基酸连接体的DNA序列(5'-GGCGGTGGCGGTAGCGGCGGTGGCGGTAGCGGCGGT GGCGGTAGC-3')(SEQ ID NO:19)。图26C是显示scMspA纳米孔二聚体在耻垢分枝杆菌中表达的Western印迹的图像。泳道1是分子量标准(M)，泳道2是野生型耻垢分枝杆菌(WT Msmeg)，泳道3是无scMspA基因构建体(ML16)的ML16菌株，泳道4是具有野生型MspA基因构建体(WTMspA)的ML16菌株，泳道5是具有scMspA纳米孔二聚体基因构建体(scMspA)的ML16菌株。图26D显示scMspA纳米孔二聚体的电流描记线。Figures 26A-26D show the structure and channel activity of single-chain MspA nanopore dimer. Figure 26A is an image of the molecular model of the single-chain nanopore MspA dimer. Figure 26B shows the schematic diagram of the single-chain MspA nanopore dimer (scMspA) gene construct. The amino acid linker region (GGGGS) ₃ (SEQ ID NO:3) is amplified. Also shown is the DNA sequence of the amino acid linker (5'-GGCGGTGGCGGTAGCGGCGGTGGCGGTAGCGGCGGT GGCGGTAGC-3') (SEQ ID NO:19). Figure 26C is an image of a Western blot showing that the scMspA nanopore dimer is expressed in Mycobacterium smegmatis. Lane 1 is a molecular weight standard (M), lane 2 is wild-type Mycobacterium smegmatis (WT Msmeg), lane 3 is ML16 strain without scMspA gene construct (ML16), lane 4 is ML16 strain with wild-type MspA gene construct (WTMspA), and lane 5 is ML16 strain with scMspA nanopore dimer gene construct (scMspA). Figure 26D shows the current trace of the scMspA nanopore dimer.

图27显示运输dC58(SEQ ID NO:40)ssDNA通过野生型MspA孔蛋白的示意图。DNA运输由下列步骤组成：a)开始模拟；b)和c)在快速前进之前和之后DNA构象改变；和d)DNA附着至MspA孔蛋白的表面。Figure 27 shows a schematic diagram of the transport of dC58 (SEQ ID NO: 40) ssDNA through the wild-type MspA porin. DNA transport consists of the following steps: a) start of the simulation; b) and c) DNA conformational changes before and after rapid forward movement; and d) DNA attachment to the surface of the MspA porin.

图28是显示图27的dC58(SEQ ID NO:40)ssDNA运输的累积离子电流的图。在1.2V的跨膜偏压下进行运输。Figure 28 is a graph showing the cumulative ionic current for the dC58 (SEQ ID NO: 40) ssDNA transport of Figure 27. Transport was performed at a transmembrane bias of 1.2V.

图29显示单链MspA(scMspA)纳米孔八聚体序列的设计。scMspA八聚体由下述组成：野生型MspA基因单体、MspA1单体、MspA2单体、MspA3单体、MspA4单体、MspA5单体、MspA6单体和MspA7单体组成。PacI和HindIII限制性位点侧翼连接scMspA纳米孔八聚体序列。X1-X14是侧翼连接单个单体序列的唯一限制性位点。连接每一个单体的黑线表示(GGGGS)₃(SEQ ID NO:3)连接体。Figure 29 shows the design of the single-chain MspA (scMspA) nanopore octamer sequence. The scMspA octamer is composed of the following: a wild-type MspA gene monomer, an MspA1 monomer, an MspA2 monomer, an MspA3 monomer, an MspA4 monomer, an MspA5 monomer, an MspA6 monomer, and an MspA7 monomer. PacI and HindIII restriction sites flank the scMspA nanopore octamer sequence. X1-X14 are unique restriction sites flanking the individual monomer sequences. The black line connecting each monomer represents the (GGGGS) ₃ (SEQ ID NO: 3) connector.

图30显示野生型MspA单体和多种MspA旁系同源物和同系物单体的缢缩区(矩形框)。FIG30 shows the constriction region (rectangular box) of a wild-type MspA monomer and monomers of various MspA paralogs and homologs.

图31显示被DNA构建体阻塞的M1MspA中的阻塞电流水平的直方图。DNA构建体从上至下为：3'-A₄₀AAAAAAAAAA-hp-5'(SEQ ID NO:10)；3'-A₄₀CCCCAAAAAA-hp-5'(SEQ ID NO:11)；3'-A₄₀AAACCCCAAA-hp-5'(SEQ ID NO:12)；3'-A₄₀AAAAAAACAA-hp-5'(SEQ ID NO:13)；3'-A₄₀AAAAAAAAAC-hp-5'(SEQ ID NO:14)；3'-A₄₀AAAAAACCCC-hp-5'(SEQ ID NO:15)；3'-C40CCCCCCCCCC-hp-5'(SEQ ID NO:16)；3'-T₄₀TTTTTTTTTT-hp-5'(SEQ ID NO:17)；3'-A₄₀AAAAAAAGGG-hp-5'(SEQ ID NO:18)。FIG31 shows a histogram of blockade current levels in M1MspA blocked by DNA constructs. The DNA constructs are, from top to bottom: 3'-A ₄₀ AAAAAAAAAA-hp-5' (SEQ ID NO: 10); 3'-A ₄₀ CCCCAAAAAA-hp-5' (SEQ ID NO: 11); 3'-A ₄₀ AAACCCCAAA-hp-5' (SEQ ID NO: 12); 3'-A ₄₀ AAAAAAACAA-hp-5' (SEQ ID NO: 13); 3'-A ₄₀ AAAAAAAAAC-hp-5' (SEQ ID NO: 14); 3'-A ₄₀ AAAAAACCCC-hp-5' (SEQ ID NO: 15); 3'-C40CCCCCCCCCC-hp-5' (SEQ ID NO: 16); 3'-T ₄₀ TTTTTTTTTT-hp-5' (SEQ ID NO: 17); 3'-A ₄₀ AAAAAAAGGG-hp-5' (SEQ ID NO: 18).

发明详述Detailed Description of the Invention

本文中提供了方法，其包括对具有界定通道的前厅和缢缩区的耻垢分枝杆菌孔蛋白(Msp)孔蛋白施加电场，其中所述Msp孔蛋白位于第一导电液体介质与第二导电液体介质之间。任选地，所述第一和第二液体导电介质相同。任选地，所述第一和第二液体导电介质不同。Msp孔蛋白可以是本文中描述的任何Msp孔蛋白。例如，Msp孔蛋白可选自野生型MspA孔蛋白、突变的MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白以及突变的MspA旁系同源物或同系物孔蛋白。Provided herein are methods comprising applying an electric field to a Mycobacterium smegmatis porin (Msp) porin having a vestibule and a constriction region defining a channel, wherein the Msp porin is located between a first conductive liquid medium and a second conductive liquid medium. Optionally, the first and second liquid conductive media are the same. Optionally, the first and second liquid conductive media are different. The Msp porin can be any Msp porin described herein. For example, the Msp porin can be selected from a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, and a mutant MspA paralog or homolog porin.

在本文中的任何实施方案中，Msp孔蛋白还可包含分子发动机(motor)。所述分子发动机可以能够以这样的转位速度或平均转位速度将分析物移入或穿过通道，所述速率小于分析物在分子发动机不存在的情况下通过电泳转位入或穿过通道时的转位速度或平均转位速度。因此，在本文中的包括施加电场的任何实施方案中，电场可能足以使分析物通过电泳转位通过通道。In any of the embodiments herein, the Msp porin may further comprise a molecular motor. The molecular motor may be capable of moving the analyte into or through the channel at a translocation velocity or average translocation velocity that is less than the translocation velocity or average translocation velocity of the analyte into or through the channel by electrophoresis in the absence of the molecular motor. Thus, in any of the embodiments herein comprising applying an electric field, the electric field may be sufficient to cause the analyte to be translocated through the channel by electrophoresis.

本文中所述的任何液体介质例如导电液体介质可包含分析物。分析物可以是本文中所述的任何分析物。本文中的实施方案还可包括在例如下述方法中检测分析物，所述方法包括当分析物与Msp孔蛋白通道相互作用时测量离子电流以提供电流模式，其中电流模式中阻塞的出现标示着分析物的存在。Any liquid medium described herein, such as a conductive liquid medium, may contain an analyte. The analyte may be any analyte described herein. Embodiments herein may also include detecting the analyte in, for example, a method comprising measuring an ionic current when the analyte interacts with an Msp porin channel to provide a current pattern, wherein the presence of a blockage in the current pattern indicates the presence of the analyte.

任选地，Msp孔蛋白是突变的MspA或突变的MspA旁系同源物或同系物孔蛋白，并且分析物的穿过孔蛋白通道的转位速度或平均转位速度小于或大于分析物穿过野生型MspA或野生型MspA旁系同源物或同系物孔蛋白的通道的转位速度或平均转位速度。Optionally, the Msp porin is a mutant MspA or mutant MspA paralog or homolog porin, and the translocation rate or average translocation rate of the analyte through the porin channel is less than or greater than the translocation rate or average translocation rate of the analyte through the channel of wild-type MspA or a wild-type MspA paralog or homolog porin.

在本文中的任何实施方案中，分析物可具有小于0.5nm/μs的穿过通道的转位速度或平均转位速度。任选地，分析物可具有小于0.05nm/μs的穿过通道的转位速度或平均转位速度。In any of the embodiments herein, the analyte may have a translocation velocity or average translocation velocity through the channel of less than 0.5 nm/μs. Optionally, the analyte may have a translocation velocity or average translocation velocity through the channel of less than 0.05 nm/μs.

本文中所述的任何Msp孔蛋白可包含在脂双层中。在本文中的这类实施方案或任何其他实施方案中，Msp孔蛋白可具有顺面(cis side)和反面(trans side)。任选地，分析物通过电泳或其他方式从顺面转位通过通道至反面。任选地，分析物通过电泳或其他方式从反面转位通过通道至顺面。任选地，分析物通过电泳或其他方式被驱动从顺面或反面进入通道并且停留在通道中或接着分别退回顺面或反面。Any of the Msp porins described herein may be contained in a lipid bilayer. In such embodiments herein or any other embodiment, the Msp porin may have a cis side and a trans side. Optionally, the analyte is translocated from the cis side through the channel to the trans side by electrophoresis or other means. Optionally, the analyte is translocated from the trans side through the channel to the cis side by electrophoresis or other means. Optionally, the analyte is driven from the cis side or the trans side into the channel by electrophoresis or other means and remains in the channel or then retreats back to the cis side or the trans side, respectively.

本文中的任何实施方案还可包括鉴定分析物。此类方法可包括将针对未知的分析物获得的电流模式与在相同的条件下使用已知的分析物获得的已知的电流模式相比较。Any of the embodiments herein may further comprise identifying the analyte.Such methods may comprise comparing the current pattern obtained for the unknown analyte with a known current pattern obtained under the same conditions using a known analyte.

在本文中的任何实施方案中，分析物可以是核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、污染物、纳米级物体或生物战剂。任选地，分析物是聚合物例如蛋白质、肽或核酸。任选地，分析物是核酸。任选地，核酸具有小于1个核苷酸/μs的穿过通道的平均转位速度。任选地，核酸具有小于0.1个核苷酸/μs的穿过通道的转位速度或平均转位速度。核酸可以是ssDNA、dsDNA、RNA或其组合。In any of the embodiments herein, the analyte can be a nucleotide, a nucleic acid, an amino acid, a peptide, a protein, a polymer, a drug, an ion, a pollutant, a nanoscale object, or a biological warfare agent. Optionally, the analyte is a polymer such as a protein, a peptide, or a nucleic acid. Optionally, the analyte is a nucleic acid. Optionally, the nucleic acid has an average translocation velocity through the channel of less than 1 nucleotide/μs. Optionally, the nucleic acid has a translocation velocity or average translocation velocity through the channel of less than 0.1 nucleotide/μs. The nucleic acid can be ssDNA, dsDNA, RNA, or a combination thereof.

本文中的实施方案可包括区分聚合物内的至少第一单元与聚合物内的至少第二单元。所述区分可包括测量第一和第二单元分别地转位通过通道时产生的离子电流，以分别产生第一和第二电流模式，其中第一和第二电流模式彼此不同。Embodiments herein may include distinguishing at least a first unit within a polymer from at least a second unit within the polymer. The distinguishing may include measuring ionic currents generated when the first and second units are respectively translocated through a channel to generate first and second current patterns, respectively, wherein the first and second current patterns are different from each other.

本文中的任何实施方案还可包括测定聚合物的序列。测序可包括当聚合物的每一个单元分别转位通过通道时测量离子电流模式或光信号，以提供与每一个单元关联的电流模式，和将每一个电流模式与在相同条件下获得的已知单元的电流模式相比较，以便测定聚合物的序列。Any of the embodiments herein may further comprise determining the sequence of the polymer. Sequencing may comprise measuring the ionic current pattern or optical signal as each unit of the polymer is separately translocated through the channel to provide a current pattern associated with each unit, and comparing each current pattern with a current pattern of a known unit obtained under the same conditions to determine the sequence of the polymer.

本文中的任何实施方案还可包括测定分析物的浓度、大小、分子量、形状或取向或其任何组合。本文中所述的任何液体介质例如导电液体介质可包含多种分析物。本文中描述的任何分析物可包括光学珠粒或磁性珠粒。Any embodiment herein may also include determining the concentration, size, molecular weight, shape or orientation of an analyte, or any combination thereof. Any liquid medium described herein, such as a conductive liquid medium, may contain a variety of analytes. Any analyte described herein may include optical beads or magnetic beads.

本文中论述的任何Msp孔蛋白还可被进一步确定为突变的MspA孔蛋白。突变的MspA孔蛋白可包括：界定通道的前厅和缢缩区，和至少第一突变的MspA单体，其包含位置93、位置91、位置90上的突变或其任何组合。突变的MspA孔蛋白可包含位置93和91、位置93和90上、位置91和90上或位置93、90和91上的突变。任选地，突变的MspA孔蛋白在下列氨基酸位置：88、105、108、118、134或139的任何位置上的一个或多个突变或本文中描述的任何其他突变。Any of the Msp porin proteins discussed herein can also be further identified as mutant MspA porin proteins. The mutant MspA porin can include: a vestibule and a constriction region defining a channel, and at least a first mutant MspA monomer comprising a mutation at position 93, position 91, position 90, or any combination thereof. The mutant MspA porin can comprise mutations at positions 93 and 91, positions 93 and 90, positions 91 and 90, or positions 93, 90, and 91. Optionally, the mutant MspA porin comprises one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134, or 139, or any other mutations described herein.

在本文中的任何实施方案中，突变的MspA孔蛋白或突变的MspA旁系同源物或同系物的直径可能小于相应的野生型MspA孔蛋白或野生型MspA旁系同源物或同系物的缢缩区的直径。突变的MspA孔蛋白或突变的MspA旁系同源物或同系物可在前厅或缢缩区具有突变，所述突变允许分析物电泳转位或以其他方式穿过突变的MspA孔蛋白或突变MspA旁系同源物或同系物的通道的转位速度或平均转位速度小于该分析物转位通过野生型Msp孔蛋白或野生型MspA旁系同源物或同系物的通道时的转位速度或平均转位速度。In any of the embodiments herein, the diameter of the mutant MspA porin or mutant MspA paralog or homolog may be smaller than the diameter of the constriction of the corresponding wild-type MspA porin or wild-type MspA paralog or homolog. The mutant MspA porin or mutant MspA paralog or homolog may have a mutation in the vestibule or constriction that allows an analyte to electrophoretically translocate or otherwise pass through the channel of the mutant MspA porin or mutant MspA paralog or homolog at a rate or average translocation rate that is less than the rate or average translocation rate of the analyte when translocated through the channel of the wild-type Msp porin or wild-type MspA paralog or homolog.

突变的Msp孔蛋白例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物孔蛋白可包含中性缢缩区。突变的Msp孔蛋白，例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物孔蛋白，其通过通道的电导可以比通过其相应的野生型Msp孔蛋白的通道的电导更高(例如高2倍)。突变的Msp孔蛋白(例如突变的MspA孔蛋白或突变的MspA旁系同源物或突变的孔蛋白)可包括比通过其相应的野生型Msp孔蛋白的通道的电导更小的通过通道的电导。A mutated Msp porin, such as a mutated MspA porin or a mutated MspA paralog or homolog porin, may comprise a neutral constriction zone. A mutated Msp porin, such as a mutated MspA porin or a mutated MspA paralog or homolog porin, may have a conductance through the channel that is higher (e.g., 2-fold higher) than the conductance through the channel of its corresponding wild-type Msp porin. A mutated Msp porin (e.g., a mutated MspA porin or a mutated MspA paralog or mutated porin) may comprise a conductance through the channel that is lower than the conductance through the channel of its corresponding wild-type Msp porin.

本文中论述的任何Msp孔蛋白可包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道。本文中还提供了突变的MspA孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅以及具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区(其中所述前厅和缢缩区一起界定了通道)，并且还包含至少第一突变MspA旁系同源物或同系物单体。Any of the Msp porins discussed herein can comprise a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the vestibule and constriction together define a channel. Also provided herein are mutant MspA porins comprising a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the vestibule and constriction together define a channel, and further comprising at least a first mutant MspA paralog or homolog monomer.

突变的Msp孔蛋白(例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物)的缢缩区的直径可以小于其相应的野生型Msp孔蛋白(例如野生型MspA孔蛋白或野生型MspA旁系同源物或同系物)的缢缩区的直径。突变的Msp孔蛋白(例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物)可在前厅或缢缩区包含突变，所述突变允许分析物通过电泳或其他方式转位通过孔蛋白的通道的转位速度或平均转位速度小于该分析物转位通过其相应的野生型Msp孔蛋白(例如，野生型MspA孔蛋白、野生型MspA旁系同源物或同系物)的通道时的转位速度或平均转位速度。The diameter of the constriction zone of a mutant Msp porin (e.g., a mutant MspA porin or a mutant MspA paralog or homolog) can be smaller than the diameter of the constriction zone of its corresponding wild-type Msp porin (e.g., a wild-type MspA porin or a wild-type MspA paralog or homolog). A mutant Msp porin (e.g., a mutant MspA porin or a mutant MspA paralog or homolog) can comprise a mutation in the vestibule or constriction zone that allows an analyte to be translocated, electrophoretically or otherwise, through the channel of the porin at a rate or average translocation rate that is less than the rate or average translocation rate of the analyte when translocated through the channel of its corresponding wild-type Msp porin (e.g., a wild-type MspA porin, a wild-type MspA paralog or homolog).

任选地，Msp孔蛋白完全或部分地由编码部分或完整单链Msp孔蛋白的核酸序列编码，其中核酸序列包含:(a)第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列；和(b)编码氨基酸连接体序列的第三核苷酸序列。单体序列可以是本文中描述的任何单体序列。任选地，所述第一和第二Msp单体序列独立地选自野生型MspA单体、野生型MspB单体、野生型MspC单体、野生型MspD单体和其突变体。任选地，所述第一Msp单体序列包含野生型MspA单体或其突变体。任选地，所述第一Msp单体序列包含突变的MspA单体。Optionally, the Msp porin is encoded in whole or in part by a nucleic acid sequence encoding a partial or complete single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes a first Msp monomer sequence and the second nucleotide sequence encodes a second Msp monomer sequence; and (b) a third nucleotide sequence encoding an amino acid connector sequence. The monomer sequence can be any monomer sequence described herein. Optionally, the first and second Msp monomer sequences are independently selected from wild-type MspA monomer, wild-type MspB monomer, wild-type MspC monomer, wild-type MspD monomer and a mutant thereof. Optionally, the first Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof. Optionally, the first Msp monomer sequence comprises a mutated MspA monomer.

在本文中的任何实施方案中，Msp孔蛋白可完全或部分由编码部分或完整单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列或其任何亚组(subset)，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(b)编码氨基酸连接体序列的第9核苷酸序列。因此，所述孔蛋白可包含与其他Msp单体或其他部分单链Msp孔蛋白杂交、二聚化、三聚化等的一个或多个部分单链Msp孔蛋白。可选择地，完整单链Msp孔蛋白可形成孔蛋白而无需与其他Msp元件结合。在本文中的任何实施方案中，例如，Msp孔蛋白可由编码完整单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(b)In any embodiment herein, the Msp porin may be encoded in whole or in part by a nucleic acid sequence encoding a partial or complete single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences or any subset thereof, wherein the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences encode the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th Msp monomer sequences, respectively; and (b) a 9th nucleotide sequence encoding an amino acid linker sequence. Thus, the porin may comprise one or more partial single-chain Msp porins that hybridize, dimerize, trimerize, etc. with other Msp monomers or other partial single-chain Msp porins. Alternatively, the complete single-chain Msp porin may form a porin without being bound to other Msp elements. In any of the embodiments herein, for example, the Msp porin can be encoded by a nucleic acid sequence encoding a complete single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) a 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th nucleotide sequence, wherein the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th nucleotide sequence encodes a 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th Msp monomer sequence, respectively; and (b)

编码氨基酸连接体序列的第9核苷酸序列。每一个Msp单体可包含野生型MspA单体或其突变体。任选地，至少一个Msp单体包含野生型MspA单体或其突变体。因此，可完整地编码孔蛋白。The 9th nucleotide sequence encoding the amino acid linker sequence. Each Msp monomer may comprise a wild-type MspA monomer or a mutant thereof. Optionally, at least one Msp monomer comprises a wild-type MspA monomer or a mutant thereof. Thus, the porin can be fully encoded.

在本文中的任何实施方案中，Msp单体可以是野生型MspA旁系同源物或同系物，例如MspA/Msmeg0965、MspB/Msmeg0520、MspC/Msmeg5483、MspD/Msmeg6057、MppA、PorM1、PorM2、PorM1、Mmcs4296、Mmcs4297、Mmcs3857、Mmcs4382、Mmcs4383、Mjls3843、Mjls3857、Mjls3931Mjls4674、Mjls4675、Mjls4677、Map3123c、Mav3943、Mvan1836、Mvan4117、Mvan4839、Mvan4840、Mvan5016、Mvan5017、Mvan5768、MUL_2391、Mflv1734、Mflv1735、Mflv2295、Mflv1891、MCH4691c、MCH4689c、MCH4690c、MAB1080、MAB1081、MAB2800、RHA1ro08561、RHA1ro04074和RHA1ro03127。In any of the embodiments herein, the Msp monomer can be a wild-type MspA paralog or homolog, such as MspA/Msmeg0965, MspB/Msmeg0520, MspC/Msmeg5483, MspD/Msmeg6057, MppA, PorM1, PorM2, PorM1, Mmcs4296, Mmcs4297, Mmcs3857, Mmcs4382, Mmcs4383, Mjls3843, Mjls3857, Mjls3931, Mjls4674, Mjls4675, Mjls 4677, Map3123c, Mav3943, Mvan1836, Mvan4117, Mvan4839, Mvan4840, Mvan5016, Mvan5017, Mvan5768, MUL_2391, Mflv1734, Mflv1 735, Mflv2295, Mflv1891, MCH4691c, MCH4689c, MCH4690c, MAB1080, MAB1081, MAB2800, RHA1ro08561, RHA1ro04074, and RHA1ro03127.

本文中还提供了修饰通过Msp孔蛋白的通道的电导的方法，其包括在野生型Msp孔蛋白的前厅或缢缩区中除去、添加或置换至少一个氨基酸。例如，所述方法可包括增加电导。所述方法可包括减小电导。Also provided herein are methods for modifying the conductance of a channel through an Msp porin comprising removing, adding, or substituting at least one amino acid in the vestibule or constriction of a wild-type Msp porin. For example, the method may include increasing the conductance. The method may include decreasing the conductance.

还提供了包括使分析物转位通过Msp孔蛋白的通道而不施加电场的方法。在该实施方案或本文中的任何其他实施方案中，Msp孔蛋白还可包含分子发动机。Msp孔蛋白可以是本文中描述的任何Msp孔蛋白，例如野生型MspA孔蛋白、突变的MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白以及突变的MspA旁系同源物或同系物孔蛋白。Msp孔蛋白可由编码单链Msp孔蛋白的核酸序列编码。Also provided are methods comprising translocating an analyte through a channel of an Msp porin without applying an electric field. In this embodiment or any other embodiment herein, the Msp porin may further comprise a molecular motor. The Msp porin may be any Msp porin described herein, such as a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, and a mutant MspA paralog or homolog porin. The Msp porin may be encoded by a nucleic acid sequence encoding a single-chain Msp porin.

还提供了包括具有界定通道的前厅和缢缩区的Msp孔蛋白的系统，其中所述通道位于第一液体介质与第二液体介质之间，其中至少一种液体介质包含分析物，并且其中所述系统对于检测分析物的性质是有效的。系统可以对于检测任何分析物的性质均是有效的，包括将Msp孔蛋白经受电场以便分析物与Msp孔蛋白相互作用。系统对于检测分析物的性质是有效的，包括将Msp孔蛋白经受电场以便分析物通过电泳转位通过Msp孔蛋白的通道。还提供了包括具有界定通道的前厅和缢缩区的Msp孔蛋白的系统，其中所述通道位于第一液体介质与第二液体介质之间的脂双层中，并且其中第一和第二液体介质之间的液体连通的唯一的点存在于通道中。此外，本文中描述的任何Msp孔蛋白可以包含在本文中描述的任何系统中。Also provided are systems comprising an Msp porin having a vestibule and a constriction defining a channel, wherein the channel is located between a first liquid medium and a second liquid medium, wherein at least one of the liquid media comprises an analyte, and wherein the system is effective for detecting a property of the analyte. The system can be effective for detecting a property of any analyte, including subjecting the Msp porin to an electric field so that the analyte interacts with the Msp porin. The system is effective for detecting a property of an analyte, including subjecting the Msp porin to an electric field so that the analyte is electrophoretically translocated through the channel of the Msp porin. Also provided are systems comprising an Msp porin having a vestibule and a constriction defining a channel, wherein the channel is located in a lipid bilayer between the first liquid medium and the second liquid medium, and wherein the only point of liquid communication between the first and second liquid media is in the channel. In addition, any Msp porin described herein can be included in any system described herein.

第一和第二液体介质可以相同或不同，并且任一种或两种液体介质可包含如下的一种或多种：盐、去垢剂或缓冲剂。事实上，本文中描述的任何液体介质可包含如下的一种或多种：盐、去垢剂或缓冲剂。任选地，至少一种液体介质是导电的。任选地，至少一种液体介质是不导电的。本文中描述的任何液体介质可包含改变粘性的物质或改变速率的物质。液体介质可包含本文中描述的任何分析物。分析物的性质可以是电性质、化学性质或物理性质。The first and second liquid media can be the same or different, and either or both liquid media can contain one or more of the following: a salt, a detergent, or a buffer. In fact, any liquid medium described herein can contain one or more of the following: a salt, a detergent, or a buffer. Optionally, at least one liquid medium is conductive. Optionally, at least one liquid medium is non-conductive. Any liquid medium described herein can contain a substance that changes viscosity or a substance that changes rate. The liquid medium can contain any analyte described herein. The properties of the analyte can be electrical, chemical, or physical properties.

Msp孔蛋白可包含在本文中描述的系统或任何其他实施方案中的脂双层中。系统可包含多个Msp孔蛋白。The Msp porin may be comprised in a lipid bilayer in the system or any other embodiment described herein.The system may comprise a plurality of Msp porins.

系统可包含本文中描述的任何Msp孔蛋白，例如野生型MspA孔蛋白、突变的MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白或突变的MspA旁系同源物或同系物孔蛋白。任选地，Msp孔蛋白被进一步确定为突变的MspA孔蛋白。系统可包含含有界定通道的前厅和缢缩区的突变Msp孔蛋白和至少第一突变的MspA单体(其包含位置93上的突变和位置90、位置91或位置90及91上的突变)。包括在系统中的突变的Msp孔蛋白可包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道。突变的MspA孔蛋白还可包含至少第一突变的MspA旁系同源物或同系物单体。包括在系统中的Msp孔蛋白可由编码单链Msp孔蛋白的核酸序列编码。The system may comprise any Msp porin described herein, such as a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, or a mutant MspA paralog or homolog porin. Optionally, the Msp porin is further identified as a mutant MspA porin. The system may comprise a mutant Msp porin comprising a vestibule and a constriction defining a channel and at least a first mutant MspA monomer (which comprises a mutation at position 93 and a mutation at position 90, position 91, or positions 90 and 91). The mutant Msp porin included in the system may comprise a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the vestibule and constriction together define a channel. The mutant MspA porin may also comprise at least a first mutant MspA paralog or homolog monomer. The Msp porin included in the system can be encoded by a nucleic acid sequence encoding a single-chain Msp porin.

包含在系统中的Msp孔蛋白还可包含分子发动机。本文中的系统或任何其他实施方案中的分子发动机可以能够以这样的转位速度或平均转位速度将分析物移入或穿过通道，所述转位速度或平均转位速度小于分析物在分子发动机不存在的情况下转位入或通过通道时的转位速度或平均转位速度。The Msp porin included in the system may further comprise a molecular motor. The molecular motor in the system herein or any other embodiment may be capable of moving an analyte into or through the channel at a translocation velocity or average translocation velocity that is less than the translocation velocity or average translocation velocity of the analyte into or through the channel in the absence of the molecular motor.

本文中描述的任何系统还可包括膜片钳放大器或数据获取装置。系统还可包括与第一液体介质、第二液体介质或两者连通的一个或多个温度调节装置。Any of the systems described herein may further include a patch clamp amplifier or a data acquisition device.The system may further include one or more temperature regulation devices in communication with the first liquid medium, the second liquid medium, or both.

本文中描述的任何系统对于使分析物通过电泳或其他方式转位通过Msp孔蛋白通道是有效的。Any of the systems described herein are effective for translocating an analyte through an Msp porin channel, electrophoretically or otherwise.

还提供了Msp孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区,其中所述前厅和缢缩区一起界定了通道。还提供了突变的Msp孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道。还提供了突变的MspA孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区,其中所述前厅和缢缩区一起界定了通道。还提供了突变的MspA旁系同源物或同系物孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区,其中所述前厅和缢缩区一起界定了通道。本文中描述的任何突变的MspA旁系同源物或同系物还可包含至少第一突变的MspA旁系同源物或同系物单体。还提供了突变的MspA孔蛋白，其包含具有约2至约6nm的长度和约2至约6nm的直径的前厅，和具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区，其中所述前厅和缢缩区一起界定了通道，并且还包含至少第一突变的MspA旁系同源物或同系物单体。这些孔蛋白的任一个可用于本文中的任何实施方案。Also provided are Msp porins comprising an antechamber having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the antechamber and constriction together define a channel. Also provided are mutant Msp porins comprising an antechamber having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the antechamber and constriction together define a channel. Also provided are mutant MspA porins comprising an antechamber having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm, wherein the antechamber and constriction together define a channel. Also provide the MspA paralogue or homologue porin of sudden change, it comprises the antechamber with the length of about 2 to about 6nm and the diameter of about 2 to about 6nm, and the constriction zone with the length of about 0.3 to about 3nm and the diameter of about 0.3 to about 3nm, wherein said antechamber defines passage together with constriction zone.The MspA paralogue or homologue of any sudden change described herein can also comprise the MspA paralogue or homologue monomer of at least the first sudden change.Also provide the MspA porin of sudden change, it comprises the antechamber with the length of about 2 to about 6nm and the diameter of about 2 to about 6nm, and the constriction zone with the length of about 0.3 to about 3nm and the diameter of about 0.3 to about 3nm, wherein said antechamber defines passage together with constriction zone, and also comprises the MspA paralogue or homologue monomer of at least the first sudden change.Any of these porins can be used for any embodiment herein.

还提供了突变的MspA孔蛋白，其包含界定通道的前厅和缢缩区，和至少第一突变的MspA单体(其包含位置93上的突变和位置90、位置91或位置90及91上的突变)。该突变的MspA孔蛋白和本文中描述的任何其他突变的Msp孔蛋白或MspA孔蛋白可用于本文中描述的任何实施方案。突变的MspA孔蛋白可包含位置93和90上的突变。突变的MspA孔蛋白可包含位置93和91上的突变。突变的MspA孔蛋白可包含位置93、91和90上的突变。突变的MspA孔蛋白可包含本文中描述的任何其他突变。Also provided are mutant MspA porins comprising a vestibule and a constriction region defining a channel, and at least a first mutant MspA monomer comprising a mutation at position 93 and a mutation at position 90, position 91, or positions 90 and 91. This mutant MspA porin and any other mutant Msp porin or MspA porin described herein can be used in any of the embodiments described herein. The mutant MspA porin can comprise mutations at positions 93 and 90. The mutant MspA porin can comprise mutations at positions 93 and 91. The mutant MspA porin can comprise mutations at positions 93, 91, and 90. The mutant MspA porin can comprise any other mutation described herein.

突变的MspA孔蛋白的缢缩区的直径可以小于相应的野生型MspA孔蛋白的缢缩区的直径。MspA孔蛋白可在前厅或缢缩区中具有突变，所述突变允许分析物以这样的转位速度或平均转位速度通过电泳或其他方式转位通过突变体的通道，所述转位速度或平均转位速度小于分析物转位通过野生型Msp孔蛋白的通道时的转位速度或平均转位速度。MspA孔蛋白可在前厅或缢缩区中具有突变，所述突变允许分析物例如以小于0.5nm/μs或小于0.05nm/μs的平均转位速度通过电泳转位通过通道。分析物可选自核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、生物战剂、污染物、纳米级物体或其组合或聚簇。任选地，分析物被进一步确定为核酸。核酸可以以小于1个核苷酸/μs或小于0.1个核苷酸/μs的平均转位速度通过电泳或其他方式转位通过通道。核酸可被进一步确定为ssDNA、dsDNA、RNA或其组合。The diameter of the constriction of the mutant MspA porin can be smaller than the diameter of the constriction of the corresponding wild-type MspA porin. The MspA porin can have a mutation in the vestibule or constriction that allows the analyte to be translocated through the mutant's channel by electrophoresis or other means at a translocation rate or average translocation rate that is less than the translocation rate or average translocation rate of the analyte when translocating through the channel of the wild-type Msp porin. The MspA porin can have a mutation in the vestibule or constriction that allows the analyte to be translocated through the channel by electrophoresis, for example, at an average translocation rate of less than 0.5 nm/μs or less than 0.05 nm/μs. The analyte can be selected from nucleotides, nucleic acids, amino acids, peptides, proteins, polymers, drugs, ions, biological warfare agents, pollutants, nanoscale objects, or combinations or clusters thereof. Optionally, the analyte is further determined to be a nucleic acid. The nucleic acid can be translocated through the channel by electrophoresis or other means at an average translocation velocity of less than 1 nucleotide/μs or less than 0.1 nucleotide/μs. The nucleic acid can be further identified as ssDNA, dsDNA, RNA, or a combination thereof.

本文中的任何实施方案中的分析物还可包括磁性珠粒。磁性珠粒可被进一步确定为链霉抗生物素蛋白包被的磁性珠粒。分析物还可包括光学珠粒。本文中描述的任何分析物可以是离子或可以是中性的。分析物可包括生物素。The analyte in any of the embodiments herein may also include magnetic beads. Magnetic beads may be further defined as streptavidin-coated magnetic beads. The analyte may also include optical beads. Any analyte described herein may be ionic or neutral. The analyte may include biotin.

本文中描述的任何Msp孔蛋白，例如突变的MspA孔蛋白，可包含2至15个相同或不同的Msp单体。任选地，Msp孔蛋白，例如突变的MspA孔蛋白，包含为相同或不同的7至9个Msp单体。任选地，至少第二单体选自野生型MspA单体、第二突变的MspA单体、野生型MspA旁系同源物或同系物单体以及突变的MspA旁系同源物或同系物单体，其中所述第二突变的MspA单体可以与所述第一突变的MspA单体相同或不同。任选地，所述第二单体是野生型MspA旁系同源物或同系物单体。野生型MspA旁系同源物或同系物单体可以是野生型MspB单体。MspA单体可在任何下列氨基酸位置：88、105、108、118、134或139上包含一个或多个突变。MspA单体可包含一个或多个下列突变：L88W、D90K/N/Q/R、D91N/Q、D93N、I105W、N108W、D118R、D134R或E139K。MspA单体可包含下列突变：D90N/D91N/D93N。MspA单体可包含下列突变：D90N/D91N/D93N/D118R/D134R/E139K。MspA单体可包含下列突变：D90Q/D91Q/D93N。MspA单体可包含下列突变：D90Q/D91Q/D93N/D118R/D134R/E139K。MspA单体可包含下列突变：D90(K,R)/D91N/D93N。MspA单体可包含下列突变：(L88，I105)W/D91Q/D93N。MspA单体可包含下列突变：I105W/N108W。此外，MspA单体可包含本文中描述的任何其他突变。Any Msp porin described herein, such as the MspA porin of mutation, can comprise 2 to 15 identical or different Msp monomers. Optionally, the Msp porin, such as the MspA porin of mutation, comprises 7 to 9 identical or different Msp monomers. Optionally, at least the second monomer is selected from the MspA monomer of wild-type MspA monomer, the MspA monomer of the second mutation, the wild-type MspA paralogue or homologue monomer and the MspA paralogue or homologue monomer of mutation, wherein the MspA monomer of the second mutation can be identical or different with the MspA monomer of the first mutation. Optionally, the second monomer is a wild-type MspA paralogue or homologue monomer. The wild-type MspA paralogue or homologue monomer can be a wild-type MspB monomer. The MspA monomer can comprise one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134 or 139. An MspA monomer may comprise one or more of the following mutations: L88W, D90K/N/Q/R, D91N/Q, D93N, I105W, N108W, D118R, D134R, or E139K. An MspA monomer may comprise the following mutations: D90N/D91N/D93N. An MspA monomer may comprise the following mutations: D90N/D91N/D93N/D118R/D134R/E139K. An MspA monomer may comprise the following mutations: D90Q/D91Q/D93N. An MspA monomer may comprise the following mutations: D90Q/D91Q/D93N/D118R/D134R/E139K. An MspA monomer may comprise the following mutations: D90(K,R)/D91N/D93N. The MspA monomer may comprise the following mutations: (L88, I105)W/D91Q/D93N. The MspA monomer may comprise the following mutations: I105W/N108W. In addition, the MspA monomer may comprise any other mutation described herein.

在本文中的任何实施方案中，突变的Msp孔蛋白，例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物，与野生型Msp孔蛋白的前厅或缢缩区相比较，可分别包含至少一个额外的带正电荷的氨基酸；与野生型MspA孔蛋白的前厅或缢缩区相比较，可分别包含至少一个额外的带负电荷的氨基酸；与野生型MspA孔蛋白的前厅或缢缩区相比较，可分别少包含至少一个带正电荷的氨基酸；或与野生型MspA孔蛋白的前厅或缢缩区相比较，可分别少包含至少一个带负电荷的氨基酸。In any of the embodiments herein, the mutant Msp porin, e.g., a mutant MspA porin or a mutant MspA paralog or homolog, may comprise at least one additional positively charged amino acid compared to the vestibule or constriction region of the wild-type Msp porin, respectively; may comprise at least one additional negatively charged amino acid compared to the vestibule or constriction region of the wild-type MspA porin, respectively; may comprise at least one less positively charged amino acid compared to the vestibule or constriction region of the wild-type MspA porin, respectively; or may comprise at least one less negatively charged amino acid compared to the vestibule or constriction region of the wild-type MspA porin, respectively.

任选地，可用带负电荷的氨基酸置换野生型Msp孔蛋白的前厅和缢缩区中每一个带正电荷的氨基酸，并且每一个带负电荷的氨基酸是相同的或不同的；或野生型Msp孔蛋白的前厅和缢缩区中每一个带负电荷的氨基酸可用带正电荷的氨基酸置换，并且每一个带正电荷的氨基酸是相同的或不同的。Optionally, each positively charged amino acid in the vestibule and constriction region of the wild-type Msp porin can be replaced with a negatively charged amino acid, and each negatively charged amino acid is the same or different; or each negatively charged amino acid in the vestibule and constriction region of the wild-type Msp porin can be replaced with a positively charged amino acid, and each positively charged amino acid is the same or different.

任选地，突变的Msp孔蛋白的前厅或缢缩区分别包含比野生型Msp孔蛋白的前厅或缢缩区的带正电荷残基更大量的带正电荷残基；或者前厅或缢缩区分别包含比野生型Msp孔蛋白的前厅或缢缩区的带负电荷残基更大量的带负电荷残基；或者野生型Msp孔蛋白(例如野生型MspA孔蛋白或野生型MspA旁系同源物或同系物孔蛋白)的前厅或缢缩区中至少一个带正电荷的氨基酸缺失或被带负电荷的氨基酸置换；或者野生型Msp孔蛋白的前厅或缢缩区中至少一个带负电荷的氨基酸缺失或被带正电荷的氨基酸置换。Optionally, the vestibule or constriction region of the mutant Msp porin comprises a greater number of positively charged residues than the positively charged residues in the vestibule or constriction region, respectively, of the wild-type Msp porin; or the vestibule or constriction region comprises a greater number of negatively charged residues than the negatively charged residues in the vestibule or constriction region, respectively, of the wild-type Msp porin; or at least one positively charged amino acid in the vestibule or constriction region of the wild-type Msp porin (e.g., a wild-type MspA porin or a wild-type MspA paralog or homolog porin) is deleted or substituted with a negatively charged amino acid; or at least one negatively charged amino acid in the vestibule or constriction region of the wild-type Msp porin is deleted or substituted with a positively charged amino acid.

野生型Msp孔蛋白(例如野生型MspA孔蛋白或野生型MspA旁系同源物或同系物孔蛋白)的前厅或缢缩区中至少一个氨基酸可被具有空间上更大的侧链的氨基酸、具有空间上更小的侧链的氨基酸、具有更大极性的侧链的氨基酸、具有更小极性的侧链的氨基酸或具有更大疏水性的侧链的氨基酸、具有更小疏水性的侧链的氨基酸置换。At least one amino acid in the vestibule or constriction region of a wild-type Msp porin (e.g., a wild-type MspA porin or a wild-type MspA paralog or homolog porin) can be replaced with an amino acid having a sterically larger side chain, an amino acid having a sterically smaller side chain, an amino acid having a more polar side chain, an amino acid having a less polar side chain, or an amino acid having a more hydrophobic side chain, an amino acid having a less hydrophobic side chain.

在本文中的任何实施方案中，突变的Msp孔蛋白的前厅或缢缩区中至少一个氨基酸可包括非天然氨基酸或化学修饰的氨基酸。In any of the embodiments herein, at least one amino acid in the vestibule or constriction region of the mutant Msp porin can include an unnatural amino acid or a chemically modified amino acid.

本文中描述的任何Msp孔蛋白可包含一个或多个周质环的缺失、添加或置换。Any of the Msp porins described herein may comprise deletions, additions, or substitutions of one or more periplasmic loops.

如本文中所描述的，任何Msp孔蛋白例如突变的MspA孔蛋白还可包含分子发动机。本文中描述的任何分子发动机可以能够以这样的转位速度或平均转位速度将分析物移入或通过通道，所述转位速度或平均转位速度小于分析物在分子发动机不存在的情况下转位进入或穿过通道时的转位速度或平均转位速度。在本文中的任何实施方案中，分子发动机可以是酶，例如聚合酶、外切核酸酶或Klenow片段。As described herein, any Msp porin, such as a mutant MspA porin, can also comprise a molecular motor. Any molecular motor described herein can be capable of moving an analyte into or through a channel at a translocation velocity or average translocation velocity that is less than the translocation velocity or average translocation velocity of the analyte when translocated into or through the channel in the absence of the molecular motor. In any of the embodiments herein, the molecular motor can be an enzyme, such as a polymerase, an exonuclease, or a Klenow fragment.

还提供了产生本文中描述的Msp孔蛋白的方法。因此，提供了产生包含至少一个突变MspA单体的突变MspA孔蛋白的方法，所述方法包括在位置93和位置90、位置91或位置90及91上修饰野生型MspA单体的方法。该方法可包括在位置93和90上修饰野生型MspA单体。该方法可包括在位置93和91上修饰野生型MspA单体。该方法可包括在位置93、91和90上修饰野生型MspA单体。该方法可包括进一步地或可变通地在下列位置：88、105、108、118、134或139的任何一个或多个位置上修饰野生型MspA单体，或进行本文中描述的任何其他修饰。可由本文中描述的方法产生的突变MspA孔蛋白可包括本文中描述的任何突变或孔蛋白性质。例如，突变MspA可包括中性缢缩区。突变MspA孔蛋白还可包括至少一个Msp单体，例如野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物或第二突变的MspA旁系同源物或同系物单体。突变的MspA孔蛋白的通过通道的电导可比其相应的野生型MspA孔蛋白的通过通道的电导更高，例如高1倍。Also provided are methods for producing the Msp porins described herein. Thus, methods for producing mutant MspA porins comprising at least one mutant MspA monomer are provided, the methods comprising modifying a wild-type MspA monomer at positions 93 and 90, position 91, or positions 90 and 91. The methods may comprise modifying a wild-type MspA monomer at positions 93 and 90. The methods may comprise modifying a wild-type MspA monomer at positions 93 and 91. The methods may comprise modifying a wild-type MspA monomer at positions 93, 91, and 90. The methods may further or alternatively comprise modifying a wild-type MspA monomer at any one or more of the following positions: 88, 105, 108, 118, 134, or 139, or performing any other modifications described herein. The mutant MspA porins that can be produced by the methods described herein may include any of the mutations or porin properties described herein. For example, the mutant MspA may include a neutral constriction zone. The mutant MspA porin can also include at least one Msp monomer, such as a wild-type MspA monomer, a mutant MspA monomer, a wild-type MspA paralog or homolog, or a second mutant MspA paralog or homolog monomer. The conductance through the channel of the mutant MspA porin can be higher, such as 1-fold higher, than the conductance through the channel of its corresponding wild-type MspA porin.

本文中描述的任何突变Msp孔蛋白，例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物孔蛋白，可包含一个或多个突变的MspB、突变的MspC或突变的MspD单体或其组合。Any mutant Msp porin described herein, such as a mutant MspA porin or a mutant MspA paralog or homolog porin, may comprise one or more mutant MspB, mutant MspC, or mutant MspD monomers, or combinations thereof.

还提供了产生具有界定通道的前厅和缢缩区的突变MspA孔蛋白的方法，包括缺失、添加或置换野生型MspA旁系同源物或同系物单体的前厅和缢缩区内的任何氨基酸，以便所得的突变MspA孔蛋白能够在施加电场后使分析物转位通过通道。该突变的MspA孔蛋白可以是本文中描述的任何类型。Also provided are methods for generating mutant MspA porins having a vestibule and a constriction region defining a channel, comprising deleting, adding, or substituting any amino acid within the vestibule and constriction region of a wild-type MspA paralog or homolog monomer, such that the resulting mutant MspA porin is capable of translocating an analyte through the channel upon application of an electric field. The mutant MspA porin can be any type described herein.

还提供了编码本文中描述的Msp孔蛋白的核酸序列。例如，提供了编码突变MspA孔蛋白或者突变MspA旁系同源物或同系物的核酸序列。还涉及包含本文中描述的核酸序列的载体，例如包含编码突变MspA孔蛋白或突变MspA旁系同源物或同系物的核酸序列的载体。本文中描述的任何载体还可包含启动子序列。本文中描述的任何载体还可包含组成型启动子。组成型启动子可包括p_smyc启动子。启动子可包括诱导型启动子。诱导型启动子可包括乙酰胺诱导型启动子。Also provided are nucleic acid sequences encoding the Msp porins described herein. For example, nucleic acid sequences encoding mutant MspA porins or mutant MspA paralogs or homologs are provided. Also provided are vectors comprising the nucleic acid sequences described herein, for example, vectors comprising nucleic acid sequences encoding mutant MspA porins or mutant MspA paralogs or homologs. Any of the vectors described herein may further comprise a promoter sequence. Any of the vectors described herein may further comprise a constitutive promoter. The constitutive promoter may include the _psmyc promoter. The promoter may include an inducible promoter. The inducible promoter may include an acetamide-inducible promoter.

还提供了用本文中描述的任何载体转染的培养细胞，或其后代，其中所述细胞能够表达Msp孔蛋，白例如突变的MspA孔蛋白或突变的MspA旁系同源物或同系物。Also provided are cultured cells transfected with any of the vectors described herein, or progeny thereof, wherein the cells are capable of expressing an Msp porin, such as a mutant MspA porin or a mutant MspA paralog or homolog.

还提供了包含本文中描述的任何载体的耻垢分枝杆菌菌株。还涉及不含内源孔蛋白的耻垢分枝杆菌菌株，其可包含本文中描述的任何载体。“不含”意指当使用适当的Msp-特异性抗血清时不能在免疫印迹上检测到内源孔蛋白，或包含少于1％的内源孔蛋白。Also provided are Mycobacterium smegmatis strains comprising any of the vectors described herein. Also provided are Mycobacterium smegmatis strains that are free of endogenous porins, which may comprise any of the vectors described herein. "Free" means that endogenous porins cannot be detected on immunoblot using an appropriate Msp-specific antiserum, or that contain less than 1% endogenous porins.

还提供了包含编码野生型Msp单体的核酸序列的载体，其中所述核酸序列有效地受诱导型启动子控制。载体可以是整合载体。还提供了用该载体转染的培养细胞或其后代，其中所述细胞能够表达野生型Msp孔蛋白。还涉及包含该载体的耻垢分枝杆菌菌株。Also provided are vectors comprising a nucleic acid sequence encoding a wild-type Msp monomer, wherein the nucleic acid sequence is effectively controlled by an inducible promoter. The vector can be an integrating vector. Also provided are cultured cells or their progeny transfected with the vector, wherein the cells are capable of expressing the wild-type Msp porin. Also provided are Mycobacterium smegmatis strains comprising the vector.

还提供了编码本文中描述的部分或完整单链Msp孔蛋白的核酸序列。所述核酸序列可包括例如:(a)第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列；和(b)编码氨基酸连接体序列的第三核苷酸序列。所述第一和第二Msp单体序列可独立地选自野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物单体以及突变的MspA旁系同源物或同系物单体。所述第一Msp单体序列包含野生型MspA单体或其突变体。任选地，所述第一Msp单体序列包含突变的MspA单体。所述第一Msp单体序列可包含一个或多个突变，所述突变选自氨基酸138上的A至P置换、氨基酸139上的E至A或K置换、氨基酸90上的D至K或R或Q置换、氨基酸91上的D至N或Q置换、氨基酸93上的D至N置换、氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸108上的N至W置换、氨基酸118上的D至R置换和氨基酸134上的D至R置换。事实上，本文中描述的任何Msp单体可包含任何此类置换。Also provided is a nucleic acid sequence encoding a partial or complete single-chain Msp porin described herein.Described nucleic acid sequence may include, for example: (a) a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes the first Msp monomer sequence and the second nucleotide sequence encodes the second Msp monomer sequence; and (b) a third nucleotide sequence encoding an amino acid connector sequence.Described first and second Msp monomer sequences may be independently selected from wild-type MspA monomer, the MspA monomer of mutation, wild-type MspA paralogue or homologue monomer and the MspA paralogue or homologue monomer of mutation.Described first Msp monomer sequence comprises wild-type MspA monomer or its mutant.Optionally, described first Msp monomer sequence comprises the MspA monomer of mutation. The first Msp monomer sequence may comprise one or more mutations selected from the group consisting of an A to P substitution at amino acid 138, an E to A or K substitution at amino acid 139, a D to K or R or Q substitution at amino acid 90, a D to N or Q substitution at amino acid 91, a D to N substitution at amino acid 93, an L to W substitution at amino acid 88, an I to W substitution at amino acid 105, an N to W substitution at amino acid 108, a D to R substitution at amino acid 118, and a D to R substitution at amino acid 134. Indeed, any of the Msp monomers described herein may comprise any such substitutions.

任选地，所述突变的MspA单体包含氨基酸138上的A至P置换、氨基酸139上的E至A置换或其组合；氨基酸90上的D至K或R置换、氨基酸91上的D至N置换、氨基酸93上的D至N置换或其任何组合；氨基酸90上的D至Q置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合；氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合；氨基酸105上的I至W置换、氨基酸108上的N至W置换或其组合；或氨基酸118上的D至R置换、氨基酸139上的E至K置换、氨基酸134上的D至R置换或其任何组合。Optionally, the mutant MspA monomer comprises an A to P substitution at amino acid 138, an E to A substitution at amino acid 139, or a combination thereof; a D to K or R substitution at amino acid 90, a D to N substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof; a D to Q substitution at amino acid 90, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof; an L to W substitution at amino acid 88, an I to W substitution at amino acid 105, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof; an I to W substitution at amino acid 105, an N to W substitution at amino acid 108, or a combination thereof; or a D to R substitution at amino acid 118, an E to K substitution at amino acid 139, a D to R substitution at amino acid 134, or any combination thereof.

任何Msp孔蛋白可包含第一、第二或更多Msp单体序列，包括野生型MspA旁系同源物或其突变体，其中旁系同源物或其突变体是野生型MspB单体或其突变体。一个或多个Msp单体序列可包含SEQ ID NO:1、SEQ ID NO:2或其组合。任选地，第二Msp单体序列包括突变的MspB单体。任选地，第一Msp单体序列包括野生型MspA单体或其突变体并且第二Msp单体序列包括野生型MspB单体或其突变体。任选地，第一Msp单体序列包含SEQ ID NO:1并且第二Msp单体序列包含SEQ ID NO:2。Any Msp porin may comprise a first, second or more Msp monomer sequence, including a wild-type MspA paralog or a mutant thereof, wherein the paralog or mutant thereof is a wild-type MspB monomer or a mutant thereof. One or more Msp monomer sequences may comprise SEQ ID NO: 1, SEQ ID NO: 2 or a combination thereof. Optionally, the second Msp monomer sequence comprises a mutated MspB monomer. Optionally, the first Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof and the second Msp monomer sequence comprises a wild-type MspB monomer or a mutant thereof. Optionally, the first Msp monomer sequence comprises SEQ ID NO: 1 and the second Msp monomer sequence comprises SEQ ID NO: 2.

本文中描述了氨基酸连接体序列。在本文中的任何实施方案中，氨基酸连接体序列可以例如包含10至20个氨基酸。例如，氨基酸连接体包含15个氨基酸。任选地，氨基酸连接体序列包括(GGGGS)₃(SEQ ID NO:3)肽序列。Amino acid linker sequences are described herein. In any of the embodiments herein, the amino acid linker sequence can, for example, comprise 10 to 20 amino acids. For example, the amino acid linker comprises 15 amino acids. Optionally, the amino acid linker sequence comprises the (GGGGS) ₃ (SEQ ID NO: 3) peptide sequence.

本发明还涵盖由本文中描述的任何核酸序列编码的多肽。The present invention also encompasses polypeptides encoded by any of the nucleic acid sequences described herein.

还提供了编码部分或完整单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列或其任何亚组，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(b)编码氨基酸连接体序列的第9核苷酸序列。第一和第二Msp单体序列可独立地选自野生型Msp单体、突变的Msp单体、野生型MspA旁系同源物或同系物单体以及突变MspA旁系同源物或同系物单体。各Msp单体可包含野生型MspA单体或其突变体。任选地，至少一个Msp单体包含野生型MspA单体或其突变体。任选地，至少一个Msp单体包括突变的MspA单体。突变的Msp单体序列可包含本文中描述的任何突变。例如，一个或多个所述突变选自氨基酸138上的A至P置换、氨基酸139上的E至A或K置换、氨基酸90上的D至K或R或Q置换、氨基酸91上的D至N或Q置换、氨基酸93上的D至N置换、氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸108上的N至W置换、氨基酸118上的D至R置换和氨基酸134上的D至R置换。每一个Msp单体序列可包含SEQ ID NO:1。任选地，至少一个Msp单体序列包含SEQ ID NO:1。任选地，至少一个Msp单体序列包含野生型MspA旁系同源物或其突变体，其中所述MspA旁系同源物或其突变体是野生型MspB单体或其突变体。任选地，至少一个Msp单体序列包含SEQ IDNO:2。任选地，至少一个Msp单体序列包含突变的MspB单体。任选地，至少一个Msp单体序列包含野生型MspA单体或其突变体并且至少一个Msp单体序列包含野生型MspB单体或其突变体。任选地，至少一个Msp单体序列包含EQ ID NO:1并且至少一个Msp单体序列包含SEQ IDNO:2。还提供了由前述核酸序列的任一个编码的多肽。还提供了包含任何前述核酸序列的载体。所述载体还可包含启动子序列。所述启动子可包括组成型启动子。该组成型启动子可包括p_smyc启动子。该启动子可包括诱导型启动子。该诱导型启动子可包括乙酰胺诱导型启动子。Also provided is a nucleic acid sequence encoding a partial or complete single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences or any subset thereof, wherein the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences encode the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th Msp monomer sequences, respectively; and (b) a 9th nucleotide sequence encoding an amino acid connector sequence. The first and second Msp monomer sequences can be independently selected from wild-type Msp monomers, mutated Msp monomers, wild-type MspA paralogs or homologs monomers, and mutated MspA paralogs or homologs monomers. Each Msp monomer can comprise a wild-type MspA monomer or a mutant thereof. Optionally, at least one Msp monomer comprises a wild-type MspA monomer or a mutant thereof. Optionally, at least one Msp monomer comprises a mutated MspA monomer. The mutated Msp monomer sequence can comprise any mutation described herein. For example, one or more of the mutations are selected from an A to P substitution on amino acid 138, an E to A or K substitution on amino acid 139, a D to K or R or Q substitution on amino acid 90, a D to N or Q substitution on amino acid 91, a D to N substitution on amino acid 93, an L to W substitution on amino acid 88, an I to W substitution on amino acid 105, an N to W substitution on amino acid 108, a D to R substitution on amino acid 118, and a D to R substitution on amino acid 134. Each Msp monomer sequence may comprise SEQ ID NO: 1. Optionally, at least one Msp monomer sequence comprises SEQ ID NO: 1. Optionally, at least one Msp monomer sequence comprises a wild-type MspA paralog or a mutant thereof, wherein the MspA paralog or a mutant thereof is a wild-type MspB monomer or a mutant thereof. Optionally, at least one Msp monomer sequence comprises SEQ ID NO: 2. Optionally, at least one Msp monomer sequence comprises a mutated MspB monomer. Optionally, at least one Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof and at least one Msp monomer sequence comprises a wild-type MspB monomer or a mutant thereof. Optionally, at least one Msp monomer sequence comprises SEQ ID NO:1 and at least one Msp monomer sequence comprises SEQ ID NO:2. Also provided are polypeptides encoded by any of the aforementioned nucleic acid sequences. Also provided are vectors comprising any of the aforementioned nucleic acid sequences. The vectors may further comprise a promoter sequence. The promoter may comprise a constitutive promoter. The constitutive promoter may comprise a _psmyc promoter. The promoter may comprise an inducible promoter. The inducible promoter may comprise an acetamide inducible promoter.

还提供了能够诱导型表达Msp的突变的细菌菌株，所述细菌菌株包含：(a)野生型MspA的缺失；(b)野生型MspC的缺失；(c)野生型MspD的缺失；和(d)包含有效地连接于Msp单体核酸序列的诱导型启动子的载体。该细菌菌株还可包括耻垢分枝杆菌菌株ML16。Msp核酸可编码野生型MspA单体或野生型MspA旁系同源物或同系物单体。Msp核酸可编码选自野生型MspA单体、野生型MspC单体和野生型MspD单体的Msp单体。任选地，Msp核酸编码野生型MspA单体。诱导型启动子可包含乙酰胺诱导型启动子。所述细菌菌株还可包含野生型MspB的缺失。细菌菌株还可包含本文中描述的载体，例如包含有效地连接于编码Msp孔蛋白或单体的核酸序列的组成型启动子的载体。Msp可以是野生型MspA孔蛋白或单体或者野生型MspA旁系同源物或同系物孔蛋白或单体。Msp孔蛋白或单体可选自野生型MspA孔蛋白或单体、野生型MspB孔蛋白或单体、野生型MspC孔蛋白或单体以及野生型MspD孔蛋白或单体。任选地，Msp孔蛋白或单体是野生型MspA孔蛋白或单体。Also provided are mutant bacterial strains capable of inducible expression of Msp, comprising: (a) a deletion of wild-type MspA; (b) a deletion of wild-type MspC; (c) a deletion of wild-type MspD; and (d) a vector comprising an inducible promoter operatively linked to an Msp monomer nucleic acid sequence. The bacterial strain may also include Mycobacterium smegmatis strain ML16. The Msp nucleic acid may encode a wild-type MspA monomer or a wild-type MspA paralog or homolog monomer. The Msp nucleic acid may encode an Msp monomer selected from a wild-type MspA monomer, a wild-type MspC monomer, and a wild-type MspD monomer. Optionally, the Msp nucleic acid encodes a wild-type MspA monomer. The inducible promoter may comprise an acetamide-inducible promoter. The bacterial strain may also comprise a deletion of wild-type MspB. The bacterial strain may also comprise a vector described herein, for example, a vector comprising a constitutive promoter operatively linked to a nucleic acid sequence encoding an Msp porin or monomer. Msp can be a wild-type MspA porin or monomer or a wild-type MspA paralog or homolog porin or monomer. The Msp porin or monomer can be selected from a wild-type MspA porin or monomer, a wild-type MspB porin or monomer, a wild-type MspC porin or monomer, and a wild-type MspD porin or monomer. Optionally, the Msp porin or monomer is a wild-type MspA porin or monomer.

细菌菌株还可包含含有编码完整或部分单链Msp孔蛋白的核酸的载体，其中所述核酸包含：(a)第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列；和(b)编码氨基酸连接体序列的第三核苷酸序列。细菌菌株还可包含含有编码完整或部分单链Msp孔蛋白的核酸的载体，其中所述核酸包含：(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列或其任何亚组，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(b)编码氨基酸连接体序列的第9核苷酸序列。The bacterial strain may also comprise a vector comprising a nucleic acid encoding a complete or partial single-chain Msp porin, wherein the nucleic acid comprises: (a) a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes a first Msp monomer sequence and the second nucleotide sequence encodes a second Msp monomer sequence; and (b) a third nucleotide sequence encoding an amino acid linker sequence. The bacterial strain may also comprise a vector comprising a nucleic acid encoding a complete or partial single-chain Msp porin, wherein the nucleic acid comprises: (a) the first, second, third, fourth, fifth, sixth, seventh, and eighth nucleotide sequences or any subset thereof, wherein the first, second, third, fourth, fifth, sixth, seventh, and eighth nucleotide sequences encode the first, second, third, fourth, fifth, sixth, seventh, and eighth Msp monomer sequences, respectively; and (b) a ninth nucleotide sequence encoding an amino acid linker sequence.

还提供了产生完整或部分单链Msp孔蛋白的方法，所述方法包括：(a)用含有能够编码完整或部分单链Msp孔蛋白的核酸序列的载体转化本文中描述的细菌菌株；和(b)从细菌纯化完整或部分单链Msp孔蛋白。所述载体可包含编码完整或部分单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：(a)第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列；和(b)编码氨基酸连接体序列的第三核苷酸序列。所述载体可包含编码完整或部分单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列或其任意亚组，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(b)编码氨基酸连接体序列的第9核苷酸序列。Msp单体序列可独立地选自野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物单体以及突变的MspA旁系同源物或同系物单体。例如，Msp单体序列是野生型MspA单体。Also provided is a method for producing a complete or partial single-chain Msp porin, the method comprising: (a) transforming a bacterial strain described herein with a vector containing a nucleic acid sequence capable of encoding a complete or partial single-chain Msp porin; and (b) purifying the complete or partial single-chain Msp porin from the bacteria. The vector may comprise a nucleic acid sequence encoding a complete or partial single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes a first Msp monomer sequence and the second nucleotide sequence encodes a second Msp monomer sequence; and (b) a third nucleotide sequence encoding an amino acid linker sequence. The vector may comprise a nucleic acid sequence encoding a complete or partial single-chain Msp porin, wherein the nucleic acid sequence comprises: (a) the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences or any subset thereof, wherein the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences encode the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th Msp monomer sequences, respectively; and (b) a 9th nucleotide sequence encoding an amino acid connector sequence. The Msp monomer sequence can be independently selected from a wild-type MspA monomer, a mutated MspA monomer, a wild-type MspA paralog or homolog monomer, and a mutated MspA paralog or homolog monomer. For example, the Msp monomer sequence is a wild-type MspA monomer.

"耻垢分枝杆菌孔蛋白(Msp)"或"Msp孔蛋白"是指由两个或更多个Msp单体组成的多聚体复合物。Msp单体由耻垢分枝杆菌中的基因编码。耻垢分枝杆菌具有4个已鉴定的Msp基因，称为MspA、MspB、MspC和MspD。Msp孔蛋白可以例如由野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物单体、或突变的MspA旁系同源物或同系物单体组成。任选地，Msp孔蛋白是单链Msp孔蛋白或是若干单链Msp孔蛋白的多聚体。单链Msp孔蛋白可以例如包括由两个或更多个Msp单体(例如，8个单体)通过一个或多个氨基酸连接体肽连接的多聚体。部分单链Msp孔蛋白是指必需二聚化、三聚化等来形成孔蛋白的单链多聚体复合物。完全单链Msp孔蛋白是指形成孔蛋白而无需二聚化、三聚化等以形成孔蛋白的单链多聚体复合物。"Mycobacterium smegmatis porin (Msp)" or "Msp porin" refers to a multimeric complex composed of two or more Msp monomers. Msp monomers are encoded by genes in Mycobacterium smegmatis. Mycobacterium smegmatis has four identified Msp genes, referred to as MspA, MspB, MspC and MspD. The Msp porin can, for example, be composed of a wild-type MspA monomer, a mutated MspA monomer, a wild-type MspA paralog or homolog monomer, or a mutated MspA paralog or homolog monomer. Optionally, the Msp porin is a single-chain Msp porin or a multimer of several single-chain Msp porins. A single-chain Msp porin can, for example, comprise a multimer of two or more Msp monomers (e.g., 8 monomers) connected by one or more amino acid linker peptides. A partially single-chain Msp porin refers to a single-chain multimeric complex that must dimerize, trimerize, etc. to form a porin. A completely single-chain Msp porin refers to a porin that forms without dimerization, trimerization, etc. to form a single-chain multimeric complex of the porin.

本文中任何实施方案中的Msp孔蛋白可以是本文中描述的任何Msp孔蛋白，例如野生型MspA孔蛋白、突变的MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白或者突变的MspA旁系同源物或同系物孔蛋白。Msp孔蛋白可以由编码单链Msp孔蛋白的核酸序列编码。此处的任何Msp孔蛋白可包含本文中描述的任何Msp单体，例如突变的Msp单体。The Msp porin in any embodiment herein can be any Msp porin described herein, such as a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, or a mutant MspA paralog or homolog porin. The Msp porin can be encoded by a nucleic acid sequence encoding a single-chain Msp porin. Any Msp porin herein can comprise any Msp monomer described herein, such as a mutant Msp monomer.

营养物在分枝杆菌(mycobacteria)中通过野生型孔蛋白。野生型MspA孔蛋白、野生型MspB孔蛋白、野生型MspC孔蛋白和野生型MspD孔蛋白是野生型通道形成孔蛋白的实例。Msp孔蛋白可被进一步确定为本文中描述的任何Msp孔蛋白，包括旁系同源物、同系物、突变体和单链孔蛋白。Nutrients pass through wild-type porins in mycobacteria. Wild-type MspA porin, wild-type MspB porin, wild-type MspC porin, and wild-type MspD porin are examples of wild-type channel-forming porins. Msp porins can be further identified as any Msp porin described herein, including paralogs, homologs, mutants, and single-chain porins.

"突变的MspA孔蛋白"是与其相应的野生型MspA孔蛋白具有至少或至多70、75、80、85、90、95、98或99％或更多、或可来自其间的任何范围、但小于100％的同一性并且保持通道形成能力的多聚体复合物。突变的MspA孔蛋白可以是重组蛋白质。任选地，突变的MspA孔蛋白是在野生型MspA孔蛋白的缢缩区或前厅具有突变的孔蛋白。任选地，突变可存在于野生型MspA孔蛋白的周质环的边缘或外部。突变的MspA孔蛋白可用于本文中描述的任何实施方案。A "mutated MspA porin" is a multimeric complex that is at least or at most 70, 75, 80, 85, 90, 95, 98, or 99% or more, or any range therebetween, but less than 100%, identical to its corresponding wild-type MspA porin and retains channel-forming ability. The mutated MspA porin can be a recombinant protein. Optionally, the mutated MspA porin is a porin that has a mutation in the constriction or vestibule of the wild-type MspA porin. Optionally, the mutation can be present at the edge or exterior of the periplasmic loop of the wild-type MspA porin. The mutated MspA porin can be used in any of the embodiments described herein.

表1中提供了示例性野生型MspA旁系同源物和同源物。提供了野生型MspA旁系同源物，其包括野生型MspB、野生型MspC和野生型MspD。“旁系同源物”，如本文中定义的，是来自相同细菌种类的具有相似结构和功能的基因。"同源物"，如本文中定义的，是来自另一细菌种类的具有相似结构和进化来源的基因。例如，提供了野生型MspA同源物，其包括MppA、PorM1、PorM2、PorM1和Mmcs4296。Exemplary wild-type MspA paralogs and homologs are provided in Table 1. Wild-type MspA paralogs are provided, including wild-type MspB, wild-type MspC, and wild-type MspD. "Paralogs," as defined herein, are genes with similar structure and function from the same bacterial species. "Homologs," as defined herein, are genes with similar structure and evolutionary origin from another bacterial species. For example, wild-type MspA homologs are provided, including MppA, PorM1, PorM2, PorM1, and Mmcs4296.

"突变的MspA旁系同源物或同系物孔蛋白"是与其相应的野生型MspA旁系同源物或同系物孔蛋白具有至少或至多70、75、80、85、90、95、98或99％或更多、或来自其间的任何范围、但小于100％的同一性并且保持通道形成能力的多聚复合物。突变的MspA旁系同源物或同系物孔蛋白可以是重组蛋白质。任选地，突变的MspA旁系同源物或同系物孔蛋白是在野生型MspA旁系同源物或同系物孔蛋白的缢缩区或前厅中具有突变的孔蛋白。任选地，突变可存在于野生型MspA旁系同源物或同系物孔蛋白的周质环的边缘或外部。任何突变的MspA旁系同源物或同系物孔蛋白可用于本文中描述的任何实施方案，并且可包含本文中描述的任何突变。A "mutated MspA paralog or homolog porin" is a multimeric complex that has at least or at most 70, 75, 80, 85, 90, 95, 98, or 99% or more, or any range therebetween, but less than 100%, identity to its corresponding wild-type MspA paralog or homolog porin and retains channel-forming ability. The mutated MspA paralog or homolog porin can be a recombinant protein. Optionally, the mutated MspA paralog or homolog porin is a porin that has a mutation in the constriction or vestibule of the wild-type MspA paralog or homolog porin. Optionally, the mutation may be present at the edge or exterior of the periplasmic loop of the wild-type MspA paralog or homolog porin. Any mutated MspA paralog or homolog porin can be used in any embodiment described herein and can comprise any mutation described herein.

Msp孔蛋白可包含两个或更多个Msp单体。"Msp单体"是为野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物单体或者突变的MspA旁系同源物或同系物单体的蛋白质单体，并且当与一个或多个其他Msp单体结合时保持通道形成能力。本文中描述的任何Msp孔蛋白可包含一个或多个本文中描述的任何Msp单体。任何Msp孔蛋白可以包含例如2至15个Msp单体，其中每一个单体可以是相同的或不同的。The Msp porin may comprise two or more Msp monomers. An "Msp monomer" is a protein monomer that is a wild-type MspA monomer, a mutant MspA monomer, a wild-type MspA paralog or homolog monomer, or a mutant MspA paralog or homolog monomer, and that maintains channel-forming ability when combined with one or more other Msp monomers. Any Msp porin described herein may comprise one or more any Msp monomers described herein. Any Msp porin may comprise, for example, 2 to 15 Msp monomers, each of which may be the same or different.

"突变的MspA单体"是指与野生型MspA单体具有至少或至多70、75、80、85、90、95、98或99％或更多、或可来自其间的任何范围、但小于100％的同一性并且当与一个或多个其他Msp单体结合时保持形成通道的能力的Msp单体。任选地，突变的MspA单体被进一步确定为在促进完全形成的通道形成孔蛋白的前厅或缢缩区的形成的序列部分中包含突变。突变的Msp单体可以是例如重组蛋白。突变的MspA单体可包含本文中描述的任何突变。"MspA monomer of mutation" refers to an Msp monomer that has at least or at most 70, 75, 80, 85, 90, 95, 98 or 99% or more, or any range therebetween, but is less than 100% identity to a wild-type MspA monomer and that maintains the ability to form a channel when combined with one or more other Msp monomers. Optionally, the MspA monomer of mutation is further defined as comprising a mutation in the sequence portion that promotes the formation of the vestibule or constriction zone of a fully formed channel-forming porin. The Msp monomer of mutation can be, for example, a recombinant protein. The MspA monomer of mutation can comprise any mutation described herein.

"突变的MspA旁系同源物或同系物单体"是指与野生型MspA旁系同源物或同系物单体具有至少或至多70、75、80、85、90、95、98或99％或更多，或可来自其间的任何范围，但小于100％的同一性并且保持通道形成能力的MspA旁系同源物或同系物单体。任选地，突变的MspA旁系同源物或同系物单体被进一步确定为在序列的该部分包含突变，所述部分促进完全形成的通道形成孔蛋白的前厅和/或缢缩区的形成。突变的MspA旁系同源物或同系物单体可以例如是重组蛋白质。任何突变的MspA旁系同源物或同系物单体可以任选地用于本文中的任何实施方案。" the MspA paralogue of sudden change or homologue monomer " refer to and wild-type MspA paralogue or homologue monomer have at least or at the most 70,75,80,85,90,95,98 or 99% or more, or can be from any scope therebetween, but be less than 100% homogeneity and keep the MspA paralogue or homologue monomer of passage forming ability.Optionally, the MspA paralogue of sudden change or homologue monomer are further defined as comprising sudden change at this part of sequence, and described part promotes the formation of the vestibule and/or constriction zone of fully formed passage formation porin. The MspA paralogue of sudden change or homologue monomer can for example be recombinant protein. The MspA paralogue of any sudden change or homologue monomer can optionally be used for any embodiment herein.

Msp孔蛋白可表达为两个或更多个野生型MspA单体、突变的MspA单体、野生型MspA旁系同源物或同系物单体或突变的MspA旁系同源物或同系物单体的组合。这样，Msp孔蛋白可以是或可包括二聚体、三聚体、四聚体、五聚体、六聚体、七聚体、八聚体、九聚体等。例如，Msp孔蛋白可包括野生型MspA单体和野生型MspB单体的组合。Msp孔蛋白可包括1至15个单体，其中每一个单体是相同的或不同的。事实上，本文中描述的任何Msp孔蛋白可包含至少或至多1、2、3、4、5、6、7、8、9、10、11、12、13、14或15个或可来自其间的任何范围内的个数的单体，其中每一个单体是相同的或不同的。例如，Msp孔蛋白可包含一个或多个相同的或不同的突变MspA单体。作为另一个实例，Msp孔蛋白可包含至少一个突变的MspA单体和至少一个MspA旁系同源物或同系物单体。The Msp porin can be expressed as a combination of two or more wild-type MspA monomers, mutant MspA monomers, wild-type MspA paralogs or homologs monomers or mutant MspA paralogs or homologs monomers. Thus, the Msp porin can be or can include a dimer, trimer, tetramer, pentamer, hexamer, heptamer, octamer, nonamer, etc. For example, the Msp porin can include a combination of a wild-type MspA monomer and a wild-type MspB monomer. The Msp porin can include 1 to 15 monomers, each of which is the same or different. In fact, any Msp porin described herein can include at least or at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15 monomers, or a number that can be from any range therebetween, each of which is the same or different. For example, the Msp porin can include one or more identical or different mutant MspA monomers. As another example, an Msp porin can comprise at least one mutant MspA monomer and at least one MspA paralog or homolog monomer.

如上文中定义的，单链Msp孔蛋白包含两个或更多个通过一个或多个氨基酸连接体肽连接的Msp单体。包含2个Msp单体的单链Msp孔蛋白(其中所述Msp单体由氨基酸连接体序列连接)可称为单链Msp孔蛋白二聚体。包含8个Msp单体(其中所述Msp单体由氨基酸连接体序列连接)的单链Msp孔蛋白可称为单链Msp孔蛋白八聚体。单链Msp孔蛋白可包含通过氨基酸连接体序列连接的2、3、4、5、6、7、8、9、10、11、12、13、14、15或更多个Msp单体或可来自其间的任何范围内的Msp单体。任选地，单链Msp孔蛋白可以例如包括两个或更多个单链Msp孔蛋白二聚体、两个或更多个单链Msp孔蛋白三聚体、两个或更多个单链Msp孔蛋白四聚体、两个或更多个单链Msp孔蛋白五聚体、一个或多个单链Msp孔蛋白六聚体、一个或多个单链Msp孔蛋白七聚体、一个或多单链Msp孔蛋白八聚体或其组合。例如，单链Msp孔蛋白可包括一个单链Msp孔蛋白二聚体和两个单链Msp孔蛋白三聚体。作为另一个实例，单链Msp孔蛋白可包括一个单链Msp孔蛋白四聚体和两个单链Msp孔蛋白二聚体。As defined above, a single-chain Msp porin comprises two or more Msp monomers connected by one or more amino acid linker peptides. A single-chain Msp porin comprising two Msp monomers (wherein the Msp monomers are connected by an amino acid linker sequence) can be referred to as a single-chain Msp porin dimer. A single-chain Msp porin comprising eight Msp monomers (wherein the Msp monomers are connected by an amino acid linker sequence) can be referred to as a single-chain Msp porin octamer. A single-chain Msp porin can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 or more Msp monomers connected by an amino acid linker sequence or can be derived from any range of Msp monomers therein. Optionally, the single-chain Msp porin may, for example, include two or more single-chain Msp porin dimers, two or more single-chain Msp porin trimers, two or more single-chain Msp porin tetramers, two or more single-chain Msp porin pentamers, one or more single-chain Msp porin hexamers, one or more single-chain Msp porin heptamers, one or more single-chain Msp porin octamers, or a combination thereof. For example, a single-chain Msp porin may include one single-chain Msp porin dimer and two single-chain Msp porin trimers. As another example, a single-chain Msp porin may include one single-chain Msp porin tetramer and two single-chain Msp porin dimers.

野生型单链Msp孔蛋白由野生型Msp单体组成。任选地，单链Msp孔蛋白中的一个或多个突变存在于单链Msp孔蛋白的前厅或缢缩区中。突变的单链Msp孔蛋白例如与野生型单链Msp相比较在周质环、前厅或缢缩区的氨基酸序列中具有至少一个突变(例如，缺失、置换或添加)。单链的多聚体也可形成孔蛋白，其中每一个单链包括2、3、4、5、6、7或更多个Msp单体。The wild-type single-chain Msp porin is composed of wild-type Msp monomers. Optionally, one or more mutations in the single-chain Msp porin are present in the vestibule or constriction region of the single-chain Msp porin. The mutated single-chain Msp porin has at least one mutation (e.g., deletion, substitution, or addition) in the amino acid sequence of the periplasmic loop, vestibule, or constriction region compared to the wild-type single-chain Msp. Single-chain polymers can also form porins, wherein each single chain comprises 2, 3, 4, 5, 6, 7, or more Msp monomers.

本文中提供了编码Msp单体序列和其突变体的核酸序列。对于上文列出的突变的MspA单体序列，参照MspA序列是成熟野生型MspA单体序列(SEQ ID NO:1)。本文中提供的核酸序列中的每一个核苷酸序列可以例如包含突变的MspA单体序列。表7中提供了突变的MspA序列的非限制性实例。任选地，突变的MspA包含氨基酸138上的A至P置换、氨基酸139上的E至A置换或其组合。任选地，突变的MspA包含氨基酸90上的D至K或R置换、氨基酸91上的D至N置换、氨基酸93上的D至N置换或其任何组合。任选地，突变的MspA包含氨基酸90上的D至Q的置换、氨基酸91上的D至Q的置换、氨基酸93上的D至N的置换或其任何组合。任选地，突变的MspA包含氨基酸88上的L至W的置换、氨基酸105上的I至W的置换、氨基酸91上的D至Q的置换、氨基酸93上的D至N的置换或其任何组合。任选地，突变的MspA包含氨基酸105上的I至W的置换、氨基酸108上的N至W的置换或其组合；任选地，突变的MspA包含氨基酸118上的D至R的置换、氨基酸139上的E至K的置换、氨基酸134上的D至R的置换或其任何组合。对于下面所列的突变的MspB单体序列，参照MspB序列是成熟野生型MspB单体序列(SEQ ID NO:2)。任选地，突变的MspB包含氨基酸90上的D至K或R的置换、氨基酸91上的D至N的置换、氨基酸93上的D至N的置换或其任何组合。Provided herein is the nucleotide sequence of coding Msp monomeric sequence and its mutant.For the MspA monomeric sequence of the mutation listed above, it is mature wild-type MspA monomeric sequence (SEQ ID NO:1) with reference to the MspA sequence.Each nucleotide sequence in the nucleotide sequence provided herein can for example comprise the MspA monomeric sequence of mutation.The limiting examples of the MspA sequence of mutation is provided in Table 7.Optionally, the MspA of mutation comprises the A on amino acid 138 to P displacement, the E on amino acid 139 to A displacement or its combination.Optionally, the MspA of mutation comprises the D on amino acid 90 to K or R displacement, the D on amino acid 91 to N displacement, the D on amino acid 93 to N displacement or its any combination.Optionally, the MspA of mutation comprises the D on amino acid 90 to Q displacement, the D on amino acid 91 to Q displacement, the D on amino acid 93 to N displacement or its any combination. Optionally, the mutated MspA comprises an L to W substitution at amino acid 88, an I to W substitution at amino acid 105, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof. Optionally, the mutated MspA comprises an I to W substitution at amino acid 105, an N to W substitution at amino acid 108, or a combination thereof; Optionally, the mutated MspA comprises a D to R substitution at amino acid 118, an E to K substitution at amino acid 139, a D to R substitution at amino acid 134, or any combination thereof. For the mutated MspB monomer sequences listed below, the reference MspB sequence is the mature wild-type MspB monomer sequence (SEQ ID NO: 2). Optionally, the mutated MspB comprises a D to K or R substitution at amino acid 90, a D to N substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

本文中论述的野生型Msp单体的序列在万维网上公开于位于pubmed.gov的GenBank中，并且这类序列和其他序列以及其中包含的各个亚序列或片段以其全文通过引用合并入本文。例如，野生型MspA单体的核苷酸和氨基酸序列可分别见于GenBank登录号AJ001442和CAB56052。野生型MspB单体的核苷酸和氨基酸序列可以例如分别见于GenBank登录号NC_008596.1(从核苷酸600086至600730)和YP_884932.1。野生型MspC单体的核苷酸和氨基酸序列可以例如分别见于GenBank登录号AJ299735和CAC82509。野生型MspD单体的核苷酸和氨基酸序列可以例如分别见于GenBank登录号AJ300774和CAC83628。因此提供了MspA、MspB、MspC和MspD单体的核苷酸序列，所述核苷酸包含与前述核苷酸GenBank登录号的核苷酸序列至少约70、75、80、85、90、95、98、99％或更大、或可来自其间的任何范围内的同一性的核苷酸序列。还提供了MspA、MspB、MspC和MspD单体的氨基酸序列(图18)，所述序列包含与前述氨基酸序列GenBank登录号的序列至少约70、75、80、85、90、95、98、99％或更大或可来自其间的任何范围内的同一性的氨基酸序列。The monomeric sequence of the wild-type Msp discussed herein is disclosed in the GenBank at pubmed.gov on the World Wide Web, and this type of sequence and other sequences and each subsequence or fragment therein are incorporated herein by reference in their entirety. For example, the monomeric nucleotide and amino acid sequence of wild-type MspA can be found in GenBank accession numbers AJ001442 and CAB56052, respectively. The monomeric nucleotide and amino acid sequence of wild-type MspB can be found in GenBank accession numbers NC_008596.1 (from nucleotides 600086 to 600730) and YP_884932.1, respectively. The monomeric nucleotide and amino acid sequence of wild-type MspC can be found in GenBank accession numbers AJ299735 and CAC82509, respectively. The monomeric nucleotide and amino acid sequence of wild-type MspD can be found in GenBank accession numbers AJ300774 and CAC83628, respectively. Thus, provided are nucleotide sequences of MspA, MspB, MspC, and MspD monomers comprising a nucleotide sequence that is at least about 70, 75, 80, 85, 90, 95, 98, 99% or greater, or any range therebetween, identical to the nucleotide sequence of the aforementioned GenBank Accession No. Also provided are amino acid sequences of MspA, MspB, MspC, and MspD monomers ( FIG. 18 ), comprising an amino acid sequence that is at least about 70, 75, 80, 85, 90, 95, 98, 99% or greater, or any range therebetween, identical to the sequence of the aforementioned GenBank Accession No.

还提供了MspA旁系同源物和同源单体的氨基酸序列，所述MspA旁系同源物和同源单体包含与野生型MspA旁系同源物或是同源单体至少约70、75、80、85、90、95、98、99％或更大，或可来源于其的任何范围内的同一性的氨基酸序列。野生型MspA旁系同源物和同源单体在本领域内是公知的。表1提供了这类旁系同源物和同源物的非限定性列表：Also provide the aminoacid sequence of MspA paralogue and homologous monomer, described MspA paralogue and homologous monomer comprise and wild-type MspA paralogue or homologous monomer at least about 70,75,80,85,90,95,98,99% or larger, or can be derived from the aminoacid sequence of the homology in any scope thereof.Wild-type MspA paralogue and homologous monomer are well known in the art.Table 1 provides the non-limiting list of this type of paralogue and homologue:

表1.野生型MspA和野生型MspA旁系同源物和同源单体Table 1. Wild-type MspA and wild-type MspA paralogs and homologous monomers

只包括在蛋白质的全长范围上具有显著氨基酸相似性的蛋白质。利用PSI-Blast算法(BLOSUM62矩阵)，使用万维网上ncbi.nlm.nih.gov/blast/Blast.cgi上的NIHGenBank数据库获得数据。Only proteins with significant amino acid similarity over the full length of the protein were included.Data were obtained using the PSI-Blast algorithm (BLOSUM62 matrix) using the NIH GenBank database available on the World Wide Web at ncbi.nlm.nih.gov/blast/Blast.cgi.

n.d.:"未测定的”n.d.: "undetermined"

*Stahl等人，Mol.Microbiol.40:451(2001)*Stahl et al., Mol. Microbiol. 40:451 (2001)

**Dorner等人，Biochim.Biophys.Acta.1667:47-55(2004)**Dorner et al., Biochim. Biophys. Acta. 1667:47-55 (2004)

可进一步修饰和改变本文中描述的肽、多肽、单体、多聚体、蛋白质等，只要期望的功能得到维持或增强即可。应理解，确定本文中公开的基因和蛋白质的任何已知的修饰和衍生物或可能产生的修饰和衍生物的一个方法是通过根据与特定的已知序列的同一性确定修饰和衍生物。具体地公开了与野生型MspA和野生型MspA旁系同源物或同系物(例如，野生型MspB、野生型MspC、野生型MspD、MppA、PorM1、Mmcs4296)和本文中提供的突变体具有至少70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99％的同一性的多肽。The peptides, polypeptides, monomers, polymers, proteins, etc. described herein can be further modified and changed as long as the desired function is maintained or enhanced. It should be understood that any known modification and derivative of the genes and proteins disclosed herein or a method for the modification and derivative that may be produced is by determining the modification and derivative based on the identity with a specific known sequence. Specifically disclosed are polypeptides having at least 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% identity with wild-type MspA and wild-type MspA paralogs or homologs (e.g., wild-type MspB, wild-type MspC, wild-type MspD, MppA, PorM1, Mmcs4296) and mutants provided herein.

本领域技术人员容易理解测定两个多肽的同一性的方法。例如，可在对齐两个序列(以使同一性处于其最高水平)后计算同一性。例如，为了测定两个氨基酸序列或两个核酸序列的“同一性百分比”，就最佳比较目的对齐序列(例如，为了与第二氨基酸或核酸序列的最佳比对，可在第一氨基酸或核酸序列中引入缺口)。然后比较位于相应的氨基酸位置或核苷酸位置上的氨基酸残基或核苷酸。当第一序列中的位置由与第二序列中的相应位置上的相同氨基酸残基或核苷酸占据时，那么所述分子在该位置上是同一的。两个序列之间的同一性百分比是由各序列共有的同一位置的数量的函数(即，同一性百分比＝同一位置的数量/位置的总数(例如，重叠的位置)x 100)。在一个实施方案中，两个序列长度相同。Those skilled in the art will readily understand the method for measuring the identity of two polypeptides. For example, identity can be calculated after aligning two sequences (so that identity is at its highest level). For example, in order to measure the "percentage identity" of two amino acid sequences or two nucleic acid sequences, sequences are aligned for optimal comparison purposes (for example, for optimal comparison with a second amino acid or nucleic acid sequence, a gap can be introduced into the first amino acid or nucleic acid sequence). The amino acid residues or nucleotides positioned at the corresponding amino acid positions or nucleotide positions are then compared. When the position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, the molecule is identical at that position. The percentage identity between two sequences is a function of the number of identical positions shared by each sequence (i.e., the total number of the number/position of identical positions (e.g., overlapping positions) x 100 of percentage identity = identical positions). In one embodiment, the two sequences are identical in length.

有几种方法可用于测定同一性百分比。一种方法可以以下列方式测定同一性百分比。使用来自包括BLASTN 2.0.14版和BLASTP2.0.14版的BLASTZ独立版本的BLAST 2序列(Bl2seq)程序将靶核酸或氨基酸序列与鉴定的核酸或氨基酸序列相比较。可从美国政府的美国国家生物技术信息中心网站(万维网址ncbi.nlm.nih.gov)获得BLASTZ的独立版本。解释如何使用Bl2seq程序的说明书可见于BLASTZ附带的帮助文件。There are several methods for determining percent identity. One method can determine percent identity in the following manner. The target nucleic acid or amino acid sequence is compared to the identified nucleic acid or amino acid sequence using the BLAST 2 sequence (Bl2seq) program from a standalone version of BLASTZ, including BLASTN version 2.0.14 and BLASTP version 2.0.14. A standalone version of BLASTZ can be obtained from the U.S. government's National Center for Biotechnology Information website (www.ncbi.nlm.nih.gov). Instructions for explaining how to use the Bl2seq program can be found in the help file that comes with BLASTZ.

Bl2seq利用BLASTN或BLASTP算法进行两个序列之间的比较。BLASTN用于比较核酸序列，而BLASTP用于比较氨基酸序列。为了比较两个核酸序列，可如下设置选择：将–i设定为含有待比较的第一核酸序列的文件(例如，C:\seq1.txt)；将-j设定为含有待比较的第二核酸序列的文件(例如，C:\seq2.txt)；将-p设定为blastn；将-o设定为任何期望的文件名(例如，，C:\output.txt)；将-q设定为-1；将-r设定为2；并且所有其他选项保持为其缺省设置。下列命令将产生包含两个序列之间的比较的输出文件：C:\Bl2seq-i c:\seq1.txt-jc:\seq2.txt-p blastn-o c:\output.txt-q-1-r 2。如果靶序列与已鉴定的序列的任何部分共有同源性，那么该指定的输出文件将以对齐的序列提供这类同源的区域。如果靶序列与已鉴定的序列的任何部分不共有同源性，那么该指定的输出文件将不提供对齐的序列。Bl2seq uses the BLASTN or BLASTP algorithm to compare two sequences. BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare amino acid sequences. To compare two nucleic acid sequences, the following options can be set: set -i to the file containing the first nucleic acid sequence to be compared (e.g., C:\seq1.txt); set -j to the file containing the second nucleic acid sequence to be compared (e.g., C:\seq2.txt); set -p to blastn; set -o to any desired file name (e.g., C:\output.txt); set -q to -1; set -r to 2; and keep all other options at their default settings. The following command will generate an output file containing the comparison between the two sequences: C:\Bl2seq -i c:\seq1.txt -j c:\seq2.txt -p blastn -o c:\output.txt -q -1 -r 2. If the target sequence shares homology with any portion of the identified sequence, the specified output file will provide such regions of homology as aligned sequences. If the target sequence does not share homology with any portion of the identified sequences, then the specified output file will not provide aligned sequences.

一旦对齐，通过计数来自靶序列的与来自已鉴定的序列的始于任何匹配的位置并且终止于任何其他匹配的位置的序列的比对中的连续核苷酸的数量来测定长度。匹配的位置是靶序列和已鉴定的序列中提供相同核苷酸的任何位置。不计数靶序列中提供的缺口，因为缺口不是核苷酸。同样，不计数已鉴定的序列中提供的缺口，因为计数的是靶序列核苷酸，而非来自已鉴定的序列的核苷酸。Once aligned, length is determined by counting the number of consecutive nucleotides in the alignment of the target sequence with the sequence from the identified sequence starting at any matching position and ending at any other matching position. A matching position is any position in the target sequence and the identified sequence that provide the same nucleotide. Gaps provided in the target sequence are not counted because gaps are not nucleotides. Similarly, gaps provided in the identified sequence are not counted because target sequence nucleotides are counted, not nucleotides from the identified sequence.

可这样测定特定长度范围上的同一性百分比，即通过计数在该长度范围上匹配的位置的数量并且将该数量除以长度，然后将所得的值乘法以100来测定。例如，如果(1)将50个核苷酸的靶序列与编码野生型MspA的序列相比较，(2)Bl2seq程序提供与编码野生型MspA的序列的区域对齐的来自靶序列的45个核苷酸，其中该45个核苷酸区域的第一和最后一个核苷酸是匹配的，并且(3)在该45个对齐的核苷酸范围上匹配的数量是40，那么所述50个核苷酸的靶序列包含45个核苷酸的长度并且在该长度范围上的同一性百分比为89(即，40/45x100＝89)。The percent identity over a particular length range can be determined by counting the number of positions that are matched over the length range and dividing that number by the length, and then multiplying the resulting value by 100. For example, if (1) a 50 nucleotide target sequence is compared to a sequence encoding wild-type MspA, (2) the Bl2seq program provides 45 nucleotides from the target sequence that align with a region of the sequence encoding wild-type MspA, where the first and last nucleotides of the 45 nucleotide region are matched, and (3) the number of matches over the 45 aligned nucleotide range is 40, then the 50 nucleotide target sequence comprises a length of 45 nucleotides and the percent identity over the length range is 89 (i.e., 40/45 x 100 = 89).

计算同一性的另一个方法可通过公布的算法来进行。用于比较的序列的最佳比对可利用Smith和Waterman，Adv.Appl.Math 2:482(1981)的局部同一性算法，利用Needleman和Wunsch，J.Mol.Biol.48:443(1970)的同一性比对算法，利用Pearson和Lipman，Proc.Natl.Acad.Sci.USA 85:2444(1988)的相似性搜索算法(search for similaritymethod)，利用此类算法的计算机化操作(Wisconsin Genetics软件包中的GAP、BESTFIT、FASTA和TFASTA，Genetics Computer Group，575Science Dr.，Madison，WI)或利用目测观察来进行。Another method for calculating identity can be performed by published algorithms. Optimal alignment of sequences for comparison can be performed using the local identity algorithm of Smith and Waterman, Adv. Appl. Math 2:482 (1981), using the identity alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970), using the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988), using computerized versions of such algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or using visual inspection.

可以例如利用例如Zuker，Science 244:48-52(1989)；Jaeger等人，Proc.Natl.Acad.Sci.USA 86:7706-10(1989)；Jaeger等人，Methods Enzymol.183:281-306(1989)(其因至少与核酸比对实质相关而通过引用合并入本文)中公开的算法获得核酸的相同类型的同一性。应理解，通常可使用所述方法的任何方法，并且在某些情况下，这类不同方法的结果可能不同，但本领域技术人员理解，如果利用这类方法的至少一个方法发现有同一性，那么所述序列就被认为具有声明的同一性并且公开于本文中。The same type of identity for nucleic acids can be obtained, for example, using algorithms disclosed in, for example, Zuker, Science 244:48-52 (1989); Jaeger et al., Proc. Natl. Acad. Sci. USA 86:7706-10 (1989); Jaeger et al., Methods Enzymol. 183:281-306 (1989), which are incorporated herein by reference for at least substantial relevance to nucleic acid alignments. It will be understood that generally any of the methods described can be used, and that in some cases the results of different such methods may differ, but one skilled in the art understands that if identity is found using at least one of such methods, then the sequence is considered to have the claimed identity and is disclosed herein.

还公开了编码本文中公开的蛋白质序列的核酸及其变体和片段。这类序列包括与特定蛋白质序列相关的全部简并序列，即具有编码一个特定蛋白质序列的序列的全部核酸以及编码所述蛋白质序列的公开的变体和衍生物的全部核酸，包括简并核酸。因此，虽然每一个特定的核酸序列可能未在本文中书面写出，但应理解：每一个序列事实上通过公开的蛋白质序列在本文中得到了公开和描述。Also disclosed are nucleic acids encoding the protein sequences disclosed herein, as well as variants and fragments thereof. Such sequences include all degenerate sequences related to a particular protein sequence, i.e., all nucleic acids having a sequence encoding a particular protein sequence, as well as all nucleic acids encoding disclosed variants and derivatives of said protein sequence, including degenerate nucleic acids. Thus, although each specific nucleic acid sequence may not be set forth herein, it should be understood that each sequence is in fact disclosed and described herein by the disclosed protein sequence.

Msp孔蛋白或单体的片段和部分序列可用于本文中描述的方法。关于所有肽、多肽和蛋白质，包括其片段，应理解可产生本文中公开的Msp多肽的氨基酸序列中的另外的修饰，所述修饰不改变肽、多肽和蛋白质的性质或功能。应理解，对这些修饰的唯一限制是实用性，它们必须包括用于相关实施方案所必需的功能元件(例如，通道形成能力)。此类修饰包括保守氨基酸置换并且在下文中更详细地进行说明。Fragments and partial sequences of Msp porins or monomers can be used in the methods described herein. With respect to all peptides, polypeptides, and proteins, including fragments thereof, it will be understood that additional modifications in the amino acid sequence of the Msp polypeptides disclosed herein can be made without altering the properties or function of the peptides, polypeptides, and proteins. It will be understood that the only limitation to these modifications is practicality, and they must include the functional elements necessary for the relevant embodiments (e.g., channel forming ability). Such modifications include conservative amino acid substitutions and are described in more detail below.

确定蛋白质是否是通道形成蛋白的方法在本领域内是公知的。通过确定蛋白质是否插入双分子层可确定Msp是否形成通道，如在下列实施例2中所说明的：如果蛋白质插入双分子层，那么孔蛋白是通道形成蛋白。通常，通道形成可通过观察电导的不连续变化来检测。参见，例如图2，实施例2和Niederweis等人，Mol.Microbiol.33:933(1999)。本文中描述了双分子层。Methods for determining whether a protein is a channel-forming protein are well known in the art. Whether an Msp forms a channel can be determined by determining whether the protein inserts into the bilayer, as illustrated in Example 2 below: If the protein inserts into the bilayer, then the porin is a channel-forming protein. Typically, channel formation can be detected by observing discontinuous changes in conductance. See, for example, FIG2 , Example 2, and Niederweis et al., Mol. Microbiol. 33:933 (1999). Bilayers are described herein.

如上文中所表明的，Msp孔蛋白通常能够被插入脂双层或其他薄膜中，所述脂双层或薄膜各自在本领域内是公知的。在本文中解释了将突变的MspA孔蛋白插入脂双层的实例；该技术同样可用于其他Msp孔蛋白。此外，美国专利6,746,594(通过引用合并入本文)描述了多种脂双层和薄膜，包括无机材料，其可用于本文中描述的Msp孔蛋白。美国专利6,267,872(以其全文通过引用合并入本文)中描述的方法、装置和技术也可用于本文中描述的Msp孔蛋白。As indicated above, Msp porins are generally capable of being inserted into lipid bilayers or other membranes, each of which is well known in the art. Examples of inserting mutant MspA porins into lipid bilayers are explained herein; this technique is equally applicable to other Msp porins. In addition, U.S. Patent No. 6,746,594 (incorporated herein by reference) describes a variety of lipid bilayers and membranes, including inorganic materials, that can be used with the Msp porins described herein. The methods, devices, and techniques described in U.S. Patent No. 6,267,872 (incorporated herein by reference in its entirety) can also be used with the Msp porins described herein.

此外，脂双层中可包括超过一个Msp孔蛋白。例如脂双层中可包含2、3、4、5、10、20、200、2000或更多个Msp孔蛋白。任选地，2至1010范围内的任何多个Msp孔蛋白可用于本文中描述的方法。这样的多种Msp孔蛋白可以以Msp孔蛋白的聚簇的形式存在。聚簇可以随机装配或可采用一定的模式。如本文中所用的，"聚簇"是指聚在一起并且作为一个单位移动(但彼此并非共价结合)的分子。In addition, more than one Msp porin may be included in the lipid bilayer. For example, the lipid bilayer may contain 2, 3, 4, 5, 10, 20, 200, 2000, or more Msp porins. Optionally, any number of Msp porins in the range of 2 to 1010 may be used in the methods described herein. Such multiple Msp porins may be present in the form of clusters of Msp porins. Clusters may be randomly assembled or may adopt a specific pattern. As used herein, "cluster" refers to molecules that come together and move as a unit (but are not covalently bound to each other).

任选地，Msp孔蛋白不会自发地门控(gate)。"门控"是指通常为短暂(例如，持续短至1-10毫秒至多至秒)的通过蛋白质的通道的电导自发变化。长持续时间的门控事件通常可通过改变极性来逆转。在大多数情况下，门控的概率可通过施加更高电压来增加。取决于例如前厅和缢缩区的组成以及蛋白质浸没于其中的液体介质的性质，门控和通过通道的导电程度在Msp孔蛋白之间是高度可变的。通常，蛋白质在门控期间导电性减小，并且作为结果电导可永久性停止(即，通道可永久性关闭)，以致该过程不可逆。任选地，门控是指通过蛋白质的通道的电导自发地改变至低于其开放态电流的75％。Optionally, the Msp porin does not gate spontaneously. "Gating" refers to a spontaneous change in the conductance of the channel through the protein that is typically short-lived (e.g., lasting as short as 1-10 milliseconds to as much as a second). Long-duration gating events can generally be reversed by changing polarity. In most cases, the probability of gating can be increased by applying a higher voltage. Gating and the degree of conduction through the channel are highly variable between Msp porins, depending on, for example, the composition of the vestibule and constriction and the properties of the liquid medium in which the protein is immersed. Typically, the protein's conductivity decreases during gating, and as a result, the conductance may cease permanently (i.e., the channel may close permanently), such that the process is irreversible. Optionally, gating refers to the conductance of the channel through the protein spontaneously changing to less than 75% of its open-state current.

各种条件，例如光和接触Msp孔蛋白的液体介质(包括其pH、缓冲液组成、去垢剂组成和温度)，可暂时或永久性地影响Msp孔蛋白的行为，特别是对于其通过通道的电导以及分析物相对于通道的运动。Various conditions, such as light and the liquid medium contacting the Msp porin (including its pH, buffer composition, detergent composition, and temperature), can temporarily or permanently affect the behavior of the Msp porin, particularly its conductance through the channel and the movement of analytes relative to the channel.

特别相关的是Msp孔蛋白，尤其是MspA孔蛋白的通道的几何形状。Msp孔蛋白的几何形状可提供提高的空间分辨率。此外，野生型MspA孔蛋白是非常稳定的并且在暴露于任何pH后和在极端温度(例如，高至100℃持续长达30分钟和在高至80℃下温育长达15分钟)下提取后保持通道形成活性。可使用本文中描述的体外测定就其期望的活性测试所述多肽。Of particular relevance is the geometry of the channels of the Msp porins, particularly the MspA porins. The geometry of the Msp porins can provide improved spatial resolution. In addition, the wild-type MspA porin is very stable and retains channel-forming activity after exposure to any pH and after extraction at extreme temperatures (e.g., up to 100° C. for up to 30 minutes and incubation at up to 80° C. for up to 15 minutes). The polypeptides can be tested for their desired activities using the in vitro assays described herein.

特别地对于MspA孔蛋白而言，任选地，MspA孔蛋白是由8个184个氨基酸的MspA单体组成的八聚体。可在野生型MspA孔蛋白的一个或多个氨基酸MspA单体中产生一个或多个突变来产生突变的MspA孔蛋白。此外，MspA孔蛋白可具有少于或多于8个单体，其中任何一个或多个可包含突变。Specifically for the MspA porin, optionally, the MspA porin is an octamer composed of eight 184 amino acid MspA monomers. One or more mutations can be generated in one or more amino acid MspA monomers of the wild-type MspA porin to generate a mutant MspA porin. Furthermore, the MspA porin can have fewer than or more than eight monomers, any one or more of which can contain a mutation.

此外，野生型MspA孔蛋白包含由13个氨基酸组成的并且正好与缢缩区相邻的周质环。参见Huff等人，J.Biol.Chem.284:10223(2009)。野生型MspB、C和D孔蛋白还包含周质环。一个或多个突变可存在于野生型Msp孔蛋白的周质环中，从而产生突变的Msp孔蛋白。例如，在野生型MspA孔蛋白的周质环中可发生多达全部13个氨基酸的缺失。通常，周质环中的缺失不影响Msp孔蛋白的通道形成能力。In addition, the wild-type MspA porin contains a periplasmic loop consisting of 13 amino acids and located just adjacent to the constriction. See Huff et al., J. Biol. Chem. 284:10223 (2009). Wild-type MspB, C, and D porins also contain a periplasmic loop. One or more mutations may be present in the periplasmic loop of a wild-type Msp porin, thereby generating a mutant Msp porin. For example, deletions of up to all 13 amino acids may occur in the periplasmic loop of a wild-type MspA porin. Generally, deletions in the periplasmic loop do not affect the channel-forming ability of an Msp porin.

还可化学或生物地修饰Msp孔蛋白或Msp单体。例如，可用化学物质修饰Msp孔蛋白或Msp单体，以产生二硫桥，这对于本领域技术人员来说是已知的。The Msp porin or Msp monomer can also be modified chemically or biologically. For example, the Msp porin or Msp monomer can be modified with chemicals to generate disulfide bridges, which are known to those skilled in the art.

Msp孔蛋白可包含核苷酸结合位点。如本文中所使用的，“核苷酸结合位点”是指Msp孔蛋白中核苷酸停留在其上与氨基酸接触或位于所述氨基酸上的时间长于因扩散运动而停留或位于其上的时间(例如长于1皮秒或1纳秒)的位置。分子动力学计算可用于估计这类短暂静止时间。The Msp porin may comprise a nucleotide binding site. As used herein, a "nucleotide binding site" refers to a location in the Msp porin where a nucleotide remains in contact with or on an amino acid for a longer period than due to diffusion (e.g., longer than 1 picosecond or 1 nanosecond). Molecular dynamics calculations can be used to estimate such short resting times.

"前厅"是指Msp孔蛋白内部的圆锥形部分，其直径通常沿中心轴从一端至另一端减小，其中前厅的最窄的部分连接于缢缩区。前厅还可称为“高脚杯(goblet)”。关于野生型MspA孔蛋白的前厅的实例，参见图1。前厅和缢缩区一起界定了Msp孔蛋白的通道。The "vestibule" refers to the conical portion of the interior of the Msp porin, whose diameter generally decreases from one end to the other along the central axis, with the narrowest part of the vestibule connected to the constriction. The vestibule may also be referred to as a "goblet." For an example of the vestibule of the wild-type MspA porin, see Figure 1. Together, the vestibule and the constriction define the channel of the Msp porin.

当提及前厅的直径时，应理解，因为前厅在形状上是圆锥样的，因此直径沿中心轴的路径而变化，其中直径在一端比另一端大。直径的范围可以是约2nm至约6nm。任选地，直径为约，至少约或至多约2、2.1、2.2、2.3、2.4、2.5、2.6、2.7、2.8、2.9、3.0、3.1、3.2、3.3、3.4、3.5、3.6、3.7、3.8、3.9、4.0、4.1、4.2、4.3、4.4、4.5、4.6、4.7、4.8、4.9、5.0、5.1、5.2、5.3、5.4、5.5、5.6、5.7、5.8、5.9或6.0nm，或可来自其间的任何范围。中心轴的长度范围可以是约2nm至约6nm。任选地，长度为约，至少约或至多约2、2.1、2.2、2.3、2.4、2.5、2.6、2.7、2.8、2.9、3.0、3.1、3.2、3.3、3.4、3.5、3.6、3.7、3.8、3.9、4.0、4.1、4.2、4.3、4.4、4.5、4.6、4.7、4.8、4.9、5.0、5.1、5.2、5.3、5.4、5.5、5.6、5.7、5.8、5.9或6.0nm，或可来自其间的任何范围。当在本文中提及“直径”时，可通过测量中心至中心距离或原子的表面至表面距离来测定直径。When referring to the diameter of the antechamber, it is understood that because the antechamber is conical in shape, the diameter varies along the path of the central axis, with the diameter being larger at one end than the other. The diameter can range from about 2 nm to about 6 nm. Optionally, the diameter is about, at least about, or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, or 6.0 nm, or any range therebetween. The length range of central axis can be about 2nm to about 6nm.Optionally, length is about, at least about or at most about 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9 or 6.0nm, or can be from any scope therebetween.When mentioning " diameter " in this article, diameter can be determined by measuring center to center distance or atomic surface to surface distance.

"缢缩区"是就直径而言Msp孔蛋白的通道的最窄部分，其与前厅连接。野生型MspA孔蛋白的缢缩区示于图1中(标记为"内缢缩")。缢缩区的长度范围可以是约0.3nm至约2nm。任选地，长度是约，至多约或至少约0.3、0.4、0.5、0.6、0.7、0.8、0.9、1.0、1.1、1.2、1.3、1.4、1.5、1.6、1.7、1.8、1.9、2或3nm，或可来自其间的任何范围。缢缩区的直径范围可以是约0.3nm至约2nm。任选地，直径是约，至多约或至少约0.3、0.4、0.5、0.6、0.7、0.8、0.9、1.0、1.1、1.2、1.3、1.4、1.5、1.6、1.7、1.8、1.9、2或3nm，或在可来自其间的任何范围内。The "constriction" is the narrowest part of the channel of the Msp porin in terms of diameter, which is connected to the vestibule. The constriction of the wild-type MspA porin is shown in Figure 1 (labeled "inner constriction"). The length of the constriction can range from about 0.3 nm to about 2 nm. Optionally, the length is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or any range therebetween. The diameter of the constriction can range from about 0.3 nm to about 2 nm. Optionally, the diameter is about, at most about, or at least about 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, or 3 nm, or in any range therebetween.

"中性缢缩区"是指包含当浸没于水溶液中时总体上不展示净电荷的氨基酸侧链的缢缩区。与缢缩区接触的液体介质(例如，缓冲水溶液)的pH可以影响缢缩区是否被表征为中性。A "neutral constriction" refers to a constriction comprising amino acid side chains that generally exhibit no net charge when immersed in an aqueous solution. The pH of the liquid medium (e.g., a buffered aqueous solution) in contact with the constriction can affect whether the constriction is characterized as neutral.

"通道"是指由前厅和缢缩区界定的Msp的中央空心部分，气体、液体、离子或分析物可通过其。The "channel" refers to the central hollow portion of the Msp defined by the vestibule and constriction, through which gases, liquids, ions, or analytes can pass.

如本文中所使用的，"正面"是指分析物通过其进入通道或分析物移动横穿其表面的Msp通道的一侧。As used herein, "front side" refers to the side of an Msp channel through which analytes enter the channel or across whose surface analytes move.

如本文中所使用的，"反面"是指分析物(或其片段)通过其排出通道或分析物不移动穿过其表面的Msp通道的一侧。As used herein, "reverse side" refers to the side of the Msp channel through which the analyte (or fragment thereof) exits the channel or the surface through which the analyte does not move.

如本文中所使用的，"通过电泳转位分析物"和其语法变型，是指对与一种或多种溶液接触(例如，浸没在溶液中)的Msp孔蛋白施加电场，以便电流流过Msp孔蛋白通道。电场移动分析物以便其与通道相互作用。“相互作用”，其意指分析物移入和任选地穿过通道，其中"穿过Msp通道"(或"转位")意指进入通道的一侧并且移向和移出通道的另一侧。As used herein, "translocation of an analyte by electrophoresis" and grammatical variations thereof refers to applying an electric field to an Msp porin in contact with (e.g., immersed in) one or more solutions so that current flows through an Msp porin channel. The electric field moves the analyte so that it interacts with the channel. "Interacting" means that the analyte moves into and, optionally, through the channel, where "through an Msp channel" (or "translocation") means entering one side of the channel and moving toward and out of the other side of the channel.

特别地涉及，本文中论述的任何分析物在本文描述的任何实施方案中可通过电泳或其他方式转位通过Msp孔蛋白通道。在这一点上，明确地提及：除非明确指出，否则包括转位的本文中的任何实施方案可以指电泳转位或非电泳转位。任选地，不使用电泳转位的方法也涵盖在内。In particular, any analyte discussed herein can be translocated through the Msp porin channel by electrophoresis or other means in any embodiment described herein. In this regard, it is expressly mentioned that unless otherwise specified, any embodiment herein that includes translocation can refer to electrophoretic translocation or non-electrophoretic translocation. Optionally, methods that do not use electrophoretic translocation are also contemplated.

"液体介质"包括水性、有机-水性和仅有机的液体介质。有机介质包括例如甲醇、乙醇、二甲基亚砜和其混合物。可用于本文中描述的方法的液体在本领域内是公知的。此类介质包括电导液体介质的说明和实例提供于例如美国专利7,189,503中，其以全文通过引用合并入本文。可向这样的介质中加入盐、去垢剂或缓冲剂。此类试剂可用于改变液体介质的pH或离子强度。改变粘度的物质例如甘油或各种聚合物(例如，聚乙烯吡咯烷酮、聚乙二醇、聚乙烯醇、纤维素聚合物)和其混合物可包含在液体介质中。测量粘度的方法在本领域内是公知的。可向液体介质中加入的任何试剂还可改变被研究的分析物的速度。这样，改变速度的试剂可以是盐、去垢剂、缓冲剂、改变粘度的物质或加入液体介质的可增加或减少分析物速度的任何其他物质。"Liquid media" include aqueous, organic-aqueous, and purely organic liquid media. Organic media include, for example, methanol, ethanol, dimethyl sulfoxide, and mixtures thereof. Liquids that can be used in the methods described herein are well known in the art. Descriptions and examples of such media, including conductive liquid media, are provided in, for example, U.S. Patent 7,189,503, which is incorporated herein by reference in its entirety. Salts, detergents, or buffers can be added to such media. Such reagents can be used to change the pH or ionic strength of the liquid medium. Viscosity-altering substances such as glycerol or various polymers (e.g., polyvinyl pyrrolidone, polyethylene glycol, polyvinyl alcohol, cellulosic polymers), and mixtures thereof, can be included in the liquid medium. Methods for measuring viscosity are well known in the art. Any reagent that can be added to the liquid medium can also change the speed of the analyte being studied. Thus, the speed-altering reagent can be a salt, detergent, buffer, a viscosity-altering substance, or any other substance added to the liquid medium that can increase or decrease the speed of the analyte.

通常，本文中使用的分析物在至少一种与本文中描述的Msp接触的液体介质中是可溶性的或部分可溶的。任何分析物可用于本文中，包括例如核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、生物战剂、污染物、纳米级物体或包括这类分析物的一种或其组合的任何其他分子。分析物可以是分子的聚簇，因为所述聚簇作为整体被认为是分析物。通常，分析物的大小不会大至其不能进入Msp的通道:换句话说，经典分析物在大小上会小于Msp的通道的开口。然而，可使用比通道的开口更大的分析物，并且可使用本文中描述的方法测定分析物是否太大以至其不能进入通道。任选地，分析物的分子量小于1百万Da。任选地，分析物的分子量为约，至多约或至少约1,000,000、950,000、900,000、850,000、800,000、750,000、700,000、650,000、600,000、550,000、500,000、450,000、400,000、350,000、300,000、250,000、200,000、150,000、100,000、75,000、50,000、25,000、20,000、15,000、10,000、7,500、5,000、2,500、2,000、1,500、1,000或500Da或更小，或在可来自其间的任何范围内。Generally, the analyte used in this article is soluble or partially soluble in at least one liquid medium contacted with Msp described herein.Any analyte can be used herein, including for example nucleotides, nucleic acids, amino acids, peptides, proteins, polymers, drugs, ions, biological warfare agents, pollutants, nanoscale objects or any other molecule including one or a combination of such analytes. Analyte can be the clustering of molecules, because the clustering is considered to be analyte as a whole. Generally, the size of analyte will not be so large that it cannot enter the channel of Msp: in other words, classical analytes will be smaller than the opening of the channel of Msp in size. However, the analyte larger than the opening of channel can be used, and whether the method described herein can be used to determine analyte is too large so that it cannot enter channel. Optionally, the molecular weight of analyte is less than 1 million Da. Optionally, the analyte has a molecular weight of about, up to about, or at least about 1,000,000, 950,000, 900,000, 850,000, 800,000, 750,000, 700,000, 650,000, 600,000, 550,000, 500,000, 450,000, 400,000, 350,000, 300 ,000, 250,000, 200,000, 150,000, 100,000, 75,000, 50,000, 25,000, 20,000, 15,000, 10,000, 7,500, 5,000, 2,500, 2,000, 1,500, 1,000 or 500 Da or less, or within any range therebetween.

蛋白质修饰包括氨基酸序列修饰。氨基酸序列的修饰可天然地产生为等位基因变型(例如，由于遗传多态性)，可因环境影响(例如，因接触紫外辐射)而产生，或者可因人干预(例如通过克隆DNA序列的诱变)而产生，例如诱导的点突变体、缺失、插入和置换突变体。此类修饰可导致氨基酸序列的变化，提供沉默突变，改变限制性位点或提供其他特定突变。氨基酸序列修饰通常落在如下3类中的一类或多类中：置换、插入或缺失修饰。插入包括氨基酸和/或末端融合以及单个或多个氨基酸残基的序列内插入。插入通常将是比氨基或羧基末端融合更小的插入，例如，1至4个残基的插入。可通过从蛋白质序列除去一个或多个氨基酸残基来表征缺失。通常，在蛋白质分子内的任何一个位置上缺失不超过约2至6个残基。氨基酸置换通常是单个残基的置换，但可一次在许多不同位置上发生；插入通常属于约1至10个氨基酸残基的插入；缺失的范围可以是1至30个残基。优选在相邻对上进行缺失或插入，即，2个残基的缺失或2个残基的插入。可组合置换、缺失、插入或其任何组合来获得终构建体。突变可以发生在或可以不发生在阅读框架外的序列中以及可以产生或可以不产生能形成mRNA二级结构的互补区。置换型修饰是其中已除去至少一个残基并且在其位置上插入不同残基的修饰。Protein modification includes amino acid sequence modification. The modification of the amino acid sequence can be naturally produced as an allelic variant (for example, due to genetic polymorphism), can be produced due to environmental influences (for example, due to exposure to ultraviolet radiation), or can be produced due to human intervention (for example, by mutagenesis of cloned DNA sequences), such as induced point mutants, deletions, insertions and substitution mutants. Such modifications can lead to changes in the amino acid sequence, provide silent mutations, change restriction sites or provide other specific mutations. Amino acid sequence modifications generally fall into one or more of the following three categories: substitution, insertion or deletion modifications. Insertion includes insertion within the sequence of amino acids and/or terminal fusions and single or multiple amino acid residues. Insertion will generally be an insertion smaller than amino or carboxyl terminal fusions, for example, an insertion of 1 to 4 residues. Deletion can be characterized by removing one or more amino acid residues from the protein sequence. Generally, deletion is no more than about 2 to 6 residues at any one position within the protein molecule. Amino acid substitutions are typically substitutions of a single residue, but can occur at many different positions at once; insertions are typically insertions of about 1 to 10 amino acid residues; deletions can range from 1 to 30 residues. Deletions or insertions are preferably made in adjacent pairs, i.e., deletions of 2 residues or insertions of 2 residues. Substitutions, deletions, insertions, or any combination thereof can be combined to obtain the final construct. Mutations may or may not occur in sequences outside the reading frame and may or may not produce complementary regions that can form mRNA secondary structure. Substitutional modifications are modifications in which at least one residue has been removed and a different residue inserted in its place.

可利用已知的方法进行修饰，包括特定氨基酸的置换。例如，可通过编码蛋白质的DNA中核苷酸的定点诱变进行修饰，从而产生编码修饰的DNA，接着在重组细胞培养物中表达所述DNA。用于在具有已知序列的DNA中预先确定的位置上产生置换突变的技术是公知的，例如M13引物诱变和PCR诱变。Modifications can be made using known methods, including substitutions of specific amino acids. For example, modifications can be made by site-directed mutagenesis of nucleotides in the DNA encoding the protein, thereby generating DNA encoding the modification, which is then expressed in recombinant cell culture. Techniques for generating substitution mutations at predetermined positions in DNA with a known sequence are well known, such as M13 primer mutagenesis and PCR mutagenesis.

Msp孔蛋白中的一个或多个突变可存在于蛋白质的前厅或缢缩区中。任选地，与野生型Msp孔蛋白相比较，突变的Msp孔蛋白在其周质环、前厅或缢缩区氨基酸序列中具有至少一个差异(例如，缺失、置换、添加)。One or more mutations in the Msp porin may be present in the vestibule or constriction region of the protein. Optionally, the mutant Msp porin has at least one difference (e.g., deletion, substitution, addition) in the amino acid sequence of its periplasmic loop, vestibule, or constriction region compared to the wild-type Msp porin.

如本文中所使用的，“氨基酸”是指蛋白质中发现的20种天然存在的氨基酸中的任何氨基酸、天然存在的氨基酸的D-立体异构体(例如，D-苏氨酸)、非天然氨基酸以及化学修饰的氨基酸。这些氨基酸类型中的每一种类型不是相互排斥的。α-氨基酸包含与氨基、羧基、氢原子和称为“侧链”的独特基团键合的碳原子。天然存在的氨基酸的侧链在本领域内是公知的并且包括例如氢(例如，如在甘氨酸中)、烷基(例如，如在丙氨酸、缬氨酸、亮氨酸、异亮氨酸、脯氨酸中)、取代烷基(例如，如在苏氨酸、丝氨酸、甲硫氨酸、半胱氨酸、天冬氨酸、天冬酰胺、谷氨酸、谷氨酰胺、精氨酸和赖氨酸中)，芳烷基(例如，如在苯丙氨酸和色氨酸中)、取代芳烷基(例如，如在酪氨酸中)和杂芳烷基(例如，如在组氨酸中)。As used herein, " amino acid " refers to any amino acid in the 20 kinds of naturally occurring amino acids found in protein, naturally occurring amino acid whose D-stereoisomer (for example, D-threonine), non-natural amino acids and chemically modified amino acid.Each type in these amino acid types is not mutually exclusive.Alpha-amino acid comprises the carbon atom bonded to amino, carboxyl, hydrogen atom and the unique group called " side chain ".Naturally occurring amino acid whose side chain is well known in the art and includes for example hydrogen (for example, as in glycine), alkyl (for example, as in alanine, valine, leucine, isoleucine, proline), substituted alkyl (for example, as in threonine, serine, methionine, cysteine, aspartic acid, asparagine, glutamic acid, glutamine, arginine and lysine), aralkyl (for example, as in phenylalanine and tryptophan), substituted aralkyl (for example, as in tyrosine) and heteroaralkyl (for example, as in histidine).

下列缩写用于20种天然存在的氨基酸：丙氨酸(Ala；A)、天冬酰胺(Asn；N)、天冬氨酸(Asp；D)、精氨酸(Arg；R)、半胱氨酸(Cys；C),谷氨酸(Glu；E)、谷氨酰胺(Gln；Q)、甘氨酸(Gly；G)、组氨酸(His；H)、异亮氨酸(Ile；I)、亮氨酸(Leu；L)、赖氨酸(Lys；K)、甲硫氨酸(Met；M)、苯丙氨酸(Phe；F)、脯氨酸(Pro；P)、丝氨酸(Ser；S)、苏氨酸(Thr；T)、色氨酸(Trp；W)、酪氨酸(Tyr；Y)和缬氨酸(Val；V)。The following abbreviations are used for the 20 naturally occurring amino acids: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamate (Glu; E), glutamine (Gln; Q), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).

非天然氨基酸(即，非在蛋白质中天然发现的氨基酸)在本领域内也是已知的，如例如在Williams等人，Mol.Cell.Biol.9:2574(1989)；Evans等人，J.Amer.Chem.Soc.112:4011-4030(1990)；Pu等人，J.Amer.Chem.Soc.56:1280-1283(1991)；Williams等人，J.Amer.Chem.Soc.113:9276-9286(1991)以及其中所引用的所有参考文献中所示的。β-和γ-氨基酸在本领域内是已知的并且在本文中被提及为非天然氨基酸。下列表显示本文中涉及的非天然氨基酸的非限定性实例。Unnatural amino acids (i.e., amino acids not naturally found in proteins) are also known in the art, as shown, for example, in Williams et al., Mol. Cell. Biol. 9:2574 (1989); Evans et al., J. Amer. Chem. Soc. 112:4011-4030 (1990); Pu et al., J. Amer. Chem. Soc. 56:1280-1283 (1991); Williams et al., J. Amer. Chem. Soc. 113:9276-9286 (1991), and all references cited therein. β- and γ-amino acids are known in the art and are referred to herein as unnatural amino acids. The following table shows non-limiting examples of unnatural amino acids referred to herein.

表2.示例性非天然氨基酸Table 2. Exemplary unnatural amino acids

缩写.abbreviation. 氨基酸amino acids 缩写.abbreviation. 氨基酸amino acids AadAad 2-氨基己二酸2-aminoadipic acid EtAsnEtAsn N-乙基天冬酰胺N-ethylasparagine BaadBaad 3-氨基己二酸3-Aminoadipic acid HylHyl 羟基赖氨酸Hydroxylysine BalaBala β-丙氨酸，β-氨基丙酸β-Alanine, β-aminopropionic acid AHylAHyl 别-羟基赖氨酸allo-hydroxylysine AbuAbu 2-氨基丁酸2-aminobutyric acid 3Hyp3Hyp 3-羟基脯氨酸3-Hydroxyproline 4Abu4Abu 4-氨基丁酸，piperidinic acid4-aminobutyric acid, piperidinic acid 4Hyp4Hyp 4-羟基脯氨酸4-Hydroxyproline AcpAcp 6-氨基己酸6-aminohexanoic acid IdeIde 异锁链赖氨素Isodesmosine AheAhe 2-氨基庚酸2-Aminoheptanoic acid AIleAIle 别-异亮氨酸Allo-isoleucine AibAib 2-氨基异丁酸2-aminoisobutyric acid MeGlyMeGly N-甲基甘氨酸，肌氨酸N-methylglycine, sarcosine BaibBaib 3-氨基异丁酸3-Aminoisobutyric acid MeIleMeiIle N-甲基异亮氨酸N-methylisoleucine ApmApm 2-氨基庚二酸2-aminopimelanoic acid MeLysMeLys 6-N-甲基赖氨酸6-N-methyllysine DbuDbu 2,4-二氨基丁酸2,4-Diaminobutyric acid MeValMeVal N-甲基缬氨酸N-methylvaline

缩写.abbreviation. 氨基酸amino acids 缩写.abbreviation. 氨基酸amino acids DesDes 锁链素Desmosin NvaNva 正缬氨酸Norvaline DpmDpm 2,2'-二氨基庚二酸2,2'-Diaminopimelane NleNle 正亮氨酸Norleucine DprDpr 2,3-二氨基丙酸2,3-Diaminopropionic acid OrnOrn 鸟氨酸Ornithine EtGlyEtGly N-乙基甘氨酸N-ethylglycine

如本文中所使用的，“化学修饰的氨基酸”是指其侧链已被化学修饰的氨基酸。例如，侧链可被修饰以包含产生信号的部分例如荧光团或放射性标记(radiolabel)。侧链可被修饰以包含新型官能团例如硫氢基、羧酸或氨基。翻译后修饰的氨基酸也包括在化学修饰的氨基酸的定义中。As used herein, "chemically modified amino acid" refers to an amino acid whose side chain has been chemically modified. For example, the side chain can be modified to include a signal-generating moiety such as a fluorophore or a radiolabel. The side chain can be modified to include a novel functional group such as a sulfhydryl, carboxylic acid, or amino group. Post-translationally modified amino acids are also included in the definition of chemically modified amino acids.

氨基酸和更具体地其侧链可用其化学特征来表征。例如，氨基酸侧链可以带正电荷、负电荷或是中性的。溶液的pH影响某些侧链的带电性质，这对于本领域技术人员来说是已知的。可带正电荷的侧链的非限定性实例包括组氨酸，精氨酸和赖氨酸。可带负电荷的侧链的非限定性实例包括天冬氨酸和谷氨酸。可表征为中性的侧链的非限定性实例包括甘氨酸、丙氨酸、苯丙氨酸、缬氨酸、亮氨酸、异亮氨酸、半胱氨酸、天冬酰胺、谷氨酰胺、丝氨酸、苏氨酸、酪氨酸、甲硫氨酸、脯氨酸和色氨酸。Amino acids, and more specifically their side chains, can be characterized by their chemical characteristics. For example, amino acid side chains can be positively charged, negatively charged, or neutral. The pH of a solution affects the charge properties of certain side chains, as is known to those skilled in the art. Non-limiting examples of positively charged side chains include histidine, arginine, and lysine. Non-limiting examples of negatively charged side chains include aspartic acid and glutamic acid. Non-limiting examples of neutral side chains include glycine, alanine, phenylalanine, valine, leucine, isoleucine, cysteine, asparagine, glutamine, serine, threonine, tyrosine, methionine, proline, and tryptophan.

侧链的立体化学(Sterics)还可用于表征氨基酸。原子直径表可帮助人们确定一个侧链是否比另一个侧链大。计算机模型也可帮助该确定。Side chain stereochemistry can also be used to characterize amino acids. Atomic diameter tables can help determine whether one side chain is larger than another. Computer models can also help with this determination.

氨基酸可用其侧链的极性来表征。极性侧链(其通常比非极性侧链更亲水)包括例如丝氨酸、苏氨酸、酪氨酸、半胱氨酸、天冬酰胺和谷氨酰胺的侧链。非极性侧链(其通常比极性侧链更疏水)包括例如甘氨酸、丙氨酸、缬氨酸、亮氨酸、异亮氨酸、脯氨酸、甲硫氨酸、苯丙氨酸和色氨酸的侧链。可使用涉及侧链的原子电负性测定和三维结构估计的本领域内已知的常规技术确定侧链的极性。还可使用本领域内已知的常规技术比较侧链的疏水性/亲水性，例如比较每一个氨基酸的辛醇/水分配系数。参见Sangster，In:Octanol-WaterPartition Coefficients:Fundamentals and Physical Chemistry，Wiley Series inSolution Chemistry，Chichester:John Wiley&Sons Ltd.，2:178pages(1997)。Amino acids can be characterized by the polarity of their side chains. Polar side chains (which are generally more hydrophilic than non-polar side chains) include, for example, the side chains of serine, threonine, tyrosine, cysteine, asparagine, and glutamine. Non-polar side chains (which are generally more hydrophobic than polar side chains) include, for example, the side chains of glycine, alanine, valine, leucine, isoleucine, proline, methionine, phenylalanine, and tryptophan. The polarity of the side chain can be determined using conventional techniques known in the art involving atomic electronegativity determination and three-dimensional structure estimation of the side chain. The hydrophobicity/hydrophilicity of the side chain can also be compared using conventional techniques known in the art, for example, comparing the octanol/water partition coefficient of each amino acid. See Sangster, In: Octanol-Water Partition Coefficients: Fundamentals and Physical Chemistry, Wiley Series in Solution Chemistry, Chichester: John Wiley & Sons Ltd., 2: 178 pages (1997).

下表提供了可帮助本领域技术人员确定如何选择用于本文中所述Msp孔蛋白或单体修饰的氨基酸的氨基酸性质的非限定性实例。The following table provides non-limiting examples of amino acid properties that can help one skilled in the art determine how to select amino acids for modification of the Msp porins or monomers described herein.

表3.氨基酸性质Table 3. Amino acid properties

a该栏表示氨基酸被埋藏在蛋白质内部的趋势(定义为可接触溶剂的残基<5％)，并且是基于9种蛋白质的结构(总共约2000个被研究的单个残基，这些残基中有587个(29％)被埋藏)。数值表示发现每一个氨基酸相对于该氨基酸的残基在蛋白质中存在的总数而言被埋藏的频率。括号的值表示相对于蛋白质中全部被埋藏的残基而言发现该氨基酸被埋藏的残基的数量。数据来自Schien，BioTechnology 8:308(1990)；关于具有相似结果的其他计算方法，参见Janin，Nature277:491(1979)；和Rose等人，Science 229:834(1985)。aThis column represents the tendency of amino acids to be buried in the interior of proteins (defined as <5% of residues accessible to solvent) and is based on the structures of nine proteins (a total of approximately 2000 individual residues studied, of which 587 (29%) were buried). The numerical values indicate how often each amino acid is found buried relative to the total number of residues of that amino acid present in the protein. The values in parentheses indicate the number of residues where that amino acid is found buried relative to the total number of buried residues in the protein. Data from Schien, BioTechnology 8:308 (1990); for other calculation methods with similar results, see Janin, Nature 277:491 (1979); and Rose et al., Science 229:834 (1985).

b埋藏残基的平均体积(Vr)，根据侧链的表面积计算。Richards，Annu.Rev.Biophys.Bioeng.6:151(1977)；Baumann，Protein Eng.2:329(1989)。b Average volume of buried residues (Vr), calculated from the surface area of the side chains. Richards, Annu. Rev. Biophys. Bioeng. 6: 151 (1977); Baumann, Protein Eng. 2: 329 (1989).

c数据来自Darby N.J.和Creighton T.E.Protein structure.In In focus(ed.D.Rickwood)，p.4.IRL Press，Oxford，United Kingdom(1993)。c Data from Darby N.J. and Creighton T.E. Protein structure. In In focus (ed. D. Rickwood), p. 4. IRL Press, Oxford, United Kingdom (1993).

d主链以伸展构象存在的Gly-X-Gly三肽中残基X的氨基酸侧链的总的可及表面积(accessible surface area,ASA)。Miller等人，J.Mol.Biol.196:641(1987).d Total accessible surface area (ASA) of the amino acid side chains of residue X in a Gly-X-Gly tripeptide with the backbone in an extended conformation. Miller et al., J. Mol. Biol. 196:641 (1987).

e显示的值表示针对38个公布的疏水性标度，氨基酸根据其在每一序列等级上出现的频率的平均等级。Trinquier和Sanejouand，Protein Eng.11:153(1998)。虽然大部分此类疏水性标度来自分离氨基酸的化学行为或物化性质(例如，水中的溶解度、水与有机溶剂之间的分配、色谱转位或对表面张力的影响)的实验测量，但还包括基于蛋白质中氨基酸的已知环境特征，例如其溶解可及性或其占据蛋白质核心的趋势(基于残基在三级结构中的位置，如利用x射线衍射晶体分析法或NMR观察到的)的几种“可使用的(operational)”的疏水性标度。更低的等级表示最疏水的氨基酸，更高的值表示最亲水的氨基酸。为了比较目的，Radzicka和Wolfenden，Biochem.27:1664(1988)的疏水性标度示于括号内。该标度来自氨基酸基于其从气相转移至环己烷、1-辛醇和中性水溶液的自由能的测量的水合势能。The values shown in e represent the average rank of amino acids according to their frequency of occurrence at each sequence level for 38 published hydrophobicity scales. Trinquier and Sanejouand, Protein Eng.11:153 (1998). Although most of these hydrophobicity scales are derived from experimental measurements of the chemical behavior or physicochemical properties of isolated amino acids (e.g., solubility in water, distribution between water and organic solvents, chromatographic translocation, or the effect on surface tension), several "operational" hydrophobicity scales based on known environmental characteristics of amino acids in proteins, such as their solubility accessibility or their tendency to occupy the protein core (based on the position of the residues in the tertiary structure, as observed using x-ray crystallography or NMR) are also included. Lower ranks represent the most hydrophobic amino acids, and higher values represent the most hydrophilic amino acids. For comparison purposes, the hydrophobicity scale of Radzicka and Wolfenden, Biochem.27:1664 (1988) is shown in brackets. This scale is derived from the measured hydration potentials of amino acids based on their free energies of transfer from the gas phase to cyclohexane, 1-octanol, and neutral aqueous solutions.

可选地，可考虑氨基酸的亲水指数(hydropathic index)。基于其疏水性和/或电荷特征，已给每一种氨基酸分配了亲水指数，这些指数是：异亮氨酸(+4.5)；缬氨酸(+4.2)；亮氨酸(+3.8)；苯丙氨酸(+2.8)；半胱氨酸/胱氨酸(+2.5)；甲硫氨酸(+1.9)；丙氨酸(+1.8)；甘氨酸(-0.4)；苏氨酸(-0.7)；丝氨酸(-0.8)；色氨酸(-0.9)；酪氨酸(-1.3)；脯氨酸(-1.6)；组氨酸(-3.2)；谷氨酸(-3.5)；谷氨酰胺(-3.5)；天冬氨酸(-3.5)；天冬酰胺(-3.5)；赖氨酸(-3.9)和/或精氨酸(-4.5)。亲水氨基酸指数在赋予蛋白质相互作用的生物学功能中的重要性在本领域内是普通知晓的。已知某些氨基酸可置换具有相似亲水指数和/或分值的其他氨基酸和/或仍然保持相似的生物学功能。在基于亲水指数引起变化时，置换氨基酸的亲水指数可在±2以内、±1以内或±0.5以内。Alternatively, the hydropathic index of amino acids can be considered. Based on their hydrophobicity and/or charge characteristics, each amino acid has been assigned a hydropathic index, which is: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamic acid (-3.5); glutamine (-3.5); aspartic acid (-3.5); asparagine (-3.5); lysine (-3.9) and/or arginine (-4.5). The importance of the hydropathic amino acid index in conferring interactive biological function on proteins is generally known in the art. It is known that certain amino acids can be substituted for other amino acids having similar hydropathic indices and/or scores and/or still retain similar biological function. When changes are made based on the hydropathic index, the hydropathic index of the substituted amino acid can be within ±2, within ±1, or within ±0.5.

在本领域内还应理解，可基于亲水性有效地进行相似氨基酸的置换。如美国专利4,554,101(通过引用合并入本文)中所述，已将下列亲水性值分配给氨基酸残基：精氨酸(+3.0)；赖氨酸(+3.0)；天冬氨酸(+3.0±1)；谷氨酸(+3.0±1)；丝氨酸(+0.3)；天冬酰胺(+0.2)；谷氨酰胺(+0.2)；甘氨酸(0)；苏氨酸(-0.4)；脯氨酸(-0.5±1)；丙氨酸(-0.5)；组氨酸(-0.5)；半胱氨酸(-1.0)；甲硫氨酸(-1.3)；缬氨酸(-1.5)；亮氨酸(-1.8)；异亮氨酸(-1.8)；酪氨酸(-2.3)；苯丙氨酸(-2.5)；色氨酸(-3.4)。在基于相似亲水性值进行变化中，预期置换其亲水指数可在±2以内、±1以内或±0.5以内的氨基酸。It is also understood in the art that similar amino acid substitutions can be effectively made based on hydrophilicity. As described in U.S. Patent No. 4,554,101 (incorporated herein by reference), the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartic acid (+3.0 ± 1); glutamic acid (+3.0 ± 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5 ± 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). In making changes based on similar hydropathicity values, it is contemplated that amino acids whose hydropathic indices are within ±2, within ±1, or within ±0.5 are substituted.

任何突变的Msp孔蛋白或单体与野生型Msp孔蛋白或单体相比较可包含保守氨基酸置换。如果置换对蛋白质的生化性质破坏很小，任何这样的置换突变是保守的。被引入以置换保守氨基酸残基的突变的非限定性实例包括：用带正电荷的残基置换带正电荷的残基(例如，H、K和R)；用带负电荷的残基置换带负电荷的残基(例如，D和E)；用中性极性残基置换中性极性残基(例如，C、G、N、Q、S、T和Y)；以及用中性非极性残基置换中性非极性残基(例如，A、F、I、L、M、P、V和W)。可按照下列表4进行保守置换。同样地进行非保守置换(例如，脯氨酸对甘氨酸的置换)。Any mutated Msp porin or monomer may comprise conservative amino acid substitutions compared to the wild-type Msp porin or monomer. Any such substitution mutation is conservative if the substitution has minimal disruption to the biochemical properties of the protein. Non-limiting examples of mutations introduced to replace conservative amino acid residues include: substitution of positively charged residues (e.g., H, K, and R) with positively charged residues; substitution of negatively charged residues (e.g., D and E) with negatively charged residues; substitution of neutral polar residues (e.g., C, G, N, Q, S, T, and Y) with neutral polar residues; and substitution of neutral non-polar residues (e.g., A, F, I, L, M, P, V, and W) with neutral non-polar residues. Conservative substitutions may be made according to Table 4 below. Non-conservative substitutions (e.g., substitution of proline for glycine) may also be made.

表4:示例性氨基酸置换Table 4: Exemplary amino acid substitutions

氨基酸amino acids 置换Replacement AlaAla Ser，Gly，CysSer, Gly, Cys ArgArg Lys，Gln，Met，IleLys, Gln, Met, Ile AsnAsn Gln，His，Glu，AspGln, His, Glu, Asp AspAsp Glu，Asn，GlnGlu, Asn, Gln CysCys Ser，Met，ThrSer, Met, Thr GlnGln Asn，Lys，Glu，AspAsn, Lys, Glu, Asp GluGlu Asp，Asn，GlnAsp, Asn, Gln GlyGly Pro，AlaPro, Ala HisHis Asn，GlnAsn, Gln IleIle Leu，Val，MetLeu, Val, Met LeuLeu Ile，Val，MetIle, Val, Met LysLys Arg，Gln，Met，IleArg, Gln, Met, Ile MetMet Leu，Ile，ValLeu, Ile, Val PhePhe Met，Leu，Tyr，Trp，HisMet, Leu, Tyr, Trp, His SerSer Thr，Met，CysThr, Met, Cys ThrThr Ser，Met，ValSer, Met, Val TrpTrp Tyr，PheTyr, Phe TyrTyr Trp，Phe，HisTrp, Phe, His ValVal Ile，Leu，MetIle, Leu, Met

如本文中所使用的，“肽”是指由酰胺键(即，“肽键”)连接在一起的两个或更多个氨基酸。肽包含多至或包括50个氨基酸。肽可以是线性的或环状的。肽可以是α、β、γ、δ或更高级的，或混合的。肽可包含本文中定义的氨基酸的任何混合物，例如包含D、L、α、β、γ、δ或更高级氨基酸的任何组合。As used herein, a "peptide" refers to two or more amino acids linked together by an amide bond (i.e., a "peptide bond"). A peptide comprises up to or including 50 amino acids. A peptide can be linear or cyclic. A peptide can be α, β, γ, δ, or higher, or mixed. A peptide can comprise any mixture of amino acids as defined herein, for example, any combination of D, L, α, β, γ, δ, or higher amino acids.

如本文中所使用的，“蛋白质”是指具有51个或更多个氨基酸的氨基酸序列。As used herein, "protein" refers to an amino acid sequence having 51 or more amino acids.

如本文中所使用的，“聚合物”是指包含两个或更多个线性单元(也称为“聚体”)的分子，其中每一个单元可以相同或不同。聚合物的非限定性实例包括核酸、肽和蛋白质，以及多种烃聚合物(例如，聚乙烯、聚苯乙烯)和官能化的烃聚合物，其中聚合物的主链包含碳链(例如，聚氯乙烯、聚甲基丙烯酸酯)。聚合物包括共聚物、嵌段共聚物和支化聚合物例如星形聚合物和树状聚合物(dendrimer)。As used herein, "polymer" refers to a molecule comprising two or more linear units (also referred to as "mers"), each of which can be the same or different. Non-limiting examples of polymers include nucleic acids, peptides, and proteins, as well as various hydrocarbon polymers (e.g., polyethylene, polystyrene) and functionalized hydrocarbon polymers in which the backbone of the polymer comprises a carbon chain (e.g., polyvinyl chloride, polymethacrylate). Polymers include copolymers, block copolymers, and branched polymers such as star polymers and dendrimers.

本文中描述了使用Msp孔蛋白测定聚合物的序列的方法。此外，可以以与美国专利7,189,503(以其全文通过引用合并入本文)中描述的方法类似的方法进行测序方法。也参见美国专利6,015,714，以其全文通过引用合并入本文。可以以这样的测序方法进行一种以上的阅读以提高准确性。分析聚合物的特征(例如，大小、长度、浓度、身份)和鉴定聚合物的离散单元(或"聚体(mer)")的方法也论述于'503专利中，并且可用于本Msp孔蛋白。实际上，Msp孔蛋白可用于'503专利中论述的任何方法。Methods for determining the sequence of polymers using Msp porins are described herein. In addition, sequencing methods can be performed in a manner similar to that described in U.S. Patent 7,189,503 (incorporated herein by reference in its entirety). See also U.S. Patent 6,015,714, incorporated herein by reference in its entirety. More than one read can be performed with such sequencing methods to improve accuracy. Methods for analyzing the characteristics (e.g., size, length, concentration, identity) of polymers and identifying discrete units (or "mers") of polymers are also discussed in the '503 patent and can be used with the present Msp porins. In fact, the Msp porins can be used with any of the methods discussed in the '503 patent.

目前，几种类型的可观察到的信号正被开发为纳米测序和分析物检测中的读出机制。最初提出的最直接的并且最多开发的读出方法依赖于由占据孔中最窄缢缩区的核苷酸或其他分析物的身份而独特地测定的离子“阻塞电流(blockade current)”或“共通过电流(copassing current)”。该方法称为“阻塞电流纳米孔测序”或BCNX。核酸的阻塞电流检测和表征已在蛋白质孔α-溶血素(αHL)和固态纳米孔中得到例证。已显示阻塞电流检测和表征提供了一大堆关于不同背景中穿过或保留在纳米孔中的DNA的结构的信息。Currently, several types of observable signals are being developed as readout mechanisms in nanopore sequencing and analyte detection. The most direct and most developed readout method initially proposed relies on an ionic "blockade current" or "copassing current" that is uniquely determined by the identity of the nucleotide or other analyte occupying the narrowest constriction in the pore. This method is called "blockade current nanopore sequencing" or BCNX. Blockade current detection and characterization of nucleic acids has been exemplified in the protein pore α-hemolysin (αHL) and solid-state nanopores. Blockade current detection and characterization have been shown to provide a wealth of information about the structure of DNA passing through or retained in the nanopore in different contexts.

一般而言，“阻塞”由与噪声波动明显可区别并且通常与分析物分子在孔的中央开口处的存在相关联的离子电流的变化所证实。阻塞的强度取决于存在的分析物的类型。更具体地，“阻塞”是指其中离子电流降至低于未阻塞电流水平的约5-100％的阈值，在该点保持至少1.0μs，然后自发地返回至未阻塞的水平的间期。例如，离子电流可降至低于约、至少约或至多约5％、10％、15％、20％、25％、30％、35％、40％、45％、50％、55％、60％、65％、70％、75％、80％、85％、90％、95％或100％或在可来自其间的任何范围内的阈值。如果正好在阻塞之前或之后的未阻塞信号的平均电流偏离典型的未阻塞水平在未阻塞信号的平方根噪声值的两倍以上，那么舍弃阻塞。"深度阻塞”被确定为其中离子电流降至小于未阻塞水平的50％的间隔。其中电流保持在未阻塞水平的80％至50％的间隔被确定为“部分阻塞”。In general, "blockade" is evidenced by a change in the ion current that is clearly distinguishable from noise fluctuations and is typically associated with the presence of an analyte molecule at the central opening of the pore. The intensity of the blockage depends on the type of analyte present. More specifically, "blockade" refers to an interval in which the ion current drops to a threshold value of about 5-100% below the unblocked current level, remains at that point for at least 1.0 μs, and then spontaneously returns to the unblocked level. For example, the ion current may drop to a threshold value of less than about, at least about, or at most about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% or any range therebetween. If the average current of the unblocked signal just before or after the blockage deviates from the typical unblocked level by more than twice the square root noise value of the unblocked signal, then the blockage is discarded. "Deep blockade" was defined as the interval where the ion current dropped to less than 50% of the unblocked level. "Partial blockade" was defined as the interval where the current remained between 80% and 50% of the unblocked level.

如本文中所使用的，术语“受试者”是指活哺乳动物生物体例如人、猴、牛、绵羊、山羊、狗、猫、小鼠、大鼠、豚鼠或其转基因物种。任选地，患者或受试者是灵长类动物。人受试者的非限定性实例是成人、青少年、婴儿和胎儿。As used herein, the term "subject" refers to a living mammalian organism such as a human, monkey, cow, sheep, goat, dog, cat, mouse, rat, guinea pig, or a transgenic species thereof. Optionally, the patient or subject is a primate. Non-limiting examples of human subjects are adults, adolescents, infants, and fetuses.

术语“核酸”是指以单链或双链形式存在的脱氧核糖核苷酸或核糖核苷酸聚合物，并且除非另外限定，否则包括以与天然存在的核苷酸相似的方式与核酸杂交的天然核苷酸的已知类似物，例如肽核酸(PNAs)和硫代磷酸酯DNA。除非另外指出，否则一个具体的核酸序列包括其互补序列。核苷酸包括但不限于ATP、dATP、CTP、dCTP、GTP、dGTP、UTP、TTP、dUTP、5-甲基-CTP、5-甲基-dCTP、ITP、dITP、2-氨基-腺苷-TP、2-氨基-脱氧腺苷-TP、2-硫代胸腺核苷三磷酸(thiothymidine triphosphate)、吡咯并-嘧啶三磷酸和2-巯基胞苷以及上述所有核苷酸的α硫代三磷酸(alphathiotriphosphates)和所有上述碱基的2’-O-甲基核糖核苷三磷酸。经修饰的碱基包括但不限于5-Br-UTP、5-Br-dUTP、5-F-UTP、5-F-dUTP、5-丙炔基dCTP和5-丙炔基-dUTP。The term "nucleic acid" refers to a deoxyribonucleotide or ribonucleotide polymer present in single-stranded or double-stranded form, and unless otherwise limited, includes known analogs of natural nucleotides hybridized to nucleic acids in a manner similar to naturally occurring nucleotides, such as peptide nucleic acids (PNAs) and thiophosphate DNA. Unless otherwise noted, a specific nucleic acid sequence includes its complementary sequence. Nucleotide includes but is not limited to ATP, dATP, CTP, dCTP, GTP, dGTP, UTP, TTP, dUTP, 5-methyl-CTP, 5-methyl-dCTP, ITP, dITP, 2-amino-adenosine-TP, 2-amino-deoxyadenosine-TP, 2-thiothymidine triphosphate (thiothymidine triphosphate), pyrrolo-pyrimidine triphosphate and 2-thiocytidine and the α-thiotriphosphates (alphathiotriphosphates) of all nucleotides mentioned above and the 2'-O-methyl ribonucleoside triphosphates of all the above bases. Modified bases include, but are not limited to, 5-Br-UTP, 5-Br-dUTP, 5-F-UTP, 5-F-dUTP, 5-propynyl dCTP, and 5-propynyl-dUTP.

如本文中所使用的，“药物”是指可改变受试者的生物学过程的任何物质。药物可被设计用于或用在受试者的疾病、障碍、综合征或其他健康折磨的诊断、治疗或预防中。药物在性质上可以是娱乐性的，即仅用于改变生物学过程并且不用于或用在受试者的疾病、障碍、综合征或其他健康折磨的诊断、治疗或预防中。生物制品，即意指由牵涉重组DNA技术的生物学机制产生的物质，也包括在术语“药物”中。药物包括例如抗菌药物、抗炎药、抗凝血药、抗病毒剂、抗高血压药、抗抑郁药、抗微生物剂、镇痛药、麻醉药、β-阻塞剂、双磷酸盐类(bisphosphonate)、化疗药物、造影剂、生育用药(fertility medication)、致幻药、激素类、麻醉品、阿片制剂、镇静药、他汀类、类固醇和血管扩张剂。药物的非限定性实例还可见于默克索引(Merck Index)。用于治疗结核病的抗菌药例如包括异烟肼、利福平、吡嗪酰胺和乙胺丁醇。As used herein, "drug" refers to any substance that can alter a biological process in a subject. A drug can be designed for or used in the diagnosis, treatment, or prevention of a disease, disorder, syndrome, or other health affliction in a subject. A drug can be recreational in nature, i.e., used only to alter a biological process and not for or used in the diagnosis, treatment, or prevention of a disease, disorder, syndrome, or other health affliction in a subject. Biological products, i.e., substances produced by biological mechanisms involving recombinant DNA technology, are also included in the term "drug." Drugs include, for example, antibacterial drugs, anti-inflammatory drugs, anticoagulants, antivirals, antihypertensive drugs, antidepressants, antimicrobials, analgesics, anesthetics, beta-blockers, bisphosphonates, chemotherapeutic drugs, contrast agents, fertility medications, hallucinogens, hormones, narcotics, opiates, sedatives, statins, steroids, and vasodilators. Non-limiting examples of drugs can also be found in the Merck Index. Examples of antibacterial drugs used to treat tuberculosis include isoniazid, rifampicin, pyrazinamide, and ethambutol.

使用药物作为分析物的方法还可包括药物筛选。例如，可使用Msp孔蛋白，通过观察离子电流阻塞来研究药物至细胞或生物体内的摄入。可构建具有不同大小、静电性质和化学性质的特定Msp孔蛋白的缢缩区和/前厅，以近似地模仿药物进入或排出细胞或生物体的期望的途径。此类方法可极大地加速药物的筛选以及药物设计。已使用其他孔蛋白例如由Pagel等人，J.Bacteriology 189:8593(2007)描述的孔蛋白进行了此类研究。Methods using drugs as analytes can also include drug screening. For example, Msp porins can be used to study the uptake of drugs into cells or organisms by observing ion current blockage. Constriction zones and/or vestibules of specific Msp porins with different sizes, electrostatic properties, and chemical properties can be constructed to approximately mimic the desired pathways for drugs to enter or exit cells or organisms. Such methods can greatly accelerate drug screening and drug design. Such studies have been performed using other porins, such as those described by Pagel et al., J. Bacteriology 189:8593 (2007).

如本文中所使用的，“生物战剂”是指能够引起植物或动物(包括人)的死亡或疾病或者食物或水供应恶化或环境退化的任何生物体或任何这样的微生物的任何天然存在、生物工程改造的或合成的组分。非限定性实例包括埃勃拉病毒、马尔堡病毒、炭疽芽孢杆菌(Bacillus anthracis)和肉毒梭菌(Clostridium botulinum)、重型天花(Variolamajor)、类天花(Variola minor)、炭疽和篦麻毒素。As used herein, "biological warfare agent" refers to any organism or any naturally occurring, bioengineered, or synthetic component of any such microorganism that is capable of causing death or disease in plants or animals (including humans), or deterioration of food or water supplies, or environmental degradation. Non-limiting examples include Ebola virus, Marburg virus, Bacillus anthracis and Clostridium botulinum, Variola major, Variola minor, anthrax, and ricin.

如本文中所使用的，“污染物”是指污染空气、水或土壤的物质。污染物的非限定性实例包括肥料、农药、杀虫剂、去垢剂、石油烃类、烟和含重金属物质例如含锌、铜或汞(例如，甲基汞)的物质。As used herein, "pollutants" refer to substances that pollute the air, water, or soil. Non-limiting examples of pollutants include fertilizers, pesticides, insecticides, detergents, petroleum hydrocarbons, smoke, and substances containing heavy metals such as zinc, copper, or mercury (e.g., methylmercury).

分析物可以是“纳米级物体”，其是在两个其维度上小于100nm的物体。The analyte may be a "nanoscale object," which is an object that is smaller than 100 nm in two of its dimensions.

可使用的珠粒包括磁性珠粒和光学珠粒。例如，可使用链霉抗生物素蛋白包被的磁性珠粒对牵引DNA通过Msp孔蛋白通道的静电力施加相反的力。在该后一种技术中，将磁性珠粒附着生物素化的DNA，使用强磁场梯度可施加可与静电驱动力(约10pN)相当的力。参见，Gosse和Croquette，Biophys.J.82:3314(2002)。这样，阻塞电流读出将不受影响，但可独立地控制对DNA的力。然后使每一个DNA的数十或数百个完整独立的读出关联起来并且装配以重构准确的DNA序列。Beads that can be used include magnetic beads and optical beads. For example, streptavidin-coated magnetic beads can be used to apply a force opposite to the electrostatic force that pulls DNA through the Msp porin channel. In this latter technique, the magnetic beads are attached to biotinylated DNA, and a force comparable to the electrostatic driving force (about 10 pN) can be applied using a strong magnetic field gradient. See, Gosse and Croquette, Biophys. J. 82: 3314 (2002). In this way, the blocking current readout will not be affected, but the force on the DNA can be independently controlled. Tens or hundreds of complete independent readouts of each DNA are then linked and assembled to reconstruct the accurate DNA sequence.

通过“光摄(optical tweezers)”操纵的光学珠粒在本领域内也是已知的，并且可将此类方法用于本文中描述的Msp孔蛋白。光摄是用于对纳米级物体产生力的常见工具。将分析物连接至珠粒的一端，同时可将将另一端插入孔蛋白的通道。使用光摄控制和测量珠粒的位置和力。此类方法控制分析物至通道内的通过并且允许更大地控制分析物的读出，例如聚合物单元的读出。关于人造纳米孔背景中此类方法的描述，参见例如Trepagnier等人，Nano Lett.7:2824(2007)。美国专利5,795,782(通过引用合并入本文)也论述了光摄的用途。Optical beads manipulated by "optical tweezers" are also known in the art, and such methods can be applied to the Msp porins described herein. Optical tweezers are a common tool for generating forces on nanoscale objects. The analyte is attached to one end of the bead, while the other end can be inserted into the channel of the porin. Optical tweezers are used to control and measure the position and force of the bead. Such methods control the passage of the analyte into the channel and allow greater control over the readout of the analyte, such as the readout of polymer units. For a description of such methods in the context of artificial nanopores, see, for example, Trepagnier et al., Nano Lett. 7:2824 (2007). U.S. Patent No. 5,795,782 (incorporated herein by reference) also discusses the use of optical tweezers.

荧光共振能量转移(FRET)是一种公知的技术，其可用于本文中的分析方法。例如，可将荧光FRET-受体或FRET-供体分子整合入Msp孔蛋白。然后用匹配的FRET-供体或FRET-受体标记分析物。当匹配的FRET-供体与FRET受体在距离内时，将可能发生能量转移。不用或除了本文中描述的使用离子电流的方法外，所得的信号可用于分析目的。因此，检测、鉴定或测序的方法可包括FRET技术。Fluorescence resonance energy transfer (FRET) is a well-known technique that can be used in the analytical methods described herein. For example, a fluorescent FRET-acceptor or FRET-donor molecule can be incorporated into the Msp porin. The analyte is then labeled with a matching FRET-donor or FRET-acceptor. When the matching FRET-donor and FRET acceptor are within a certain distance, energy transfer can occur. The resulting signal can be used for analytical purposes instead of, or in addition to, the methods described herein using ionic current. Thus, methods for detection, identification, or sequencing can include FRET technology.

可使用的其他光学方法包括将旋光分子引入Msp孔蛋白的内部(例如前厅或缢缩区)。外部光可用于影响蛋白质的内部：此类方法可用于影响分析物的转位速度或可允许分析物进入或从通道排出，从而提供了分析物的受控通过。可选地，聚焦在孔上的光学脉冲可用于加热孔以影响其与分析物相互作用的方式。这样的控制可以非常快，因为来自小体积的焦点的热可快速消散。控制分析物的转位速度的方法从而可使用此类旋光分子或光学脉冲。Other optical methods that can be used include introducing optically active molecules into the interior of the Msp porin (e.g., vestibule or constriction). External light can be used to influence the interior of the protein: such methods can be used to influence the translocation rate of the analyte or can allow the analyte to enter or exit the channel, thereby providing controlled passage of the analyte. Alternatively, an optical pulse focused on the pore can be used to heat the pore to influence the way it interacts with the analyte. Such control can be very fast because the heat from the small volume of the focal point can be dissipated quickly. Methods for controlling the translocation rate of the analyte can thus use such optically active molecules or optical pulses.

还可通过将物体连接至分析物的一端来实现转位速度的操纵，然后分析物的另一端与Msp孔蛋白相互作用。物体可以是珠粒(例如，聚苯乙烯珠)、细胞、大分子例如链霉抗生物素蛋白、neutravidin、DNA等或纳米级物体。然后可将物体处于流体流中，可使其经受被动粘性曳力。Manipulation of translocation velocity can also be achieved by attaching an object to one end of the analyte, which then interacts with the Msp porin at its other end. The object can be a bead (e.g., a polystyrene bead), a cell, a macromolecule such as streptavidin, neutravidin, DNA, or a nanoscale object. The object can then be placed in a fluid flow, where it can be subjected to passive viscous drag.

"分子发动机"在本领域内是公知的并且是指物理上与分析物例如聚合物(例如，多核苷酸)相互作用并且能够物理上相对于固定位置(例如Msp孔蛋白的前厅、缢缩区或通道)移动分析物的分子(例如，酶)。然而，不期望受理论束缚，分子发动机利用化学能产生机械力。分子发动机可以以顺序方式与聚合物的每一个单元(或“聚体”)相互作用。分子发动机的非限定性实例包括DNA聚合酶、RNA聚合酶解旋酶、核糖体和外切核酸酶。非酶促的发动机也是已知的，例如包装DNA的病毒发动机。参见Smith等人，Nature 413:748(2001)。多种分子发动机和此类发动机的期望的性质描述于美国专利7,238,485中，所述专利以其全文通过引用合并入本文。分子发动机可沉积在Msp孔蛋白的顺面或反面并且任选地可被固定，例如由'485专利所描述的。可使用'485专利中描述的方法将分子发动机整合入Msp孔蛋白。同样地，还可将'485专利中描述的系统和装置用于本文中描述的Msp孔蛋白。事实上，'485专利中论述的任何实施方案可使用Msp孔蛋白进行应用，如本文中所描述的。分子发动机也在例如Cockroft等人，J.Amer.Chem.Soc.130:818(2008)；Benner等人，NatureNanotech.2:718(2007)；和Gyarfas等人，ACS Nano 3:1457(2009)中进行了描述。"Molecular motors" are well known in the art and refer to molecules (e.g., enzymes) that physically interact with an analyte, such as a polymer (e.g., a polynucleotide) and are capable of physically moving the analyte relative to a fixed location (e.g., the vestibule, constriction, or channel of an Msp porin). However, without wishing to be bound by theory, molecular motors utilize chemical energy to generate mechanical force. Molecular motors can interact with each unit (or "mer") of a polymer in a sequential manner. Non-limiting examples of molecular motors include DNA polymerases, RNA polymerase helicases, ribosomes, and exonucleases. Non-enzymatic motors are also known, such as viral motors that package DNA. See Smith et al., Nature 413:748 (2001). A variety of molecular motors and the desired properties of such motors are described in U.S. Patent 7,238,485, which is incorporated herein by reference in its entirety. Molecular motors can be deposited on the cis or trans side of an Msp porin and optionally can be immobilized, such as described by the '485 patent. Molecular motors can be incorporated into Msp porins using the methods described in the '485 patent. Similarly, the systems and devices described in the '485 patent can also be used with the Msp porins described herein. In fact, any of the embodiments discussed in the '485 patent can be applied using Msp porins, as described herein. Molecular motors are also described, for example, in Cockroft et al., J. Amer. Chem. Soc. 130:818 (2008); Benner et al., Nature Nanotech. 2:718 (2007); and Gyarfas et al., ACS Nano 3:1457 (2009).

分子发动机通常用于调节分析物与Msp孔蛋白相互作用时的速率或转位速度。本文中描述的任何Msp蛋白可包含分子发动机。任选地，应用分子发动机以减小分析物进入Msp孔蛋白通道的速率，或减少分析物转位通过Msp孔蛋白通道的转位速度。任选地，转位速度或平均转位速度小于0.5nm/μs。任选地，转位速度或平均转位速度小于0.05nm/μs。任选地，转位速度或平均转位速度小于1个核苷酸/μs。任选地，转位速度或平均转位速度小于0.1个核苷酸/μs。任选地，分析物的运动速率在大于0Hz至2000Hz的范围内。此处，速率是指规整聚合物在1秒钟内前进的亚基(或“聚体”)的数量(Hz)。任选地，该范围在约50至1500Hz、100至1500Hz或350至1500Hz之间。任选地，运动速率为约、至多约或至少约25、75、100、150、200、250、300、350、400、450、500、550、600、650、700、750、800、850、900、950、1000、1050、1100、1150、1200、1250、1300、1350、1400、1450、1500、1550、1600、1650、1700、1750、1800、1850、1900、1950或2000Hz或在可来自其间的任何范围内。在表征过程中，可通过使用以基本上恒定的速率移动分析物的分子发动机来控制速率，至少进行一段时间。此外，运动速率的范围可取决于分子发动机。例如，对于RNA聚合酶，范围可以是350至1500Hz；对于DNA聚合酶，范围可以是75至1500Hz；并且对于核糖体、解旋酶和外切核酸酶，范围可以是50至1500Hz。Molecular motors are generally used to regulate the rate or translocation speed of an analyte when interacting with an Msp porin. Any Msp protein described herein may comprise a molecular motor. Optionally, a molecular motor is applied to reduce the rate at which an analyte enters the Msp porin channel, or to reduce the translocation speed of an analyte translocated through an Msp porin channel. Optionally, the translocation speed or average translocation speed is less than 0.5 nm/μs. Optionally, the translocation speed or average translocation speed is less than 0.05 nm/μs. Optionally, the translocation speed or average translocation speed is less than 1 nucleotide/μs. Optionally, the translocation speed or average translocation speed is less than 0.1 nucleotide/μs. Optionally, the rate of movement of the analyte is in the range of greater than 0 Hz to 2000 Hz. Here, the rate refers to the number of subunits (or "polymers") of a structured polymer that advance in 1 second (Hz). Optionally, the range is between about 50 and 1500 Hz, 100 and 1500 Hz, or 350 and 1500 Hz. Optionally, the rate of motion is about, at most about, or at least about 25, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 Hz, or any range therebetween. During the characterization process, the rate can be controlled by using a molecular motor that moves the analyte at a substantially constant rate, at least for a period of time. Furthermore, the range of motion rates can depend on the molecular motor. For example, for RNA polymerases, the range can be 350 to 1500 Hz; for DNA polymerases, the range can be 75 to 1500 Hz; and for ribosomes, helicases, and exonucleases, the range can be 50 to 1500 Hz.

可用于本文中描述的方法的记录和检测技术。此外，美国专利5,795,782和7,189,503(以其全文通过引用合并入本文)也描述了可用于Msp孔蛋白的记录方法和仪器，以及用于最优化电导读数的方法。美国专利6,746,594(以其全文通过引用合并入本文)描述了用于含有纳米孔的薄膜的支持物和使用可用于本文中描述的Msp孔蛋白的支持物的方法。Recording and detection techniques that can be used with the methods described herein. In addition, U.S. Patents 5,795,782 and 7,189,503 (incorporated herein by reference in their entirety) also describe recording methods and instruments that can be used with Msp porins, as well as methods for optimizing conductivity readings. U.S. Patent 6,746,594 (incorporated herein by reference in its entirety) describes supports for thin films containing nanopores and methods of using supports that can be used with the Msp porins described herein.

另外还提供了包含本文中描述的任何核酸的载体。如本文中所使用的，载体可包含编码单链Msp纳米孔(例如，单链Msp二聚体或单链Msp八聚体)的核酸分子，其中所述核酸分子有效地连接于表达控制序列。适当的载体主链包括例如本领域中常规使用的主链，例如质粒、人工染色体BAC或PAC。许多载体和表达系统可从这样的公司如Novagen(Madison，WI)、Clonetech(Pal Alto，CA)、Stratagene(La Jolla，CA)和Invitrogen/LifeTechnologies(Carlsbad，CA)商购获得。载体通常包含一个或多个调控区。调控区包括但不限于启动子序列、增强子序列、应答元件、蛋白质识别位点、可诱导元件、蛋白质结合序列、5'和3'非翻译区(UTR)、转录起始位点、终止序列、多聚腺苷酸化序列和内含子。Also provided in addition is a vector comprising any nucleic acid described herein. As used herein, a vector can comprise a nucleic acid molecule encoding a single-stranded Msp nanopore (e.g., a single-stranded Msp dimer or a single-stranded Msp octamer), wherein the nucleic acid molecule is effectively connected to an expression control sequence. Suitable vector backbones include, for example, the conventionally used backbones in this area, such as plasmids, artificial chromosomes BAC or PAC. Many vectors and expression systems are commercially available from companies such as Novagen (Madison, WI), Clonetech (Pal Alto, CA), Stratagene (La Jolla, CA) and Invitrogen/Life Technologies (Carlsbad, CA). Vectors typically comprise one or more regulatory regions. Regulatory regions include, but are not limited to, promoter sequences, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5' and 3' untranslated regions (UTRs), transcription initiation sites, terminator sequences, polyadenylation sequences and introns.

在另一个方面，提供了用包含本文中描述的核酸的载体转染的培养细胞。在这一点上，当完整细胞的转录机器接触核酸模板以产生mRNA时，细胞被载体成功地转染。促进载体至细胞内的转染的方案在本领域内是公知的。In another aspect, cultured cells transfected with a vector comprising a nucleic acid as described herein are provided. In this regard, cells are successfully transfected with the vector when the transcription machinery of the intact cell contacts the nucleic acid template to produce mRNA. Protocols for facilitating transfection of vectors into cells are well known in the art.

本文中提供了用上述载体稳定地转染的培养细胞的后代。这样的后代可包含载体的拷贝而无需经历转染方案并且能够转录载体中包含的在表达控制序列控制下的核酸。利用用表达载体转染的培养细胞产生大量多肽的技术在本领域内是公知的。参见，例如，Wang，H.,等人，J.Virology 81:12785(2007)。Provided herein are progeny of cultured cells stably transfected with the above-described vectors. Such progeny may contain copies of the vector without undergoing a transfection protocol and may be capable of transcribing nucleic acids contained in the vector under the control of expression control sequences. Techniques for producing large quantities of polypeptides using cultured cells transfected with expression vectors are well known in the art. See, for example, Wang, H., et al., J. Virology 81:12785 (2007).

本文中还提供了能够诱导型表达Msp的突变细菌菌株。该突变细菌菌株包含野生型MspA的缺失、野生型MspC的缺失、野生型MspD的缺失和包含有效地连接Msp单体核酸序列的诱导型启动子的载体。任选地，该突变细菌菌株包含耻垢分枝杆菌菌株ML16。任选地，所述Msp单体核酸序列编码选自野生型MspA单体、野生型MspC单体、野生型MspD单体和其突变单体之中的Msp单体。任选地，所述诱导型启动子包含乙酰胺诱导型启动子。Also provided herein are mutant bacterial strains capable of inducible expression of Msp. The mutant bacterial strain comprises a deletion of wild-type MspA, a deletion of wild-type MspC, a deletion of wild-type MspD, and a vector comprising an inducible promoter effectively connected to an Msp monomer nucleic acid sequence. Optionally, the mutant bacterial strain comprises Mycobacterium smegmatis strain ML16. Optionally, the Msp monomer nucleic acid sequence encodes an Msp monomer selected from wild-type MspA monomer, wild-type MspC monomer, wild-type MspD monomer, and a mutant thereof. Optionally, the inducible promoter comprises an acetamide-inducible promoter.

任选地，突变的细菌菌株还包含野生型MspB的缺失。包含野生型MspB缺失的突变细菌菌株还可包含具有有效连接核酸序列的组成型启动子的载体，所述核酸序列编码Msp孔蛋白或单体。任选地，Msp孔蛋白或单体选自野生型MspA、野生型MspC、野生型MspD和其突变体。任选地，载体包含本文中描述的任何核酸。Optionally, the mutant bacterial strain further comprises a deletion of wild-type MspB. The mutant bacterial strain comprising a deletion of wild-type MspB may further comprise a vector having a constitutive promoter operably linked to a nucleic acid sequence encoding an Msp porin or monomer. Optionally, the Msp porin or monomer is selected from wild-type MspA, wild-type MspC, wild-type MspD, and mutants thereof. Optionally, the vector comprises any nucleic acid described herein.

还提供了产生完整或部分单链Msp孔蛋白的方法。所述方法包括转化突变的细菌菌株。该突变的菌株包含野生型MspA、野生型MspB、野生型MspC、野生型MspD的缺失和包含有效地连接Msp单体核酸序列的诱导型启动子的载体。用包含能够编码单链Msp孔蛋白的核酸序列的载体转化该突变的菌株。然后从细菌纯化单链Msp孔蛋白。任选地，单链Msp孔蛋白包含单链MspA孔蛋白。任选地，载体包含本文中描述的任何核酸。Also provided are methods for producing complete or partial single-chain Msp porins. The methods include transforming a mutant bacterial strain. The mutant strain comprises a deletion of wild-type MspA, wild-type MspB, wild-type MspC, and wild-type MspD, and a vector comprising an inducible promoter operatively linked to an Msp monomer nucleic acid sequence. The mutant strain is transformed with a vector comprising a nucleic acid sequence encoding a single-chain Msp porin. The single-chain Msp porin is then purified from the bacteria. Optionally, the single-chain Msp porin comprises a single-chain MspA porin. Optionally, the vector comprises any nucleic acid described herein.

还提供了使用单链Msp孔蛋白测定核酸或多肽序列的方法。所述方法包括产生包含第一和第二侧面的脂双层，向脂双层的第一侧面加入纯化的Msp孔蛋白，对脂双层的第二侧面施加正电压，使实验核酸或多肽序列转位通过该单链Msp孔蛋白，将实验阻塞电流与阻塞电流标准相比较，和测定实验序列。任选地，该单链Msp孔蛋白包含野生型MspA单体或其突变的单体。任选地，Msp单体包含选自表1的MspA旁系同源物或同系物单体。Also provided are methods for determining nucleic acid or polypeptide sequences using a single-chain Msp porin. The method comprises generating a lipid bilayer comprising a first and a second side, adding a purified Msp porin to the first side of the lipid bilayer, applying a positive voltage to the second side of the lipid bilayer, translocating the test nucleic acid or polypeptide sequence through the single-chain Msp porin, comparing the test blocking current to a blocking current standard, and determining the test sequence. Optionally, the single-chain Msp porin comprises a wild-type MspA monomer or a mutant thereof. Optionally, the Msp monomer comprises an MspA paralog or homolog monomer selected from Table 1.

除非明确指出择一地指示或选项是相互排斥的，否则术语“或”在权利要求中的使用用于表示“和/或”，虽然本公开内容支持表示择一选项和“和/或”的定义。Unless explicitly stated that the alternatives are mutually exclusive, the use of the term "or" in the claims is intended to mean "and/or," although the present disclosure supports a definition that indicates both the alternative and "and/or."

在整个本申请中，术语“约”用于表示数值包括用于测定该数值的装置或方法的误差的标准差。在与术语“约”结合使用的数值的上下文中论述的任何实施方案中，明确地预期术语约可被省略。Throughout this application, the term "about" is used to indicate that a numerical value includes the standard deviation of error for the device or method being used to determine the value. In any embodiment discussed in the context of a numerical value used in conjunction with the term "about", it is expressly intended that the term about can be omitted.

根据长期专利法，除非明确地指出，否则词"a"和"an"(一种/一个等)，当在权利要求或说明书中与词“包含”结合使用时，表示一个或多个。Under long-standing patent law, the words "a" and "an," when used in conjunction with the word "comprising" in a claim or specification, mean one or more unless expressly stated otherwise.

本发明公开了可与本文公开的方法和组合物结合使用、用于其制备成作为其产物的材料、组合物和组分，本文中公开了这类和其他材料，并且应理解，当公开此类材料的组合、亚组、相互作用、类等、然而未明确地公开此类化合物的每一个不同的个别和集体组合以及排列时，在本文中仍明确地涉及和描述了这些情形。例如，如果公开和论述了某一方法，并且论述了可对该方法中包括的许多分子进行的许多修饰(包括该方法)，那么除非明确地指出与之相反，否则明确地涉及所述方法的所有组合及排列和可能的修饰。同样地，也明确地涉及和公开了此类方法和修饰的任何亚组或组合。该概念用于本公开内容的所有方面，包括但不限于使用所公开组合物的方法中的步骤。因此，如果存在可进行的许多另外的步骤，那么应理解可用任何特定的方法步骤或所公开方法中的方法步骤的组合进行此类另外的步骤中的每一步骤，以及每一个这样的组合或组合的亚组是被明确地涉及的并且应当被认为是公开的。因此预期本说明书中论述的任何实施方案可应用于本文中公开的任何方法、化合物、蛋白质、孔蛋白、肽、多肽、多聚体、单体、核酸、载体、菌株、培养细胞、系统或组合物等，反之亦然。例如，本文中描述的任何蛋白质可用于本文中公开的任何方法。The present invention discloses materials, compositions, and components that can be used in conjunction with the methods and compositions disclosed herein, for use in their preparation as products thereof, and such and other materials are disclosed herein, and it should be understood that when combinations, subgroups, interactions, classes, etc. of such materials are disclosed, while not explicitly disclosing every different individual and collective combination and permutation of such compounds, these are still explicitly contemplated and described herein. For example, if a method is disclosed and discussed, and a number of modifications that can be made to a number of molecules included in the method are discussed (including the method), then unless explicitly stated to the contrary, all combinations and permutations and possible modifications of the method are explicitly contemplated. Similarly, any subgroup or combination of such methods and modifications is also explicitly contemplated and disclosed. This concept applies to all aspects of the present disclosure, including but not limited to steps in the methods using the disclosed compositions. Thus, if there are many additional steps that can be performed, it should be understood that each of such additional steps can be performed using any particular method step or combination of method steps in the disclosed method, and each such combination or subgroup of combinations is explicitly contemplated and should be considered disclosed. It is therefore contemplated that any embodiment discussed in this specification can be applied to any method, compound, protein, porin, peptide, polypeptide, multimer, monomer, nucleic acid, vector, strain, cultured cell, system or composition disclosed herein, and vice versa. For example, any protein described herein can be used in any method disclosed herein.

因此本文中引用的出版物和它们被引用的材料以其全文通过引用合并入本文中。The publications cited herein and the materials for which they are cited are hereby incorporated by reference in their entirety.

为了举例说明(非限定性地)本文中公开的材料，提供下列实施例。The following examples are provided to illustrate, but not to limit, the materials disclosed herein.

实施例Example

实施例1:实施例1至7的材料和方法Example 1: Materials and methods of Examples 1 to 7

由Integrated DNA Technologies，(IDT；Coralville，IA)合成均质的ssDNA寡核苷酸dA₅₀，dC₅₀和dT₅₀(分别为SEQ ID NO:10、SEQ ID NO:16和SEQ ID NO:17)和发夹构建体hp08(5′GCTGTTGC TCTCTC GCAACAGC A₅₀3′)(SEQ ID NO:4)、hp10(5′GCTCTGTTGC TCTCTCGCAACAGAGC A₅₀3′)(SEQ ID NO:5)和hp12(5′GCTGTCTGTTGC TCTCTC GCAACAGACAGC A₅₀3′)(SEQ ID NO:6)。Homogeneous ssDNA oligonucleotides _dA50 , _dC50 , and _dT50 (SEQ ID NO: 10, SEQ ID NO: 16, and SEQ ID NO: 17, respectively) and hairpin constructs hp08 (5' GCTGTTGC TCTCTC GCAACAGC A50 3') (SEQ ID NO: 4), hp10 (5' GCTCTGTTGC TCTCTC GCAACAGAGC _A50 3') (SEQ ID NO: 5), and hp12 (5' GCTGTCTGTTGC TCTCTC GCAACAGACAGC _A50 ₃ ') (SEQ ID NO: 6) were synthesized by Integrated DNA Technologies , (IDT; Coralville, IA).

细菌菌株和生长条件。用于本研究的全部细菌菌株列于表5中。将分枝杆菌在37℃下培养在补充有0.2％甘油、0.05％Tween的Middlebrook 7H9液体培养基(Difco)中或补充有0.2％甘油的Middlebrook 7H10琼脂糖(Difco)上。大肠杆菌DH5α用于全部克隆实验并且在37℃常规地培养于Luria–Bertani(LB)培养基中。以200μg/mL(对于大肠杆菌而言)和50μg/mL(对于耻垢分枝杆菌而言)的浓度使用潮霉素。Bacterial strains and growth conditions. All bacterial strains used in this study are listed in Table 5. Mycobacteria were cultured at 37°C in Middlebrook 7H9 liquid medium (Difco) supplemented with 0.2% glycerol, 0.05% Tween or on Middlebrook 7H10 agarose (Difco) supplemented with 0.2% glycerol. Escherichia coli DH5α was used for all cloning experiments and was routinely cultured in Luria-Bertani (LB) medium at 37°C. Hygromycin was used at concentrations of 200 μg/mL (for E. coli) and 50 μg/mL (for Mycobacterium smegmatis).

表5.菌株和质粒.Table 5. Strains and plasmids.

注释HygR表示对潮霉素的抗性。MspA、mspC和mspD是耻垢分枝杆菌的孔蛋白基因。Annotation: HygR indicates resistance to hygromycin. MspA, mspC, and mspD are porin genes of Mycobacterium smegmatis.

mspA的定点诱变。利用定点诱变，使用如由Bi和Stambrook，Nucl.Acids Res.25:2949(1997)描述的联合链式反应(combined chain reaction，CCR)以分步的方式构建M1MspA和M2MspA突变单体。质粒pMN016含有psmyc-mspA转录融合物(Stephan等人，Mol.Microbiol.58:714(2005))并且用作模板。将寡核苷酸psmyc1和pMS-seq1(表6)分别作为正向和反向引物用于CCR。将3个随后的突变引入mspA以构建m1mspA基因。将3个另外的突变引入m1mspA以产生m2mspA。通过在将所有质粒转化入三重孔蛋白突变耻垢分枝杆菌ML16(Stephan等人，Mol.Microbiol.58:714(2005))以产生蛋白质之前，通过测定完整mspA基因的序列来验证所有质粒。Site-directed mutagenesis of mspA. Using site-directed mutagenesis, M1MspA and M2MspA mutant monomers were constructed in a step-by-step manner using a combined chain reaction (CCR) as described by Bi and Stambrook, Nucl. Acids Res. 25: 2949 (1997). Plasmid pMN016 contains a psmyc-mspA transcriptional fusion (Stephan et al., Mol. Microbiol. 58: 714 (2005)) and was used as a template. Oligonucleotides psmyc1 and pMS-seq1 (Table 6) were used as forward and reverse primers for CCR, respectively. Three subsequent mutations were introduced into mspA to construct the m1mspA gene. Three additional mutations were introduced into m1mspA to generate m2mspA. All plasmids were verified by sequencing the entire mspA gene before transformation into triple porin mutant M. smegmatis ML16 (Stephan et al., Mol. Microbiol. 58:714 (2005)) for protein production.

表6.寡核苷酸Table 6. Oligonucleotides

被改变以引入MspA突变的密码子加以下划线。The codons that were changed to introduce the MspA mutation are underlined.

单通道实验。利用以等比例或非等比例制备的二植烷酰-PA和二植烷酰-PC脂质来产生双分子层，并且如(Akeson等人，Biophys.J.77:3227(1999))所述的铁氟龙(Teflon)中所述横跨水平的约20μm-直径的孔形成。以约2.5ng/mL的浓度向双分子层的一侧(顺面)加入MspA孔蛋白。将顺面接地，并且对双分子层的反面施加正电压。应用Axopatch-1B膜片钳放大器(Axon Instruments)在双分子层两侧施加电压并且测量流过孔的离子电流。用4极Bessel滤波器在50kHz对模拟信号进行低通滤波。在250kHz对滤过的信号数字化。使用在LabWindows/CVI(National Instruments)中书写的定制软件控制数据获取。在21±2℃下于1M KCl，在pH 8下缓冲的10mM Hepes/KOH中进行全部实验。Single channel experiments. Bilayers were generated using diphytanoyl-PA and diphytanoyl-PC lipids prepared in equal or unequal proportions, and pores of approximately 20 μm in diameter were formed across the horizontal plane in Teflon as described in (Akeson et al., Biophys. J. 77:3227 (1999)). MspA porin was added to one side (cis side) of the bilayer at a concentration of approximately 2.5 ng/mL. The cis side was grounded, and a positive voltage was applied to the reverse side of the bilayer. An Axopatch-1B patch clamp amplifier (Axon Instruments) was used to apply voltage across the bilayer and measure the ionic current flowing through the pore. The analog signal was low-pass filtered at 50 kHz using a 4-pole Bessel filter. The filtered signal was digitized at 250 kHz. Data acquisition was controlled using custom software written in LabWindows/CVI (National Instruments). All experiments were performed at 21 ± 2 °C in 1 M KCl, 10 mM Hepes/KOH buffered at pH 8.

数据分析。使用在Matlab(The MathWorks；Natick，MA)中书写的定制软件进行数据分析。阻塞被确定为其中离子电流降至未阻塞电流水平的80％的阈值以下，在该点保持至少12μs，然后自发地返回至未阻塞水平的间期。如果正好在阻塞之前或之后的未阻塞信号的平均电流偏离典型未阻塞水平多于未阻塞信号的平方根噪声值的两倍，则舍弃阻塞。如果阻塞在另一个阻塞的26μs内发生，那么也舍弃所述阻塞。深度阻塞被确定为其中离子电流下降至未阻塞水平的50％以下的间期。其中电流保持在未阻塞水平的80％至50％的间期被确定为“部分阻塞”。每一个事件利用其组成型部分和深度子区间的驻留时间和平均电流来进行参数化。Data Analysis. Data analysis was performed using custom software written in Matlab (The MathWorks; Natick, MA). Blockades were defined as intervals in which the ion current dropped below a threshold of 80% of the unblocked current level, remained at that point for at least 12 μs, and then spontaneously returned to the unblocked level. Blockades were rejected if the average current of the unblocked signal just before or after the block deviated from the typical unblocked level by more than twice the square root noise value of the unblocked signal. Blockades were also rejected if they occurred within 26 μs of another blockade. Deep blockades were defined as intervals in which the ion current dropped below 50% of the unblocked level. Intervals in which the current remained between 80% and 50% of the unblocked level were defined as "partial blockades." Each event was parameterized using the dwell time and average current of its constituent part and depth subintervals.

将用于参数化发夹深度阻塞驻留时间分布的tD值估计为驻留时间的以10为底的对数的概率密度分布的峰值(图8)。使用Matlab核光滑密度估计器(Kernel smoothingdensity estimator)，利用标准核函数(normal kernel function)和0.15的宽度估计该分布。通过检测从小于1nS至大于1nS的电导的陡变来分析跨双层数据。记录这些变化发生时的电压，然后概述于图9E-9G显示的直方图中。The tD value used to parameterize the hairpin deep blockade dwell time distribution was estimated as the peak of the probability density distribution of the logarithm to base 10 of the dwell time ( FIG8 ). The distribution was estimated using the Matlab kernel smoothing density estimator using a normal kernel function and a width of 0.15. The cross-double layer data were analyzed by detecting abrupt changes in conductance from less than 1 nS to greater than 1 nS. The voltage at which these changes occurred was recorded and then summarized in the histograms shown in FIG9E-9G .

在全部实验中，孔的方向被定向为“入口”(图1)暴露于装置的顺面区室(ciscompartment)。In all experiments, the wells were oriented with the "inlet" (Figure 1) exposed to the cis compartment of the device.

图5-8中展示的全部发夹数据来源于对相同长寿命M1MspA孔蛋白采集的数据。利用与所述发夹数据不同的不同长寿命M1MspA孔蛋白获得实施例5中提供的同聚物数据，但在对这两种孔采集的扩展发夹数据集之间存在定量的一致性。All hairpin data presented in Figures 5-8 are derived from data collected for the same long-lived M1MspA porin. The homopolymer data presented in Example 5 were obtained using a different long-lived M1MspA porin than the hairpin data described, but there is quantitative agreement between the extended hairpin data sets collected for the two pores.

实施例2:具有和不具有分析物的野生型MspA(WTMspA)孔蛋白的阻塞特征Example 2: Blocking characteristics of wild-type MspA (WTMspA) porin with and without analyte

MspA孔蛋白的纯化。从耻垢分枝杆菌选择性提取MspA孔蛋白，随后如(Heinz和Niederweis，Anal.Biochem.285:113(2000)；Heinz等人，Methods Mol.Biol.228:139(2003))中所述通过阴离子交换和凝胶过滤层析进行纯化。Purification of MspA Porin. The MspA porin was selectively extracted from M. smegmatis and subsequently purified by anion exchange and gel filtration chromatography as described (Heinz and Niederweis, Anal. Biochem. 285: 113 (2000); Heinz et al., Methods Mol. Biol. 228: 139 (2003)).

与之前的结果(Niederweis等人，Mol.Microbiol.33:933(1999))相一致，纯化的蛋白质显示高通道形成活性，在约20℃下于1.0M KCl中最常见的电导为4.9nS(图2)。将顺面区室接地并且对反面区室施加正电压(图3A)。在约60mV以上，WTMspA孔蛋白在ssDNA不存在的情况下显示频繁的离子电流自发阻塞(图3)。一些自发阻塞是短暂的，其他阻塞需要反转电压来重建未阻塞的电流水平。尽管存在该行为，对于高至约100mV的电压，仍然存在持续数十秒的稳定的未阻塞信号的间期(图3)。向顺面区室加入约2–8μM dC50(SEQ ID NO:48)ssDNA不会导致这些阻塞特征的显著增强或改变。在约100mV以上，自发阻塞如此频繁以至于ssDNA检测实验不能实施。Consistent with previous results (Niederweis et al., Mol. Microbiol. 33:933 (1999)), the purified protein showed high channel-forming activity, with a peak conductance of 4.9 nS in 1.0 M KCl at approximately 20°C (Figure 2). The cis compartment was grounded and a positive voltage was applied to the trans compartment (Figure 3A). Above approximately 60 mV, the WTMspA porin showed frequent spontaneous blockades of ionic current in the absence of ssDNA (Figure 3). Some spontaneous blockades were transient, while others required a voltage reversal to reestablish unblocked current levels. Despite this behavior, intervals of stable unblocked signal persisted for tens of seconds for voltages up to approximately 100 mV (Figure 3). Addition of approximately 2–8 μM dC50 (SEQ ID NO:48) ssDNA to the cis compartment did not result in a significant enhancement or change in these blocking characteristics. Above approximately 100 mV, spontaneous blockades were so frequent that ssDNA detection experiments could not be performed.

ssDNA与WTMspA孔蛋白的相互作用的明显不存在的一个解释是孔中负电荷的高密度(图1)。与带负电荷的通道内部的静电相互作用可能抑制了DNA进入孔内。为了解决该问题，用天冬酰胺置换缢缩区中的天冬氨酸残基(图1)。在实施例3中论述了所得的MspA突变D90N/D91N/D93N(M1MspA)孔蛋白。One explanation for the apparent absence of ssDNA interaction with the WTMspA porin is the high density of negative charges in the pore ( FIG1 ). Electrostatic interactions with the negatively charged interior of the channel may inhibit DNA entry into the pore. To address this issue, the aspartic acid residues in the constriction region were replaced with asparagine ( FIG1 ). The resulting MspA mutant D90N/D91N/D93N (M1MspA) porin is discussed in Example 3.

实施例3:具有和不具有分析物的MspA突变M1MspA孔蛋白的阻塞特征Example 3: Blocking characteristics of the MspA mutant M1MspA porin with and without analyte

实验experiment

如实施例2中所指出的，ssDNA与WTMspA孔蛋白的通道之间的静电相互作用可影响ssDNA通过孔的转位。MspA突变D90N/D91N/D93N(M1MspA，也称为M1-NNN)被设计用以测试该理论。从缺乏大多数内源孔蛋白的耻垢分枝杆菌菌株ML16表达M1MspA孔蛋白并且纯化其(Stephan等人，Mol.Microbiol.58:714(2005))。M1MspA孔蛋白(图4)的表达水平和其通道形成活性与WTMspA孔蛋白相似，然而电导下降至1/2至1/3(图2)。此外，自发阻塞的频率在M1MspA孔蛋白中急剧减少，从而使得可能在高至180mV和更高的电压下进行DNA检测实验(图5)。As noted in Example 2, electrostatic interactions between ssDNA and the channel of the WTMspA porin may affect the translocation of ssDNA through the pore. The MspA mutant D90N/D91N/D93N (M1MspA, also known as M1-NNN) was designed to test this theory. The M1MspA porin was expressed and purified from the Mycobacterium smegmatis strain ML16, which lacks most endogenous porins (Stephan et al., Mol. Microbiol. 58:714 (2005)). The expression level and channel-forming activity of the M1MspA porin ( FIG4 ) were similar to those of the WTMspA porin, however, the conductance was reduced by 1/2 to 1/3 ( FIG2 ). Furthermore, the frequency of spontaneous blockages was dramatically reduced in the M1MspA porin, making it possible to perform DNA detection experiments at voltages as high as 180 mV and higher ( FIG5 ).

ssDNA发夹构建体用于研究DNA与M1MspA孔蛋白的相互作用。每一个构建体具有50-nt poly-dA突出(在3′末端上)、可变长度(对于构建体hp08(SEQ ID NO:4)、hp10(SEQID NO:5)和hp12(SEQ ID NO:6)分别为8、10和20bp)的dsDNA双链体区域和6-nt的环(图6)。在180mV，向顺面区室加入约8μM hp08ssDNA引起瞬间离子电流阻塞的速率从每秒0.1–0.6次阻塞增加至每秒20–50次阻塞(图5)。阻塞速率与DNA浓度成比例并且呈现强烈的电压依赖性，施用的电压减少20-mV，则阻塞速率减小至约1/3。足够长而能良好分辨的阻塞是其中离子电流减小至未阻塞水平的80％至50％的部分阻塞或其中离子电流减少至未阻塞水平的50％以下的深度阻塞(图5C)。同时展示部分和深度子区的阻塞是非常罕见的。部分阻塞持续数十至数百微秒，并且其驻留时间随电压增加而增加(图5C和7)。深度阻塞持续数百微秒至数百毫秒，并且其驻留时间随电压增加而减少(图6和7)。在使用全部三种发夹的实验中均观察到这些趋势。ssDNA hairpin constructs were used to study DNA interactions with the M1MspA porin. Each construct contained a 50-nt poly-dA overhang at the 3' end, a dsDNA duplex region of variable length (8, 10, and 20 bp for constructs hp08 (SEQ ID NO: 4), hp10 (SEQ ID NO: 5), and hp12 (SEQ ID NO: 6), respectively), and a 6-nt loop (Figure 6). At 180 mV, addition of approximately 8 μM hp08 ssDNA to the cis compartment increased the rate of transient ion current blockades from 0.1–0.6 blockades per second to 20–50 blockades per second (Figure 5). The blockade rate was proportional to DNA concentration and strongly voltage-dependent, with a 20-mV decrease in applied voltage reducing the blockade rate by approximately one-third. Blockades long enough to be well resolved were either partial blockades, where the ion current was reduced to 80% to 50% of the unblocked level, or deep blockades, where the ion current was reduced to less than 50% of the unblocked level (Figure 5C). It is very rare to show blockage of both partial and deep sub-regions simultaneously. Partial blockage lasts for tens to hundreds of microseconds, and its residence time increases with increasing voltage (Fig. 5C and 7). Deep blockage lasts for hundreds of microseconds to hundreds of milliseconds, and its residence time decreases with increasing voltage (Fig. 6 and 7). These trends were observed in experiments using all three hairpins.

分析analyze

与利用αHL观察到的类似信号(Butler等人，Biophys.J.93:3229-40(2007))相似，部分阻塞被解释为DNA进入M1MspA孔蛋白的前厅而未使单链区段穿过通道缢缩发生丝状化。对于该机制，预期离子电流有适度减小。无意受限于理论，驻留时间随电压的增加而增加(图7)最可能是因不断增加的静电屏障防止DNA分子从前厅逃回顺面区室而引起的。这一对驻留时间的解释可以在动力学框架内得到理解，在所述框架中，聚合物从前厅的衰减通过逃离所施加电压梯度和一个末端穿过缢缩的丝状化的两个一级过程而发生。因而寿命是这些过程的速率常数之和的倒数。如果(i)逃逸速率常数随电压的升高而减小并且(ii)其减小超过丝状化速率常数的任何变化，那么该寿命将随电压的升高而增加。Similar to similar signals observed with αHL (Butler et al., Biophys. J. 93:3229-40 (2007)), partial blockage is interpreted as DNA entering the vestibule of the M1MspA porin without filamenting the single-stranded segment through the channel constriction. For this mechanism, a modest decrease in ionic current is expected. Without wishing to be bound by theory, the increase in dwell time with increasing voltage ( FIG. 7 ) is most likely due to the increasing electrostatic barrier preventing the DNA molecule from escaping from the vestibule back to the cis compartment. This interpretation of the dwell time can be understood within a kinetic framework in which the decay of the polymer from the vestibule occurs through two first-order processes: escape from the applied voltage gradient and filamentation of one end through the constriction. The lifetime is therefore the inverse of the sum of the rate constants of these processes. If (i) the escape rate constant decreases with increasing voltage and (ii) its decrease exceeds any change in the filamentation rate constant, then the lifetime will increase with increasing voltage.

对于深度阻塞，驻留时间随电压不断升高而明显减少与牵涉DNA逃回顺面区室的任何过程不相符。离子电流减小的程度和驻留时间的电压依赖性与其中单链polydA区段被驱动通过约1-nm直径的缢缩直至约2.2-nm-直径的DNA双链体到达缢缩区并且停止转位的过程相符(图5A)。发夹构建体保持其丝状构型直至DNA双链体的解链(Vercoutere等人，Nat.Biotech.19:248-52(2001)；Sauer-Budge等人，Phys.Rev.Lett.90:238101(2003)；Mathe等人，Biophys.J.87:3205-12(2004))或M1MspA孔蛋白缢缩区的构象重排允许转位完成。不受理论束缚，转位完成的解链机制似乎是最合理的，因为dsDNA螺旋的通过要求缢缩在直径上大致加倍，破坏侧翼连接缢缩的β-桶的氢键(Faller等人，Science 303:1189(2004))并且潜在地将蛋白质的疏水区和双分层内部暴露于水。For deep blockade, the significant decrease in dwell time with increasing voltage is inconsistent with any process involving DNA escape back into the cis compartment. The extent of the ionic current reduction and the voltage dependence of the dwell time are consistent with a process in which a single-stranded polydA segment is driven through an approximately 1-nm diameter constriction until an approximately 2.2-nm diameter DNA duplex reaches the constriction and stops translocation (Figure 5A). The hairpin construct maintains its filamentous configuration until the DNA duplex melts (Vercoutere et al., Nat. Biotech. 19:248-52 (2001); Sauer-Budge et al., Phys. Rev. Lett. 90:238101 (2003); Mathe et al., Biophys. J. 87:3205-12 (2004)) or a conformational rearrangement of the constriction zone of the M1MspA porin allows translocation to complete. Without being bound by theory, a mechanism of unwinding by translocation seems most plausible because passage of the dsDNA helix requires the constriction to roughly double in diameter, disrupting the hydrogen bonds flanking the β-barrel of the constriction (Faller et al., Science 303:1189 (2004)) and potentially exposing hydrophobic regions of the protein and the interior of the bilayer to water.

M1MspA孔蛋白中的发夹深度阻塞具有非常宽广的驻留时间分布，其不能由简单的指数或指数之和充分说明(图8)。为了使分布参数化，使用图6中相应于具有最高阻塞密度的驻留时间的深度阻塞驻留时间tD的对数模式(图8)。对于全部电压，hp08具有最短的tD。在160mV以下，hp10和hp12具有相似的tD。然而，在160mV以上，hp10始终具有比hp12更长的tD。这些观察结果与来自αHL的观察结果(其中利用单指数建立发夹阻塞驻留时间分布的模型并且具有更大的标准生成自由能的发夹一致地产生更长的深度阻塞)稍有不同(Vercoutere等人，Nat.Biotechnol.19:248(2001)；Mathe等人，Biophys.J.87:3205(2004))。假定深度阻塞是由以双链体解链作为限速步骤的转位产生的，那么该过程在M1MspA孔蛋白中应该比在αHL中慢10至100倍(Mathe等人，Biophys.J.87:3205(2004))。有趣地，hp10阻塞持续比hp12阻塞更长的时间。在于180mV下利用hp10的6个重复的实验中，各自观察到为340±7pA的平均未阻塞电流水平和9±1ms(平均值±SEM)的平均tD。The deep blockage of hairpins in the M1MspA porin has a very broad dwell time distribution that cannot be fully described by a simple exponential or sum of exponentials ( FIG. 8 ). To parameterize the distribution, the logarithmic model of the deep blockade dwell times, tD , corresponding to the dwell times with the highest blockade density in FIG. 6 was used ( FIG. 8 ). hp08 had the shortest tD for all voltages. Below 160 mV, hp10 and hp12 had similar tD s. However, above 160 mV, hp10 consistently had a longer tD than hp12. These observations differ slightly from those from αHL, where the hairpin blockade dwell time distribution was modeled using a single exponential and hairpins with larger standard free energies of formation consistently produced longer deep blockades ( Vercoutere et al., Nat. Biotechnol. 19:248 (2001); Mathe et al., Biophys. J. 87:3205 (2004)). Assuming that deep blockade results from translocation with duplex melting as the rate-limiting step, this process should be 10- to 100-fold slower in the M1MspA porin than in αHL (Mathe et al., Biophys. J. 87:3205 (2004)). Interestingly, the hp10 blockade persisted longer than the hp12 blockade. In six replicate experiments using hp10 at 180 mV, an average unblocked current level of 340 ± 7 pA and an average tD of 9 ± 1 ms (mean ± SEM) were observed in each of the six replicates.

实施例4:利用M1MspA孔蛋白进行的跨双层检测Example 4: Trans-bilayer detection using M1MspA porin

理论。theory.

为了获得DNA转位通过MspA的直接证据，使用图9中举例说明的和由Nakane等人创建的跨膜检测技术(Nakane等人，Biophys.J.87:615(2004))。通过电泳驱动在一端具有大块锚复合物的ssDNA探针分子进入纳米孔。游离的ssDNA末端穿过孔进入反面区室直至锚终止转位。如果反面区室含有与ssDNA探针的末端互补的短ssDNA靶分子，那么探针和靶可杂交。如果杂交发生，那么探针以带螺纹的构型被锁定直至施加足够负的电压引起探针与靶解离并且排入顺面区室。如果杂交因随机原因或因为探针末端与靶不互补而未发生，或者如果在反面区域中不存在靶分子，那么探针退回至顺面区域就不需要负电压。出现只有通过足够负的电压才能清除的阻塞的是ssDNA探针已穿过纳米孔至反面区室并且与靶DNA杂交的证据。To obtain direct evidence of DNA translocation through MspA, a transmembrane detection technique, illustrated in FIG9 and developed by Nakane et al. (Nakane et al., Biophys. J. 87:615 (2004)), was used. An ssDNA probe molecule with a bulky anchor complex at one end was electrophoretically driven into the nanopore. The free ssDNA end passed through the pore into the trans compartment until the anchor terminated translocation. If the trans compartment contained a short ssDNA target molecule complementary to the end of the ssDNA probe, the probe and target could hybridize. If hybridization occurred, the probe was locked in a threaded configuration until a sufficiently negative voltage was applied to cause the probe to dissociate from the target and expel into the cis compartment. If hybridization did not occur due to random causes or because the probe end was not complementary to the target, or if there was no target molecule in the trans compartment, then no negative voltage was required for the probe to retreat to the cis compartment. The presence of a blockage that could only be cleared by a sufficiently negative voltage was evidence that the ssDNA probe had passed through the nanopore to the trans compartment and hybridized with the target DNA.

实验。experiment.

构建包含75-nt-长的ssDNA分子的探针分子，所述ssDNA分子在其生物素化的5′末端连接至neutravidin(nA)锚并且在其3′末端具有异质的15-nt长的互补序列。从Invitrogen(Carlsbad，CA)获得nA。由IDT合成两个不同的5′-生物素化的ssDNA构建体5′-bt-dC6dA54d(CTCTATTCTTATCTC)-3′(SEQ ID NO:7)和5′-bt-dC6dA54d(CACACACACACACAC)-3′(SEQ ID NO:8)。将nA和ssDNA构建体以1:1的比例以50μM的浓度混合于实验性1M KCl缓冲液中并且于-20℃下贮存直至即将使用。由IDT合成15-nt长的靶DNA，3′-GAGATAAGAATAGAG-5′(SEQ ID NO:9)，将其悬浮于该实验性缓冲液中，于-20℃下贮存直至即将使用。顺面区室预先加载约100μM的靶DNA并且反面区室用无DNA的缓冲液灌洗。在双分子层形成后，灌洗顺面区室以除去通过孔扩散的任何靶DNA。一旦建立稳定的M1MspA孔蛋白，向反式区室加入nA–ssDNA复合物至约1μM的终浓度。将于LabWindows中书写的定制实验控制软件用于连续监控电流和施加适当的电压。A probe molecule was constructed containing a 75-nt-long ssDNA molecule linked to a neutravidin (nA) anchor at its biotinylated 5′ end and a heterogeneous 15-nt-long complementary sequence at its 3′ end. nA was obtained from Invitrogen (Carlsbad, CA). Two different 5′-biotinylated ssDNA constructs, 5′-bt-dC6dA54d(CTCTATTCTTATCTC)-3′ (SEQ ID NO: 7) and 5′-bt-dC6dA54d(CACACACACACACAC)-3′ (SEQ ID NO: 8), were synthesized by IDT. nA and the ssDNA construct were mixed in a 1:1 ratio in an experimental 1 M KCl buffer at a concentration of 50 μM and stored at −20° C. until immediately used. A 15-nt target DNA, 3′-GAGATAAGAATAGAG-5′ (SEQ ID NO:9), was synthesized by IDT, suspended in the assay buffer, and stored at -20°C until immediately used. The cis compartment was preloaded with approximately 100 μM target DNA, and the trans compartment was priming with DNA-free buffer. After bilayer formation, the cis compartment was priming to remove any target DNA that had diffused through the pore. Once a stable M1MspA porin was established, the nA–ssDNA complex was added to the trans compartment to a final concentration of approximately 1 μM. Custom experimental control software written in LabWindows was used to continuously monitor the current and apply the appropriate voltage.

当将用180mV驱动探针分子从顺面区室进入孔时，观察到不明确的深度电流阻塞。为了进行跨双层实验，利用180mV捕捉探针分子。在短暂延迟以确保ssDNA尽可能穿过M1MspA孔蛋白后，将电压降至40mV并且在该水平上保持5秒以允许15-nt长的靶ssDNA之一与探针的互补末端退火。然后将电压以130mV/s的速度下降。对于每一个事件，探针排出电压Vexit被确定为当在电压逐渐下降时观察到电导发生大且急剧的增加时的电压(图9C和9D)。When the probe molecule was driven from the cis compartment into the pore with 180 mV, an ambiguous deep current blockade was observed. For the cross-double layer experiment, the probe molecule was captured using 180 mV. After a short delay to ensure that the ssDNA passed through the M1MspA porin as much as possible, the voltage was reduced to 40 mV and held at this level for 5 seconds to allow one of the 15-nt long target ssDNAs to anneal to the complementary end of the probe. The voltage was then decreased at a rate of 130 mV/s. For each event, the probe exit voltage Vexit was determined as the voltage at which a large and sharp increase in conductance was observed as the voltage gradually decreased (Figures 9C and 9D).

通过检测电导从小于1至大于1nS的陡变分析跨双分子层数据。记录这些变化发生时的电压，然后概括于图9E–9G中显示的直方图中。关于有关数据分析的其他信息参见实施例1中的材料和方法。Transbilayer data were analyzed by detecting abrupt changes in conductance from less than 1 to greater than 1 nS. The voltage at which these changes occurred was recorded and summarized in the histograms shown in Figures 9E-9G. For additional information on data analysis, see Materials and Methods in Example 1.

分析。analyze.

来自使用3个不同探针/靶组合的实验的Vexit的直方图示于图9中。当探针DNA与靶DNA互补时(图9E)，相当数量的Vexit是负的，表明探针/靶杂交。在6个使用互补探针/靶分子的重复实验中，观察到相似的负Vexit群体。在其中ssDNA 3′末端不与靶分子互补的5个重复实验(图9F)中和在不具有靶DNA的一个实验(图9G)中，很少观察到负的Vexit值。对两个不同的纳米孔使用互补和非互补探针/靶组合。这些孔之一的数据示于图10E和10F中。这些数据提供了ssDNA可穿过M1MspA孔蛋白的明确和直接的证据，从而证实了图5中观察到的深度阻塞确实是由ssDNA转位通过M1MspA孔蛋白引起的这一假设。Histograms of Vexit from experiments using three different probe/target combinations are shown in Figure 9. When the probe DNA was complementary to the target DNA (Figure 9E), a significant number of Vexit values were negative, indicating probe/target hybridization. Similar negative Vexit populations were observed in six replicate experiments using complementary probe/target molecules. In five replicate experiments in which the ssDNA 3' end was not complementary to the target molecule (Figure 9F) and in one experiment without target DNA (Figure 9G), few negative Vexit values were observed. Complementary and non-complementary probe/target combinations were used for two different nanopores. Data for one of these pores are shown in Figures 10E and 10F. These data provide clear and direct evidence that ssDNA can pass through the M1MspA porin, confirming the hypothesis that the deep blockage observed in Figure 5 is indeed caused by ssDNA translocation through the M1MspA porin.

实施例5:MspA突变M1MspA孔蛋白和线性均质的ssDNAExample 5: MspA mutant M1MspA porin and linear homogeneous ssDNA

还研究了M1MspA孔蛋白与线性均质ssDNA 50-聚体之间的相互作用。在180mV，向顺面区室加入约8μM dT50引起每秒约5次阻塞(图10)，比在dT50(SEQ ID NO:32)不存在的情况下的阻塞速率增加约20倍。这些阻塞的大部分短于30μs，这些阻塞太短以至不能分辨内部结构或估计阻塞的深度。使用dA50(SEQ ID NO:49)和dC50(SEQ ID NO:48)的实验产生相似的结果。观察到的阻塞的短持续时间表明这些线性均质的ssDNA 50-聚体的转位通常短于30μs。所述阻塞也与聚合物进入前厅的短途旅行(以逃回顺面区室结束)相符。虽然在使用线性ssDNA 50-聚体的实验中转位和逃逸都可能发生，但不可能估计这两个过程的相对频率。The interaction between the M1MspA porin and linear homogeneous ssDNA 50-mers was also studied. At 180 mV, the addition of approximately 8 μM dT50 to the cis compartment resulted in approximately 5 blockages per second ( FIG. 10 ), an approximately 20-fold increase in the blockage rate in the absence of dT50 (SEQ ID NO: 32). Most of these blockages were shorter than 30 μs, which was too short to discern the internal structure or estimate the depth of the blockage. Experiments using dA50 (SEQ ID NO: 49) and dC50 (SEQ ID NO: 48) produced similar results. The short duration of the observed blockages suggests that the translocation of these linear homogeneous ssDNA 50-mers is typically shorter than 30 μs. The blockages also coincide with the polymer's short trip into the antechamber (ending with escape back into the cis compartment). Although both translocation and escape can occur in experiments using linear ssDNA 50-mers, it is impossible to estimate the relative frequencies of these two processes.

实施例6:具有和不具有分析物的MspA突变体M2MspA孔蛋白的阻塞特征Example 6: Blocking characteristics of the MspA mutant M2MspA porin with and without analyte

为了进一步检查MspA孔蛋白中电荷对其DNA分析能力的效应，对M1MspA孔蛋白进行3个另外的突变，并且用带正电荷的残基置换前厅中和入口周围的带负电荷的残基(图1)。所得的突变体D90N/D91N/D93N/D118R/D134R/E139K(M2MspA)孔蛋白显示与WTMspA(图2)孔蛋白相似的表达水平(图4)和通道形成活性。To further examine the effect of charge in the MspA porin on its DNA analysis ability, three additional mutations were made to the M1MspA porin, replacing negatively charged residues in the vestibule and around the entrance with positively charged residues ( FIG1 ). The resulting mutant D90N/D91N/D93N/D118R/D134R/E139K (M2MspA) porin showed similar expression levels ( FIG4 ) and channel formation activity to the WTMspA ( FIG2 ) porin.

与M1MspA孔蛋白一样，M2MspA孔蛋白具有比WTMspA孔蛋白(图2)更小的电导并且对于高至180mV或更高的电压展示最小的自发阻塞。在180mV，向顺面区室加入2μM dT50(SEQ ID NO:32)导致每秒约25次阻塞的阻塞速率(图10B)。以明确的向下峰电位结束的约100μs的部分阻塞是常见阻塞模式(图10C)。部分阻塞持续时间和其以向下峰电位结束的趋势都随电压的升高而增加(图11)。这些趋势与其中聚合物进入前厅并且保持在其中(从而产生部分阻塞直至一端进入强电场缢缩和启动转位)的过程相符。该机制准确地解释了使用αHL观察到的相似的部分-至-深度阻塞模式(Butler等人，Biophys.J.93:3229(2007))。向下峰电位的短持续时间表明线性ssDNA 50-聚体通过M2MspA孔蛋白的转位短于约30μs。不以向下峰电位结束的部分阻塞被解释为逃回至顺面区室或短于约10μs的转位，该转位太短以至在这些实验中不能被观察到。Like the M1MspA porin, the M2MspA porin has a smaller conductance than the WTMspA porin ( FIG. 2 ) and exhibits minimal spontaneous blockade for voltages up to 180 mV or higher. At 180 mV, addition of 2 μM dT50 (SEQ ID NO: 32) to the cis compartment resulted in a blockade rate of approximately 25 blockades per second ( FIG. 10B ). A partial blockade of approximately 100 μs, ending with a clear downward spike, was a common blocking pattern ( FIG. 10C ). Both the duration of the partial blockade and its tendency to end with a downward spike increased with increasing voltage ( FIG. 11 ). These trends are consistent with a process in which the polymer enters the antechamber and remains there, thereby generating a partial blockade until one end enters a strong electric field, constricts, and initiates translocation. This mechanism accurately explains the similar partial-to-deep blockade pattern observed with αHL (Butler et al., Biophys. J. 93: 3229 (2007)). The short duration of the downward spike suggests that translocation of the linear ssDNA 50-mer through the M2MspA porin is shorter than approximately 30 μs. Partial blockades that do not end with a downward spike are interpreted as escape back to the cis compartment or translocation shorter than approximately 10 μs, which is too short to be observed in these experiments.

实施例7:M1和M2MspA突变孔蛋白与αHL性质的比较Example 7: Comparison of properties of M1 and M2MspA mutant porins and αHL

M1MspA与M2MspA孔蛋白之间的重要相似性是线性ssDNA 50聚体的转位似乎太快以至于不能产生具有可分辨的结构的深度阻塞。不受理论束缚，这一观察结果表明对于两种突变体来说都相同的缢缩是主要决定线性ssDNA分子转位通过MspA孔蛋白的速度的区域。比较M1MspA和M2MspA孔蛋白的约2至10个碱基/μs的MspA转位速度与利用αHL观察到的约0.5–1个碱基/μs的转位速度(Meller等人，Proc.Natl Acad.Sci.USA 97:1079(2000)；Butler等人，Biophys.J.93:3229(2007))，支持了通道几何形状和组成的细节在确定转位速度中起着首要作用的概念。A key similarity between the M1MspA and M2MspA porins is that the translocation of linear ssDNA 50-mers appears too rapid to produce deep blockages with discernible structures. Without being bound by theory, this observation suggests that the constriction, which is identical for both mutants, is the region that primarily determines the rate at which linear ssDNA molecules translocate through the MspA porin. Comparison of the MspA translocation velocities of approximately 2 to 10 bases/μs for the M1MspA and M2MspA porins with the approximately 0.5–1 base/μs observed with αHL (Meller et al., Proc. Natl Acad. Sci. USA 97:1079 (2000); Butler et al., Biophys. J. 93:3229 (2007)) supports the concept that details of channel geometry and composition play a primary role in determining translocation speed.

在MspA孔蛋白和αHL的情况下，转位速度上的大差异可能因侧面连接缢缩的通道区域的宽度所致。如果DNA与通道壁之间的相互作用减慢了DNA的通过(Slonkina andKolomeisky，J.Chem.Phys.118:7112-8(2003))，那么在αHL中预期含有更慢的转位，其中被高度限制在缢缩区和跨膜区中的10至20个碱基被迫与通道壁相互作用。在MspA孔蛋白中，缢缩区中只有2至4个碱基被迫与该蛋白质接触。缢缩区中电荷的分布是αHL与M1和M2MspA突变孔蛋白之间的另一个显著差异。αHL缢缩区由E111、K147和M113的侧链形成(Song等人，Science 274:1859(1996))，这迫使带负电荷的ssDNA主链与7个带正电荷的和7个带负电荷残基极其靠近。M1和M2MspA突变孔蛋白的缢缩区中带电荷残基的缺乏还可能是造成与αHL相比较更快的转位速度的原因。The large difference in translocation speed in the case of the MspA porin and αHL may be due to the width of the channel region flanking the constriction. If interactions between DNA and the channel walls slow down the passage of DNA (Slonkina and Kolomeisky, J. Chem. Phys. 118:7112-8 (2003)), then slower translocation would be expected in αHL, where the 10 to 20 bases that are highly confined in the constriction and transmembrane region are forced to interact with the channel walls. In the MspA porin, only 2 to 4 bases in the constriction are forced to contact the protein. The distribution of charge in the constriction is another significant difference between αHL and the M1 and M2 MspA mutant porins. The αHL constriction is formed by the side chains of E111, K147, and M113 (Song et al., Science 274:1859 (1996)), which forces the negatively charged ssDNA backbone into close proximity with seven positively and seven negatively charged residues. The lack of charged residues in the constriction of the M1 and M2MspA mutant porins may also be responsible for the faster translocation rate compared to αHL.

两个MspA突变孔蛋白之间的同聚物阻塞特征的进一步比较有助于了解通道中带电荷残基的排列是如何影响其与DNA的相互作用。对于给定的ssDNA浓度，M2MspA孔蛋白的阻塞速率约为M1MspA孔蛋白速率的20倍(图10B)。在低至约80mV下，M2MspA孔蛋白还显示可容易地观察的阻塞，然而对于M1MspA孔蛋白，低于约140mV几乎未看到阻塞。最后，针对M2MspA孔蛋白的部分阻塞至少为M1MspA孔蛋白的约100倍长(图9C)。这些趋势与简单静电模型相符，其中M2MspA孔蛋白中带正电荷的残基都促进ssDNA进入前厅并且抑制ssDNA分子从前厅逃回顺面区室。这些观察结果显示带电荷残基的适当放置提供了实质上定制(tailor)MspA孔蛋白与DNA之间的相互作用的简单方法。Further comparison of the homopolymer blocking characteristics between the two MspA mutant porins helps to understand how the arrangement of charged residues in the channel affects its interaction with DNA. For a given ssDNA concentration, the blocking rate of the M2MspA porin is about 20 times that of the M1MspA porin (Figure 10B). At as low as about 80mV, the M2MspA porin also shows easily observable blockages, while for the M1MspA porin, almost no blockage is seen below about 140mV. Finally, the partial blockage for the M2MspA porin is at least about 100 times longer than that for the M1MspA porin (Figure 9C). These trends are consistent with a simple electrostatic model, in which the positively charged residues in the M2MspA porin all promote ssDNA entry into the vestibule and inhibit the ssDNA molecule from escaping from the vestibule back to the cis compartment. These observations show that the appropriate placement of charged residues provides a simple method to essentially tailor the interaction between the MspA porin and DNA.

实施例8:M1MspA孔蛋白通过发夹(hp)区段识别保持在孔中的DNA中的单核苷酸Example 8: M1MspA porin recognizes single nucleotides in DNA held in the pore via the hairpin (hp) segment

如实施例3中所述进行使用M1MspA孔蛋白和(i)具有包埋在poly-A背景中的单个C及(ii)具有包埋在poly-A背景中的单个T的poly-A DNA链的实验。如上文中所指出的，所述发夹在MspA孔蛋白缢缩区中保持该DNA构建体的时间足够长而能获得极好地确定的电流特征(current signature)。Experiments using the M1MspA porin and poly-A DNA strands (i) with a single C embedded in a poly-A background and (ii) with a single T embedded in a poly-A background were performed as described in Example 3. As noted above, the hairpin retained the DNA construct in the MspA porin constriction long enough to obtain a well-defined current signature.

包埋在poly-A DNA发夹构建体中的单个C。图12A显示由发夹后位置1、2和3上的单个C以及poly-A和poly-C的混合物产生的电流直方图。每一个位点的电流直方图极不相同，并且显示“识别位点”在位置2的附近。为了进行更多的定量描述，根据针对poly-C和poly-A发现的电流差异标度电流分布的峰值(图12B)。高斯拟合显示MspA孔蛋白对于单个C的识别位置是距离发夹所在的位置1.7个核苷酸(nt)。识别位置的长度(缢缩区长度)可与高斯的宽度(1.6nt，约长)相当。Figure 12A shows the current histograms generated by single Cs at positions 1, 2, and 3 after the hairpin, as well as a mixture of poly-A and poly-C. The current histograms for each position are very different and show a "recognition site" near position 2. For a more quantitative description, the peak of the current distribution was scaled according to the current difference found for poly-C and poly-A (Figure 12B). Gaussian fitting shows that the recognition position of the MspA porin for a single C is 1.7 nucleotides (nt) from the position of the hairpin. The length of the recognition position (constriction length) is comparable to the width of the Gaussian (1.6 nt, approximately 1.5 nt long).

包埋在poly-A DNA发夹构建体中的单个T。以相似的方式进行在poly-A DNA中使用单个T的实验，仅聚焦于与发夹相邻的前3个位置上(图13，图框2-4)。特异性同样令人难忘，但在该情况下在位置1的附近展示最大的灵敏度。单个T的位置可远比一个位置好分辨。不受理论束缚，本发明者推测与poly-A中的C相比较而言位置识别的差异实际上由促成形成缢缩的静电环境的DNA本身引起。关于C背景中的单个A的数据示于图13的最底下的3个图框中。虽然单个A产生仅能与poly-C背景微弱地分开的电流阻塞特征，但电流分布足够窄，能够分辨单个A。A在poly-C链中的最佳位置似乎在位置2附近，即，与单个C在A链中的相似。A single T embedded in a poly-A DNA hairpin construct. Experiments using a single T in poly-A DNA were performed in a similar manner, focusing only on the first three positions adjacent to the hairpin ( FIG. 13 , panels 2-4). Specificity was equally impressive, but in this case, the greatest sensitivity was exhibited near position 1. The position of a single T can be resolved much better than a single position. Without being bound by theory, the inventors speculate that the difference in positional recognition compared to a C in poly-A is actually caused by the DNA itself, which contributes to the electrostatic environment that forms the constriction. Data for a single A in a background of C are shown in the bottom three panels of FIG. 13 . Although a single A produces a current blockade feature that is only weakly distinguishable from the poly-C background, the current distribution is narrow enough to resolve a single A. The optimal position for A in a poly-C chain appears to be near position 2, similar to that of a single C in an A chain.

位置3以外的DNA尾的组成不影响碱基识别性质。Poly-A DNA形成二级结构，并且poly-A背景中的C与poly-C背景中的A之间的数据差异可归因于poly-A尾的二级结构(刚度)的中断。利用由A或C三核苷酸占据的前3个位置之后的47个碱基长的异质序列进行测量。发现电流水平不能与纯A50和C50尾的电流水平相区别，从而表明尾二级结构或组成不影响电流阻塞(图14)。The composition of the DNA tail outside position 3 does not affect the base recognition properties. Poly-A DNA forms a secondary structure, and the data difference between the C in the poly-A background and the A in the poly-C background can be attributed to the interruption of the secondary structure (rigidity) of the poly-A tail. Measurements were made using a 47-base long heterogeneous sequence after the first 3 positions occupied by either A or C trinucleotides. It was found that the current level could not be distinguished from that of pure A50 and C50 tails, thereby indicating that the tail secondary structure or composition does not affect the current blockade (Figure 14).

进行另外系列的实验(1)以评估M1MspA孔蛋白区别不同核苷酸的能力和(2)评估孔蛋白对其敏感的区域的位置和长度(空间分辨率)。在这些实验中，使用具有50个核苷酸的ssDNA链的不同DNA构建体，所述ssDNA链连接至14碱基对发夹区段以阻止瞬间转位(immediate translocation)。数据概述于图31中。dA₅₀(SEQ ID NO:49)和dC₅₀(SEQ ID NO:48)产生显著不同的阻塞电流。接着，测试一系列构建体，识别位点被分离至发夹后的前4个碱基。这些构建体在发夹之后具有dC₄dA₄₆(SEQ ID NO:15)、dA₃dC₄dA₄₃(SEQ ID NO:12)和dA₆dC₄dA₄₀(SEQ ID NO:11)的ssDNA序列。dC₄dA₄₆展示几乎与dC₅₀相同的阻塞电流分布，而dA₃dC₄dA₄₃和dA₆dC₄dA₄₀阻塞与dA₅₀相似。这使识别位点缩窄至发夹后的前3个核苷酸。然后，使用位于poly-dA背景中的不同位置上的单个dC测试构建体。Hp-dC₁dA₄₉(dC在位置1上)(SEQ ID NO:14)以处于poly-dA与poly-dC值中间的水平阻塞电流。构建体dA₂dC₃dA₄₇(dC在位置3上)(SEQ ID NO:50)以处于poly-dA与poly-dC中间的水平阻塞电流，但更接近poly-dA。Poly-dT₅₀(SEQ ID NO:32)以最小的电流发生阻塞，hp-dG₃dA₄₇(SEQ ID NO:18)产生处于poly-dC与poly-dA中间的电流。在不同突变体(D90/91Q+D93N+D118/134R+E139K)中，测量poly-dC、poly-dA和poly-dT的阻塞电流并且其是彼此可区分的。这些数据表明M1MspA孔蛋白具有识别能力并且识别位点较短。此外，假定发夹正好在缢缩区的顺面被阻止，那么所述识别位点似乎位于缢缩区。An additional series of experiments were performed to (1) assess the ability of the M1MspA porin to discriminate between different nucleotides and (2) assess the location and length (spatial resolution) of the regions to which the porin is sensitive. In these experiments, different DNA constructs were used with ssDNA chains of 50 nucleotides attached to a 14 base pair hairpin segment to prevent immediate translocation. The data are summarized in Figure 31. dA ₅₀ (SEQ ID NO:49) and dC ₅₀ (SEQ ID NO:48) produced significantly different blocking currents. Next, a series of constructs were tested in which the recognition site was isolated to the first four bases after the hairpin. These constructs had ssDNA sequences after the hairpin of dC ₄ dA ₄₆ (SEQ ID NO:15), dA ₃ dC ₄ dA ₄₃ (SEQ ID NO:12), and dA ₆ dC ₄ dA ₄₀ (SEQ ID NO:11). dC ₄ dA ₄₆ showed a blocking current distribution almost identical to dC ₅₀ , while dA ₃ dC ₄ dA ₄₃ and dA ₆ dC ₄ dA ₄₀ blocked currents similar to dA _50. This narrowed the recognition site to the first three nucleotides after the hairpin. The constructs were then tested using a single dC at different positions within a poly-dA background. Hp-dC ₁ dA ₄₉ (dC at position 1) (SEQ ID NO: 14) blocked current at a level intermediate between the poly-dA and poly-dC values. Construct dA ₂ dC ₃ dA ₄₇ (dC at position 3) (SEQ ID NO: 50) blocked current at a level intermediate between the poly-dA and poly-dC values, but closer to poly-dA. Poly-dT ₅₀ (SEQ ID NO:32) produced the smallest current block, while hp-dG ₃ dA ₄₇ (SEQ ID NO:18) produced a current intermediate between poly-dC and poly-dA. Blockade currents for poly-dC, poly-dA, and poly-dT were measured in different mutants (D90/91Q+D93N+D118/134R+E139K) and were distinguishable from each other. These data suggest that the M1MspA porin has recognition capabilities and a relatively short recognition site. Furthermore, given that the hairpin is blocked cis-laterally to the constriction, the recognition site appears to be located within the constriction.

实施例9:突变的MspA M1-QQN和M2-QQN孔蛋白的构建和表征Example 9: Construction and characterization of mutant MspA M1-QQN and M2-QQN porins

在设计以用减慢DNA转位通过MspA孔蛋白通道的另一组实验中，产生两个另外的突变体。以与上述M1-NNN(或M1MspA)相似的方式，通过用谷氨酰胺置换野生型MspA单体的位置90和91中的氨基酸和用天冬酰胺置换位置90中氨基酸来产生一个突变体(称为M1-QQN)。关于M2-QQN，通过在M2MspA突变体(参见实施例6；D90Q+D91Q+D93N+D118R+E139K+D134R)的背景中在位置90和91上引入更庞大的谷氨酰胺来减小孔的缢缩区。其在上文实施例1和3中描述的耻垢分枝杆菌ML16突变体中表达。M2-QQN孔蛋白在去垢剂提取物中的量与WTMspA孔蛋白的量一样高(图15A)，这表明新突变不影响孔表达。脂双层实验显示M2-QQN孔蛋白与WTMspA孔蛋白一样形成稳定的开放孔(图15B)。孔形成活性与WTMspA孔蛋白相似。M2-QQN孔蛋白的单通道电导(2.4nS)高于其亲本M2(1.4nS)的电导。In another set of experiments designed to slow DNA translocation through the MspA porin channel, two additional mutants were generated. In a similar manner to M1-NNN (or M1MspA) described above, one mutant (designated M1-QQN) was generated by replacing the amino acids at positions 90 and 91 of the wild-type MspA monomer with glutamine and the amino acid at position 90 with asparagine. Regarding M2-QQN, the constriction zone of the pore was reduced by introducing bulkier glutamines at positions 90 and 91 in the context of the M2MspA mutant (see Example 6; D90Q+D91Q+D93N+D118R+E139K+D134R). This was expressed in the M. smegmatis ML16 mutant described in Examples 1 and 3 above. The amount of M2-QQN porin in detergent extracts was as high as that of the WTMspA porin ( FIG. 15A ), indicating that the new mutations did not affect pore expression. Lipid bilayer experiments showed that the M2-QQN porin formed stable open pores, similar to the WTMspA porin ( Figure 15B ). Its pore-forming activity was similar to that of the WTMspA porin. The single-channel conductance of the M2-QQN porin (2.4 nS) was higher than that of its parental M2 (1.4 nS).

QQN突变体在A、C和T碱基之间也可区分。与M1MspA突变孔蛋白(也称为M1-NNN突变体)的性质相似，使用同聚物-hp链，QQN突变体展示良好分辨的电流水平，但水平之间的相对间距在M1-QQN孔蛋白中是不同的。对于每一个孔，收集关于具有A50、T50和C50尾(分别地SEQ ID NO:49、SEQ ID NO:32、SEQ ID NO:48)的发夹的数据。将阻塞电流作为未阻塞的开孔电流的一部分作图(图16)。在每一个情况下，poly-T比poly-C更容易阻塞，poly-C比poly-A更容易阻塞。每一个峰与其他峰很好分辨。在QQN孔蛋白中，平均poly-A和poly-C电流水平不如M1-NNN孔蛋白中的好分离，但在M1-NNN孔蛋白中，poly-T与poly-C更好分离。令人惊讶地，两种QQN突变孔蛋白中poly-T阻塞的相对水平截然不同。这两个突变体仅在远离缢缩区的边缘结构域置换上有不同。不受理论束缚，这可能是由于边缘结构域与锚定发夹之间的相互作用造成的。The QQN mutant also distinguishes between A, C, and T bases. Similar to the properties of the M1MspA mutant porin (also known as the M1-NNN mutant), the QQN mutant exhibits well-resolved current levels using homopolymeric hp chains, but the relative spacing between levels is different in the M1-QQN porin. For each pore, data were collected for hairpins with A50, T50, and C50 tails (SEQ ID NO:49, SEQ ID NO:32, and SEQ ID NO:48, respectively). The blockade current was plotted as a fraction of the unblocked open pore current ( FIG16 ). In each case, poly-T blocked more readily than poly-C, which in turn blocked more readily than poly-A. Each peak was well resolved from the others. In the QQN porin, the average poly-A and poly-C current levels were not as well separated as in the M1-NNN porin, but in the M1-NNN porin, poly-T and poly-C were better separated. Surprisingly, the relative levels of poly-T blockade were strikingly different in the two QQN mutant porins. The two mutants differ only in the displacement of the edge domain away from the constriction. Without being bound by theory, this may be due to an interaction between the edge domain and the anchoring hairpin.

QQN突变孔蛋白似乎减慢DNA转位通过MspA。构建QQN突变体的主要动机是减慢DNA通过。随深度阻塞状态的持续时间记录异质的100nt ssDNA区段(不具有锚定发夹)的转位。存在曲线(图17)显示持续比时间t更长的阻塞事件的部分。在大约前100μs过程中，NNN突变体衰减显著快速于具有QQN缢缩区的突变体。这些数据与通过QQN增加对转位的阻碍相符。The QQN mutant porin appears to slow DNA translocation through MspA. The primary motivation for constructing the QQN mutant was to slow DNA passage. Translocation of a heterogeneous 100nt ssDNA segment (without an anchoring hairpin) was recorded as the duration of the deeply blocked state. The presence curve ( FIG17 ) shows a portion of the blocking event that lasted longer than time t. During the first approximately 100 μs, the NNN mutant decayed significantly faster than the mutant with the QQN constriction. These data are consistent with increased resistance to translocation by QQN.

实施例10:耻垢分枝杆菌四重Msp缺失突变体的构建Example 10: Construction of a quadruple Msp deletion mutant of Mycobacterium smegmatis

为了制备MspA孔蛋白，选择性提取来自突变菌株耻垢分枝杆菌ML16的蛋白质，所述菌株只包含4个Msp基因之一(MspB)(其他基因是MspA、MspC和MspD)。所述方法通过在0.5％正辛基聚氧乙烯烃(OPOE)(非离子去垢剂)中煮沸耻垢分枝杆菌细胞而利用了MspA的极端热稳定性，得到极少污染有其他蛋白质的MspA孔蛋白(Heinz和Niederweis，Anal.Biochem.285:113-20(2000))。然而，仍然可在使用Msp-特异性抗血清的免疫印迹中检测到MspB的背景表达(Stephan等人，Mol.Microbiol.58:714-30(2005))，这表明混合的MspA/MspB寡聚物可形成和促成在孔重建实验中观察到的孔异质性。因此，目的之一是构建不含内源孔蛋白的耻垢分枝杆菌菌株。由于耻垢分枝杆菌需要孔蛋白活性以存活，因此，对于孔蛋白三重突变体ML16的分枝杆菌噬菌体L5，将loxP-侧翼连接的MspA表达盒整合入染色体attB位点。To prepare the MspA porin, the protein was selectively extracted from a mutant strain of Mycobacterium smegmatis ML16, which contains only one of the four Msp genes (MspB) (the others are MspA, MspC, and MspD). This method exploits the extreme thermostability of MspA by boiling M. smegmatis cells in 0.5% n-octylpolyoxyethylene olefin (OPOE), a nonionic detergent, to yield an MspA porin with minimal contamination by other proteins (Heinz and Niederweis, Anal. Biochem. 285:113-20 (2000)). However, background expression of MspB was still detectable in immunoblots using Msp-specific antiserum (Stephan et al., Mol. Microbiol. 58:714-30 (2005)), suggesting that mixed MspA/MspB oligomers can form and contribute to the pore heterogeneity observed in pore reconstitution experiments. Therefore, one goal was to construct a strain of M. smegmatis that lacks endogenous porins. Since M. smegmatis requires porin activity for survival, a loxP-flanked MspA expression cassette was integrated into the chromosomal attB site for the mycobacteriophage L5 of the porin triple mutant ML16.

这使MspA单体在菌株ML56中的表达恢复至野生型水平的一半。接着，在如(Stephan等人，Gene 343:181-190(2004))所描述的两步策略中使用自杀载体pMN247利用FRT-侧翼连接的hyg基因置换MspB基因。在利用Flp重组酶切除hyg基因后，获得孔蛋白四重突变菌株ML59(ΔMspA ΔMspB ΔMspC ΔMspD attB::loxP-MspA-loxP)。通过Southern印迹杂交确认MspB基因的缺失。PCR证明4个原始Msp基因中的每一个基因均不存在(图19)。loxP-MspA-loxP盒的切除导致小的能存活的克隆，更详尽地检查其中之一(ML180)。使用相同的高温方法从ML180细胞提取蛋白质，Western分析显示ML180细胞不表达Msp孔蛋白，在加入20μg蛋白质后在脂双层实验中也不存在任何重建事件(图20)。这些结果综合在一起显示已产生了缺乏全部4种Msp孔蛋白的耻垢分枝杆菌孔蛋白突变体。然而，使用MspA表达载体不可能检测到MspA单体的表达，这最可能是由于未知的次级突变所致。因此，该耻垢分枝杆菌菌株不能用于针对DNA转位而工程改造的MspA孔的表达。This restored the expression of the MspA monomer in strain ML56 to half of the wild-type level. Next, the suicide vector pMN247 was used to replace the MspB gene using the hyg gene connected by FRT-flank in a two-step strategy as described by (Stephan et al., Gene 343:181-190 (2004)). After utilizing the Flp recombinase to excise the hyg gene, the porin quadruple mutant strain ML59 (ΔMspA ΔMspB ΔMspC ΔMspD attB::loxP-MspA-loxP) was obtained. The deletion of the MspB gene was confirmed by Southern blot hybridization. PCR proved that each of the four original Msp genes did not exist (Figure 19). Excision of the loxP-MspA-loxP box resulted in small viable clones, one of which (ML180) was examined in more detail. Using the same high temperature method to extract protein from ML180 cells, Western analysis showed that ML180 cells do not express Msp porins, and there are no reconstitution events in lipid bilayer experiments after adding 20 μg of protein (Figure 20). These results, taken together, show that a Mycobacterium smegmatis porin mutant lacking all four Msp porins has been generated. However, it was not possible to detect expression of MspA monomers using the MspA expression vector, which is most likely due to unknown secondary mutations. Therefore, this Mycobacterium smegmatis strain cannot be used for expression of MspA pores engineered for DNA translocation.

实施例11:利用MspA的诱导型表达构建耻垢分枝杆菌四重Msp缺失突变体ML705Example 11: Construction of the Mycobacterium smegmatis quadruple Msp deletion mutant ML705 using inducible expression of MspA

为了分离野生型和突变MspA孔蛋白，目前使用耻垢分枝杆菌ML16菌株(△MspA、△MspC、△MspD)。然而，MspB的背景表达使得转位实验的解释变得复杂。因此，需要构建缺乏全部4种Msp基因的耻垢分枝杆菌菌株来改进单孔实验。为了达到这一点，将在乙酰胺诱导型启动子控制下的MspA基因整合入耻垢分枝杆菌ML16的L5attB位点，从而导致通过等位基因交换除去MspB基因。因此，在乙酰胺存在的情况下，表达MspA来拯救耻垢分枝杆菌四重突变体的生长。To isolate wild-type and mutant MspA porins, the Mycobacterium smegmatis ML16 strain (ΔMspA, ΔMspC, ΔMspD) is currently used. However, background expression of MspB complicates the interpretation of translocation experiments. Therefore, it is necessary to construct an M. smegmatis strain lacking all four Msp genes to improve single-well experiments. To achieve this, the MspA gene under the control of an acetamide-inducible promoter was integrated into the L5attB site of M. smegmatis ML16, resulting in the removal of the MspB gene by allelic exchange. Therefore, in the presence of acetamide, expression of MspA rescues the growth of the M. smegmatis quadruple mutant.

为了实现这一点，构建整合质粒pML967，其包含在乙酰胺诱导型启动子控制之下的MspA基因(图21A)。还构建MspB缺失载体pML1611(图21B)，其包含两个报告基因gfp和xylE作为整合和等位基因置换的标记。To achieve this, the integrative plasmid pML967 was constructed, which contains the MspA gene under the control of an acetamide-inducible promoter ( FIG. 21A ). The MspB deletion vector pML1611 ( FIG. 21B ) was also constructed, which contains two reporter genes, gfp and xylE, as markers for integration and allelic replacement.

在将MspA单体表达质粒pML967整合入耻垢分枝杆菌ML16后获得菌株ML341(ML16，attP::pML967)。通过由之前描述的质粒pML2005临时表达Flp重组酶(Song等人，Mycobacteria protocols(2008))从该菌株除去潮霉素抗性基因，从而产生菌株ML343(ML341，attP::p_acet-MspA)。为了检查整合的MspA基因单体的功能性，用去垢剂从未诱导的和诱导的细胞提取MspA。图22显示在加入2％的乙酰胺后MspA以野生型水平的20％从整合的构建体表达。该MspA单体水平足以使耻垢分枝杆菌能够存活。在未诱导的细胞中存在极低的Msp孔蛋白的背景表达(图22)，表明表达系统受到调控。After the MspA monomer expression plasmid pML967 was integrated into Mycobacterium smegmatis ML16, strain ML341 (ML16, attP::pML967) was obtained. The hygromycin resistance gene was removed from the strain by temporarily expressing the Flp recombinase (Song et al., Mycobacteria protocols (2008)) from the plasmid pML2005 described previously, thereby producing strain ML343 (ML341, attP::p _acet -MspA). In order to check the functionality of the integrated MspA gene monomer, MspA was extracted from uninduced and induced cells with detergent. Figure 22 shows that after adding 2% acetamide, MspA was expressed from the integrated construct at 20% of the wild-type level. This MspA monomer level is sufficient to enable Mycobacterium smegmatis to survive. There was very low background expression of Msp porin in uninduced cells (Figure 22), indicating that the expression system is regulated.

然后，将MspB缺失载体pML1611转化入ML343。将转化体涂板在含有10％蔗糖的Middlebrook 7H10琼脂板上以进行双交换候选者的直接选择。获得若干集落，所述集落在用蓝光照射时并且在XylE不存在的情况下通过绿色荧光显示GFP的存在。来自所述克隆之一的菌落PCR(Colony PCR)确认了MspB基因的不存在和能存活的Msp四重突变体的构建。该菌株称为ML378。用pCreSacB1质粒转化ML378菌株以除去gfp-hyg表达盒。在随后的负选择后，获得若干克隆，通过菌落PCR对其进行检查。耻垢分枝杆菌的8个未标记的孔蛋白四重突变体之一称为ML705，对其进一步进行表征。The MspB deletion vector pML1611 was then transformed into ML343. Transformants were plated on Middlebrook 7H10 agar plates containing 10% sucrose for direct selection of double-crossover candidates. Several colonies were obtained that displayed the presence of GFP by green fluorescence when illuminated with blue light and in the absence of XylE. Colony PCR from one of these clones confirmed the absence of the MspB gene and the construction of a viable Msp quadruple mutant. This strain was designated ML378. The ML378 strain was transformed with the pCreSacB1 plasmid to remove the gfp-hyg expression cassette. After subsequent negative selection, several colonies were obtained and examined by colony PCR. One of the eight untagged porin quadruple mutants of Mycobacterium smegmatis, designated ML705, was further characterized.

为了检查MspA单体是否补偿四重突变体的表型，将MspA表达质粒pMN016转化入ML705。图24显示ML705在7H10琼脂板上的生长急剧减少；然而，MspA从pMN016的表达完全将ML705的生长恢复至野生型水平(图23)。这些结果证明无次级突变引起了生长缺陷，并且可表达MspA单体以在Msp四重突变体ML705中产生MspA孔蛋白。To examine whether MspA monomers compensate for the phenotype of the quadruple mutant, MspA expression plasmid pMN016 was transformed into ML705. Figure 24 shows that the growth of ML705 on 7H10 agar plates was dramatically reduced; however, expression of MspA from pMN016 completely restored the growth of ML705 to wild-type levels (Figure 23). These results demonstrate that no secondary mutations cause the growth defect and that MspA monomers can be expressed to produce MspA porins in the Msp quadruple mutant ML705.

孔蛋白四重突变体ML705在Middlebrook 7H9培养基中的生长远比野生型耻垢分枝杆菌的生长慢得多，并且显著慢于孔蛋白三重突变体ML16的生长(图24)。加入2％乙酰胺以诱导L5位点上的MspA基因单体的表达和质粒pMN016上的MspA的表达使生长速率恢复至野生型水平(图24)。ML705在板上和在液体培养基中的生长都比所述三重突变体的生长慢，表明ML705在外膜中具有比Msp三重突变体ML16更少的孔蛋白。这一假定在Western印迹中得到验证(图25)。MspA单体的量少于野生型(wt)耻垢分枝杆菌的MspA单体量的5％，并且比三重突变体的MspA单体量少50％。图25还显示当加入2％乙酰胺时，我们可将MspA诱导至高达野生型的25％。The growth of the porin quadruple mutant ML705 in Middlebrook 7H9 medium was much slower than that of wild-type M. smegmatis and significantly slower than that of the porin triple mutant ML16 (Figure 24). Adding 2% acetamide to induce expression of the MspA gene monomer at the L5 site and expression of MspA on plasmid pMN016 restored the growth rate to wild-type levels (Figure 24). The growth of ML705 on plates and in liquid culture was slower than that of the triple mutant, indicating that ML705 has fewer porins in the outer membrane than the Msp triple mutant ML16. This hypothesis was verified in Western blotting (Figure 25). The amount of MspA monomer was less than 5% of the amount of MspA monomer in wild-type (wt) M. smegmatis and 50% less than the amount of MspA monomer in the triple mutant. Figure 25 also shows that when 2% acetamide was added, we could induce MspA to 25% of the wild-type level.

上述实验显示已构建了Msp四重突变体(Ml705)，其可在乙酰胺存在的情况下生长以临时产生野生型MspA单体。然后用含有野生型或突变的MspA单体、或者野生型或突变的单链Msp孔蛋白的表达盒的质粒转化ML705菌株。可通过洗掉乙酰胺和将细胞转移至无乙酰胺的培养基来关闭野生型MspA单体的产生。这引起野生型或突变的MspA单体的产生或者更少污染有野生型MspA的野生型或突变的单链Msp孔蛋白的产生。因此，ML705适合用于产生野生型和突变的MspA孔蛋白以用于任何目的。The above experiments show that an Msp quadruple mutant (M1705) has been constructed that can grow to temporarily produce wild-type MspA monomers in the presence of acetamide. The ML705 strain is then transformed with a plasmid containing an expression cassette for a wild-type or mutant MspA monomer or a wild-type or mutant single-chain Msp porin. The generation of the wild-type MspA monomer can be shut off by washing away acetamide and transferring the cells to a culture medium without acetamide. This causes the generation of wild-type or mutant MspA monomers or the generation of wild-type or mutant single-chain Msp porin that is less contaminated with wild-type MspA. Therefore, ML705 is suitable for producing wild-type and mutant MspA porin for any purpose.

实施例12:单链MspA孔蛋白二聚体的构建Example 12: Construction of single-chain MspA porin dimers

单链DNA不是旋转对称的(rotationally symmetric)。因此，为了测序目的具有不对称的孔是比较有利的。为了组合MspA孔蛋白的优良测序能力与增强的使前厅和缢缩区性质适应DNA测序的能力，将构建单链MspA纳米孔。将MspA链末端在MspA孔蛋白二聚体中连接在一起(图26A)，其可通过短肽连接体连接。为了测试该想法，将MspA基因单体与MspB基因单体融合在一起，其编码与野生型MspA单体相比较仅有2个改变(A138P，E139A)的蛋白质(Stahl等人，Mol.Microbiol.40:451(2001))。将通常用于连接蛋白质的(GGGGS)₃(SEQ IDNO:3)肽(Huston等人，Proc.Natl.Acad.Sci.USA 85:5879(1988))用于连接MspA单体的C末端与MspB单体的缺乏信号肽的N末端(图26B)。将所得的MspA-MspB孔蛋白二聚体置于质粒pML870中组成型p_smyc启动子的控制之下，然后将其在耻垢分枝杆菌ML16中表达。使用标准热提取方法纯化蛋白质。虽然单链MspA孔蛋白二聚体的表达水平比野生型MspA孔蛋白的表达水平低(图26C)，但两种孔蛋白的通道活性相似(图26D)。电流记录的分析显示由MspA二聚体形成的孔的单通道电导为2.6nS。该结果显示连接体区段未削弱Msp孔的折叠或功能。Single-stranded DNA is not rotationally symmetric. Therefore, it is advantageous to have an asymmetric hole for sequencing purposes. In order to combine the excellent sequencing capabilities of the MspA porin with the enhanced ability to adapt the vestibule and constriction properties to DNA sequencing, a single-stranded MspA nanopore will be constructed. The MspA chain ends are linked together in the MspA porin dimer (Figure 26A), which can be connected by a short peptide linker. To test this idea, the MspA gene monomer is fused to the MspB gene monomer, which encodes a protein with only 2 changes (A138P, E139A) compared to the wild-type MspA monomer (Stahl et al., Mol.Microbiol.40:451 (2001)). The (GGGGS) ₃ (SEQ ID NO: 3) peptide (Huston et al., Proc. Natl. Acad. Sci. USA 85: 5879 (1988)), commonly used to link proteins, was used to link the C-terminus of the MspA monomer to the N-terminus of the MspB monomer, which lacks a signal peptide ( FIG. 26B ). The resulting MspA-MspB porin dimer was placed under the control of the constitutive _psmyc promoter in plasmid pML870 and then expressed in Mycobacterium smegmatis ML16. The protein was purified using standard heat extraction methods. Although the expression level of the single-chain MspA porin dimer was lower than that of the wild-type MspA porin ( FIG. 26C ), the channel activity of the two porins was similar ( FIG. 26D ). Analysis of current recordings showed that the single-channel conductance of the pore formed by the MspA dimer was 2.6 nS. This result shows that the linker segment does not impair the folding or function of the Msp pore.

实施例13:单链MspA孔蛋白的构建Example 13: Construction of single-chain MspA porin

为了组合MspA的优良测序能力与增加的使前厅和缢缩区性质适应DNA测序的能力，将构建允许前厅和缢缩区的最佳性质用于DNA测序的单链MspA孔蛋白八聚体。将MspA链末端在MspA孔蛋白中连接在一起，通过短肽连接体连接。使用(GGGGS)₃(SEQ ID NO:3)肽将前面的MspA单体的羧基末端连接至后面的MspA单体的缺乏信号肽的氨基末端。To combine the excellent sequencing capabilities of MspA with the increased ability to adapt the vestibule and constriction properties to DNA sequencing, a single-stranded MspA porin octamer was constructed that allows the optimal properties of the vestibule and constriction to be used for DNA sequencing. The MspA chain ends were linked together in the MspA porin by a short peptide linker. The carboxyl terminus of the preceding MspA monomer was linked to the amino terminus of the subsequent MspA monomer lacking a signal peptide using the (GGGGS) ₃ (SEQ ID NO: 3) peptide.

为了产生含有MspA孔蛋白序列的载体，用唯一的限制性位点侧翼连接每一个MspA单体序列，这样能够突变任何单个单体。用PacI和HindIII限制性位点侧翼连接整个MspA孔蛋白序列。MspA单体序列之间的限制性位点包括：BamHI、ClaI、EcoRV、HpaI、KpnI、MluI、NdeI、NheI、PstI、ScaI、SpeI、XbaI、NotI和SphI(图31)。为了产生MspA孔蛋白序列，利用唯一的限制性位点分步装配每一个MspA序列以形成二聚体、四聚体和八聚体单链MspA。为了在产生单链MspA多聚体中避免重组的问题，使用不同的密码子用法合成7个MspA基因(SEQID NO:21，SEQ ID NO:22，SEQ ID NO:23，SEQ ID NO:24，SEQ ID NO:25，SEQ ID NO:26，SEQID NO:27)，即所述基因编码完全相同的氨基酸序列，然而DNA序列与原始MspA基因核苷酸序列(SEQ ID NO:20)相比发生了改变。为了产生MspA孔蛋白序列，第一Msp单体必须包含图18中所示的前导序列(例如，SEQ ID NO:28的氨基酸1至27))。第一Msp单体序列后的7个Msp单体序列中的每一个序列可包含SEQ ID NO:1或选自表7中所列突变的SEQ ID NO:1突变。表达载体pML2604是含有克隆入PacI和HindIII限制性位点的MspA孔蛋白序列的亲本载体。将pML2604转化入四重孔蛋白突变体，通过非变性和变性蛋白的Western印迹检查MspA孔蛋白的表达水平以及寡聚状态。通过脂双层实验检查MspA孔蛋白的通道活性。To generate vectors containing the MspA porin sequence, each MspA monomer sequence was flanked by unique restriction sites to allow for mutation of any individual monomer. The entire MspA porin sequence was flanked by PacI and HindIII restriction sites. Restriction sites between MspA monomer sequences include: BamHI, ClaI, EcoRV, HpaI, KpnI, MluI, NdeI, NheI, PstI, ScaI, SpeI, XbaI, NotI, and SphI (Figure 31). To generate the MspA porin sequence, each MspA sequence was assembled in steps using unique restriction sites to form dimers, tetramers, and octamers of single-stranded MspA. To avoid recombination issues in generating single-chain MspA multimers, seven MspA genes (SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27) were synthesized using different codon usage, i.e., the genes encode identical amino acid sequences, but the DNA sequences are altered compared to the original MspA gene nucleotide sequence (SEQ ID NO: 20). To generate the MspA porin sequence, the first Msp monomer must contain the leader sequence shown in Figure 18 (e.g., amino acids 1 to 27 of SEQ ID NO: 28). Each of the seven Msp monomer sequences following the first Msp monomer sequence can contain SEQ ID NO: 1 or a mutation of SEQ ID NO: 1 selected from the mutations listed in Table 7. The expression vector pML2604 is the parent vector containing the MspA porin sequence cloned into Pad and HindIII restriction sites. pML2604 was transformed into the quadruple porin mutant, and the expression level and oligomeric state of the MspA porin were examined by Western blotting of native and denatured proteins. The channel activity of the MspA porin was examined by lipid bilayer assay.

表7:MspA突变体Table 7: MspA mutants

本发明具体涉及以下实施方式：The present invention specifically relates to the following embodiments:

1.包括对具有界定通道的前厅和缢缩区的耻垢分枝杆菌孔蛋白(Msp)施加电场的方法，其中Msp孔蛋白位于第一导电液体介质与第二导电液体介质之间。1. A method comprising applying an electric field to a Mycobacterium smegmatis porin (Msp) having a vestibule and a constriction region defining a channel, wherein the Msp porin is located between a first conductive liquid medium and a second conductive liquid medium.

2.实施方式1的方法，其中所述Msp孔蛋白选自野生型MspA孔蛋白、突变MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白和突变MspA旁系同源物或同系物孔蛋白。2. The method of embodiment 1, wherein the Msp porin is selected from the group consisting of a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, and a mutant MspA paralog or homolog porin.

3.实施方式1或实施方式2的方法，其中所述Msp孔蛋白还包括分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下通过电泳转位入或穿过通道时的平均转位速度。3. The method of embodiment 1 or embodiment 2, wherein the Msp porin further comprises a molecular motor, wherein the molecular motor is capable of moving the analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte when translocated into or through the channel by electrophoresis in the absence of the molecular motor.

4.实施方式1至3之一中的方法，其中至少一种导电液体介质包含分析物。4. The method of any one of embodiments 1 to 3, wherein at least one conductive liquid medium contains an analyte.

5.实施方式3或实施方式4的方法，其还包括在方法中检测分析物，所述方法包括当分析物与通道相互作用时测量离子电流以提供电流模式，其中电流模式中阻塞的出现标示着分析物的存在。5. The method of embodiment 3 or embodiment 4, further comprising detecting an analyte in a method comprising measuring an ionic current to provide a current pattern when the analyte interacts with the channel, wherein the presence of a blockade in the current pattern is indicative of the presence of the analyte.

6.实施方式3至5之一中的方法，其中施加电场足以引起分析物穿过电泳转移通过通道。6. The method of any one of embodiments 3 to 5, wherein the applied electric field is sufficient to cause electrophoretic transport of the analyte through the channel.

7.实施方式3至6之一中的方法，其中所述Msp孔蛋白是突变MspA或者突变MspA旁系同源物或同系物孔蛋白，并且分析物穿过通道的平均转位速度小于分析物穿过野生型MspA或者野生型MspA旁系同源物或同系物孔蛋白的通道的平均转位速度。7. The method of any one of embodiments 3 to 6, wherein the Msp porin is a mutant MspA or a mutant MspA paralog or homolog porin, and the average translocation velocity of the analyte through the channel is less than the average translocation velocity of the analyte through the channel of a wild-type MspA or a wild-type MspA paralog or homolog porin.

8.实施方式3至6之一中的方法，其中所述Msp孔蛋白是突变MspA或者突变MspA旁系同源物或同系物孔蛋白，并且分析物穿过通道的平均转位速度大于分析物穿过野生型MspA或者野生型MspA旁系同源物或同系物孔蛋白的通道的平均转位速度。8. The method of any one of embodiments 3 to 6, wherein the Msp porin is a mutant MspA or a mutant MspA paralog or homolog porin, and the average translocation velocity of the analyte through the channel is greater than the average translocation velocity of the analyte through the channel of a wild-type MspA or a wild-type MspA paralog or homolog porin.

9.实施方式3至8之一中的方法，其中分析物穿过通道的平均转位速度小于0.5nm/μs。9. The method of any one of embodiments 3 to 8, wherein the average translocation velocity of the analyte through the channel is less than 0.5 nm/μs.

10.实施方式3至9之一中的方法，其还包括鉴定分析物。10. The method of any one of embodiments 3 to 9, further comprising identifying the analyte.

11.实施方式10的方法，其中鉴定分析物包括将电流模式与在相同的条件下使用已知的分析物获得的已知电流模式相比较。11. The method of embodiment 10, wherein identifying the analyte comprises comparing the current pattern to a known current pattern obtained under the same conditions using a known analyte.

12.实施方式3至11之一中的方法，其中分析物是核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、污染物、纳米级物体或生物战剂。12. The method of any one of embodiments 3 to 11, wherein the analyte is a nucleotide, a nucleic acid, an amino acid, a peptide, a protein, a polymer, a drug, an ion, a pollutant, a nanoscale object, or a biological warfare agent.

13.实施方式3至12之一中的方法，其中分析物是聚合物。13. The method of any one of embodiments 3 to 12, wherein the analyte is a polymer.

14.实施方式13的方法，其中聚合物是蛋白质、肽或核酸。14. The method of embodiment 13, wherein the polymer is a protein, a peptide, or a nucleic acid.

15.实施方式14的方法，其中聚合物是核酸。15. The method of embodiment 14, wherein the polymer is a nucleic acid.

16.实施方式15的方法，其中核酸穿过通道的平均转位速度小于1个核苷酸/μs。16. The method of embodiment 15, wherein the average translocation velocity of the nucleic acid through the channel is less than 1 nucleotide/μs.

17.实施方式15或实施方式16的方法，其中核酸是ssDNA、dsDNA、RNA或其组合。17. The method of embodiment 15 or embodiment 16, wherein the nucleic acid is ssDNA, dsDNA, RNA, or a combination thereof.

18.实施方式13至17之一中的方法，其还包括区分聚合物内的至少第一单元与聚合物内的至少第二单元，所述区分包括测量当第一和第二单元分别地转位通过通道时产生的离子电流，以分别产生第一和第二电流模式，其中第一和第二电流模式彼此不同。18. The method of any one of embodiments 13 to 17, further comprising distinguishing at least a first unit within the polymer from at least a second unit within the polymer, wherein the distinguishing comprises measuring ionic currents generated when the first and second units are respectively translocated through the channel to generate first and second current patterns, respectively, wherein the first and second current patterns are different from each other.

19.实施方式13至18之一中的方法，其还包括测定聚合物的序列。19. The method of any one of embodiments 13 to 18, further comprising determining the sequence of the polymer.

20.实施方式19的方法，其中测序包括当聚合物的每一个单元分别地转位通过通道时测量离子电流或光信号，以提供与每一个单元关联的电流模式，和将每一个电流模式与在相同条件下获得的已知单元的电流模式相比较，以便测定聚合物的序列。20. The method of embodiment 19, wherein sequencing comprises measuring the ionic current or optical signal as each unit of the polymer is separately translocated through the channel to provide a current pattern associated with each unit, and comparing each current pattern with a current pattern of a known unit obtained under the same conditions to determine the sequence of the polymer.

21.实施方式3至20之一中的方法，其还包括测定分析物的浓度、大小、分子量、形状或取向或其任何组合。21. The method of any one of embodiments 3 to 20, further comprising determining the concentration, size, molecular weight, shape or orientation of the analyte, or any combination thereof.

22.实施方式1至21之一中的方法，其中至少一种导电液体介质包含多种分析物。22. The method of any one of embodiments 1 to 21, wherein the at least one conductive liquid medium comprises a plurality of analytes.

23.实施方式1至22之一中的方法，其中所述Msp孔蛋白被进一步确定为突变MspA孔蛋白。23. The method of any one of embodiments 1 to 22, wherein the Msp porin is further defined as a mutant MspA porin.

24.实施方式23的方法，其中所述突变MspA孔蛋白包包含：24. The method of embodiment 23, wherein the mutant MspA porin comprises:

界定通道的前厅和缢缩区，和Antechambers and constrictions defining the passageway, and

至少第一突变MspA单体，其包含位置93上的突变和位置90、位置91上的突变或位置90及91上的突变。At least a first mutant MspA monomer comprises a mutation at position 93 and a mutation at position 90, a mutation at position 91, or a mutation at positions 90 and 91.

25.实施方式23或实施方式24的方法，其中突变MspA孔蛋白的缢缩区的直径小于野生型MspA孔蛋白的缢缩区的直径。25. The method of embodiment 23 or embodiment 24, wherein the diameter of the constriction zone of the mutant MspA porin is smaller than the diameter of the constriction zone of the wild-type MspA porin.

26.实施方式23至25之一中的方法，其中所述突变MspA孔蛋白在前厅或缢缩区中包含突变，所述突变允许分析物通过电泳转位通过突变体的通道的平均转位速度小于分析物通过电泳转位通过野生型MspA孔蛋白的通道的平均转位速度。26. The method of any one of embodiments 23 to 25, wherein the mutant MspA porin comprises a mutation in the vestibule or constriction that allows the average translocation rate of an analyte by electrophoretic translocation through the channel of the mutant to be less than the average translocation rate of an analyte by electrophoretic translocation through the channel of the wild-type MspA porin.

27.实施方式24至26之一中的方法，其中所述第一突变MspA单体还在下列氨基酸位置：88、105、108、118、134或139的任何位置上包含一个或多个突变。27. The method of any one of embodiments 24 to 26, wherein the first mutant MspA monomer further comprises one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134, or 139.

28.实施方式23至27之一中的方法，其中所述突变MspA孔蛋白包含中性缢缩区。28. The method of any one of embodiments 23 to 27, wherein the mutant MspA porin comprises a neutral constriction region.

29.实施方式23至28之一中的方法，其中通过突变MspA孔蛋白的通道的电导高于通过其相应的野生型MspA孔蛋白的通道的电导。29. The method of any one of embodiments 23 to 28, wherein the conductance of the channel through the mutant MspA porin is higher than the conductance of the channel through its corresponding wild-type MspA porin.

30.实施方式23的方法，其中所述突变MspA孔蛋白包含30. The method of embodiment 23, wherein the mutant MspA porin comprises

具有约2至约6nm的长度和约2至约6nm的直径的前厅，和a vestibule having a length of about 2 to about 6 nm and a diameter of about 2 to about 6 nm, and

具有约0.3至约3nm的长度和约0.3至约3nm的直径的缢缩区,a constriction having a length of about 0.3 to about 3 nm and a diameter of about 0.3 to about 3 nm,

其中所述前厅和缢缩区一起界定了通道，和wherein the vestibule and constriction together define a passageway, and

还包含至少第一突变MspA旁系同源物或同系物单体。Also included is at least a first mutant MspA paralog or homolog monomer.

31.实施方式30的方法，其中突变MspA孔蛋白的缢缩区的直径小于其相应的野生型MspA孔蛋白的缢缩区的直径。31. The method of embodiment 30, wherein the diameter of the constriction zone of the mutant MspA porin is smaller than the diameter of the constriction zone of its corresponding wild-type MspA porin.

32.实施方式30或实施方式31的方法，其中所述突变MspA孔蛋白在前厅或缢缩区包含突变，所述突变允许分析物通过电泳转位通过突变体的通道的平均转位速度小于分析物通过电泳转位通过其相应的野生型MspA孔蛋白的通道的平均转位速度。32. The method of embodiment 30 or embodiment 31, wherein the mutant MspA porin comprises a mutation in the vestibule or constriction that allows an average translocation rate of an analyte by electrophoretic translocation through the channel of the mutant to be less than an average translocation rate of an analyte by electrophoretic translocation through the channel of its corresponding wild-type MspA porin.

33.实施方式30至32之一中的方法，其中通过突变MspA孔蛋白的通道的电导高于通过其相应的野生型MspA孔蛋白的通道的电导。33. The method of any one of embodiments 30 to 32, wherein the conductance of the channel through the mutant MspA porin is higher than the conductance of the channel through its corresponding wild-type MspA porin.

34.实施方式1、3至6或9至22之一中的方法，其中所述Msp孔蛋白被进一步确定为这样的Msp孔蛋白，其包含34. The method of any one of embodiments 1, 3 to 6, or 9 to 22, wherein the Msp porin is further defined as an Msp porin comprising

其中所述前厅和缢缩区一起界定了通道。The vestibule and the constriction zone together define a passageway.

35.实施方式1、3至6或9至22之一中的方法，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：35. The method of any one of embodiments 1, 3 to 6, or 9 to 22, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

(a)第一和第二核苷酸序列，其中所述第一核苷酸序列编码第一Msp单体序列并且第二核苷酸序列编码第二Msp单体序列；和(a) a first and a second nucleotide sequence, wherein the first nucleotide sequence encodes a first Msp monomer sequence and the second nucleotide sequence encodes a second Msp monomer sequence; and

(b)编码氨基酸连接体序列的第三核苷酸序列。(b) a third nucleotide sequence encoding an amino acid linker sequence.

36.实施方式35的方法，其中所述第一和第二Msp单体序列独立地选自野生型MspA单体、野生型MspB单体、野生型MspC单体、野生型MspD单体和其突变体。36. The method of embodiment 35, wherein the first and second Msp monomer sequences are independently selected from a wild-type MspA monomer, a wild-type MspB monomer, a wild-type MspC monomer, a wild-type MspD monomer, and mutants thereof.

37.实施方式35的方法，其中所述第一Msp单体序列包含野生型MspA单体或其突变体。37. The method of embodiment 35, wherein the first Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof.

38.实施方式37的方法，其中所述第一Msp单体序列包含突变MspA单体。38. The method of embodiment 37, wherein the first Msp monomer sequence comprises a mutant MspA monomer.

39.实施方式1、3至6或9至22之一中的方法，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：39. The method of any one of embodiments 1, 3 to 6, or 9 to 22, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

(a)第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列，其中所述第1、第2、第3、第4、第5、第6、第7和第8核苷酸序列分别编码第1、第2、第3、第4、第5、第6、第7和第8Msp单体序列；和(a) the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences, wherein the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th nucleotide sequences encode the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th and 8th Msp monomer sequences, respectively; and

(b)编码氨基酸连接体序列的第9核苷酸序列。(b) The ninth nucleotide sequence encoding the amino acid linker sequence.

40.实施方式39的方法，其中每一个Msp单体包含野生型MspA单体或其突变体。40. The method of embodiment 39, wherein each Msp monomer comprises a wild-type MspA monomer or a mutant thereof.

41.实施方式39的方法，其中至少一个Msp单体包含野生型MspA单体或其突变体。41. The method of embodiment 39, wherein at least one Msp monomer comprises a wild-type MspA monomer or a mutant thereof.

42.实施方式35至41之一中的方法，其中所述Msp单体是野生型MspA旁系同源物或同系物。42. The method of any one of embodiments 35 to 41, wherein the Msp monomer is a wild-type MspA paralog or homolog.

43.实施方式42的方法，其中所述野生型MspA旁系同源物或同系物选自MspA/Msmeg0965、MspB/Msmeg0520、MspC/Msmeg5483、MspD/Msmeg6057、MppA、PorM1、PorM2、PorM1、Mmcs4296、Mmcs4297、Mmcs3857、Mmcs4382、Mmcs4383、Mjls3843、Mjls3857、Mjls3931Mjls4674、Mjls4675、Mjls4677、Map3123c、Mav3943、Mvan1836、Mvan4117、Mvan4839、Mvan4840、Mvan5016、Mvan5017、Mvan5768、MUL_2391、Mflv1734、Mflv1735、Mflv2295、Mflv1891、MCH4691c、MCH4689c、MCH4690c、MAB1080、MAB1081、MAB2800、RHA1ro08561、RHA1ro04074和RHA1ro03127。43. The method of embodiment 42, wherein the wild-type MspA paralog or homolog is selected from the group consisting of MspA/Msmeg0965, MspB/Msmeg0520, MspC/Msmeg5483, MspD/Msmeg6057, MppA, PorM1, PorM2, PorM1, Mmcs4296, Mmcs4297, Mmcs3857, Mmcs4382, Mmcs4383, Mjls3843, Mjls3857, Mjls3931, Mjls4674, Mjls4675, Mjls4676, 7. Map3123c, Mav3943, Mvan1836, Mvan4117, Mvan4839, Mvan4840, Mvan5016, Mvan5017, Mvan5768, MUL_2391, Mflv1734, Mflv173 5. Mflv2295, Mflv1891, MCH4691c, MCH4689c, MCH4690c, MAB1080, MAB1081, MAB2800, RHA1ro08561, RHA1ro04074 and RHA1ro03127.

44.改变通过耻垢分枝杆菌孔蛋白(Msp)的通道的电导的方法，包括在野生型Msp孔蛋白的前厅或缢缩区中除去、添加或置换至少一个氨基酸。44. A method of altering the conductance of a channel through a Mycobacterium smegmatis porin (Msp) comprising removing, adding or substituting at least one amino acid in the vestibule or constriction of a wild-type Msp porin.

45.包括具有界定通道的前厅和缢缩区的耻垢分枝杆菌孔蛋白(Msp)的系统，其中所述通道位于第一液体介质与第二液体介质之间，其中至少一种液体介质包含分析物，并且其中所述系统对于检测分析物的性质是有效的。45. A system comprising a Mycobacterium smegmatis porin (Msp) having a vestibule and a constriction region defining a channel, wherein the channel is located between a first liquid medium and a second liquid medium, wherein at least one of the liquid media comprises an analyte, and wherein the system is effective for detecting a property of the analyte.

46.实施方式45的系统，其中所述系统对于将分析物转位穿过通道是有效的。46. The system of embodiment 45, wherein the system is effective for translocating the analyte through the channel.

47.实施方式45的系统，其中所述系统对于检测分析物的性质是有效的，包括将Msp孔蛋白经受电场以便分析物与Msp孔蛋白相互作用。47. The system of embodiment 45, wherein the system is effective for detecting a property of an analyte comprising subjecting the Msp porin to an electric field such that the analyte interacts with the Msp porin.

48.实施方式45至47之一的方法，其中所述系统对于检测分析物的性质是有效的，包括将Msp孔蛋白经受电场以便分析物通过电泳转位通过Msp孔蛋白的通道。48. The method of any one of embodiments 45 to 47, wherein the system is effective for detecting a property of an analyte comprising subjecting the Msp porin to an electric field such that the analyte is electrophoretically translocated through a channel of the Msp porin.

49.实施方式45至48之一中的系统，其中所述性质是分析物的电性质、化学性质或物理性质。49. The system of any one of embodiments 45 to 48, wherein the property is an electrical property, a chemical property, or a physical property of the analyte.

50.实施方式45至49之一中的系统，其中所述Msp孔蛋白包含在脂双层中。50. The system of any one of embodiments 45 to 49, wherein the Msp porin is contained in a lipid bilayer.

51.实施方式45至50之一中的系统，其还被定义为包含多个Msp孔蛋白。51. The system of any one of embodiments 45 to 50, further defined as comprising a plurality of Msp porins.

52.实施方式45至51之一中的系统，其中所述Msp孔蛋白选自野生型MspA孔蛋白、突变MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白、或者突变MspA旁系同源物或同系物孔蛋白。52. The system of any one of embodiments 45 to 51, wherein the Msp porin is selected from a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, or a mutant MspA paralog or homolog porin.

53.实施方式45至52之一中的系统，其中所述Msp孔蛋白被进一步确定为突变MspA孔蛋白。53. The system of any one of embodiments 45 to 52, wherein the Msp porin is further defined as a mutant MspA porin.

54.实施方式53的系统，其中所述突变MspA孔蛋白包含54. The system of embodiment 53, wherein the mutant MspA porin comprises

55.实施方式53的系统，其中所述突变MspA孔蛋白包含55. The system of embodiment 53, wherein the mutant MspA porin comprises

56.实施方式45至51之一中的系统，其中所述Msp孔蛋白还被确定为这样的Msp孔蛋白，其包含56. The system of any one of embodiments 45 to 51, wherein the Msp porin is further defined as an Msp porin comprising

57.实施方式52的系统，其中所述Msp是野生型MspA旁系同源物或同系物孔蛋白。57. The system of embodiment 52, wherein the Msp is a wild-type MspA paralog or homolog porin.

58.实施方式57的系统，其中所述野生型MspA旁系同源物或同系物孔蛋白选自MspA/Msmeg0965、MspB/Msmeg0520、MspC/Msmeg5483、MspD/Msmeg6057、MppA、PorM1、PorM2、PorM1、Mmcs4296、Mmcs4297、Mmcs3857、Mmcs4382、Mmcs4383、Mjls3843、Mjls3857、Mjls3931、Mjls4674、Mjls4675、Mjls4677、Map3123c、Mav3943、Mvan1836、Mvan4117、Mvan4839、Mvan4840、Mvan5016、Mvan5017、Mvan5768、MUL_2391、Mflv1734、Mflv1735、Mflv2295、Mflv1891、MCH4691c、MCH4689c、MCH4690c、MAB1080、MAB1081、MAB2800、RHA1ro08561、RHA1ro04074和RHA1ro03127。58. The system of embodiment 57, wherein the wild-type MspA paralog or homolog porin is selected from the group consisting of MspA/Msmeg0965, MspB/Msmeg0520, MspC/Msmeg5483, MspD/Msmeg6057, MppA, PorM1, PorM2, PorM1, Mmcs4296, Mmcs4297, Mmcs3857, Mmcs4382, Mmcs4383, Mjls3843, Mjls3857, Mjls3931, Mjls4674, Mjls4675, Mjls4676, 677, Map3123c, Mav3943, Mvan1836, Mvan4117, Mvan4839, Mvan4840, Mvan5016, Mvan5017, Mvan5768, MUL_2391, Mflv1734, Mflv17 35. Mflv2295, Mflv1891, MCH4691c, MCH4689c, MCH4690c, MAB1080, MAB1081, MAB2800, RHA1ro08561, RHA1ro04074 and RHA1ro03127.

59.实施方式45至51之一中的系统，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：59. The system of any one of embodiments 45 to 51, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

60.实施方式45至51之一中的方法，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：60. The method of any one of embodiments 45 to 51, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

61.实施方式45至60之一中的系统，其中所述Msp孔蛋白还包括分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下转位入或穿过通道的平均转位速度。61. The system of any one of embodiments 45 to 60, wherein the Msp porin further comprises a molecular motor, wherein the molecular motor is capable of moving the analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte into or through the channel in the absence of the molecular motor.

62.实施方式45至61之一中的系统，其还包含膜片钳放大器。62. The system of any one of embodiments 45 to 61, further comprising a patch clamp amplifier.

63.实施方式45至62之一中的系统，其还包含数据获取装置。63. The system of any one of embodiments 45 to 62, further comprising a data acquisition device.

64.实施方式45至63之一中的系统，其还包含与第一液体介质、第二液体介质或两者连通的一个或多个温度调节装置。64. The system of any one of embodiments 45 to 63, further comprising one or more temperature regulating devices in communication with the first liquid medium, the second liquid medium, or both.

65.包含具有界定通道的前厅和缢缩区的耻垢分枝杆菌孔蛋白(Msp)的系统，其中所述通道位于第一液体介质与第二液体介质之间的脂双层中，并且其中第一和第二液体介质之间的液体连通的唯一的点存在于通道中。65. A system comprising a Mycobacterium smegmatis porin (Msp) having a vestibule and a constriction region defining a channel, wherein the channel is located in a lipid bilayer between a first liquid medium and a second liquid medium, and wherein the only point of liquid communication between the first and second liquid media occurs in the channel.

66.突变的耻垢分枝杆菌孔蛋白A(MspA)孔蛋白，其包含66. A mutant Mycobacterium smegmatis porin A (MspA) porin comprising

67.实施方式66的突变MspA孔蛋白，其包含位置93和90上的突变。67. The mutant MspA porin of embodiment 66, comprising mutations at positions 93 and 90.

68.实施方式66的突变MspA孔蛋白，其包含位置93和91上的突变。68. The mutant MspA porin of embodiment 66, comprising mutations at positions 93 and 91.

69.实施方式66的突变MspA孔蛋白，其包含位置93、91和90上的突变。69. The mutant MspA porin of embodiment 66, comprising mutations at positions 93, 91 and 90.

70.实施方式66的突变MspA孔蛋白，其中突变MspA孔蛋白的缢缩区的直径小于野生型MspA孔蛋白的缢缩区的直径。70. The mutant MspA porin of embodiment 66, wherein the diameter of the constriction zone of the mutant MspA porin is smaller than the diameter of the constriction zone of the wild-type MspA porin.

71.实施方式66至70之一中的突变MspA孔蛋白，其中所述MspA孔蛋白在前厅或缢缩区中具有突变，所述突变允许分析物以这样的平均转位速度转位穿过突变体的通道，所述平均转位速度小于分析物转位穿过野生型Msp孔蛋白的通道的平均转位速度。71. The mutant MspA porin of any one of embodiments 66 to 70, wherein the MspA porin has a mutation in the vestibule or constriction that allows an analyte to translocate through the channel of the mutant at an average translocation velocity that is less than the average translocation velocity of the analyte through the channel of the wild-type Msp porin.

72.实施方式66至70之一中的突变MspA孔蛋白，其中所述MspA孔蛋白在前厅或缢缩区中具有突变，所述突变允许分析物以这样的平均转位速度通过电泳转位通过突变体的通道，所述平均转位速度小于分析物通过电泳转位通过野生型Msp孔蛋白的通道的平均转位速度。72. The mutant MspA porin of any one of embodiments 66 to 70, wherein the MspA porin has a mutation in the vestibule or constriction that allows an analyte to be electrophoretically translocated through the channel of the mutant at an average translocation velocity that is less than the average translocation velocity of the analyte to be electrophoretically translocated through the channel of the wild-type Msp porin.

73.实施方式66至72之一中的突变MspA孔蛋白，其中所述MspA孔蛋白在前厅或缢缩区具有突变，所述突变允许分析物以小于0.5nm/μs的平均转位速度转位通过通道。73. The mutant MspA porin of any one of embodiments 66 to 72, wherein the MspA porin has a mutation in the vestibule or constriction that allows translocation of an analyte through the channel at an average translocation velocity of less than 0.5 nm/μs.

74.实施方式71至73之一中的突变MspA孔蛋白，其中分析物选自核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、生物战剂、污染物、纳米级物体或其组合或聚簇。74. The mutant MspA porin of any one of embodiments 71 to 73, wherein the analyte is selected from the group consisting of nucleotides, nucleic acids, amino acids, peptides, proteins, polymers, drugs, ions, biological warfare agents, pollutants, nanoscale objects, or combinations or clusters thereof.

75.实施方式74的突变MspA孔蛋白，其中分析物被进一步确定为核酸。75. The mutant MspA porin of embodiment 74, wherein the analyte is further defined as a nucleic acid.

76.实施方式75的突变MspA孔蛋白，其中核酸以小于1个核苷酸/μs的平均转位速度转位通过通道。76. The mutant MspA porin of embodiment 75, wherein the nucleic acid translocates through the channel at an average translocation velocity of less than 1 nucleotide/μs.

77.实施方式75或实施方式76的突变MspA孔蛋白，其中核酸被进一步确定为ssDNA、dsDNA、RNA或其组合。77. The mutant MspA porin of embodiment 75 or embodiment 76, wherein the nucleic acid is further defined as ssDNA, dsDNA, RNA, or a combination thereof.

78.实施方式66至77之一中的突变MspA孔蛋白，其中所述突变MspA孔蛋白包含为相同或不同的2至15个Msp单体。78. The mutant MspA porin of any one of embodiments 66 to 77, wherein the mutant MspA porin comprises 2 to 15 Msp monomers that are the same or different.

79.实施方式78的突变MspA孔蛋白，其中所述突变MspA孔蛋白包含为相同或不同的7至9个Msp单体。79. The mutant MspA porin of embodiment 78, wherein the mutant MspA porin comprises 7 to 9 Msp monomers that are the same or different.

80.实施方式66至79之一中的突变MspA孔蛋白，其还包含选自野生型MspA单体、第二突变MspA单体、野生型MspA旁系同源物或同系物单体以及突变MspA旁系同源物或同系物单体中的至少第二单体，其中所述第二突变MspA单体可以与第一突变MspA单体相同或不同。80. The mutant MspA porin of any one of embodiments 66 to 79, further comprising at least a second monomer selected from a wild-type MspA monomer, a second mutant MspA monomer, a wild-type MspA paralog or homolog monomer, and a mutant MspA paralog or homolog monomer, wherein the second mutant MspA monomer may be the same as or different from the first mutant MspA monomer.

81.实施方式80的突变MspA孔蛋白，其中所述第二单体是野生型MspA旁系同源物或同系物单体。81. The mutant MspA porin of embodiment 80, wherein the second monomer is a wild-type MspA paralog or homolog monomer.

82.实施方式81的突变MspA孔蛋白，其中野生型MspA旁系同源物或同系物单体是野生型MspB单体。82. The mutant MspA porin of embodiment 81, wherein the wild-type MspA paralog or homolog monomer is a wild-type MspB monomer.

83.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体还在下列氨基酸位置：88、105、108、118、134或139的任何位置上包含一个或多个突变。83. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer further comprises one or more mutations at any of the following amino acid positions: 88, 105, 108, 118, 134, or 139.

84.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含一个或多个下列突变：L88W、D90K/N/Q/R、D91N/Q、D93N、I105W、N108W、D118R、D134R或E139K。84. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises one or more of the following mutations: L88W, D90K/N/Q/R, D91N/Q, D93N, I105W, N108W, D118R, D134R, or E139K.

85.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：D90N/D91N/D93N。85. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: D90N/D91N/D93N.

86.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：D90N/D91N/D93N/D118R/D134R/E139K。86. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: D90N/D91N/D93N/D118R/D134R/E139K.

87.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：D90Q/D91Q/D93N。87. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: D90Q/D91Q/D93N.

88.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：D90Q/D91Q/D93N/D118R/D134R/E139K。88. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: D90Q/D91Q/D93N/D118R/D134R/E139K.

89.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：D90(K,R)/D91N/D93N。89. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: D90(K,R)/D91N/D93N.

90.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：(L88，I105)W/D91Q/D93N。90. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: (L88, I105)W/D91Q/D93N.

91.实施方式66至82之一中的突变MspA孔蛋白，其中所述第一突变MspA单体包含下列突变：I105W/N108W。91. The mutant MspA porin of any one of embodiments 66 to 82, wherein the first mutant MspA monomer comprises the following mutations: I105W/N108W.

92.实施方式66至91之一中的突变MspA孔蛋白，其还包含一个或多个周质环缺失。92. The mutant MspA porin of any one of embodiments 66 to 91, further comprising a deletion of one or more periplasmic loops.

93.实施方式66至92之一中的突变MspA孔蛋白，其通过通道的电导高于通过其相应野生型MspA孔蛋白的通道的电导。93. The mutant MspA porin of any one of embodiments 66 to 92, wherein the conductance through the channel is higher than the conductance through the channel of its corresponding wild-type MspA porin.

94.实施方式66至93之一中的突变MspA孔蛋白，其还包括分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下转位入或穿过通道的平均转位速度。94. The mutant MspA porin of any one of embodiments 66 to 93, further comprising a molecular motor, wherein the molecular motor is capable of moving an analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte into or through the channel in the absence of the molecular motor.

95.实施方式66至94之一中的突变MspA孔蛋白，其还包括分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下通过电泳转位入或穿过通道的平均转位速度。95. The mutant MspA porin of any one of embodiments 66 to 94, further comprising a molecular motor, wherein the molecular motor is capable of moving an analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte into or through the channel by electrophoresis in the absence of the molecular motor.

96.实施方式94或实施方式95的突变MspA孔蛋白，其中所述分子发动机是酶。96. The mutant MspA porin of embodiment 94 or embodiment 95, wherein the molecular motor is an enzyme.

97.实施方式96的突变MspA孔蛋白，其中所述酶是聚合酶、外切核酸酶或Klenow片段。97. The mutant MspA porin of embodiment 96, wherein the enzyme is a polymerase, an exonuclease, or a Klenow fragment.

98.一种突变的耻垢分枝杆菌孔蛋白A(MspA)孔蛋白，其包含98. A mutant Mycobacterium smegmatis porin A (MspA) porin comprising

99.实施方式98的突变MspA孔蛋白，其中所述突变MspA孔蛋白的缢缩区的直径小于其相应野生型MspA孔蛋白的缢缩区的直径。99. The mutant MspA porin of embodiment 98, wherein the diameter of the constriction zone of the mutant MspA porin is smaller than the diameter of the constriction zone of its corresponding wild-type MspA porin.

100.实施方式98或实施方式99的突变MspA孔蛋白，其在前厅或缢缩区具有突变，所述突变允许分析物以这样的平均转位速度穿过突变体的通道，所述平均转位速度小于分析物转位通过其相应野生型MspA孔蛋白的通道的平均转位速度。100. The mutant MspA porin of embodiment 98 or embodiment 99, having a mutation in the vestibule or constriction region, wherein the mutation allows an analyte to translocate through the channel of the mutant at an average translocation velocity that is less than the average translocation velocity of the analyte through the channel of its corresponding wild-type MspA porin.

101.实施方式98至100之一中的突变MspA孔蛋白，其在前厅或缢缩区中具有突变，所述突变允许分析物以这样的平均转位速度电泳转位通过突变体的通道，所述平均转位速度小于分析物通过电泳转位通过其相应野生型MspA孔蛋白的通道的平均转位速度。101. The mutant MspA porin of any one of embodiments 98 to 100, comprising a mutation in the vestibule or constriction that allows electrophoretic translocation of an analyte through the channel of the mutant at an average translocation velocity that is less than the average translocation velocity of the analyte through the channel of its corresponding wild-type MspA porin.

102.实施方式98至101之一中的突变MspA孔蛋白，其在前厅或缢缩区中具有允许分析物以小于0.5nm/μs的平均转位速度穿过通道的突变。102. The mutant MspA porin of any one of embodiments 98 to 101, having a mutation in the vestibule or constriction that allows the analyte to traverse the channel at an average translocation velocity of less than 0.5 nm/μs.

103.实施方式101至102之一中的突变MspA孔蛋白，其中所述分析物被进一步确定为核苷酸、核酸、氨基酸、肽、蛋白质、聚合物、药物、离子、生物战剂、污染物、纳米级物体或其组合或聚簇。103. The mutant MspA porin of any one of embodiments 101 to 102, wherein the analyte is further defined as a nucleotide, a nucleic acid, an amino acid, a peptide, a protein, a polymer, a drug, an ion, a biowarfare agent, a pollutant, a nanoscale object, or a combination or cluster thereof.

104.实施方式103的突变MspA孔蛋白，其中所述分析物被进一步确定为核酸。104. The mutant MspA porin of embodiment 103, wherein the analyte is further defined as a nucleic acid.

105.实施方式104的突变MspA孔蛋白，其中所述核酸以小于1个核苷酸/μs的平均转位速度转位通过通道。105. The mutant MspA porin of embodiment 104, wherein the nucleic acid translocates through the channel at an average translocation velocity of less than 1 nucleotide/μs.

106.实施方式104或实施方式105的突变MspA孔蛋白，其中所述核酸被进一步确定为ssDNA、dsDNA、RNA或其组合。106. The mutant MspA porin of embodiment 104 or embodiment 105, wherein the nucleic acid is further defined as ssDNA, dsDNA, RNA, or a combination thereof.

107.实施方式98至106之一中的突变MspA孔蛋白，其中所述突变MspA孔蛋白还包含至少一个Msp单体。107. The mutant MspA porin of any one of embodiments 98 to 106, wherein the mutant MspA porin further comprises at least one Msp monomer.

108.实施方式107的突变MspA孔蛋白，其中所述Msp单体选自野生型MspA单体、突变MspA单体、野生型MspA旁系同源物或同系物或者第二突变MspA旁系同源物或同系物单体。108. The mutant MspA porin of embodiment 107, wherein the Msp monomer is selected from a wild-type MspA monomer, a mutant MspA monomer, a wild-type MspA paralog or homolog, or a second mutant MspA paralog or homolog monomer.

109.实施方式98至108之一中的突变MspA孔蛋白，其还包含一个或多个周质环缺失。109. The mutant MspA porin of any one of embodiments 98 to 108, further comprising a deletion of one or more periplasmic loops.

110.实施方式98至109之一中的突变MspA孔蛋白，其在前厅或缢缩区中包含突变。110. The mutant MspA porin of any one of embodiments 98 to 109, comprising a mutation in the vestibule or constriction.

111.实施方式98至110之一中的突变MspA孔蛋白，其通过通道的电导高于通过其相应野生型MspA孔蛋白的通道的电导。111. The mutant MspA porin of any one of embodiments 98 to 110, wherein the conductance through the channel is higher than the conductance through the channel of its corresponding wild-type MspA porin.

112.实施方式98至111之一中的突变MspA孔蛋白，其还包含分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下转位通过或穿过通道时的平均转位速度。112. The mutant MspA porin of any one of embodiments 98 to 111, further comprising a molecular motor, wherein the molecular motor is capable of moving an analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte when translocated through or through the channel in the absence of the molecular motor.

113.实施方式98至112之一中的突变MspA孔蛋白，其还包含分子发动机，其中所述分子发动机能够以这样的平均转位速度将分析物移入或穿过通道，所述平均转位速度小于分析物在分子发动机不存在的情况下通过电泳转位入或穿过通道的平均转位速度。113. The mutant MspA porin of any one of embodiments 98 to 112, further comprising a molecular motor, wherein the molecular motor is capable of moving an analyte into or through the channel at an average translocation velocity that is less than the average translocation velocity of the analyte into or through the channel by electrophoresis in the absence of the molecular motor.

114.实施方式112或实施方式113的突变MspA孔蛋白，其中所述分子发动机是酶。114. The mutant MspA porin of embodiment 112 or embodiment 113, wherein the molecular motor is an enzyme.

115.实施方式114的突变MspA孔蛋白，其中所述酶是聚合酶、外切核酸酶或Klenow片段。115. The mutant MspA porin of embodiment 114, wherein the enzyme is a polymerase, an exonuclease, or a Klenow fragment.

116.一种突变的耻垢分枝杆菌孔蛋白A(MspA)旁系同源物或同系物，其包含116. A mutant Mycobacterium smegmatis porin A (MspA) paralog or homolog comprising

117.实施方式116的突变MspA旁系同源物或同系物，其还包含至少第一突变MspA旁系同源物或同系物单体。117. The mutant MspA paralog or homolog of embodiment 116, further comprising at least a first mutant MspA paralog or homolog monomer.

118.制备包含至少一个突变MspA单体的突变耻垢分枝杆菌孔蛋白A(MspA)孔蛋白的方法，所述方法包括在位置93和位置90、位置91、或位置90及91上修饰野生型MspA单体。118. A method of making a mutant Mycobacterium smegmatis Porin A (MspA) porin comprising at least one mutant MspA monomer, the method comprising modifying a wild-type MspA monomer at positions 93 and 90, position 91, or positions 90 and 91.

119.制备具有界定通道的前厅和缢缩区的突变耻垢分枝杆菌孔蛋白A(MspA)孔蛋白的方法，包括在野生型MspA旁系同源物或同系物单体的前厅或缢缩区中缺失、添加或置换任何氨基酸，以便所得的突变MspA孔蛋白能够在施加电场后使分析物转位通过通道。119. A method for preparing a mutant Mycobacterium smegmatis porin A (MspA) porin having a vestibule and a constriction region defining a channel, comprising deleting, adding or substituting any amino acid in the vestibule or constriction region of a wild-type MspA paralog or homolog monomer such that the resulting mutant MspA porin is capable of translocating an analyte through the channel upon application of an electric field.

120.一种方法，包括在不应用电场的情况下使分析物转位通过耻垢分枝杆菌孔蛋白(Msp)孔蛋白的通道。120. A method comprising translocating an analyte through a channel of a Mycobacterium smegmatis porin (Msp) porin in the absence of an applied electric field.

121.实施方式120的方法，其中所述Msp孔蛋白还包括分子发动机。121. The method of embodiment 120, wherein the Msp porin further comprises a molecular motor.

122.实施方式120或实施方式121的方法，其中所述Msp孔蛋白选自野生型MspA孔蛋白、突变MspA孔蛋白、野生型MspA旁系同源物或同系物孔蛋白以及突变MspA旁系同源物或同系物孔蛋白。122. The method of embodiment 120 or embodiment 121, wherein the Msp porin is selected from the group consisting of a wild-type MspA porin, a mutant MspA porin, a wild-type MspA paralog or homolog porin, and a mutant MspA paralog or homolog porin.

123.实施方式120或实施方式121的方法，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含:123. The method of embodiment 120 or embodiment 121, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

124.实施方式120或实施方式121的方法，其中所述Msp孔蛋白由编码单链Msp孔蛋白的核酸序列编码，其中所述核酸序列包含：124. The method of embodiment 120 or embodiment 121, wherein the Msp porin is encoded by a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

125.实施方式120至124之一中的方法，其中所述分析物包含光学珠粒或磁性珠粒。125. The method of any one of embodiments 120 to 124, wherein the analyte comprises optical beads or magnetic beads.

126.编码单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：126. A nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

127.实施方式126的核酸序列，其中所述第一和第二Msp单体序列独立地选自野生型MspA单体、突变MspA单体、野生型MspA旁系同源物或同系物单体以及突变MspA旁系同源物或同系物单体。127. The nucleic acid sequence of embodiment 126, wherein the first and second Msp monomer sequences are independently selected from a wild-type MspA monomer, a mutant MspA monomer, a wild-type MspA paralog or homolog monomer, and a mutant MspA paralog or homolog monomer.

128.实施方式126或实施方式127的核酸序列，其中所述第一Msp单体序列包含野生型MspA单体或其突变体。128. The nucleic acid sequence of embodiment 126 or embodiment 127, wherein the first Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof.

129.实施方式128的核酸序列，其中所述Msp单体序列包含突变MspA单体。129. The nucleic acid sequence of embodiment 128, wherein the Msp monomer sequence comprises a mutant MspA monomer.

130.实施方式128的核酸序列，其中所述第一Msp单体序列包含一个或多个选自下述的突变：氨基酸138上的A至P置换、氨基酸139上的E至A或K置换、氨基酸90上的D至K或R或Q置换、氨基酸91上的D至N或Q置换、氨基酸93上的D至N置换、氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸108上的N至W置换、氨基酸118上的D至R置换和氨基酸134上的D至R置换。130. The nucleic acid sequence of embodiment 128, wherein the first Msp monomer sequence comprises one or more mutations selected from the following: an A to P substitution on amino acid 138, an E to A or K substitution on amino acid 139, a D to K or R or Q substitution on amino acid 90, a D to N or Q substitution on amino acid 91, a D to N substitution on amino acid 93, an L to W substitution on amino acid 88, an I to W substitution on amino acid 105, an N to W substitution on amino acid 108, a D to R substitution on amino acid 118, and a D to R substitution on amino acid 134.

131.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸138上的A至P置换、氨基酸139上的E至A置换或其组合。131. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises an A to P substitution at amino acid 138, an E to A substitution at amino acid 139, or a combination thereof.

132.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸90上的D至K或R置换、氨基酸91上的D至N置换、氨基酸93上的D至N置换或其任何组合。132. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises a D to K or R substitution at amino acid 90, a D to N substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

133.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸90上的D至Q置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合。133. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises a D to Q substitution at amino acid 90, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

134.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合。134. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises an L to W substitution at amino acid 88, an I to W substitution at amino acid 105, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

135.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸105上的I至W置换、氨基酸108上的N至W置换或其组合。135. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises an I to W substitution at amino acid 105, an N to W substitution at amino acid 108, or a combination thereof.

136.实施方式130的核酸序列，其中所述突变MspA单体包含氨基酸118上的D至R置换、氨基酸139上的E至K置换、氨基酸134上的D至R置换或其任何组合。136. The nucleic acid sequence of embodiment 130, wherein the mutant MspA monomer comprises a D to R substitution at amino acid 118, an E to K substitution at amino acid 139, a D to R substitution at amino acid 134, or any combination thereof.

137.实施方式126的核酸序列，其中所述第一MspA单体序列包含SEQ ID NO:1。137. The nucleic acid sequence of embodiment 126, wherein the first MspA monomer sequence comprises SEQ ID NO: 1.

138.实施方式126的核酸序列，其中所述第二MspA单体序列包含野生型MspA旁系同源物或其突变体，其中所述旁系同源物或其突变体是野生型MspB单体或其突变体。138. The nucleic acid sequence of embodiment 126, wherein the second MspA monomer sequence comprises a wild-type MspA paralog or a mutant thereof, wherein the paralog or mutant thereof is a wild-type MspB monomer or a mutant thereof.

139.实施方式138的核酸序列，其中所述第二MspA单体序列包含SEQ ID NO:2。139. The nucleic acid sequence of embodiment 138, wherein the second MspA monomer sequence comprises SEQ ID NO:2.

140.实施方式138的核酸序列，其中所述第二MspA单体序列包含突变MspB单体。140. The nucleic acid sequence of embodiment 138, wherein the second MspA monomer sequence comprises a mutant MspB monomer.

141.实施方式126的核酸序列，其中所述第一MspA单体序列包含野生型MspA单体或其突变体，并且所述第二Msp单体包含野生型MspB单体或其突变体。141. The nucleic acid sequence of embodiment 126, wherein the first MspA monomer sequence comprises a wild-type MspA monomer or a mutant thereof, and the second Msp monomer comprises a wild-type MspB monomer or a mutant thereof.

142.实施方式126的核酸序列，其中所述第一MspA单体序列包含SEQ ID NO:1并且所述第二Msp单体包含SEQ ID NO:2。142. The nucleic acid sequence of embodiment 126, wherein the first MspA monomer sequence comprises SEQ ID NO: 1 and the second Msp monomer comprises SEQ ID NO: 2.

143.实施方式126至142中任一项的核酸序列，其中所述氨基酸连接体序列包含10至20个氨基酸。143. The nucleic acid sequence of any one of embodiments 126 to 142, wherein the amino acid linker sequence comprises 10 to 20 amino acids.

144.实施方式143的核酸序列，其中所述氨基酸连接体序列包含15个氨基酸。144. The nucleic acid sequence of embodiment 143, wherein the amino acid linker sequence comprises 15 amino acids.

145.实施方式144的核酸序列，其中所述氨基酸连接体序列包含(GGGGS)₃(SEQ IDNO:3)肽序列。145. The nucleic acid sequence of embodiment 144, wherein the amino acid linker sequence comprises a (GGGGS) ₃ (SEQ ID NO: 3) peptide sequence.

146.由实施方式126至145之一中的核酸序列编码的多肽。146. A polypeptide encoded by the nucleic acid sequence of one of embodiments 126 to 145.

147.包含实施方式126至145之一中的核酸序列的载体。147. A vector comprising the nucleic acid sequence of one of embodiments 126 to 145.

148.实施方式147的载体，其中所述载体还包含启动子序列。148. The vector of embodiment 147, wherein the vector further comprises a promoter sequence.

149.实施方式148的载体，其中所述启动子包括组成型启动子。149. The vector of embodiment 148, wherein the promoter comprises a constitutive promoter.

150.实施方式149的载体，其中所述组成型启动子包括p_smyc启动子。150. The vector of embodiment 149, wherein the constitutive promoter comprises a _psmyc promoter.

151.实施方式148的载体，其中所述启动子包括诱导型启动子。151. The vector of embodiment 148, wherein the promoter comprises an inducible promoter.

152.实施方式151的载体，其中所述诱导型启动子包括乙酰胺诱导型启动子。152. The vector of embodiment 151, wherein the inducible promoter comprises an acetamide-inducible promoter.

153.编码单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：153. A nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

154.实施方式153的核酸序列，其中所述第一和第二Msp单体序列独立地选自野生型Msp单体、突变Msp单体、野生型MspA旁系同源物或同系物单体以及突变MspA旁系同源物或同系物单体。154. The nucleic acid sequence of embodiment 153, wherein the first and second Msp monomer sequences are independently selected from a wild-type Msp monomer, a mutant Msp monomer, a wild-type MspA paralog or homolog monomer, and a mutant MspA paralog or homolog monomer.

155.实施方式153的核酸序列，其中每一个Msp单体包含野生型MspA单体或其突变体。155. The nucleic acid sequence of embodiment 153, wherein each Msp monomer comprises a wild-type MspA monomer or a mutant thereof.

156.实施方式153的核酸序列，其中至少一个Msp单体包括野生型MspA单体或其突变体。156. The nucleic acid sequence of embodiment 153, wherein at least one Msp monomer comprises a wild-type MspA monomer or a mutant thereof.

157.实施方式155或实施方式156的核酸序列，其中至少一个Msp单体包含突变MspA单体。157. The nucleic acid sequence of embodiment 155 or embodiment 156, wherein at least one Msp monomer comprises a mutant MspA monomer.

158.实施方式157的核酸序列，其中所述突变Msp单体序列包含一个或多个选自如下的突变：氨基酸138上的A至P置换、氨基酸139上的E至A或K置换、氨基酸90上的D至K或R或Q置换、氨基酸91上的D至N或Q置换、氨基酸93上的D至N置换、氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸108上的N至W置换、氨基酸118上的D至R置换和氨基酸134上的D至R置换。158. The nucleic acid sequence of embodiment 157, wherein the mutant Msp monomer sequence comprises one or more mutations selected from the following: an A to P substitution on amino acid 138, an E to A or K substitution on amino acid 139, a D to K or R or Q substitution on amino acid 90, a D to N or Q substitution on amino acid 91, a D to N substitution on amino acid 93, an L to W substitution on amino acid 88, an I to W substitution on amino acid 105, an N to W substitution on amino acid 108, a D to R substitution on amino acid 118, and a D to R substitution on amino acid 134.

159.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸138上的A至P置换、氨基酸139上的E至A置换或其组合。159. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises an A to P substitution at amino acid 138, an E to A substitution at amino acid 139, or a combination thereof.

160.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸90上的D至K或R置换、氨基酸91上的D至N置换、氨基酸93上的D至N置换或其任何组合。160. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises a D to K or R substitution at amino acid 90, a D to N substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

161.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸90上的D至Q置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合。161. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises a D to Q substitution at amino acid 90, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

162.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸88上的L至W置换、氨基酸105上的I至W置换、氨基酸91上的D至Q置换、氨基酸93上的D至N置换或其任何组合。162. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises an L to W substitution at amino acid 88, an I to W substitution at amino acid 105, a D to Q substitution at amino acid 91, a D to N substitution at amino acid 93, or any combination thereof.

163.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸105上的I至W置换、氨基酸108上的N至W置换或其组合。163. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises an I to W substitution at amino acid 105, an N to W substitution at amino acid 108, or a combination thereof.

164.实施方式158的核酸序列，其中所述突变MspA单体包含氨基酸118上的D至R置换、氨基酸139上的E至K置换、氨基酸134上的D至R置换或其任何组合。164. The nucleic acid sequence of embodiment 158, wherein the mutant MspA monomer comprises a D to R substitution at amino acid 118, an E to K substitution at amino acid 139, a D to R substitution at amino acid 134, or any combination thereof.

165.实施方式153的核酸序列，其中每一个Msp单体序列包含SEQ ID NO:1。165. The nucleic acid sequence of embodiment 153, wherein each Msp monomer sequence comprises SEQ ID NO:1.

166.实施方式153的核酸序列，其中至少一个Msp单体序列包含SEQ ID NO:1。166. The nucleic acid sequence of embodiment 153, wherein at least one Msp monomer sequence comprises SEQ ID NO:1.

167.实施方式153的核酸序列，其中至少一个Msp单体序列包含野生型MspA旁系同源物或其突变体，其中所述MspA旁系同源物或其突变体是野生型MspB单体或其突变体。167. The nucleic acid sequence of embodiment 153, wherein at least one Msp monomer sequence comprises a wild-type MspA paralog or a mutant thereof, wherein the MspA paralog or a mutant thereof is a wild-type MspB monomer or a mutant thereof.

168.实施方式166或实施方式167的核酸序列，其中至少一个Msp单体序列包含SEQID NO:2。168. The nucleic acid sequence of embodiment 166 or embodiment 167, wherein at least one Msp monomer sequence comprises SEQ ID NO:2.

169.实施方式166至168之一中的核酸序列，其中至少一个Msp单体序列包含突变MspB单体。169. The nucleic acid sequence of any one of embodiments 166 to 168, wherein at least one Msp monomer sequence comprises a mutant MspB monomer.

170.实施方式153的核酸序列，其中至少一个Msp单体序列包含野生型MspA单体或其突变体，以及至少一个Msp单体序列包含野生型MspB单体或其突变体。170. The nucleic acid sequence of embodiment 153, wherein at least one Msp monomer sequence comprises a wild-type MspA monomer or a mutant thereof, and at least one Msp monomer sequence comprises a wild-type MspB monomer or a mutant thereof.

171.实施方式153的核酸序列，其中至少一个Msp单体序列包含SEQ ID NO:1，并且至少一个Msp单体序列包含SEQ ID NO:2。171. The nucleic acid sequence of embodiment 153, wherein at least one Msp monomer sequence comprises SEQ ID NO:1, and at least one Msp monomer sequence comprises SEQ ID NO:2.

172.实施方式153至171之一中的核酸序列，其中所述氨基酸连接体序列包含10至20个氨基酸。172. The nucleic acid sequence of any one of embodiments 153 to 171, wherein the amino acid linker sequence comprises 10 to 20 amino acids.

173.实施方式172的核酸序列，其中所述氨基酸连接体序列包含15个氨基酸。173. The nucleic acid sequence of embodiment 172, wherein the amino acid linker sequence comprises 15 amino acids.

174.实施方式173的核酸序列，其中所述氨基酸连接体序列包含(GGGGS)₃(SEQ IDNO:3)肽序列。174. The nucleic acid sequence of embodiment 173, wherein the amino acid linker sequence comprises a (GGGGS) ₃ (SEQ IDNO: 3) peptide sequence.

175.由实施方式66至115或153至174的一个中的核酸序列编码的多肽。175. A polypeptide encoded by the nucleic acid sequence of one of embodiments 66 to 115 or 153 to 174.

176.包含实施方式66至115或153至174的一个中的核酸序列的载体。176. A vector comprising the nucleic acid sequence of one of embodiments 66 to 115 or 153 to 174.

177.实施方式176的载体，其中所述载体还包含启动子序列。177. The vector of embodiment 176, wherein the vector further comprises a promoter sequence.

178.实施方式177的载体，其中所述启动子包括组成型启动子。178. The vector of embodiment 177, wherein the promoter comprises a constitutive promoter.

179.实施方式178的载体，其中所述组成型启动子包括p_smyc启动子。179. The vector of embodiment 178, wherein the constitutive promoter comprises a _psmyc promoter.

180.实施方式179的载体，其中所述启动子包括诱导型启动子。180. The vector of embodiment 179, wherein the promoter comprises an inducible promoter.

181.实施方式180的启动子，其中所述诱导型启动子包括乙酰胺诱导型启动子。181. The promoter of embodiment 180, wherein the inducible promoter comprises an acetamide-inducible promoter.

182.被实施方式176至181之一中的载体转染的培养细胞或其后代，其中所述细胞能够表达Msp孔蛋白或Msp孔蛋白单体.182. A cultured cell or progeny thereof transfected with a vector according to any one of embodiments 176 to 181, wherein the cell is capable of expressing the Msp porin or Msp porin monomer.

183.包含实施方式176至181之一的载体的耻垢分枝杆菌菌株。183. A Mycobacterium smegmatis strain comprising the vector of one of embodiments 176 to 181.

184.能够诱导型表达Msp单体的突变细菌菌株，所述细菌菌株包含：184. A mutant bacterial strain capable of inducibly expressing an Msp monomer, the bacterial strain comprising:

(a)野生型MspA的缺失；(a) Deletion of wild-type MspA;

(b)野生型MspC的缺失；(b) deletion of wild-type MspC;

(c)野生型MspD的缺失；和(c) deletion of wild-type MspD; and

(d)包含有效地连接Msp单体核酸序列的诱导型启动子的载体。(d) A vector comprising an inducible promoter operably linked to an Msp monomer nucleic acid sequence.

185.实施方式184的细菌菌株，其中所述细菌菌株包括耻垢分枝杆菌菌株ML16。185. The bacterial strain of embodiment 184, wherein the bacterial strain comprises Mycobacterium smegmatis strain ML16.

186.实施方式184的细菌菌株，其中所述Msp核酸编码野生型MspA单体或者野生型MspA旁系同源物或同系物单体。186. The bacterial strain of embodiment 184, wherein the Msp nucleic acid encodes a wild-type MspA monomer or a wild-type MspA paralog or homolog monomer.

187.实施方式184的细菌菌株，其中所述Msp核酸编码选自下组的Msp单体：野生型MspA单体、野生型MspC单体和野生型MspD单体。187. The bacterial strain of embodiment 184, wherein the Msp nucleic acid encodes an Msp monomer selected from the group consisting of a wild-type MspA monomer, a wild-type MspC monomer, and a wild-type MspD monomer.

188.实施方式187的细菌菌株，其中所述Msp核酸编码野生型MspA单体。188. The bacterial strain of embodiment 187, wherein the Msp nucleic acid encodes a wild-type MspA monomer.

189.实施方式184的细菌菌株，其中所述诱导型启动子包括乙酰胺诱导型启动子。189. The bacterial strain of embodiment 184, wherein the inducible promoter comprises an acetamide-inducible promoter.

190.实施方式184的细菌菌株，其还包含野生型MspB的缺失。190. The bacterial strain of embodiment 184, further comprising a deletion of wild-type MspB.

191.实施方式190的细菌菌株，其还包含载体，所述载体包含有效地连接于编码Msp孔蛋白或单体的核酸序列的组成型启动子。191. The bacterial strain of embodiment 190, further comprising a vector comprising a constitutive promoter operably linked to a nucleic acid sequence encoding an Msp porin or monomer.

192.实施方式191的细菌菌株，其中所述Msp是野生型MspA孔蛋白或单体或者野生型MspA旁系同源物或同系物孔蛋白或单体。192. The bacterial strain of embodiment 191, wherein the Msp is a wild-type MspA porin or monomer or a wild-type MspA paralog or homolog porin or monomer.

193.实施方式191的细菌菌株，其中所述Msp孔蛋白或单体选自野生型MspA孔蛋白或单体、野生型MspB孔蛋白或单体、野生型MspC孔蛋白或单体以及野生型MspD孔蛋白或单体。193. The bacterial strain of embodiment 191, wherein the Msp porin or monomer is selected from the group consisting of a wild-type MspA porin or monomer, a wild-type MspB porin or monomer, a wild-type MspC porin or monomer, and a wild-type MspD porin or monomer.

194.实施方式193的细菌菌株，其中所述Msp孔蛋白或单体是野生型MspA孔蛋白或单体。194. The bacterial strain of embodiment 193, wherein the Msp porin or monomer is a wild-type MspA porin or monomer.

195.实施方式190的细菌菌株，其还包含含有编码单链Msp孔蛋白的核酸的载体，其中所述核酸包含:195. The bacterial strain of embodiment 190, further comprising a vector comprising a nucleic acid encoding a single-chain Msp porin, wherein the nucleic acid comprises:

196.实施方式190的细菌菌株，其还包含含有编码单链Msp孔蛋白的核酸的载体，其中所述核酸包含:196. The bacterial strain of embodiment 190, further comprising a vector comprising a nucleic acid encoding a single-chain Msp porin, wherein the nucleic acid comprises:

197.产生单链Msp孔蛋白的方法，所述方法包括:197. A method for producing a single-chain Msp porin, the method comprising:

(a)用包含能够编码单链Msp孔蛋白的核酸序列的载体转化实施方式190的细菌菌株；和(a) transforming the bacterial strain of embodiment 190 with a vector comprising a nucleic acid sequence capable of encoding a single-chain Msp porin; and

(b)从细菌纯化单链Msp孔蛋白。(b) Purification of single-chain Msp porins from bacteria.

198.实施方式197的方法，其中所述载体包含编码单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：198. The method of embodiment 197, wherein the vector comprises a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

199.实施方式197的方法，其中所述载体包含编码单链Msp孔蛋白的核酸序列，其中所述核酸序列包含：199. The method of embodiment 197, wherein the vector comprises a nucleic acid sequence encoding a single-chain Msp porin, wherein the nucleic acid sequence comprises:

(b)编码氨基酸连接体的第9核苷酸序列。(b) The ninth nucleotide sequence encoding the amino acid linker.

200.实施方式198或实施方式199的方法，其中所述Msp单体序列独立地选自野生型MspA单体、突变MspA单体、野生型MspA旁系同源物或同系物单体和突变MspA旁系同源物或同系物单体。200. The method of embodiment 198 or embodiment 199, wherein the Msp monomer sequences are independently selected from a wild-type MspA monomer, a mutant MspA monomer, a wild-type MspA paralog or homolog monomer, and a mutant MspA paralog or homolog monomer.

201.如实施方式198至200之一中的方法，其中所述Msp单体序列是野生型MspA单体。201. The method of any one of embodiments 198 to 200, wherein the Msp monomer sequence is a wild-type MspA monomer.

**********************************

虽然已在本文中举例说明和描述了示例性实施方案，但应理解可在本文中进行各种改变而不背离本文中描述的内容的精神和范围。Although exemplary embodiments have been illustrated and described herein, it will be understood that various changes can be made therein without departing from the spirit and scope of what is described herein.

Claims

1. Mycobacterium smegmatis porin (Msp) comprising a precordial and constricted regions defining a channel, wherein the Msp comprises a mutant MspA monomer containing a mutation at position 93, and mutations at positions 90, 91, or both, wherein the mutation at position 90 is D90N or D90Q, and/or the mutation at position 91 is D91N or D91Q, and wherein, compared to the wild-type MspA monomer, negatively charged amino acids at positions 118, 134, and 139 are replaced by positively charged amino acids.

2. The Msp of claim 1, wherein the mutation MspA comprises mutations at positions 90, 91, and 93.

3. The Msp of claim 1, wherein the mutation MspA comprises mutations at positions 90, 91 and 93, wherein the mutation at position 90 is D90N, the mutation at position 91 is D91N, and the mutation at position 93 is D93N.

4. The Msp of claim 3, wherein the mutation at position 118 is D118R.

5. The Msp of claim 4, wherein the mutation at position 139 is E139K or E139R.

6. The Msp of claim 5, wherein the mutation at position 118 is D118R, the mutation at position 134 is D134R, and the mutation at position 139 is E139K.

7. The Msp of claim 3, wherein the mutation MspA comprises the mutation at position 126, wherein the mutation at position 126 is Q126R.

8. The Msp of claim 1, wherein the mutation MspA comprises a mutation at position 88.

9. The Msp of claim 1, wherein the mutation MspA comprises mutations at positions 90, 91 and 93, wherein the mutation at position 90 is D90Q, the mutation at position 91 is D91Q, and the mutation at position 93 is D93N.

10. The Msp of claim 9, wherein the mutation at position 118 is D118R, the mutation at position 134 is D134R, and the mutation at position 139 is E139K.

11. Mycobacterium smegmatis porin (Msp) comprising a precordial and constricted regions defining a channel, wherein the Msp contains a mutant MspA, the mutant MspA comprising mutations at positions 90, 91 and 93, wherein the mutation at position 90 is D90N, the mutation at position 91 is D91N and the mutation at position 93 is D93N.

12. Mycobacterium smegmatis porin (Msp) comprising a precordial and constricted regions defining a channel, wherein the Msp comprises a mutant MspA comprising mutations at positions 90, 91, and 93, wherein the mutation at position 90 is D90Q, the mutation at position 91 is D91Q, and the mutation at position 93 is D93N, and wherein a positively charged amino acid replaces a negatively charged amino acid at positions 118, 134, and 139 of the wild-type Msp.

13. The Msp of claim 12, further comprising mutations at positions 118, 139, and 134, wherein the mutation at position 118 is D118R, the mutation at position 134 is D134R, the mutation at position 139 is E139K, and wherein a positively charged amino acid replaces a negatively charged amino acid at positions 118, 134, and 139 of the wild-type Msp.

14. The Msp of any one of claims 1-13, used for detecting the presence of an analyte.

15. The Msp of claim 14, wherein the analyte is a nucleic acid.

16. The Msp of claim 15, wherein the nucleic acid is DNA.

17. The Msp of claim 16, wherein the DNA is single-stranded DNA.

18. The Msp of claim 17, wherein the single-stranded DNA comprises a hairpin.

19. The Msp of any one of claims 1-13, used for analyzing nucleic acids.

20. The Msp of claim 19, wherein the nucleic acid is DNA.

21. The Msp of claim 20, wherein the DNA is single-stranded DNA.

22. The Msp of claim 21, wherein the single-stranded DNA comprises a hairpin.

23. A system comprising the Msp of any one of claims 1-13, wherein the channel of the Msp is located between a first conductive liquid medium and a second conductive liquid medium, wherein at least one conductive liquid medium contains the analyte, and wherein the system is effective for detecting the analyte.

24. The system of claim 23, comprising means for applying an electric field sufficient to transpose the analyte from the first conductive liquid medium to the second conductive liquid medium via liquid communication through Msp.

25. The system of claim 24, comprising a device for measuring the current across Msp.

26. The system of claim 23, comprising means for grounding the first conductive liquid medium and applying a positive voltage to the second conductive liquid medium.

27. The system of claim 23, wherein the analyte is a polymer.

28. The system of claim 27, wherein the polymer is a nucleic acid.

29. The system of claim 28, wherein the nucleic acid is DNA.

30. The system of claim 29, wherein the DNA is single-stranded DNA.

31. The system of claim 28, wherein the nucleic acid is RNA.

32. The system of claim 28, wherein the system is used for nucleic acid sequencing.

33. The system of claim 28, wherein the system further comprises a molecular engine, wherein the molecular engine is capable of moving the analyte into or through the channel at an average transposition velocity less than the average transposition velocity of the analyte into or through the channel in the absence of the molecular engine.

34. The system of claim 33, wherein the molecular engine is a helicase.

35. The system of claim 33, wherein the molecular engine is a polymerase.

36. The nucleic acid encoding the Msp of any one of claims 1-13.

37. A vector comprising a promoter operatively linked to the nucleic acid sequence of claim 36.

38. A mutant Mycobacterium smegmatis bacterium capable of expressing Msp, said bacterium comprising:

(a) Deficiency of wild-type MspA;

(b) The absence of wild-type MspC;

(c) The absence of wild-type MspD;

(d) The carrier of claim 37.

39. The mutant Mycobacterium smegmatis bacterium of claim 38, wherein the bacterium is Mycobacterium smegmatis ML16 strain.

40. Mycobacterium smegmatis porin that can be obtained from the mutant Mycobacterium smegmatis bacteria of claim 38 or 39.