CN115803815A - Block position and/or rotation based weight set selection for base detection - Google Patents
Block position and/or rotation based weight set selection for base detection Download PDFInfo
- Publication number
- CN115803815A CN115803815A CN202280005111.4A CN202280005111A CN115803815A CN 115803815 A CN115803815 A CN 115803815A CN 202280005111 A CN202280005111 A CN 202280005111A CN 115803815 A CN115803815 A CN 115803815A
- Authority
- CN
- China
- Prior art keywords
- weights
- sensing
- series
- sensor data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Neurology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Pens And Brushes (AREA)
- Road Signs Or Road Markings (AREA)
- Road Paving Structures (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种用于碱基检出的系统,该系统包括存储器,该存储器存储神经网络的拓扑结构、多个权重集和用于一系列感测循环的传感器数据。测序事件跨越该碱基检出操作经过感测循环子系列的时间进展和该碱基检出操作经过生物传感器上的位置的空间进展。可配置处理器被配置为:在可配置处理器上加载拓扑结构,根据感测循环的受试者子系列和/或生物传感器上的受试者位置来选择权重集,在处理元件上加载用于感测循环的受试者子系列和受试者位置的受试者传感器数据,使用所选择的权重集来配置拓扑结构,以及使神经网络处理受试者传感器数据以产生用于受试者子系列和受试者位置的碱基检出分类数据。
The present invention discloses a system for base calling that includes a memory that stores a topology of a neural network, a plurality of weight sets, and sensor data for a series of sensing cycles. Sequencing events span the temporal progression of the base calling operation through a subseries of sensing cycles and the spatial progression of the base calling operation through locations on the biosensor. The configurable processor is configured to: load the topology on the configurable processor, select the set of weights based on the subject sub-series of sensing cycles and/or the subject's position on the biosensor, load the processing element with subject sensor data for the subject subseries of the sensing cycle and the subject location, configure the topology using the selected set of weights, and cause the neural network to process the subject sensor data to generate Base calling categorical data for subseries and subject positions.
Description
优先权申请priority application
本申请要求2021年3月16日提交的名称为“Tile Location and/or Cycle BasedWeight Set Selection for Base Calling”的美国临时专利申请号63/161,880(代理人案卷号ILLM 1019-1/IP-1861-PRV);2021年3月16日提交的名称为“Neural NetworkParameter Quantization for Base Calling”的美国临时专利申请号63/161,896(代理人案卷号ILLM 1019-2/IP-2049-PRV);2022年3月4日提交的名称为“Tile Location and/orCycle Based Weight Set Selection for Base Calling”的美国非临时专利申请号17/687,551(代理人案卷号ILLM 1019-3/IP-1861-US);2022年3月4日提交的名称为“NeuralNetwork Parameter Quantization for Base Calling”的美国非临时专利申请号17,687,583(代理人案卷号ILLM 1019-4/IP-2049-US)的权益。优先权申请据此以引用方式并入本文中用于所有目的。This application claims U.S. Provisional Patent Application No. 63/161,880, filed March 16, 2021, entitled "Tile Location and/or Cycle BasedWeight Set Selection for Base Calling" (Attorney Docket No. ILLM 1019-1/IP-1861- PRV); U.S. Provisional Patent Application No. 63/161,896, filed March 16, 2021, entitled "Neural NetworkParameter Quantization for Base Calling" (Attorney Docket No. ILLM 1019-2/IP-2049-PRV); March 2022 U.S. Nonprovisional Patent Application No. 17/687,551 (Attorney Docket ILLM 1019-3/IP-1861-US), entitled "Tile Location and/orCycle Based Weight Set Selection for Base Calling," filed May 4; 2022 Benefit of U.S. Nonprovisional Patent Application No. 17,687,583 (Attorney Docket ILLM 1019-4/IP-2049-US), filed March 4, entitled "NeuralNetwork Parameter Quantization for Base Calling." The priority application is hereby incorporated herein by reference for all purposes.
技术领域technical field
本发明所公开的技术涉及人工智能类型计算机和数字数据处理系统以及对应数据处理方法和用于仿真智能的产品(即,基于知识的系统、推断系统和知识采集系统);并且包括用于不确定性推断的系统(例如,模糊逻辑系统)、自适应系统、机器学习系统和人工神经网络。具体地,所公开的技术涉及使用深度神经网络诸如深度卷积神经网络来分析数据以及权重集的选择性使用。The technology disclosed in the present invention relates to artificial intelligence type computers and digital data processing systems and corresponding data processing methods and products for simulating intelligence (that is, knowledge-based systems, inference systems, and knowledge acquisition systems); Inference systems (eg, fuzzy logic systems), adaptive systems, machine learning systems, and artificial neural networks. In particular, the disclosed techniques relate to the use of deep neural networks, such as deep convolutional neural networks, to analyze data and the selective use of weight sets.
文献并入Literature incorporated
以下文献以引用方式并入,即如同在本文完整示出一样:The following documents are incorporated by reference as if fully set forth herein:
2020年2月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED BASE CALLINGOF INDEX SEQUENCES”的美国临时专利申请号62/979,384(代理人案卷号ILLM 1015-1/IP-1857-PRV);U.S. Provisional Patent Application No. 62/979,384, entitled "ARTIFICIAL INTELLIGENCE-BASED BASE CALLINGOF INDEX SEQUENCES," filed February 20, 2020 (Attorney Docket No. ILLM 1015-1/IP-1857-PRV);
2020年2月20日提交的名称为“ARTIFICIAL INTELLIGENCE-BASED MANY-TO-MANYBASE CALLING”的美国临时专利申请号62/979,414(代理人案卷号ILLM 1016-1/IP-1858-PRV);U.S. Provisional Patent Application No. 62/979,414, entitled "ARTIFICIAL INTELLIGENCE-BASED MANY-TO-MANYBASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1016-1/IP-1858-PRV);
2020年2月20日提交的标题为“KNOWLEDGE DISTILLATION-BASED COMPRESSION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的美国临时专利申请号62/979,385(代理人案卷号ILLM 1017-1/IP-1859-PRV);U.S. Provisional Patent Application No. 62/979,385, entitled "KNOWLEDGE DISTILLATION-BASED COMPRESSION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER," filed February 20, 2020 (Attorney Docket No. ILLM 1017-1/IP-1859-PRV);
2020年8月28日提交的标题为“DETECTING AND FILTERING CLUSTERS BASED ONARTIFICIAL INTELLIGENCE-PREDICTED BASE CALLS”的美国临时专利申请号63/072,032(代理人案卷号ILLM 1018-1/IP-1860-PRV);U.S. Provisional Patent Application No. 63/072,032, entitled "DETECTING AND FILTERING CLUSTERS BASED ONARTIFICIAL INTELLIGENCE-PREDICTED BASE CALLS," filed August 28, 2020 (Attorney Docket No. ILLM 1018-1/IP-1860-PRV);
2020年2月20日提交的标题为“DATA COMPRESSION FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING”的美国临时专利申请号62/979,411(代理人案卷号ILLM 1029-1/IP-1964-PRV);U.S. Provisional Patent Application No. 62/979,411, entitled "DATA COMPRESSION FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1029-1/IP-1964-PRV);
2020年2月20日提交的标题为“SQUEEZING LAYER FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING”的美国临时专利申请号62/979,399(代理人案卷号ILLM 1030-1/IP-1982-PRV);U.S. Provisional Patent Application No. 62/979,399, entitled "SQUEEZING LAYER FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1030-1/IP-1982-PRV);
2020年3月20日提交的标题为“TRAINING DATA GENERATION FOR ARTIFICIALINTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/825,987(代理人案卷号ILLM 1008-16/IP-1693-US);U.S. Nonprovisional Patent Application No. 16/825,987, entitled "TRAINING DATA GENERATION FOR ARTIFICIALINTELLIGENCE-BASED SEQUENCING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-16/IP-1693-US);
2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED GENERATION OFSEQUENCING METADATA”的美国非临时专利申请号16/825,991(代理人案卷号ILLM 1008-17/IP-1741-US);U.S. Nonprovisional Patent Application No. 16/825,991, entitled "ARTIFICIAL INTELLIGENCE-BASED GENERATION OFSEQUENCING METADATA," filed March 20, 2020 (Attorney Docket No. ILLM 1008-17/IP-1741-US);
2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED BASE CALLING”的美国非临时专利申请号16/826,126(代理人案卷号ILLM 1008-18/IP-1744-US);U.S. Nonprovisional Patent Application No. 16/826,126, entitled "ARTIFICIAL INTELLIGENCE-BASED BASE CALLING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-18/IP-1744-US);
2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED QUALITYSCORING”的美国非临时专利申请号16/826,134(代理人案卷号ILLM 1008-19/IP-1747-US);U.S. Nonprovisional Patent Application No. 16/826,134, entitled "ARTIFICIAL INTELLIGENCE-BASED QUALITYSCORING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-19/IP-1747-US);
2020年3月21日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/826,168(代理人案卷号ILLM 1008-20/IP-1752-US);U.S. Nonprovisional Patent Application No. 16/826,168, entitled "ARTIFICIAL INTELLIGENCE-BASED SEQUENCING," filed March 21, 2020 (Attorney Docket No. ILLM 1008-20/IP-1752-US);
2020年5月14日提交的名称为“Systems and Devices for Characterizationand Performance Analysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US);以及U.S. Nonprovisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM 1011-4/IP-1750-US) ;as well as
2021年2月15日提交的名称为“HARDWARE EXECUTION AND ACCELERATION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的美国非临时专利申请号17/176,147(代理人案卷号ILLM1020-2/IP-1866-US)。U.S. Nonprovisional Patent Application No. 17/176,147, entitled "HARDWARE EXECUTION AND ACCELERATION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER," filed February 15, 2021 (Attorney Docket No. ILLM1020-2/IP-1866-US).
背景技术Background technique
本部分中讨论的主题不应仅因为在本部分中有提及就被认为是现有技术。类似地,在本部分中提及的或与作为背景技术提供的主题相关联的问题不应被认为先前在现有技术中已被认识到。本部分中的主题仅表示不同的方法,这些方法本身也可对应于受权利要求书保护的技术的具体实施。Subject matter discussed in this section should not be admitted to be prior art merely by virtue of its mention in this section. Similarly, issues mentioned in this section or in connection with subject matter provided as background should not be admitted to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which may themselves correspond to specific implementations of the claimed technology.
近年来,计算能力的快速提高使得深度卷积神经网络(CNN)在许多准确度显著提高的计算机视觉任务上取得了很大的成功。在推理阶段,许多应用需要以严格的功率消耗要求对一个图像进行低等待时间处理,这降低了图形处理单元(GPU)和其他通用平台的效率,通过定制专用于深度学习算法推理的数字电路,为特定的加速硬件(例如,现场可编程门阵列(FPGA))带来了机会。然而,由于大数据量、密集型计算、变化的算法结构和频繁的存储器访问,在便携式和嵌入式系统上部署CNN仍然具有挑战性。In recent years, the rapid increase in computing power has allowed deep convolutional neural networks (CNNs) to achieve great success in many computer vision tasks with significantly improved accuracy. In the inference stage, many applications require low-latency processing of an image with strict power consumption requirements, which reduces the efficiency of graphics processing units (GPUs) and other general-purpose platforms. By customizing digital circuits dedicated to deep learning algorithm inference, Opportunities arise for specific acceleration hardware such as Field Programmable Gate Arrays (FPGAs). However, it is still challenging to deploy CNNs on portable and embedded systems due to large data volumes, intensive computations, changing algorithmic structures, and frequent memory accesses.
由于卷积在CNN中贡献了大部分运算,因此卷积加速方案显著影响了硬件CNN加速器的效率和性能。卷积涉及具有沿内核和特征图滑动的四个循环级的乘法和累加(MAC)运算。第一循环级计算内核窗口内的像素的MAC。第二循环级跨不同的输入特征图累加MAC的乘积之和。在完成第一循环级和第二循环级之后,通过添加偏置来获得最终的输出像素。第三循环级在输入特征图内滑动内核窗口。第四循环级生成不同的输出特征图。Since convolution contributes most of the operations in CNN, the convolution acceleration scheme significantly affects the efficiency and performance of hardware CNN accelerators. Convolution involves multiply and accumulate (MAC) operations with four recurrent stages sliding along the kernel and feature maps. The first loop stage computes the MAC of the pixels within the kernel window. The second recurrent stage accumulates the sum of products of the MACs across different input feature maps. After completion of the first and second cyclic stages, the final output pixels are obtained by adding a bias. The third recurrent stage slides a kernel window within the input feature map. The fourth recurrent stage generates different output feature maps.
FPGA由于其(1)高度可重构性,(2)与专用集成电路(ASIC)相比开发时间更快,以跟上CNN的快速发展,(3)良好的性能,以及(4)与GPU相比优越的能量效率,获得了越来越多的关注和普及,特别是在加速推理任务方面。FPGA的高性能和高效率可以通过合成针对特定计算定制的电路来实现,以利用定制的存储器系统直接处理数十亿次运算。例如,现代FPGA上的数百至数千个数字信号处理(DSP)块以高并行性支持核心卷积操作,例如,乘法和加法。外部片上存储器和片上处理引擎(PE)之间的专用数据缓冲器可被设计成通过在FPGA芯片上配置数十兆字节的片上块随机存取存储器(BRAM)来实现优选的数据流。FPGA due to its (1) high reconfigurability, (2) faster development time compared to application-specific integrated circuit (ASIC) to keep up with the rapid development of CNN, (3) good performance, and (4) integration with GPU Compared to superior energy efficiency, it has gained increasing attention and popularity, especially in accelerating inference tasks. The high performance and efficiency of FPGAs can be achieved by synthesizing circuits tailored for specific computations to directly process billions of operations with custom memory systems. For example, hundreds to thousands of digital signal processing (DSP) blocks on modern FPGAs support core convolution operations, eg, multiplication and addition, with high parallelism. A dedicated data buffer between the external on-chip memory and the on-chip processing engine (PE) can be designed to enable optimal data flow by configuring tens of megabytes of on-chip block random access memory (BRAM) on the FPGA chip.
需要高效的数据流和CNN加速的硬件架构来最小化数据通信,同时最大化资源利用来实现高性能。因此有机会设计出在具有高性能、高效率和高度灵活性的加速硬件上加速各种CNN算法的推理过程的方法和框架。Efficient data flow and CNN-accelerated hardware architecture are required to minimize data communication while maximizing resource utilization to achieve high performance. Therefore, there is an opportunity to design methods and frameworks to accelerate the inference process of various CNN algorithms on accelerated hardware with high performance, high efficiency, and high flexibility.
附图说明Description of drawings
在附图中,在所有不同视图中,类似的参考符号通常是指类似的部件。另外,附图未必按比例绘制,而是重点说明所公开的技术的原理。在以下描述中,参考以下附图描述了所公开的技术的各种具体实施,其中:In the drawings, like reference characters generally refer to like parts throughout the different views. Additionally, the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosed technology. In the following description, various implementations of the disclosed technology are described with reference to the following figures, in which:
图1示出了可以在各种实施方案中使用的生物传感器的横截面。Figure 1 shows a cross-section of a biosensor that can be used in various embodiments.
图2示出了在其区块中包含簇的流通池的一个具体实施。Figure 2 shows an implementation of a flow cell containing clusters in its blocks.
图3示出了具有八个槽道的示例性流通池,并且还示出了一个区块及其簇和它们的周围背景的放大视图。Figure 3 shows an exemplary flow cell with eight channels, and also shows an enlarged view of a block and its clusters and their surrounding context.
图4是用于分析来自测序系统的传感器数据(诸如碱基检出传感器输出)的系统的简化框图。4 is a simplified block diagram of a system for analyzing sensor data from a sequencing system, such as base calling sensor output.
图5是示出了碱基检出操作的方面的简化图,该方面包括由主机处理器执行的运行时程序的功能。5 is a simplified diagram illustrating aspects of a base calling operation, including the functionality of a runtime program executed by a host processor.
图6是可配置处理器(诸如,图4的可配置处理器)的配置的简化图。6 is a simplified diagram of the configuration of a configurable processor, such as the configurable processor of FIG. 4 .
图7是可使用如本文所述配置的可配置或可重构阵列执行的神经网络架构的图。7 is a diagram of a neural network architecture that may be implemented using a configurable or reconfigurable array configured as described herein.
图8A是由如图7一样的神经网络架构使用的传感器数据的区块的组织的简化图示。FIG. 8A is a simplified illustration of the organization of blocks of sensor data used by a neural network architecture like FIG. 7 .
图8B是由如图7一样的神经网络架构使用的传感器数据的区块的补片的简化图示。FIG. 8B is a simplified illustration of a patch of blocks of sensor data used by a neural network architecture like FIG. 7 .
图9示出了可配置或可重构阵列(诸如现场可编程门阵列(FPGA))上的如图7一样的神经网络的配置的一部分。Figure 9 shows a portion of the configuration of a neural network like Figure 7 on a configurable or reconfigurable array, such as a Field Programmable Gate Array (FPGA).
图10是可使用如本文所述配置的可配置或可重构阵列执行的另一个另选神经网络架构的图。10 is a diagram of another alternative neural network architecture that may be implemented using a configurable or reconfigurable array configured as described herein.
图11示出了基于神经网络的碱基检出器的专门化架构的一个具体实施,该基于神经网络的碱基检出器用于隔离对不同测序循环的数据的处理。FIG. 11 shows a specific implementation of a specialized architecture of a neural network-based base caller for isolating the processing of data for different sequencing cycles.
图12示出了隔离层的一个具体实施,每个隔离层可包括卷积。Figure 12 shows one implementation of isolation layers, each of which may include convolutions.
图13A示出了组合层的一个具体实施,每个组合层可包括卷积。Figure 13A shows one implementation of combined layers, each of which may include convolutions.
图13B示出了组合层的另一具体实施,每个组合层可包括卷积。Figure 13B shows another implementation of combined layers, each of which may include convolutions.
图14、图15和图16示出了用于碱基检出的各种示例性基于区块位置的权重选择方案。Figures 14, 15, and 16 illustrate various exemplary tile position-based weight selection schemes for base calling.
图17A示出了衰落的示例,其中信号强度随着作为碱基检出操作的测序运行的循环数而降低。Figure 17A shows an example of fading in which signal strength decreases with cycle number of a sequencing run as a base calling operation.
图17B概念性地示出了随着测序循环进展而降低的信噪比。Figure 17B conceptually illustrates the decreasing signal-to-noise ratio as the sequencing cycle progresses.
图18示出了用于碱基检出的示例性基于碱基检出循环数的权重选择方案。Figure 18 illustrates an exemplary base calling cycle number based weight selection scheme for base calling.
图19、图20、图21A和图21B示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的各种示例性权重选择方案。Figures 19, 20, 21A and 21B illustrate various exemplary weight selection schemes based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks.
图22示出了碱基检出操作的一个具体实施,其中基于空间的区块信息和时间的感测循环子系列信息来选择用于碱基检出的权重集。FIG. 22 shows an implementation of a base calling operation, in which a set of weights for base calling is selected based on spatial block information and temporal sensing cycle subseries information.
图23A示出了用于各种类别的区块和用于各种感测循环的各种权重集,各个权重集包括对应的空间权重和对应的时间权重。Figure 23A shows various sets of weights for various classes of blocks and for various sensing cycles, each set of weights including corresponding spatial weights and corresponding temporal weights.
图23B示出了用于各种类别的区块和用于各种循环的各种权重集,其中特定类别的区块的不同权重集包括公共空间权重和不同的时间权重。Figure 23B shows various sets of weights for various classes of tiles and for various cycles, where the different sets of weights for a particular class of tiles include common spatial weights and different temporal weights.
图23C示出了基于一个或多个测序运行参数来选择权重集的系统。Figure 23C illustrates a system for selecting a set of weights based on one or more sequencing run parameters.
图24是根据一个具体实施的碱基检出系统的框图。Figure 24 is a block diagram of a base calling system according to one implementation.
图25是可在图24的系统中使用的系统控制器的框图。FIG. 25 is a block diagram of a system controller that may be used in the system of FIG. 24 .
图26是可用于实施所公开的技术的计算机系统的简化框图。26 is a simplified block diagram of a computer system that can be used to implement the disclosed techniques.
具体实施方式Detailed ways
本文所述的实施方案可用于学术或商业分析的各种生物或化学过程和系统。更具体地,本文所述的实施方案可用于期望检测指示期望反应的事件、属性、质量或特征的各种过程和系统中。例如,本文所述的实施方案包括卡盒、生物传感器及其部件,以及与卡盒和生物传感器一起操作的生物测定系统。在特定实施方案中,卡盒和生物传感器包括流通池和一个或多个传感器、像素、光检测器或光电二极管,它们以基本上一体的结构耦接在一起。Embodiments described herein can be used in various biological or chemical processes and systems for academic or commercial analysis. More specifically, the embodiments described herein can be used in various processes and systems where it is desirable to detect an event, property, quality or characteristic indicative of a desired response. For example, embodiments described herein include cartridges, biosensors and components thereof, and bioassay systems that operate with the cartridges and biosensors. In certain embodiments, the cartridge and biosensor include a flow cell and one or more sensors, pixels, photodetectors, or photodiodes coupled together in a substantially unitary structure.
当结合附图阅读时,将更好地理解特定实施方案的以下详细描述。就附图示出了各种实施方案的功能框的图示而言,功能框不一定指示硬件电路之间的划分。因此,例如,功能框中的一个或多个功能框(例如,处理器或存储器)可以在单件硬件(例如,通用信号处理器或随机存取存储器、硬盘等)中实施。类似地,程序可以是独立程序,可以作为操作系统中的子例程并入,可以为安装的软件包中的功能,等等。应当理解,各种实施方案不限于附图中所示的布置和仪器。The following detailed descriptions of certain embodiments are better understood when read in conjunction with the accompanying figures. To the extent that figures show diagrams of the functional blocks of various embodiments, the functional blocks are not necessarily indicative of the division between hardware circuitry. Thus, for example, one or more of the functional blocks (eg, a processor or memory) may be implemented in a single piece of hardware (eg, a general-purpose signal processor or random access memory, hard disk, etc.). Similarly, a program may be a stand-alone program, may be incorporated as a subroutine in an operating system, may be a function in an installed software package, and so on. It should be understood that the various embodiments are not limited to the arrangements and instrumentalities shown in the drawings.
如本文所用,以单数形式叙述且前面带有词语“一个”或“一种”的元件或步骤应当理解为不排除多个所述元件或步骤,除非明确地指明此类排除。此外,对“一个实施方案”的引用并非旨在被解释为排除也包含所叙述特征的附加实施方案的存在。此外,除非有相反的明确说明,否则“包括”或“具有”或“包含”具有特定属性的一个或多个元件的实施方案可包括附加元件,无论它们是否具有该属性。As used herein, an element or step recited in the singular and preceded by the word "a" or "an" should be understood as not excluding plural of said elements or steps, unless such exclusion is explicitly stated. Furthermore, references to "one embodiment" are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, an embodiment that "comprises" or "has" or "comprises" one or more elements having a particular attribute may include additional elements whether or not they have that attribute, unless expressly stated to the contrary.
如本文所用,“期望反应”包括感兴趣的分析物的化学、电、物理或光学属性(或质量)中的至少一者的变化。在特定实施方案中,期望反应为阳性结合事件(例如,将荧光标记的生物分子与感兴趣的分析物结合)。更一般地,期望反应可以是化学转化、化学变化或化学相互作用。期望反应也可为电属性的变化。例如,期望反应可以是溶液内离子浓度的变化。示例性反应包括但不限于化学反应,诸如还原、氧化、添加、消除、重排、酯化、酰胺化、醚化、环化或取代;结合相互作用,其中第一化学品与第二化学品结合;离解反应,其中两种或更多种化学品彼此分离;荧光;发光;生物发光;化学发光;和生物反应,诸如核酸复制、核酸扩增、核酸杂交、核酸连接、磷酸化、酶催化、受体结合或配体结合。期望反应还可以是质子的添加或消除,例如,可检测为周围溶液或环境的pH变化。附加的期望反应可检测离子的跨膜流动(例如,天然或合成双层膜),例如当离子流过膜时,电流中断并且可检测到该中断。As used herein, a "desired response" includes a change in at least one of the chemical, electrical, physical or optical properties (or qualities) of an analyte of interest. In certain embodiments, the desired response is a positive binding event (eg, binding of a fluorescently labeled biomolecule to an analyte of interest). More generally, the desired reaction may be a chemical transformation, chemical change, or chemical interaction. The desired response may also be a change in electrical properties. For example, the desired response may be a change in the concentration of ions in the solution. Exemplary reactions include, but are not limited to, chemical reactions such as reduction, oxidation, addition, elimination, rearrangement, esterification, amidation, etherification, cyclization, or substitution; binding interactions in which a first chemical interacts with a second chemical Binding; dissociation reactions, in which two or more chemicals are separated from each other; fluorescence; luminescence; bioluminescence; chemiluminescence; and biological reactions, such as nucleic acid replication, nucleic acid amplification, nucleic acid hybridization, nucleic acid ligation, phosphorylation, enzymatic catalysis , receptor binding or ligand binding. The desired reaction may also be the addition or removal of protons, for example detectable as a change in pH of the surrounding solution or environment. An additional desired reaction can detect the flow of ions across a membrane (eg, a natural or synthetic bilayer membrane), eg, when ions flow through the membrane, the current is interrupted and the interruption can be detected.
在特定实施方案中,期望反应包括将荧光标记的分子与分析物结合。分析物可为寡核苷酸,并且荧光标记的分子可为核苷酸。当激发光被导向具有标记核苷酸的寡核苷酸,并且荧光团发出可检测的荧光信号时,可检测到期望反应。在另选的实施方案中,检测到的荧光是化学发光或生物发光的结果。期望反应还可例如通过使供体荧光团接近受体荧光团来增加荧光(或)共振能量转移(FRET),通过分离供体荧光团和受体荧光团来降低FRET,通过分离淬灭基团与荧光团来增加荧光,或通过共定位淬灭基团和荧光团来减少荧光。In certain embodiments, the desired reaction comprises binding a fluorescently labeled molecule to the analyte. Analytes can be oligonucleotides, and fluorescently labeled molecules can be nucleotides. When excitation light is directed at the oligonucleotide with labeled nucleotides, and the fluorophore emits a detectable fluorescent signal, the desired reaction is detected. In alternative embodiments, the detected fluorescence is the result of chemiluminescence or bioluminescence. The desired reaction can also increase fluorescence (or ) resonance energy transfer (FRET), which reduces FRET by separating the donor and acceptor fluorophores, increases fluorescence by separating the quencher from the fluorophore, or decreases fluorescence by colocalizing the quencher and fluorophore .
如本文所用,“反应组分”或“反应物”包括可用于获得期望反应的任何物质。例如,反应组分包括试剂、酶、样品、其他生物分子和缓冲液。通常将反应组分递送至溶液中的反应位点和/或固定在反应位点处。反应组分可直接或间接地与另一种物质相互作用,诸如感兴趣的分析物。As used herein, "reaction component" or "reactant" includes any substance that can be used to obtain a desired reaction. For example, reaction components include reagents, enzymes, samples, other biomolecules, and buffers. Typically the reaction components are delivered to and/or immobilized at the reaction site in solution. A reaction component may directly or indirectly interact with another species, such as an analyte of interest.
如本文所用,术语“反应位点”是可发生期望反应的局部区域。反应位点可包括其上可固定物质的基板的支撑表面。例如,反应位点可包括在其上具有核酸群体的流通池的通道中的基本上平坦的表面。通常但并不总是,群体中的核酸具有相同的序列,例如为单链或双链模板的克隆拷贝。然而,在一些实施方案中,反应位点可包含仅单个核酸分子,例如单链或双链形式。此外,多个反应位点可沿着支撑表面不均匀地分布或以预先确定的方式布置(例如,在矩阵中并排布置,诸如在微阵列中)。反应位点还可包括反应室(或孔),其至少部分地限定被配置为分隔期望反应的空间区域或体积。As used herein, the term "reaction site" is a localized area where a desired reaction can occur. A reaction site may comprise a support surface of a substrate on which a substance may be immobilized. For example, a reaction site may comprise a substantially planar surface in a channel of a flow cell having a population of nucleic acids thereon. Usually, but not always, the nucleic acids in a population have the same sequence, eg, are clonal copies of a single- or double-stranded template. However, in some embodiments, a reactive site may comprise only a single nucleic acid molecule, eg, in single- or double-stranded form. In addition, the plurality of reaction sites may be unevenly distributed along the support surface or arranged in a predetermined pattern (eg, arranged side-by-side in a matrix, such as in a microarray). A reaction site may also include a reaction chamber (or well) that at least partially defines a spatial region or volume configured to separate desired reactions.
本申请可互换地使用术语“反应室”和“孔”。如本文所用,术语“反应室”或“孔”包括与流动通道流体连通的空间区域。反应室可至少部分地与周围环境或其他空间区域隔开。例如,多个反应室可通过共用壁彼此隔开。作为更具体的示例,反应室可包括由孔的内部表面限定的腔并且具有开口或孔口,使得腔可与流动通道流体连通。包括此类反应室的生物传感器在2011年10月20日提交的国际申请号PCT/US2011/057111中更详细地描述,该国际申请全文以引用方式并入本文。This application uses the terms "reaction chamber" and "well" interchangeably. As used herein, the term "reaction chamber" or "well" includes a region of space that is in fluid communication with a flow channel. The reaction chamber can be at least partially isolated from the surrounding environment or other spatial regions. For example, multiple reaction chambers may be separated from each other by a common wall. As a more specific example, the reaction chamber can include a cavity defined by the interior surface of the bore and have an opening or orifice such that the cavity can be in fluid communication with the flow channel. Biosensors comprising such reaction chambers are described in more detail in International Application No. PCT/US2011/057111, filed October 20, 2011, which is incorporated herein by reference in its entirety.
在一些实施方案中,反应室的尺寸和形状相对于固体(包括半固体)被设定成使得固体可完全或部分地插入其中。例如,反应室的尺寸和形状可被设定成容纳仅一个捕获小珠。该捕获小珠可在其上具有克隆扩增的DNA或其他物质。或者,反应室的尺寸和形状可被设定成接纳大约数量的小珠或固体基板。又如,反应室还可填充有多孔凝胶或物质,该多孔凝胶或物质被配置为控制扩散或过滤可流入反应室中的流体。In some embodiments, the reaction chamber is sized and shaped relative to the solid (including semi-solid) such that the solid can be fully or partially inserted therein. For example, the reaction chamber can be sized and shaped to accommodate only one capture bead. The capture beads may have clonally amplified DNA or other material thereon. Alternatively, the reaction chamber can be sized and shaped to receive an approximate number of beads or solid substrate. As another example, the reaction chamber may also be filled with a porous gel or substance configured to control diffusion or filter fluids that may flow into the reaction chamber.
在一些实施方案中,传感器(例如,光检测器、光电二极管)与生物传感器的样品表面的对应像素区域相关联。因此,像素区域是表示生物传感器的样品表面上用于一个传感器(或像素)的区域的几何构造。当在覆盖相关联像素区域的反应位点或反应室处发生了期望反应时,与像素区域相关联的传感器检测从相关联像素区域采集的光发射。在平坦表面实施方案中,像素区域可重叠。在一些情况下,多个传感器可以与单个反应位点或单个反应室相关联。在其他情况下,单个传感器可以与一组反应位点或一组反应室相关联。In some embodiments, sensors (eg, photodetectors, photodiodes) are associated with corresponding pixel regions of the sample surface of the biosensor. Thus, a pixel area is a geometrical configuration representing the area on the sample surface of a biosensor for one sensor (or pixel). A sensor associated with a pixel area detects light emission collected from the associated pixel area when a desired reaction occurs at a reaction site or reaction chamber covering the associated pixel area. In flat surface implementations, the pixel areas may overlap. In some cases, multiple sensors may be associated with a single reaction site or a single reaction chamber. In other cases, a single sensor can be associated with a set of reaction sites or a set of reaction chambers.
如本文所用,“生物传感器”包括具有多个反应位点和/或反应室(或孔)的结构。生物传感器可包括固态成像设备(例如,CCD或CMOS成像器件)和任选地安装到其上的流通池。流通池可包括与反应位点和/或反应室流体连通的至少一个流动通道。作为一个具体示例,生物传感器被配置为流体耦接和电耦接到生物测定系统。生物测定系统可根据预先确定的协议(例如,边合成边测序)将反应物递送到反应位点和/或反应室,并且执行多个成像事件。例如,生物测定系统可引导反应溶液沿反应位点和/或反应室流动。溶液中的至少一种可包含四种类型的具有相同或不同荧光标记的核苷酸。核苷酸可以与位于反应位点和/或反应室的对应的寡核苷酸结合。然后,生物测定系统可使用激发光源(例如,固态光源,诸如发光二极管(LED))照亮反应位点和/或反应室。激发光可具有预定的一个或多个波长,包括一个波长范围。被激发的荧光标记提供可由传感器捕获的发射信号。As used herein, "biosensor" includes structures having multiple reaction sites and/or reaction chambers (or wells). A biosensor may include a solid-state imaging device (eg, a CCD or CMOS imaging device) and optionally a flow cell mounted thereto. A flow cell may comprise at least one flow channel in fluid communication with a reaction site and/or a reaction chamber. As a specific example, a biosensor is configured to be fluidly and electrically coupled to a bioassay system. A bioassay system can deliver reactants to reaction sites and/or reaction chambers and perform multiple imaging events according to a predetermined protocol (eg, sequencing by synthesis). For example, a bioassay system can direct the flow of a reaction solution along a reaction site and/or a reaction chamber. At least one of the solutions may contain four types of nucleotides with the same or different fluorescent labels. Nucleotides can bind to corresponding oligonucleotides located at reaction sites and/or reaction chambers. The bioassay system can then illuminate the reaction site and/or reaction chamber using an excitation light source (eg, a solid state light source such as a light emitting diode (LED)). The excitation light may have a predetermined wavelength or wavelengths, including a range of wavelengths. The excited fluorescent label provides an emission signal that can be captured by the sensor.
在另选的实施方案中,生物传感器可包括电极或被配置为检测其他可识别属性的其他类型的传感器。例如,传感器可被配置为检测离子浓度的变化。在另一示例中,传感器可被配置为检测跨膜的离子电流流动。In alternative embodiments, a biosensor may include electrodes or other types of sensors configured to detect other identifiable properties. For example, a sensor may be configured to detect changes in ion concentration. In another example, the sensor can be configured to detect ionic current flow across the membrane.
如本文所用,“簇”是类似或相同分子或核苷酸序列或DNA链的群体。例如,簇可以是扩增的寡核苷酸或具有相同或相似序列的任何其他组的多核苷酸或多肽。在其他实施方案中,簇可为占据样品表面上的物理区域的任何元素或元素组。在实施方案中,在碱基检出循环期间将簇固定到反应位点和/或反应室。As used herein, a "cluster" is a population of similar or identical molecules or nucleotide sequences or DNA strands. For example, a cluster can be an amplified oligonucleotide or any other set of polynucleotides or polypeptides having the same or similar sequence. In other embodiments, a cluster can be any element or group of elements that occupies a physical area on the sample surface. In embodiments, clusters are immobilized to reaction sites and/or reaction chambers during base calling cycles.
如本文所用,当关于生物分子或生物或化学物质使用时,术语“固定的”包括在分子水平上基本上将生物分子或生物或化学物质连接到表面。例如,可使用吸附技术将生物分子或生物或化学物质固定到基板材料的表面,该吸附技术包括非共价相互作用(例如,静电力、范德华力以及疏水界面的脱水)和共价结合技术,其中官能团或接头有利于将生物分子连接到表面。将生物分子或生物或化学物质固定到基板材料的表面可基于基板表面的属性、携带生物分子或生物或化学物质的液体介质以及生物分子或生物或化学物质本身的属性。在一些情况下,基板表面可被官能化(例如,化学或物理改性),以有利于将生物分子(或生物或化学物质)固定到基板表面。可首先改性基板表面以使官能团与表面结合。然后,官能团可与生物分子或生物或化学物质结合,以将官能团固定在其上。物质可经由凝胶固定到表面,例如,如美国专利公布号US 2011/0059865 A1中所描述,该文献以引用方式并入本文。As used herein, the term "immobilized" when used in reference to a biomolecule or biological or chemical substance includes substantially attaching the biomolecule or biological or chemical substance to a surface at the molecular level. For example, biomolecules or biological or chemical substances can be immobilized to the surface of the substrate material using adsorption techniques including non-covalent interactions (e.g., electrostatic forces, van der Waals forces, and dehydration of hydrophobic interfaces) and covalent bonding techniques, Among them, functional groups or linkers facilitate the attachment of biomolecules to surfaces. The immobilization of biomolecules or biological or chemical substances to the surface of the substrate material may be based on the properties of the substrate surface, the liquid medium carrying the biomolecules or biological or chemical substances, and the properties of the biomolecules or biological or chemical substances themselves. In some cases, the substrate surface can be functionalized (eg, chemically or physically modified) to facilitate the immobilization of biomolecules (or biological or chemical species) to the substrate surface. The surface of the substrate may first be modified to bind functional groups to the surface. The functional groups can then be combined with biomolecules or biological or chemical substances to immobilize the functional groups thereon. Substances can be immobilized to surfaces via gels, for example, as described in US Patent Publication No. US 2011/0059865 Al, which is incorporated herein by reference.
在一些实施方案中,可将核酸附着到表面并使用桥式扩增进行扩增。有用的桥式扩增方法描述于例如美国专利号5,641,658、WO 2007/010251、美国专利号6,090,592、美国专利公布号2002/0055100 A1、美国专利号7,115,400、美国专利公布号2004/0096853 A1、美国专利公布号2004/0002090 A1、美国专利公布号2007/0128624 A1或美国专利公布号2008/0009420 A1中有描述,其各自以全文并入本文。用于在表面上扩增核酸的另一种有用方法是滚环扩增(RCA),例如,使用下文进一步详细阐述的方法。在一些实施方案中,可使用一个或多个引物对使核酸附着到表面并扩增。例如,引物中的一个引物可在溶液中,并且另一引物可固定在表面上(例如,5'-附着)。以举例的方式,核酸分子可以与表面上的引物中的一个引物杂交,然后进行固定引物延伸以产生核酸的第一拷贝。然后,溶液中的引物与核酸的第一拷贝杂交,该引物可使用核酸的第一拷贝作为模板来延伸。任选地,在产生核酸的第一拷贝之后,原始核酸分子可以与表面上的第二固定引物杂交,并且可在溶液中的引物延伸的同时或之后延伸。在任何实施方案中,使用固定引物和溶液中的引物的重复延伸(例如,扩增)轮次提供核酸的多个拷贝。In some embodiments, nucleic acids can be attached to surfaces and amplified using bridge amplification. Useful bridge amplification methods are described, for example, in U.S. Patent No. 5,641,658, WO 2007/010251, U.S. Patent No. 6,090,592, U.S. Patent Publication No. 2002/0055100 Al, U.S. Patent No. 7,115,400, U.S. Patent Publication No. 2004/0096853 Al, U.S. Patent No. Described in Publication No. 2004/0002090 Al, US Patent Publication No. 2007/0128624 Al, or US Patent Publication No. 2008/0009420 Al, each of which is incorporated herein in its entirety. Another useful method for amplifying nucleic acids on surfaces is rolling circle amplification (RCA), eg, using the method described in further detail below. In some embodiments, nucleic acids can be attached to a surface and amplified using one or more primer pairs. For example, one of the primers can be in solution and the other primer can be immobilized on the surface (eg, 5'-attached). By way of example, a nucleic acid molecule can be hybridized to one of the primers on the surface, followed by fixed primer extension to produce a first copy of the nucleic acid. The primer in solution then hybridizes to the first copy of the nucleic acid, which primer can be extended using the first copy of the nucleic acid as a template. Optionally, after the first copy of the nucleic acid is produced, the original nucleic acid molecule can be hybridized to a second immobilized primer on the surface, and can be extended concurrently with or subsequent to primer extension in solution. In any embodiment, repeated extension (eg, amplification) rounds using immobilized primers and primers in solution provide multiple copies of the nucleic acid.
在特定实施方案中,由本文所述的系统和方法执行的测定协议包括使用天然核苷酸以及被配置为与天然核苷酸相互作用的酶。天然核苷酸包括例如核糖核苷酸(RNA)或脱氧核糖核苷酸(DNA)。天然核苷酸可为单磷酸盐、二磷酸盐或三磷酸盐形式,并且可具有选自腺嘌呤(A)、胸腺嘧啶(T)、尿嘧啶(U)、鸟嘌呤(G)或胞嘧啶(C)的碱基。然而,应当理解,可使用非天然核苷酸、经修饰的核苷酸或前述核苷酸的类似物。有用的非天然核苷酸的一些示例在下面关于基于可逆终止子的边合成边测序方法来阐述。In certain embodiments, assay protocols performed by the systems and methods described herein include the use of natural nucleotides and enzymes configured to interact with natural nucleotides. Natural nucleotides include, for example, ribonucleotides (RNA) or deoxyribonucleotides (DNA). Natural nucleotides can be in monophosphate, diphosphate or triphosphate form and can have a (C) base. It should be understood, however, that non-natural nucleotides, modified nucleotides, or analogs of the foregoing may be used. Some examples of useful unnatural nucleotides are set forth below with respect to reversible terminator-based sequencing-by-synthesis approaches.
在包括反应室的实施方案中,物品或固体物质(包括半固体物质)可设置在反应室内。当设置时,物品或固体可通过过盈配合、粘附或截留物理地保持或固定在反应室内。可设置在反应室内的物品或固体的示例包括聚合物小珠、微丸、琼脂糖凝胶、粉末、量子点或可被压缩和/或保持在反应室内的其他固体。在特定实施方案中,核酸超结构(诸如DNA球)可例如通过附接至反应室的内表面或通过停留在反应室内的液体中而设置在反应室中或反应室处。可进行DNA球或其他核酸超结构,然后将其设置在反应室中或反应室处。另选地,DNA球可在反应室处原位合成。DNA球可通过滚环扩增来合成以产生特定核酸序列的多联体,并且该多联体可用形成相对紧凑的球的条件处理。DNA球及其合成方法描述于例如美国专利公布号2008/0242560 A1或2008/0234136 A1中,其各自以全文并入本文。保持或设置在反应室中的物质可为固态、液态或气态。In embodiments that include a reaction chamber, objects or solid matter (including semi-solid matter) may be disposed within the reaction chamber. When positioned, items or solids may be physically held or secured within the reaction chamber by interference fit, adhesion or entrapment. Examples of items or solids that may be disposed within the reaction chamber include polymeric beads, pellets, sepharose, powders, quantum dots, or other solids that may be compressed and/or held within the reaction chamber. In particular embodiments, nucleic acid superstructures, such as DNA spheres, may be disposed in or at the reaction chamber, eg, by attaching to an inner surface of the reaction chamber or by residing in a liquid within the reaction chamber. DNA spheres or other nucleic acid superstructures can be made and then placed in or at the reaction chamber. Alternatively, DNA spheres can be synthesized in situ at the reaction chamber. DNA spheres can be synthesized by rolling circle amplification to produce concatemers of specific nucleic acid sequences, and the concatemers can be treated with conditions that form relatively compact spheres. DNA spheres and methods of their synthesis are described, for example, in US Patent Publication Nos. 2008/0242560 Al or 2008/0234136 Al, each of which is incorporated herein in its entirety. The substance held or disposed in the reaction chamber may be solid, liquid or gaseous.
如本文所用,“碱基检出”识别核酸序列中的核苷酸碱基。碱基检出是指确定特定循环处每个簇的碱基检出(A、C、G、T)的过程。例如,可利用在美国专利申请公开号2013/0079232的合并材料中描述的四通道、两通道或一通道方法和系统来执行碱基检出。在特定实施方案中,碱基检出循环被称为“采样事件”。在一染料和双通道测序协议中,采样事件包括时间序列中的两个照明阶段,使得在每个阶段生成像素信号。第一照明阶段诱导来自给定簇的照明,指示AT像素信号中的核苷酸碱基A和T,并且第二照明阶段诱导来自给定簇的照明,指示CT像素信号中的核苷酸碱基C和T。As used herein, "base calling" identifies nucleotide bases in a nucleic acid sequence. Base calling refers to the process of determining the base call (A, C, G, T) for each cluster at a specific cycle. For example, base calling can be performed using the four-pass, two-pass or one-pass methods and systems described in the incorporated material of US Patent Application Publication No. 2013/0079232. In certain embodiments, a base calling cycle is referred to as a "sampling event." In one-dye and two-channel sequencing protocols, sampling events consist of two illumination phases in time series such that pixel signals are generated in each phase. The first illumination stage induces illumination from a given cluster, indicative of nucleotide bases A and T in the AT pixel signal, and the second illumination stage induces illumination from a given cluster, indicative of the nucleotide base in the CT pixel signal Base C and T.
生物传感器biological sensor
图1示出了可以在各种实施方案中使用的生物传感器100的横截面。生物传感器100具有像素区域106'、108'、110'、112'和114',这些像素区域可各自在碱基检出循环期间保持多于一个簇(例如,每像素区域2个簇)。如图所示,生物传感器100可包括安装到采样设备104上的流通池102。在例示的实施方案中,流通池102直接附连到采样设备104。然而,在另选的实施方案中,流通池102可以可移除地耦接到采样设备104。采样设备104具有可被官能化的样品表面134(例如,以适合进行期望反应的方式进行化学或物理改性)。例如,样品表面134可被官能化并且可包括多个像素区域106'、108'、110'、112'和114',该多个像素区域可各自在碱基检出循环期间保持多于一个簇(例如,每个像素区域具有对应的簇对106A、106B;108A、108B;110A、110B;112A、112B;和114A、114B固定在其上)。每个像素区域与对应的传感器(或像素或光电二极管)106、108、110、112和114相关联,使得由像素区域接收的光由对应的传感器捕获。像素区域106'也可以与样品表面134上保持簇对的对应反应位点106”相关联,使得从反应位点106”发射的光由像素区域106'接收并且由对应的传感器106捕获。由于这种感测结构,在以下情况下,该碱基检出循环中的像素信号携带基于该两个或更多个簇中的所有簇的信息:其中在碱基检出循环期间,在特定传感器的像素区域中存在两个或更多个簇(例如,每个像素区域具有对应的簇对)。因此,如本文所述的信号处理用于区分每个簇,其中在特定碱基检出循环的给定采样事件中存在比像素信号更多的簇。Figure 1 shows a cross-section of a
在例示的实施方案中,流通池102包括侧壁138、125和由侧壁138、125支撑的流罩136。侧壁138、125耦接到样品表面134并且在流罩136与侧壁138、125之间延伸。在一些实施方案中,侧壁138、125由可固化粘合剂层形成,该可固化粘合剂层将流罩136粘结到采样设备104。In the illustrated embodiment, the
侧壁138、125的尺寸和形状被设定成使得流动通道144存在于流罩136与采样设备104之间。流罩136可包括对从生物传感器100的外部传播到流动通道144中的激发光101透明的材料。在示例中,激发光101以非正交角度接近流罩136。The
另外如图所示,流罩136可包括入口端口和出口端口142、146,该入口端口和出口端口被配置为流体地接合其他端口(未示出)。例如,其他端口可来自卡盒或工作站。流动通道144的尺寸和形状被设定成沿样品表面134引导流体。流动通道144的高度H1和其他尺寸可被配置为维持流体沿样品表面134的基本上均匀的流动。流动通道144的尺寸也可被配置为控制气泡形成。Also as shown, the
以举例的方式,流罩136(或流通池102)可包括透明材料,诸如玻璃或塑料。流罩136可构成具有平面外表面和限定流动通道144的平面内表面的基本上矩形的块。该块可安装到侧壁138、125上。另选地,可蚀刻流通池102以限定流罩136和侧壁138、125。例如,可以将凹槽蚀刻到透明材料中。当蚀刻材料安装到采样设备104时,凹槽可变成流动通道144。By way of example, flow shield 136 (or flow cell 102 ) may comprise a transparent material such as glass or plastic. The
采样设备104可类似于例如包括多个堆叠的基板层120至126的集成电路。基板层120至126可包括基部基板120、固态成像器件122(例如,CMOS图像传感器)、滤波器或光控制层124和钝化层126。应当注意,以上仅是说明性的,并且其他实施方案可包括更少层或附加层。此外,基板层120至126中的每个层可包括多个子层。采样设备104可使用类似于制造集成电路(诸如CMOS图像传感器和CCD)中使用的工艺来制造。例如,基板层120至126或其部分可被生长、沉积、蚀刻等以形成采样设备104。The
钝化层126被配置为使滤波器层124屏蔽流动通道144的流体环境。在一些情况下,钝化层126还被配置为提供允许生物分子或其他感兴趣分析物固定在其上的固体表面(即,样品表面134)。例如,每个反应位点可包括固定到样品表面134的生物分子的簇。因此,钝化层126可以由允许反应位点固定到其上的材料形成。钝化层126还可包括至少对期望荧光透明的材料。以举例的方式,钝化层126可包含氮化硅(Si2N4)和/或二氧化硅(SiO2)。然而,可使用其他合适的材料。在例示的实施方案中,钝化层126可以是基本上平面的。然而,在另选的实施方案中,钝化层126可包括凹槽,诸如凹坑、孔、槽等。在例示的实施方案中,钝化层126具有约150nm至200nm,并且更具体地约170nm的厚度。
滤波器层124可包括影响光的透射的各种特征。在一些实施方案中,滤波器层124可执行多个功能。例如,滤波器层124可被配置为(a)过滤不想要的光信号,诸如来自激发光源的光信号;(b)将来自反应位点的发射信号导向对应的传感器106、108、110、112和114,这些传感器被配置为检测来自反应位点的发射信号;或(c)阻止或防止检测到来自邻近反应位点的不想要的发射信号。因此,滤波器层124也可称为光控制层。在例示的实施方案中,滤波器层124具有约1μm至5μm,更具体地约2μm至4μm的厚度。在另选的实施方案中,滤波器层124可包括微透镜或其他光学元件的阵列。每个微透镜可被配置为将发射信号从相关联的反应位点引导到传感器。
在一些实施方案中,固态成像器件122和基部基板120可作为先前构造的固态成像设备(例如,CMOS芯片)一起提供。例如,基底基板120可以是硅晶片,并且固态成像器件122可安装在其上。固态成像器件122包括半导体材料(例如,硅)层和传感器106、108、110、112和114。在例示的实施方案中,传感器是被配置为检测光的光电二极管。在其他实施方案中,传感器包括光检测器。固态成像器件122可通过基于CMOS的制造工艺制造为单个芯片。In some embodiments, solid-
固态成像器件122可包括传感器106、108、110、112和114的密集阵列,这些传感器被配置为检测指示来自流动通道144内或沿该流动通道的期望反应的活动。在一些实施方案中,每个传感器具有约1平方微米至2平方微米(μm2)的像素区域(或检测区域)。阵列可包括五十万个传感器、五百万个传感器、一千万个传感器或甚至一亿两千万个传感器。传感器106、108、110、112和114可被配置为检测指示期望反应的预先确定的波长的光。Solid-
在一些实施方案中,采样设备104包括微电路布置,诸如美国专利号7,595,882中描述的微电路布置,该美国专利以引用方式整体并入本文。更具体地,采样设备104可包括具有传感器106、108、110、112和114的平面阵列的集成电路。在采样设备104内形成的电路可被配置用于信号放大、数字化、存储和处理中的至少一者。电路可收集和分析检测到的荧光并生成用于将检测数据传送到信号处理器的像素信号(或检测信号)。电路还可以在采样设备104中执行附加的模拟和/或数字信号处理。采样设备104可包括导电通孔130,这些导电通孔执行信号路由(例如,将像素信号传输到信号处理器)。像素信号也可通过采样设备104的电触点132传输。In some embodiments,
相对于2020年5月14日提交的名称为“Systems and Devices forCharacterization and Performance Analysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US)进一步详细讨论了采样设备104,该专利申请以引用方式并入本文,如同在本文中完全阐述一样。采样设备104不限于如上所述的上述构造或用途。在另选的实施方案中,采样设备104可采取其他形式。例如,采样设备104可包括CCD设备(诸如CCD相机),其耦接到流通池或移动以与其中具有反应位点的流通池交互。Relative to U.S. Nonprovisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM 1011-4/IP-1750- US) discusses the
图2示出了在其区块中包含簇的流通池200的一个具体实施。流通池200对应于图1的流通池102,例如,没有流罩136。此外,流通池200的描绘在性质上是象征性的,并且流通池200象征性地描绘了其内的各种槽道和区块,而未示出其内的各种其他部件。图2示出了流通池200的顶视图。Figure 2 shows one implementation of a
在一个实施方案中,流通池200被划分或分区为多个槽道,诸如槽道202a、202b、…、202P,即,P个槽道。在图2的示例中,流通池200被示出为包括8个槽道,即,在该示例中,P=8,但是流通池内的槽道的数量是具体实施特定的。In one embodiment, flow
在一个实施方案中,各个槽道202被进一步分区为被称为“区块”212的非重叠区域。例如,图2示出了示例性槽道的区段208的放大视图。区段208被示出为包括多个区块212。In one embodiment, each channel 202 is further partitioned into non-overlapping regions referred to as “blocks” 212 . For example, FIG. 2 shows an enlarged view of section 208 of an exemplary channel. Section 208 is shown to include a plurality of tiles 212 .
在示例中,每个槽道202包括一个或多个区块列。例如,在图2中,每个槽道202包括两个对应的区块列212,如放大区段208内所示。每个槽道内的每个区块列中的区块数量是具体实施特定的,并且在一个示例中,每个槽道内的每个区块列中可存在50个区块、60个区块、100个区块或另一适当数量的区块。In an example, each slot 202 includes one or more columns of blocks. For example, in FIG. 2 , each channel 202 includes two corresponding block columns 212 , as shown in enlarged section 208 . The number of blocks in each block column within each slot is implementation specific, and in one example there may be 50 blocks, 60 blocks, 100 blocks or another appropriate number of blocks.
每个区块包括对应的多个簇。在测序过程中,对区块上的簇及其周围背景进行成像。例如,图2示出了示例性区块内的示例性簇216。Each block includes a corresponding number of clusters. During the sequencing process, the clusters on the block and their surrounding background are imaged. For example, FIG. 2 shows an example cluster 216 within an example block.
图3示出了具有八个槽道的示例性Illumina GA-IIxTM流通池,并且还示出了一个区块及其簇和它们的周围背景的放大视图。例如,Illumina基因组分析仪II中的每个槽道有一百个区块,Illumina HiSeq2000中的每个槽道有六十八个区块。区块212容纳数十万至数百万的簇。在图3中,在308处(例如,308是区块的放大图像视图)示出了从具有示出为亮点的簇的区块生成的图像,其中标记了示例性簇304。簇304包括模板分子的大约一千个相同副本,但簇的尺寸和形状不同。在测序运行之前,通过对输入文库进行桥式扩增,由模板分子生成簇。扩增和簇生长的目的是增加发射信号的强度,因为成像设备不能可靠地感测单个荧光团。然而,簇304内的DNA片段的物理距离较小,因此成像设备将片段的簇感知为单个点304。Figure 3 shows an exemplary Illumina GA-IIx ™ flow cell with eight channels, and also shows a magnified view of a block and its clusters and their surrounding background. For example, there are one hundred tiles per lane in the Illumina Genome Analyzer II, and sixty-eight tiles per lane in the Illumina HiSeq2000. Blocks 212 hold hundreds of thousands to millions of clusters. In FIG. 3 , an image generated from a block with clusters shown as bright spots is shown at 308 (eg, 308 is an enlarged image view of the block), with
相对于2020年3月20日提交的标题为“TRAINING DATA GENERATION FORARTIFICIAL INTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/825,987(代理人案卷号ILLM 1008-16/IP-1693-US)进一步详细讨论了簇和区块;Further Details Relative to U.S. Nonprovisional Patent Application No. 16/825,987, Filed March 20, 2020, entitled "TRAINING DATA GENERATION FORARTIFICIAL INTELLIGENCE-BASED SEQUENCING," (Attorney Docket No. ILLM 1008-16/IP-1693-US) clusters and blocks are discussed;
图4是用于分析来自测序系统的传感器数据(诸如碱基检出传感器输出(例如,参见图1))的系统的简化框图。在图4的示例中,系统包括测序机器400和可配置处理器450。可配置处理器450可以与由主机处理器(诸如中央处理单元(CPU)402)执行的运行时程序协调地执行基于神经网络的碱基检出器。测序机器400包括碱基检出传感器和流通池401(例如,相对于图1至图3所讨论的)。流通池可包括一个或多个区块,其中遗传物质的簇暴露于分析物流的序列,该分析物流的序列用于引起簇中的反应以识别遗传物质中的碱基,如相对于图1至图3所讨论的。传感器感测流通池的每个区块中该序列的每个循环的反应以提供区块数据。下文更详细地描述了该技术的示例。遗传测序是数据密集型操作,其将碱基检出传感器数据转换为在碱基检出操作期间感测到的遗传物质的每个簇的碱基检出序列。4 is a simplified block diagram of a system for analyzing sensor data from a sequencing system, such as base calling sensor output (see, eg, FIG. 1 ). In the example of FIG. 4 , the system includes a
该示例中的系统包括执行运行时程序以协调碱基检出操作的CPU402、用于存储区块数据阵列的序列的存储器403、由碱基检出操作产生的碱基检出读段,以及碱基检出操作中使用的其他信息。另外,在该图示中,系统包括用于储存一个配置文件(或多个文件)诸如FPGA位文件的存储器404和用于配置和重新配置可配置处理器450并且执行神经网络的神经网络的模型参数。测序机器400可包括用于配置可配置处理器以及在一些实施方案中的可重构处理器的程序,以执行神经网络。The system in this example includes a
测序机器400通过总线405耦接到可配置处理器450。总线405可以使用高通量技术来实现,诸如在一个示例中,总线技术与当前由PCI-SIG(PCI特别兴趣小组)维护和开发的PCIe标准(快速外围组件互连)兼容。另外,在该示例中,存储器460通过总线461耦接到可配置处理器450。存储器460可以是设置在具有可配置处理器450的电路板上的板载存储器。存储器460用于由可配置处理器450高速访问在碱基检出操作中使用的工作数据。总线461还可以使用高通量技术诸如与PCIe标准兼容的总线技术来实现。
可配置处理器,包括现场可编程门阵列(FPGA)、粗粒度可重构阵列(CGRA)以及其他可配置和可重构的设备,可被配置为比使用执行计算机程序的通用处理器可能实现的更有效或更快地实现各种功能。可配置处理器的配置涉及编译功能描述以产生有时称为位流或位文件的配置文件,以及将配置文件分发到处理器上的可配置元件。Configurable processors, including field-programmable gate arrays (FPGAs), coarse-grained reconfigurable arrays (CGRAs), and other configurable and reconfigurable devices, that can be configured to perform better than is possible with a general-purpose processor executing a computer program more efficient or faster implementation of various functions. Configuration of a configurable processor involves compiling a functional description to produce a configuration file, sometimes called a bitstream or bitfile, and distributing the configuration file to configurable elements on the processor.
该配置文件通过将电路配置为设置数据流模式、分布式存储器和其他片上存储器资源的使用、查找表内容、可配置逻辑块和可配置执行单元(如乘法累加单元、可配置互连和可配置阵列的其他元件)的操作,来定义要由可配置处理器执行的逻辑功能。如果配置文件可在现场通过改变加载的配置文件而改变,则可配置处理器是可重构的。例如,配置文件可存储在易失性SRAM元件中、非易失性读写存储器元件中以及它们的组合中,分布在可配置或可重构处理器上的可配置元件阵列中。多种可商购获得的可配置处理器适用于如本文所述的碱基检出操作。示例包括可商购获得的产品,诸如Xilinx AlveoTMU200、XilinxAlveoTMU250、Xilinx AlveoTM U280、Intel/Altera StratixTMGX2800、Intel/AlteraStratixTMGX2800和Intel StratixTMGX10M。在一些示例中,主机CPU可在与可配置处理器相同的集成电路上实现。The configuration file is configured by configuring the circuit to set the data flow mode, use of distributed memory and other on-chip memory resources, look-up table content, configurable logic blocks, and configurable execution other elements of the array) to define the logical functions to be performed by the configurable processor. A configurable processor is reconfigurable if the configuration file can be changed in the field by changing the loaded configuration file. For example, configuration files may be stored in volatile SRAM elements, in non-volatile read-write memory elements, and combinations thereof, distributed among arrays of configurable elements on a configurable or reconfigurable processor. A variety of commercially available configurable processors are suitable for base calling operations as described herein. Examples include commercially available products such as Xilinx Alveo ™ U200, Xilinx Alveo ™ U250, Xilinx Alveo ™ U280, Intel/Altera Stratix ™ GX2800, Intel/Altera Stratix ™ GX2800, and Intel Stratix ™ GX10M. In some examples, the host CPU may be implemented on the same integrated circuit as the configurable processor.
本文所述的实施方案使用可配置处理器450实现多循环神经网络。可配置处理器的配置文件可通过使用高级描述语言(HDL)或寄存器传输级(RTL)语言规范指定要执行的逻辑功能来实现。可使用被设计用于所选择的可配置处理器的资源来编译规范以生成配置文件。为了生成可能不是可配置处理器的专用集成电路的设计,可编译相同或相似的规范。Embodiments described herein use a
因此,在本文所述的所有实施方案中,可配置处理器的另选方案包括配置的处理器,该配置的处理器包括专用ASIC或专用集成电路或集成电路组,或片上系统(SOC)器件,该配置的处理器被配置为执行如本文所述的基于神经网络的碱基检出操作。Accordingly, in all embodiments described herein, alternatives to configurable processors include configured processors comprising application-specific ASICs or application-specific integrated circuits or groups of integrated circuits, or system-on-chip (SOC) devices , the configured processor configured to perform a neural network based base calling operation as described herein.
一般来讲,如被配置为执行神经网络的运行的本文所述的可配置处理器和配置的处理器在本文中称为神经网络处理器。In general, configurable processors and configured processors as described herein that are configured to perform the operations of a neural network are referred to herein as neural network processors.
在该示例中,可配置处理器450通过使用由CPU 402或其他源执行的程序加载的配置文件进行配置,该配置文件配置可配置处理器454上的可配置元件的阵列以执行碱基检出功能。在该示例中,该配置包括数据流逻辑451,该数据流逻辑耦接到总线405和总线461,并且执行用于在碱基检出操作中使用的元件之间分发数据和控制参数的功能。In this example,
另外,可配置处理器450配置有碱基检出执行逻辑452以执行多循环神经网络。逻辑452包括多个多循环执行簇(例如,453),在该示例中,该多个多循环执行簇包括多循环簇1至多循环簇X。可根据涉及操作的所需通量和可配置处理器上的可用资源的权衡来选择多循环簇的数量。Additionally, the
多循环簇通过使用可配置处理器上的可配置互连和存储器资源实现的数据流路径454耦接到数据流逻辑451。另外,多循环簇通过使用例如可配置处理器上的可配置互连和存储器资源实现的控制路径455耦接到数据流逻辑451,这些控制路径提供指示可用簇、准备好向可用簇提供用于执行神经网络的运行的输入单元、准备好提供用于神经网络的经训练参数、准备好提供碱基检出分类数据的输出补片的控制信号,以及用于执行神经网络的其他控制数据。The multi-cycle cluster is coupled to
可配置处理器被配置为使用经训练参数来执行多循环神经网络的运行,以产生碱基流操作的感测循环的分类数据。执行神经网络的运行以产生用于碱基检出操作的受试者感测循环的分类数据。神经网络的运行对序列(包括来自N个感测循环中的相应感测循环的区块数据的数字N个阵列)进行操作,其中N个感测循环在本文所述示例中针对时间序列中每个操作的一个碱基位置提供用于不同碱基检出操作的传感器数据。任选地,如果需要,根据正在执行的特定神经网络模型,N个感测循环中的一些可能会失序。数字N可以是大于1的任何数字。在本文所述的一些示例中,N个感测循环中的感测循环表示时间序列中受试者感测循环之前的至少一个感测循环和受试者循环(subject cycle)之后的至少一个感测循环的一组感测循环。本文描述了其中数字N为等于或大于五的整数的示例。The configurable processor is configured to perform an operation of the multi-cycle neural network using the trained parameters to generate classification data for a sensing cycle of the baseflow operation. An operation of the neural network is performed to generate classification data for a subject sensing cycle of a base calling operation. A run of the neural network operates on a sequence (number N arrays comprising block data from corresponding ones of the N sensing cycles, in the examples described herein for each of the time series One base position for one operation provides sensor data for a different base calling operation. Optionally, some of the N sensing cycles may be out of sequence, if desired, depending on the particular neural network model being executed. The number N can be any number greater than one. In some examples described herein, a sensing cycle of the N sensing cycles represents at least one sensing cycle preceding the subject sensing cycle and at least one sensing cycle following the subject cycle in the time series. A set of sensing cycles for the sensing cycle. Examples are described herein where the number N is an integer equal to or greater than five.
数据流逻辑451被配置为使用用于给定运行的输入单元(包括N个阵列的空间对准补片的区块数据)将区块数据和模型的至少一些经训练参数从存储器460移动到用于神经网络的运行的可配置处理器。输入单元可通过一个DMA操作中的直接存储器存取操作来移动,或者在可用时隙期间与所部署的神经网络的执行相协调地移动的较小单元中移动。
如本文所述的用于感测循环的区块数据可包括具有一个或多个特征的传感器数据阵列。例如,传感器数据可包括两个图像,对这两个图像进行分析以识别在DNA、RNA或其他遗传物质的遗传序列中的碱基位置处的四种碱基中的一种。区块数据还可包括关于图像和传感器的元数据。例如,在碱基检出操作的实施方案中,区块数据可包括关于图像与簇的对准的信息,诸如距中心距离的信息,该距离指示传感器数据阵列中的每个像素距区块上遗传物质的簇的中心的距离。Block data for a sensing cycle as described herein may include an array of sensor data having one or more characteristics. For example, sensor data may include two images that are analyzed to identify one of four bases at a base position in a genetic sequence of DNA, RNA, or other genetic material. Block data may also include metadata about images and sensors. For example, in an embodiment of a base calling operation, the tile data may include information about the alignment of the image to the cluster, such as a distance from the center indicating the distance between each pixel in the sensor data array and the image on the tile. The distance from the center of the cluster of genetic material.
在如下所述的多循环神经网络的执行期间,区块数据还可包括在多循环神经网络的执行期间产生的数据,称为中间数据,该数据可在多循环神经网络的运行期间重复使用而不是重新计算。例如,在多循环神经网络的执行期间,数据流逻辑可将中间数据代替用于区块数据阵列的给定补片的传感器数据写入存储器460。下文更详细地描述了类似于此的实施方案。During the execution of the multi-cycle neural network as described below, the block data may also include data generated during the execution of the multi-cycle neural network, called intermediate data, which can be reused during the operation of the multi-cycle neural network without Not a recalculation. For example, during execution of a multi-cycle neural network, dataflow logic may write intermediate data to
如图所示,描述了用于分析碱基检出传感器输出的系统,该系统包括可由运行时程序访问的存储器(例如,460),该存储器储存区块数据,这些区块数据包括来自碱基检出操作的感测循环的区块的传感器数据。另外,该系统包括神经网络处理器,诸如可访问存储器的可配置处理器450。神经网络处理器被配置为使用经训练参数来执行神经网络的运行,以产生用于感测循环的分类数据。如本文所述,神经网络的运行对来自N个感测循环的相应感测循环(包括受试者循环)的区块数据的N个阵列的序列进行操作,以产生受试者循环的分类数据。提供数据流逻辑451以使用输入单元(包括来自N个感测循环的相应感测循环的N个阵列的空间对准补片的数据)将区块数据和经训练参数从存储器移动到神经网络处理器以用于神经网络的运行。As shown, a system for analyzing base calling sensor output is described that includes a memory (e.g., 460) accessible by a runtime program that stores block data including Sensor data for a block of a sensing cycle of operation is detected. In addition, the system includes a neural network processor, such as a memory-accessible
另外,描述了一种系统,其中神经网络处理器能够访问存储器,并且包括多个执行簇,该多个执行簇中的执行逻辑簇被配置为执行神经网络。数据流逻辑能够访问存储器和多个执行簇中的执行簇,以将区块数据的输入单元提供到该多个执行簇中的可用执行簇,这些输入单元包括来自相应感测循环(包括受试者感测循环)的区块数据阵列的数字N个空间对准补片,并且使执行簇将N个空间对准补片应用于神经网络以产生受试者感测循环的空间对准补片的分类数据的输出补片,其中N大于1。Additionally, a system is described in which a neural network processor has access to memory and includes a plurality of execution clusters, clusters of execution logic in the plurality of execution clusters configured to execute a neural network. The data flow logic is capable of accessing memory and execution clusters of the plurality of execution clusters to provide input units of block data to available execution clusters of the plurality of execution clusters, the input units comprising number N spatially aligned patches of the block data array of the or sensing loop), and cause the execution cluster to apply the N spatially aligned patches to the neural network to produce a spatially aligned patch of the subject sensing loop The output patch of the categorical data, where N is greater than 1.
图5是示出了碱基检出操作的方面的简化图,该方面包括由主机处理器执行的运行时程序的功能。在该图中,来自流通池(诸如图1至图2所示的流通池)的图像传感器的输出在线500上提供到图像处理线程501,该图像处理线程可对图像执行处理,诸如各个区块的传感器数据阵列中的重采样、对准和布置,并且可由为流通池中的每个区块计算区块簇掩膜的过程使用,该过程识别与流通池的对应区块上的遗传物质的簇对应的传感器数据阵列中的像素。为了计算簇掩膜,一个示例性算法是基于用于使用来源于softmax输出的度量来检测在早期测序循环中不可靠的簇的过程,然后丢弃来自那些阱/簇的数据,并且不针对那些簇产生输出数据。例如,过程可在第一N1个(例如,25个)碱基检出期间识别具有高可靠性的簇,并且拒绝其他簇。所拒绝的簇可能是多克隆的或强度非常弱的或因基准点模糊。该程序可在主机CPU上执行。在另选的实施方案中,该信息将潜在地用于识别要传回CPU的必要的感兴趣簇,从而限制中间数据所需的存储。5 is a simplified diagram illustrating aspects of a base calling operation, including the functionality of a runtime program executed by a host processor. In this figure, the output from an image sensor of a flow cell, such as that shown in FIGS. resampling, alignment, and placement in the sensor data array of the flow cell, and can be used by the process of computing a block cluster mask for each block in the flow cell, which identifies the genetic material on the corresponding block of the flow cell. Clusters correspond to pixels in the sensor data array. To compute the cluster mask, an exemplary algorithm is based on a procedure for detecting clusters that are unreliable in early sequencing cycles using metrics derived from the softmax output, then discarding data from those wells/clusters and not targeting those clusters Generate output data. For example, the process may identify clusters with high reliability during the first N1 (eg, 25) base calls and reject other clusters. Rejected clusters may be polyclonal or very weak in intensity or obscured by fiducials. The program is executable on the host CPU. In an alternative embodiment, this information would potentially be used to identify the necessary clusters of interest to pass back to the CPU, thereby limiting the storage required for intermediate data.
根据碱基检出操作的状态,图像处理线程501的输出在线502上提供到CPU中的调度逻辑510,该调度逻辑将区块数据阵列在高速总线503上路由到数据高速缓存504,或者在高速总线505上路由到多簇神经网络处理器硬件520,诸如图4的可配置处理器。硬件520将由神经网络输出的分类数据返回到调度逻辑510,该调度逻辑将信息传递到数据高速缓存504,或者在线程511上传递到使用分类数据执行碱基检出和质量分数计算的线程502,并且可以标准格式布置用于碱基检出读段的数据。在线512上将执行碱基检出和质量分数计算的线程502的输出提供给线程503,该线程聚合碱基检出读段,执行其他操作诸如数据压缩,并且将所得的碱基检出输出写入指定目的地以供客户利用。Depending on the status of the base-calling operation, the output of
在一些实施方案中,主机可以包括执行硬件520的输出的最终处理以支持神经网络的线程(未示出)。例如,硬件520可以提供来自多簇神经网络的最终层的分类数据的输出。主机处理器可以对分类数据执行输出激活功能诸如softmax功能,以配置供碱基检出和质量评分线程502使用的数据。另外,主机处理器可执行输入操作(未示出),诸如在输入到硬件520之前对区块数据进行重采样、批量归一化或其他调整。In some embodiments, the host computer may include threads (not shown) that perform final processing of the output of the
图6是可配置处理器(诸如,图4的可配置处理器)的配置的简化图。在图6中,可配置处理器包括具有多个高速PCIe接口的FPGA。FPGA配置有封装器(wrapper)600,该封装器包括参考图1描述的数据流逻辑。封装器600通过CPU通信链路609来管理与CPU中的运行时程序的接口和协调,并且经由DRAM通信链路610来管理与板载DRAM 602(例如,存储器460)的通信。封装器600中的数据流逻辑将通过遍历板载DRAM 602上的数字N个循环的区块数据阵列而检索到的补片数据提供到簇601,并且从簇601检索过程数据615以递送回板载DRAM602。封装器600还管理板载DRAM 602和主机存储器之间的数据传输,以用于区块数据的输入阵列和分类数据的输出补片两者。封装器将线613上的补片数据传输到分配的簇601。封装器在线612上将经训练的参数诸如权重和偏置提供到从板载DRAM 602检索到的簇601。封装器在线611上将配置和控制数据提供到簇601,该簇经由CPU通信链路609从主机上的运行时程序提供或响应于该运行时程序而生成。簇还可以在线616上向封装器600提供状态信号,该状态信号与来自主机的控制信号协作使用,以管理区块数据阵列的遍历,从而提供空间对准的补片数据,并且使用簇601的资源对补片数据执行多循环神经网络。6 is a simplified diagram of the configuration of a configurable processor, such as the configurable processor of FIG. 4 . In Figure 6, the configurable processor includes an FPGA with multiple high-speed PCIe interfaces. The FPGA is configured with a
如上所述,在由封装器600管理的单个可配置处理器上可以存在多个簇,该多个簇被配置用于在区块数据的多个补片中的对应补片上执行。每个簇可被配置为使用本文所述的多个感测循环的区块数据来提供受试者感测循环中的碱基检出的分类数据。As noted above, there may be multiple clusters on a single configurable processor managed by
在系统的示例中,可将模型数据(包括内核数据,如过滤器权重和偏置)从主机CPU发送到可配置处理器,使得模型可根据循环数进行更新。举一个代表性示例,碱基检出操作可包括大约数百个感测循环。在一些实施方案中,碱基检出操作可包括双端读段。例如,模型训练参数可以每20个循环(或其他数量的循环)更新一次,或者根据针对特定系统和神经网络模型实现的更新模式来更新。在包括双端读段的一些实施方案中,其中区块上的遗传簇中的给定字符串的序列包括从第一末端沿字符串向下(或向上)延伸的第一部分和从第二末端沿字符串向上(或向下)延伸的第二部分,可在从第一部分到第二部分的过渡中更新经训练参数。In an example of a system, model data (including kernel data such as filter weights and biases) can be sent from a host CPU to a configurable processor so that the model can be updated according to the number of cycles. As a representative example, a base calling operation may include on the order of hundreds of sensing cycles. In some embodiments, a base calling operation may include paired-end reads. For example, model training parameters may be updated every 20 cycles (or other number of cycles), or according to an update pattern implemented for a particular system and neural network model. In some embodiments that include paired-end reads, wherein the sequence of a given string in a genetic cluster on a block includes a first portion extending down (or up) the string from a first end and a first portion extending from a second end A second portion extending upwards (or downwards) along the string, the trained parameters may be updated in transition from the first portion to the second portion.
在一些示例中,可以将用于区块的感测数据中的用于多个循环的图像数据从CPU发送到封装器600。封装器600可任选地对感测数据进行一些预处理和转换,并且将信息写入板载DRAM 602。每个感测循环的输入区块数据可包括传感器数据阵列,包括每个感测循环每个区块大约4000×3000个像素或更多,其中两个特征表示区块的两个图像的颜色,并且每个特征每个像素一个或两个字节。对于其中数字N为要在多循环神经网络的每个运行中使用的三个感测循环的实施方案,用于多循环神经网络的每个运行的区块数据阵列可消耗每个区块大约数百兆字节。在系统的一些实施方案中,区块数据还包括每个区块存储一次的DFC数据的阵列,或关于传感器数据和区块的其他类型的元数据。In some examples, image data for multiple cycles of the sensing data for a tile may be sent from the CPU to the
在操作中,当多循环簇可用时,封装器将补片分配给簇。封装器在区块的遍历中获取区块数据的下一个补片,并将其连同适当的控制和配置信息一起发送到所分配的簇。簇可被配置为在可配置处理器上具有足够的存储器,以保存包括来自一些系统中的多个循环的补片且正被就地处理的数据补片,以及当在各种实施方案中使用乒乓缓冲技术或光栅扫描技术完成对当前补片的处理时将被处理的数据补片。In operation, when a multicycle cluster is available, the wrapper assigns patches to the cluster. The wrapper fetches the next patch of block data in the block's traversal and sends it to the allocated cluster along with appropriate control and configuration information. Clusters can be configured with sufficient memory on configurable processors to hold patches of data that include patches from multiple cycles in some systems and are being processed in-place, and when used in various embodiments The data patch that will be processed when the ping-pong buffering technique or the raster scanning technique completes the processing of the current patch.
当分配的簇完成其对当前补片的神经网络的运行并产生输出补片时,其将发信号通知封装器。封装器将从分配的簇读取输出补片,或者另选地,分配的簇将数据推送到封装器。然后,封装器将对DRAM 602中的经处理的区块组装输出补片。当整个区块的处理已完成并且数据的输出补片已传输到DRAM时,封装器将区块的经处理输出阵列以指定格式发送回主机/CPU。在一些实施方案中,板载DRAM 602由封装器600中的存储器管理逻辑管理。运行时程序可控制测序操作,以连续流的方式完成运行中所有循环的区块数据的所有阵列的分析,从而提供实时分析。The allocated cluster will signal the wrapper when it has completed its run of the neural network for the current patch and produced an output patch. The wrapper will read output patches from the allocated cluster, or alternatively the allocated cluster will push data to the wrapper. The encapsulator will then assemble the output patch to the processed block in
图7是可使用本文所述的系统执行的多循环神经网络模型的图。图7所示的示例可称为五循环输入、一循环输出神经网络。对多循环神经网络模型的输入包括来自给定区块的五个感测循环的区块数据阵列的五个空间对准补片(例如,700个)。空间对准补片具有与集合中的其他补片相同的对准行和列尺寸(x,y),使得信息涉及序列循环中的区块上的遗传物质的相同簇。在该示例中,受试者补片是来自循环K的区块数据阵列的补片。一组五个空间对准补片包括来自在受试者补片之前两个循环的循环K-2的补片、来自在受试者补片之前一个循环的循环K-1的补片、来自在来自受试者循环的补片之后一个循环的循环K+1的补片、以及来自在来自受试者循环的补片之后两个循环的循环K+2的补片。7 is a diagram of a multi-cycle neural network model that can be implemented using the systems described herein. The example shown in Figure 7 may be referred to as a five-cycle input, one-cycle output neural network. The input to the multi-cycle neural network model included five spatially aligned patches (eg, 700) from the block data array for five sensing cycles for a given block. A spatially aligned patch has the same aligned row and column dimensions (x,y) as the other patches in the set, so that the information relates to the same clusters of genetic material on blocks in the sequence cycle. In this example, the subject patch is the patch from the block data array for cycle K. A set of five spatially aligned patches included patches from cycle K-2 two cycles before the subject's patch, patches from cycle K-1 one cycle before the subject's patch, patches from The patch from cycle K+1 one cycle after the patch from the subject cycle, and the patch from cycle K+2 two cycles after the patch from the subject cycle.
该模型包括输入补片中的每个输入补片的神经网络的层的隔离叠堆701。因此,叠堆701接收来自循环K+2的补片的区块数据作为输入,并且与叠堆702、703、704和705隔离,使得它们不共享输入数据或中间数据。在一些实施方案中,叠堆710至705中的所有叠堆可具有相同的模型和相同的经训练参数。在其他实施方案中,模型和经训练参数在不同叠堆中可能不同。叠堆702接收来自循环K+1的补片的区块数据作为输入。叠堆703接收来自循环K的补片的区块数据作为输入。叠堆704接收来自循环K-1的补片的区块数据作为输入。叠堆705接收来自循环K-2的补片的区块数据作为输入。隔离叠堆的层各自执行内核的卷积操作,该内核包括层的输入数据上的多个滤波器。如在以上示例中,补片700可包括三个特征。层710的输出可包括更多的特征,诸如10个至20个特征。同样,层711至716中的每个层的输出可包括适用于特定具体实施的任何数量的特征。滤波器的参数是神经网络的经训练参数,诸如权重和偏置。来自叠堆701至705中的每个叠堆的输出特征集(中间数据)作为输入被提供到时间组合层的逆层次结构720,其中来自多个循环的中间数据被组合。在例示的示例中,逆层次结构720包括:第一层,该第一层包括三个组合层721、722、723,每个组合层接收来自隔离叠堆中的三个隔离叠堆的中间数据;以及最终层,该最终层包括一个组合层730,该组合层接收来自三个时间层721、722、723的中间数据。The model includes an
最终组合层730的输出是位于来自循环K的区块的对应补片中的簇的分类数据的输出补片。可将输出补片组装成循环K的区块的输出阵列分类数据。在一些实施方案中,输出补片可具有不同于输入补片的大小和尺寸。在一些实施方案中,输出补片可包括可经主机滤波以选择簇数据的逐像素数据。The output of the
根据特定具体实施,然后可将输出分类数据应用于任选地由主机或在可配置处理器上执行的softmax函数740(或其他输出激活函数)。可使用不同于softmax的输出函数(例如,根据最大输出产生碱基检出输出参数,然后利用使用上下文/网络输出的经学习非线性映射给出碱基质量)。Depending on the particular implementation, the output classification data can then be applied to a softmax function 740 (or other output activation function), optionally executed by the host or on a configurable processor. An output function other than softmax may be used (eg, base calling output parameters are generated from the max output, then base quality is given using a learned non-linear mapping using the context/network output).
最后,可提供softmax函数740的输出作为循环K的碱基检出概率(750)并且将其储存在主机存储器中以在后续处理中使用。其他系统可使用用于输出概率计算的另一种函数,例如,另一个非线性模型。Finally, the output of the
可使用具有多个执行簇的可配置处理器来实现神经网络,以便在等于或接近一个感测循环的时间间隔的持续时间内完成一个区块循环的评估,从而有效地实时提供输出数据。数据流逻辑可被配置为将区块数据和经训练参数的输入单元分布到执行簇,并且分布输出补片以用于聚合在存储器中。The neural network can be implemented using a configurable processor with multiple execution clusters to complete the evaluation of one block cycle in a duration equal to or close to the time interval of one sensing cycle, effectively providing output data in real time. The dataflow logic may be configured to distribute input units of block data and trained parameters to execution clusters, and distribute output patches for aggregation in memory.
参考图8A和图8B描述了用于使用双通道传感器数据的碱基检出操作的如图7一样的五循环输入、一循环输出神经网络的数据的输入单元。例如,对于基因序列中的给定碱基,碱基检出操作可执行两个分析物流和两个反应,该两个反应生成两个信号(诸如图像)通道,这些图像可被处理以识别四种碱基中的哪一种碱基位于遗传物质的每个簇的遗传序列的当前位置处。在其他系统中,可利用不同数量的感测数据的通道。例如,可利用一通道方法和系统来执行碱基检出。美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道或四通道)的碱基检出。An input unit of data of a five-cycle input, one-cycle output neural network as in FIG. 7 for a base calling operation using two-channel sensor data is described with reference to FIGS. 8A and 8B . For example, for a given base in a genetic sequence, a base calling operation can perform two analysis streams and two reactions that generate two channels of signals (such as images) that can be processed to identify four Which of the bases is located at the current position in the genetic sequence of each cluster of genetic material. In other systems, different numbers of channels of sensed data may be utilized. For example, base calling can be performed using one-pass methods and systems. The incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one, two, or four lanes.
图8A示出了针对给定区块(区块M)的五个循环的区块数据阵列,该区块M出于执行五循环输入、一循环输出神经网络的目的使用。该示例中的五循环输入区块数据可被写入板载DRAM或系统中的可由数据流逻辑访问的其他存储器,并且对于循环K-2包括用于通道1的阵列801和用于通道2的阵列811,对于循环K-1包括用于通道1的阵列802和用于通道2的阵列812,对于循环K包括用于通道1的阵列803和用于通道2的阵列813,对于循环K+1包括用于通道1的阵列804和用于通道2的阵列814,对于循环K+2包括用于通道1的阵列805和用于通道2的阵列815。另外,区块的元数据的阵列820可在存储器中写入一次,在该情况下,包括DFC文件以连同每个循环用作对神经网络的输入。FIG. 8A shows a five-cycle block data array for a given block (block M) used for the purpose of executing a five-cycle input, one-cycle output neural network. The five-cycle input block data in this example can be written to on-board DRAM or other memory in the system that is accessible by the dataflow logic, and includes
尽管图8A讨论了两通道碱基检出操作,但是使用两个通道仅仅是示例,并且可使用任何其他适当数量的通道来执行碱基检出。例如,美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道、或四通道、或另一适当数量的通道)的碱基检出。Although FIG. 8A discusses a two-lane base calling operation, the use of two lanes is merely an example, and any other suitable number of lanes may be used to perform base calling. For example, the incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one lane, two lanes, or four lanes, or another suitable number of lanes.
数据流逻辑构成区块数据的输入单元,这些输入单元可参考图8B理解,该区块数据包括每个执行簇的区块数据阵列的空间对准补片,该每个执行簇被配置为对输入补片执行神经网络的运行。用于分配的执行簇的输入单元由数据流逻辑通过以下方式构成:从五个输入循环的区块数据阵列801至805、811、815、820中的每个阵列读取空间对准补片(例如,851、852、861、862、870),并且经由数据路径(示意性地,850)将它们递送到被配置用于由分配的执行簇使用的可配置处理器上的存储器。分配的执行簇执行五循环输入/一循环输出神经网络的运行,并且针对受试者循环K递送受试者循环K中的区块的相同补片的分类数据的输出补片。The data flow logic constitutes the input units of the block data, which can be understood with reference to FIG. The input patch performs a run of the neural network. The input unit for the allocated execution cluster consists of data flow logic by reading a spatially aligned patch ( For example, 851, 852, 861, 862, 870), and deliver them via a data path (illustratively, 850) to memory on a configurable processor configured for use by the allocated execution cluster. The assigned execution cluster performs a run of the five-cycle-in/one-cycle-out neural network and delivers, for subject cycle K, an output patch of the classification data of the same patch of the block in subject cycle K.
图9是如图7(例如,701和720)一样的系统中可使用的神经网络的叠堆的简化表示。在该示例中,神经网络的一些功能(例如,900、902)在主机上执行,并且神经网络的其他部分(例如,901)在可配置处理器上执行。FIG. 9 is a simplified representation of a stack of neural networks that may be used in a system like FIG. 7 (eg, 701 and 720 ). In this example, some functions of the neural network (eg, 900, 902) are executed on the host computer, and other parts of the neural network (eg, 901) are executed on the configurable processor.
例如,第一函数可以是在CPU上形成的批量归一化(层910)。然而,在另一示例中,作为函数的批量归一化可融合到一个或多个层中,并且不存在单独的批量归一化层。For example, the first function may be batch normalization formed on the CPU (layer 910). However, in another example, batch normalization as a function may be fused into one or more layers, and there is no separate batch normalization layer.
如上文关于可配置处理器所讨论的,多个空间隔离卷积层被执行为神经网络的第一组卷积层。在该示例中,第一组卷积层在空间上应用2D卷积。As discussed above with respect to the configurable processor, multiple spatially isolated convolutional layers are implemented as the first set of convolutional layers of the neural network. In this example, the first set of convolutional layers apply 2D convolutions spatially.
如图9所示,针对每个叠堆中的数字L/2(L是参考图7描述的)个空间隔离的神经网络层,执行第一空间卷积921,之后执行第二空间卷积922,之后执行第三空间卷积923,并依此类推。如923A处所指示,空间层的数量可以是任何实际数字,针对上下文的该实际数字在不同实施方案中可在从几个到多于20个的范围内。As shown in FIG. 9 , for the number L/2 (L is described with reference to FIG. 7 ) spatially isolated neural network layers in each stack, a first
对于SP_CONV_0,内核权重例如储存在(1,6,6,3,L)结构中,因为对于该层存在3个输入通道。在该示例中,该结构中的“6”归因于将系数储存在变换的Winograd域中(内核大小在空间域中为3×3,但在变换域中扩展)。For SP_CONV_0, the kernel weights are for example stored in a (1,6,6,3,L) structure, since there are 3 input channels for this layer. In this example, the "6" in the structure is due to storing the coefficients in the transformed Winograd domain (the kernel size is 3x3 in the spatial domain, but expanded in the transform domain).
对于该示例,对于其他SP_CONV层,内核权重储存在(1,6,6L)结构中,因为对于这些层中的每个层,存在K(=L)个输入和输出。For this example, for the other SP_CONV layers, the kernel weights are stored in a (1,6,6L) structure, since for each of these layers there are K (=L) inputs and outputs.
空间层的叠堆的输出被提供到时间层,包括在FPGA上执行的卷积层924、925。层924和925可以是跨循环应用1D卷积的卷积层。如924A处所指示,时间层的数量可以是任何实际数字,针对上下文的该实际数字在不同实施方案中可在从几个到多于20个的范围内。The output of the stack of spatial layers is provided to the temporal layers, including
第一时间层TEMP_CONV_0层824将循环通道的数量从5减少到3,如图7所示。第二时间层(层925)将循环通道的数量从3减少到1,如图7所示,并且针对每个像素将特征映射图的数量减少到四个输出,从而表示每个碱基检出中的置信度。The first temporal layer TEMP_CONV_0 layer 824 reduces the number of loop channels from 5 to 3, as shown in FIG. 7 . The second temporal layer (Layer 925) reduces the number of recurrent channels from 3 to 1, as shown in Figure 7, and reduces the number of feature maps to four outputs for each pixel, representing each base call confidence in .
时间层的输出被累加在输出补片中并且被递送到主机CPU以应用例如softmax函数930或其他函数以归一化碱基检出概率。The output of the temporal layers is accumulated in an output patch and delivered to the host CPU to apply, for example, a
图10示出了示出可针对碱基检出操作执行的10输入、六输出神经网络的另选具体实施。在该示例中,来自循环0至9的空间对准输入补片的区块数据被应用于空间层的隔离叠堆,诸如循环9的叠堆1001。将隔离叠堆的输出应用于具有输出1035(2)至1035(7)的时间叠堆1020的逆分层布置,从而提供受试者循环2至7的碱基检出分类数据。Figure 10 shows an alternative implementation showing a 10-input, six-output neural network that may be implemented for a base calling operation. In this example, tile data from the spatially aligned input patches of
图11示出了基于神经网络的碱基检出器(例如,图7)的专门化架构的一个具体实施,该基于神经网络的碱基检出器用于隔离对不同测序循环的数据的处理。首先描述使用特化的架构的动机。FIG. 11 illustrates one implementation of a specialized architecture of a neural network-based base caller (eg, FIG. 7 ) for isolating the processing of data for different sequencing cycles. First describe the motivation for using a specialized architecture.
基于神经网络的碱基检出器处理当前测序循环、一个或多个先前测序循环以及一个或多个后续测序循环的数据。附加测序循环的数据提供序列特异性上下文。基于神经网络的碱基检出器在训练期间学习序列特异性上下文,并且对该序列特异性上下文进行碱基检出。此外,前测序循环和后测序循环的数据为当前测序循环提供了预定相和定相信号的二阶贡献。The neural network-based base caller processes data for the current sequencing cycle, one or more previous sequencing cycles, and one or more subsequent sequencing cycles. Data from additional sequencing cycles provide sequence-specific context. A neural network-based base caller learns a sequence-specific context during training, and performs base calling on that sequence-specific context. In addition, data from pre- and post-sequencing cycles provide second-order contributions of pre- and phased signals to the current sequencing cycle.
在不同测序循环处和不同图像通道中捕获的图像相对于彼此未对准并且具有残差配准误差。考虑到这种未对准,特化的架构包括空间卷积层,该空间卷积层不混合测序循环之间的信息并且仅混合测序循环内的信息。Images captured at different sequencing cycles and in different image channels were misaligned relative to each other and had residual registration errors. To account for this misalignment, the specialized architecture includes spatial convolutional layers that do not mix information between sequencing cycles and only mix information within a sequencing cycle.
空间卷积层使用所谓的“隔离卷积”,该隔离卷积通过经由“专用非共享”卷积序列独立处理多个测序循环中的每个测序循环的数据来实现隔离。该隔离卷积对仅给定测序循环(即,循环内)的数据和所得特征映射图进行卷积,而不对任何其他测序循环的数据和所得特征映射图进行卷积。The spatial convolutional layers use so-called "isolated convolutions" that achieve isolation by independently processing the data for each of the multiple sequencing cycles via a "dedicated non-shared" convolutional sequence. This isolated convolution convolves only the data and resulting feature maps of a given sequencing cycle (ie, within a cycle), and not the data and resulting feature maps of any other sequencing cycles.
例如,考虑输入数据包括(i)待进行碱基检出的当前(时间t)测序循环的当前数据,(ii)先前(时间t-1)测序循环的先前数据,以及(iii)先前(时间t+1)测序循环的后续数据。然后,专门化架构发起三个单独的数据处理管道(或卷积管道),即当前数据处理管道、先前数据处理管道和后续数据处理管道。当前数据处理管道接收当前(时间t)测序循环的当前数据作为输入,并且通过多个空间卷积层独立地处理该当前数据,以产生所谓的“当前空间卷积表示”作为最终空间卷积层的输出。先前数据处理管道接收先前(时间t-1)测序循环的先前数据作为输入,并且通过多个空间卷积层独立地处理该先前数据,以产生所谓的“先前空间卷积表示”作为最终空间卷积层的输出。后续数据处理管道接收后续(时间t+1)测序循环的后续数据作为输入,并且通过多个空间卷积层独立地处理该后续数据以产生所谓的“后续空间卷积表示”作为最终空间卷积层的输出。For example, consider that the input data includes (i) the current data for the current (time t) sequencing cycle to be base called, (ii) the previous data for the previous (time t-1) sequencing cycle, and (iii) the previous (time t) sequencing cycle t+1) Subsequent data of the sequencing cycle. Then, the specialized architecture initiates three separate data processing pipelines (or convolutional pipelines), namely the current data processing pipeline, the previous data processing pipeline and the subsequent data processing pipeline. The current data processing pipeline receives as input the current data of the current (time t) sequencing cycle and independently processes this current data through multiple spatial convolutional layers to produce a so-called "current spatial convolutional representation" as the final spatial convolutional layer Output. The previous data processing pipeline receives as input the previous data of the previous (time t-1) sequencing cycle and independently processes this previous data through multiple spatial convolution layers to produce the so-called "previous spatial convolution representation" as the final spatial convolution The output of the stack. The subsequent data processing pipeline receives as input the subsequent data of subsequent (time t+1) sequencing cycles and independently processes this subsequent data through multiple spatial convolution layers to produce a so-called "subsequent spatial convolution representation" as the final spatial convolution layer output.
在一些具体实施中,当前管道、一个或多个先前管道和一个或多个后续处理管道并行执行。In some implementations, the current pipeline, one or more previous pipelines, and one or more subsequent processing pipelines execute in parallel.
在一些具体实施中,空间卷积层是专门化架构内的空间卷积网络(或子网络)的一部分。In some implementations, the spatial convolutional layer is part of a spatial convolutional network (or sub-network) within a specialized architecture.
基于神经网络的碱基检出器还包括混合测序循环之间(即,循环间)的信息的时间卷积层。时间卷积层从空间卷积网络接收其输入,并且对由相应数据处理管道的最终空间卷积层产生的空间卷积表示进行操作。The neural network-based base caller also includes a temporal convolutional layer that mixes information between sequencing cycles (ie, between cycles). A temporal convolutional layer receives its input from a spatial convolutional network and operates on the spatial convolutional representation produced by the final spatial convolutional layer of the corresponding data processing pipeline.
时间卷积层的循环间可操作性自由源于以下事实:未对准属性通过由空间卷积层序列执行的隔离卷积的叠堆或级联而从空间卷积表示清除,该未对准属性存在于作为输入馈送到空间卷积网络的图像数据中。The inter-loop operability freedom of temporal convolutional layers stems from the fact that misalignment properties are cleaned from spatial convolutional representations by stacking or concatenation of isolated convolutions performed by sequences of spatial convolutional layers, which misalignment Attributes exist in the image data fed as input to a spatial convolutional network.
时间卷积层使用所谓的“组合卷积”,该组合卷积在滑动窗口的基础上逐组地对后续输入中的输入通道进行卷积。在一个具体实施中,这些后续输入是由先前的空间卷积层或先前时间卷积层产生的后续输出。Temporal convolutional layers use so-called "combined convolutions" that convolve the input channels in subsequent inputs group by group on a sliding window basis. In one implementation, these subsequent inputs are subsequent outputs produced by previous spatial convolutional layers or previous temporal convolutional layers.
在一些具体实施中,时间卷积层是专门化架构内的时间卷积网络(或子网络)的一部分。时间卷积网络从空间卷积网络接收其输入。在一个具体实施中,时间卷积网络的第一时间卷积层逐组地组合测序循环之间的空间卷积表示。在另一个具体实施中,时间卷积网络的后续时间卷积层组合先前时间卷积层的后续输出。In some implementations, the temporal convolutional layer is part of a temporal convolutional network (or sub-network) within a specialized architecture. A temporal convolutional network receives its input from a spatial convolutional network. In one implementation, the first temporal convolutional layer of the temporal convolutional network combines the spatial convolutional representations between sequencing cycles group-by-group. In another implementation, subsequent temporal convolutional layers of the temporal convolutional network combine subsequent outputs of previous temporal convolutional layers.
最终时间卷积层的输出被馈送到产生输出的输出层。输出用于在一个或多个测序循环处对一个或多个簇进行碱基检出。The output of the final temporal convolutional layer is fed to the output layer which produces the output. The output is used to base call one or more clusters at one or more sequencing cycles.
在前向传播期间,专门化架构以两个阶段处理来自多个输入的信息。在第一阶段中,使用隔离卷积来防止输入之间的信息混合。在第二阶段中,使用组合卷积来混合输入之间的信息。将来自第二阶段的结果用于对该多个输入进行单个推断。During forward propagation, specialized architectures process information from multiple inputs in two stages. In the first stage, isolated convolutions are used to prevent information mixing between inputs. In the second stage, combinatorial convolutions are used to mix information between inputs. The results from the second stage are used to make a single inference over the multiple inputs.
这不同于其中卷积层同时处理批量中的多个输入并且对该批量中的每个输入进行对应推断的批处理模式技术。相比之下,专门化架构将该多个输入映射到该单个推断。该单个推断可包括多于一个预测,诸如四种碱基(A、C、T和G)中的每种碱基的分类得分。This differs from batch-mode techniques where convolutional layers process multiple inputs in a batch simultaneously and make corresponding inferences for each input in the batch. In contrast, a specialized architecture maps the multiple inputs to this single inference. This single inference may include more than one prediction, such as a classification score for each of the four bases (A, C, T, and G).
在一个具体实施中,这些输入具有时间顺序,使得每个输入在不同的时间步长处生成并且具有多个输入通道。例如,该多个输入可包括以下三个输入:在时间步长(t)处由当前测序循环生成的当前输入、在时间步长(t-1)处由先前测序循环生成的先前输入以及在时间步长(t+1)处由后续测序循环生成的后续输入。在另一个具体实施中,每个输入分别来源于由一个或多个先前卷积层产生的当前输出、先前输出和后续输出,并且包括k个特征映射图。In one implementation, the inputs are temporally ordered such that each input is generated at a different time step and has multiple input channels. For example, the plurality of inputs may include the following three inputs: the current input generated by the current sequencing cycle at time step (t), the previous input generated by the previous sequencing cycle at time step (t-1), and the input generated by the previous sequencing cycle at time step (t-1). Subsequent inputs generated by subsequent sequencing cycles at time step (t+1). In another implementation, each input is derived from the current output, previous output and subsequent output produced by one or more previous convolutional layers, respectively, and includes k feature maps.
在一个具体实施中,每个输入可包括以下五个输入通道:红色图像通道、红色距离通道、绿色图像通道、绿色距离通道和缩放通道。在另一个具体实施中,每个输入可包括由先前卷积层产生的k特征映射图,并且每个特征映射图被视为输入通道。在又一示例中,每个输入可具有仅一个通道、两个通道或另一不同数量的通道。美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道或四通道)的碱基检出。In one specific implementation, each input may include the following five input channels: a red image channel, a red distance channel, a green image channel, a green distance channel, and a scaling channel. In another implementation, each input may include k feature maps produced by previous convolutional layers, and each feature map is considered as an input channel. In yet another example, each input may have only one channel, two channels, or another different number of channels. The incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one, two, or four lanes.
图12示出了隔离层的一个具体实施,每个隔离层可包括卷积。隔离卷积通过将卷积滤波器同步地应用于每个输入一次来处理该多个输入。利用隔离卷积,卷积滤波器组合相同输入中的输入通道,并且不组合不同输入中的输入通道。在一个具体实施中,将相同的卷积滤波器同步地应用于每个输入。在另一个具体实施中,将不同的卷积滤波器同步地应用于每个输入。在一些具体实施中,每个空间卷积层包括一组k个卷积滤波器,其中每个卷积滤波器同步地应用于每个输入。Figure 12 shows one implementation of isolation layers, each of which may include convolutions. Isolated convolutions process this multiple inputs by synchronously applying convolution filters once to each input. With isolated convolution, a convolution filter combines input channels in the same input and does not combine input channels in different inputs. In one implementation, the same convolutional filter is applied to each input synchronously. In another implementation, different convolution filters are applied to each input synchronously. In some implementations, each spatial convolutional layer includes a set of k convolutional filters, where each convolutional filter is applied synchronously to each input.
图13A示出了组合层的一个具体实施,每个组合层可包括卷积。图13B示出了组合层的另一具体实施,每个组合层可包括卷积。组合卷积通过对不同输入的对应输入通道进行分组并将卷积滤波器应用于每个分组来混合不同输入之间的信息。对这些对应输入通道的分组和卷积滤波器的应用是在滑动窗口的基础上发生的。在该上下文中,窗口跨越两个或更多个后续输入通道,其表示例如两个后续测序循环的输出。由于该窗口是滑动窗口,因此大多数输入通道用于两个或更多个窗口中。Figure 13A shows one implementation of combined layers, each of which may include convolutions. Figure 13B shows another implementation of combined layers, each of which may include convolutions. Combined convolution mixes information between different inputs by grouping their corresponding input channels and applying a convolution filter to each grouping. The grouping of these corresponding input channels and the application of convolutional filters occurs on a sliding window basis. In this context, a window spans two or more subsequent input lanes, representing eg the output of two subsequent sequencing cycles. Since the window is a sliding window, most input channels are used in two or more windows.
在一些具体实施中,不同输入源于由先前空间卷积层或先前时间卷积层产生的输出序列。在该输出序列中,这些不同输入被布置为后续输出并且因此被后续时间卷积层视为后续输入。然后,在该后续时间卷积层中,这些组合卷积将卷积滤波器应用于这些后续输入中的对应输入通道组。In some implementations, the different inputs originate from output sequences produced by previous spatial convolutional layers or previous temporal convolutional layers. In this output sequence, these different inputs are arranged as subsequent outputs and are thus considered subsequent inputs by subsequent temporal convolutional layers. Then, in the subsequent temporal convolutional layers, the combined convolutions apply convolution filters to corresponding sets of input channels in these subsequent inputs.
在一个具体实施中,这些后续输入具有时间顺序,使得当前输入在时间步长(t)处由当前测序循环生成,先前输入在时间步长(t-1)处由先测序循环生成,并且后续输入在时间步长(t+1)处由后续测序循环生成。在另一个具体实施中,每个后续输入分别来源于由一个或多个先前卷积层产生的当前输出、先前输出和后续输出,并且包括k个特征映射图。In one implementation, these subsequent inputs have a temporal order such that the current input is generated by the current sequencing cycle at time step (t), the previous input is generated by the previous sequencing cycle at time step (t-1), and the subsequent Inputs are generated by subsequent sequencing cycles at time steps (t+1). In another specific implementation, each subsequent input is respectively derived from the current output, previous output and subsequent output produced by one or more previous convolutional layers, and includes k feature maps.
在一个具体实施中,每个输入可包括以下五个输入通道:红色图像通道、红色距离通道、绿色图像通道、绿色距离通道和缩放通道。在另一个具体实施中,每个输入可包括由先前卷积层产生的k特征映射图,并且每个特征映射图被视为输入通道。In one specific implementation, each input may include the following five input channels: a red image channel, a red distance channel, a green image channel, a green distance channel, and a scaling channel. In another implementation, each input may include k feature maps produced by previous convolutional layers, and each feature map is considered as an input channel.
卷积滤波器的深度B取决于后续输入的数量,这些后续输入的对应输入通道由卷积滤波器在滑动窗口的基础上逐组地进行卷积。换句话讲,深度B等于每个滑动窗口中的后续输入的数量和组大小。The depth B of the convolutional filter depends on the number of subsequent inputs whose corresponding input channels are convoluted by the convolutional filter group by group on a sliding window basis. In other words, the depth B is equal to the number and group size of subsequent inputs in each sliding window.
在图13A中,来自两个后续输入的对应输入通道在每个滑动窗口中组合,并且因此B=2。在图13B中,来自三个后续输入的对应输入通道在每个滑动窗口中组合,并且因此B=3。In FIG. 13A, corresponding input channels from two subsequent inputs are combined in each sliding window, and thus B=2. In Figure 13B, corresponding input channels from three subsequent inputs are combined in each sliding window, and thus B=3.
在一个具体实施中,滑动窗口共享相同的卷积滤波器。在另一个具体实施中,针对每个滑动窗口使用不同的卷积滤波器。在一些具体实施中,每个时间卷积层包括一组k个卷积滤波器,其中每个卷积滤波器在滑动窗口的基础上应用于后续输入。In one implementation, the sliding windows share the same convolutional filter. In another implementation, a different convolution filter is used for each sliding window. In some implementations, each temporal convolutional layer includes a set of k convolutional filters, where each convolutional filter is applied to subsequent inputs on a sliding window basis.
图4至图10的更多细节及其变型可见于2021年2月15日提交的名称为“HARDWAREEXECUTION AND ACCELERATION OF ARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的共同未决的美国非临时专利申请号17/176,147(代理人案卷号ILLM 1020-2/IP-1866-US),该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Further details of Figures 4 through 10 and variations thereof can be found in co-pending U.S. Nonprovisional Patent Application No. 17/176,147, filed February 15, 2021, entitled "HARDWARE EXECUTION AND ACCELERATION OF ARTIFICIAL INTELLIGENCE-BASED BASE CALLER" (Attorney Docket No. ILLM 1020-2/IP-1866-US), which patent application is incorporated herein by reference as if fully set forth herein.
图14示出了用于碱基检出的示例性基于区块位置的权重选择方案。例如,图14中示出的是包括多个槽道1450的示例性流通池1400,每个槽道包括对应的多个区块(例如,如相对于图1和图2所讨论的)。流通池1400的描绘在性质上是象征性的,并且流通池1400象征性地描绘了其内的各种槽道和区块,而未示出流通池1400的各种其他部件。图14示出了流通池1400的顶视图(例如,而未示出图1的流罩136)。FIG. 14 illustrates an exemplary tile position-based weight selection scheme for base calling. For example, shown in FIG. 14 is an
在一个实施方案中并且同样如相对于图2讨论的,流通池1400被划分或分区为多个槽道,诸如槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,即,P个槽道,其中P是正整数。同样如相对于图2所讨论的,在一个实施方案中,各个槽道1450被进一步分区为被称为区块的非重叠区域。在示例中,每个槽道1450包括一个或多个区块列。例如,在图14中,每个槽道1450包括两个对应的区块列,其中图14中单个区块由对应的矩形框示出。每个槽道内的每个区块列内的区块数量是具体实施特定的。每个区块包括对应的多个簇。在测序过程中,对区块上的簇及其周围背景进行成像。例如,图2和图3示出了区块内的簇的示例。In one embodiment and also as discussed with respect to FIG. 2 ,
在一个实施方案中,例如,基于区块的位置,将流通池1400的区块分类为各种类型。在图14的示例性具体实施中,流通池1400的区块中的各个区块被分类为边缘块1408、近边缘区块1410或非边缘(或中央)区块1412。In one embodiment, the sections of the
例如,在流通池1400的垂直边缘(例如,沿Y轴)和/或水平边缘(例如,沿X轴)上的区块被分类为边缘区块1408,如图14所示。因此,边缘区块1408与流通池1400的对应边缘紧邻。For example, blocks on the vertical edges (eg, along the Y-axis) and/or horizontal edges (eg, along the X-axis) of the
靠近(例如,紧邻)边缘区块的区块被分类为近边缘区块1410。例如,近边缘区块1410是与流通池1400的边缘隔开的区块。因此,边缘区块1408将对应的近边缘区块1410与流通池1400的对应边缘隔开。Blocks that are close to (eg, immediately adjacent to) edge blocks are classified as near-edge blocks 1410 . For example, near-
不是边缘区块或近边缘区块的区块是非边缘区块1412,也被称为中央区块1412。因此,中央区块1412例如与边缘区块1408或近边缘区块1410相比相对更靠近流通池1400的中心。例如,中央区块1414通过边缘区块1408和近边缘区块1410与流通池1400的边缘隔开。Blocks that are not edge blocks or near-edge blocks are non-edge blocks 1412 , also referred to as central blocks 1412 . Thus, central block 1412 is relatively closer to the center of
尽管在图14中流通池1400的区块分类为三类(诸如边缘、近边缘和中央或非边缘),但是此类分类仅仅是示例,并且也可使用不同的基于区块位置的分类。例如,在另一具体实施中,区块可以被分类为(i)边缘或近边缘区块,和(ii)中央区块(例如,边缘区块和近边缘区块类别可合并成单个类别),从而得到两个区块类别。Although in FIG. 14 the blocks of the
如先前所讨论的,图7和图10是可用于碱基检出的示例性多循环神经网络模型,并且图9是可用于如图7和图9的系统中的神经网络的叠堆的简化表示。用于碱基检出的神经网络模型内的各种功能使用偏置和权重。例如,在卷积操作期间,包括一个或多个内核的过滤器(例如,如图12所示)具有对应的多个权重,该多个权重在神经网络模型的训练阶段期间进行训练。例如,使用从一个或多个区块生成的训练数据来调谐权重,并且这些权重用于例如图14的流通池中的碱基检出。As previously discussed, Figures 7 and 10 are exemplary multi-cycle neural network models that can be used for base calling, and Figure 9 is a simplification of a stack of neural networks that can be used in systems such as Figures 7 and 9 express. Bias and weights are used by various functions within a neural network model for base calling. For example, during a convolution operation, a filter comprising one or more kernels (eg, as shown in FIG. 12 ) has a corresponding plurality of weights that are trained during the training phase of the neural network model. For example, the weights are tuned using training data generated from one or more blocks, and these weights are used, for example, for base calling in the flow cell of FIG. 14 .
针对流通池1400的各个区块中的簇执行碱基检出循环。在示例中,与区块的碱基检出操作相关的参数可基于区块的相对位置。例如,相对于图1所讨论的激发光101被导向流通池的区块,并且例如,基于各个区块的位置和/或发射激发光101的一个或多个光源的位置,不同的区块可接收不同量的激发光101。例如,如果发射激发光101的光源垂直位于流通池1400上方,则中央区块1412可接收与边缘区块1408和/或近边缘区块1410不同量的光。A base calling cycle is performed for the clusters in each block of the
在另一示例中,在流通池1400周围的周边或外部光(例如,来自生物传感器100外部的环境光)可影响由流通池1400的各个区块接收的激发光101的量和/或特征。仅作为示例,边缘区块1408可接收激发光101以及来自流通池1400外部的一定量的周边光,而中央区块1412可主要接收激发光101。In another example, ambient or external light around the flowcell 1400 (eg, ambient light from outside the biosensor 100 ) can affect the amount and/or characteristics of the
在又一示例中,包括在流通池1400中的各个传感器(或像素或光电二极管)(例如,图1中所示的传感器106、108、110、112和114)可基于对应传感器的位置来感测光,这些位置基于对应区块的位置。例如,与周边光对与中央区块1412相关联的一个或多个其他传感器的感测操作的影响相比,由与边缘区块1408相关联的一个或多个传感器执行的感测操作可相对更多地受到周边光(以及激发光101)的影响。In yet another example, individual sensors (or pixels or photodiodes) included in flow cell 1400 (eg,
在另一示例中,反应物(例如,其包括可用于在碱基检出期间获得期望反应的任何物质,诸如试剂、酶、样品、其他生物分子和缓冲溶液)流向各种区块的流动也可能受到区块位置的影响。例如,靠近反应物的源的区块可比离源更远的区块接受到更大量的反应物。In another example, the flow of reactants (e.g., including anything that can be used to obtain a desired response during base calling, such as reagents, enzymes, samples, other biomolecules, and buffer solutions) to the various blocks is also May be affected by block location. For example, a block closer to a source of a reactant may receive a larger amount of reactant than a block further from the source.
因此,换句话说,对于不同类别的区块,与碱基检出相关联的参数可能略有不同。因此,在一个实施方案中,不同的权重集用于不同类别的区块,以补偿上述讨论的碱基检出过程的示例性区块位置依赖性。So, in other words, the parameters associated with base calling may be slightly different for different classes of blocks. Thus, in one embodiment, different sets of weights are used for different classes of blocks to compensate for the exemplary block position dependence of the base calling process discussed above.
例如,在图14的具体实施中,使用三个候选权重集:(i)用于边缘区块的边缘权重集WeT 1418,(ii)用于近边缘区块的近边缘权重集WnT 1420,以及(iii)用于中央(或非边缘)边缘区块的中央权重集WcT 1422。For example, in the implementation of FIG. 14, three candidate weight sets are used: (i) edge weight set
在示例中,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由边缘区块1408生成的图像数据上进行训练(例如,不使用从近边缘或中央区块生成的训练数据)。所得权重被包括在边缘权重集WeT 1418中。In an example, when training a neural network model for base calling, such as those discussed with respect to FIGS. train on (e.g., do not use training data generated from near-edge or central patches). The resulting weights are included in edge weight set
随后,在仅由近边缘区块1410生成的图像数据上训练神经网络模型(例如,不使用从边缘或中央区块生成的训练数据),并且所得权重被包括在近边缘权重集WnT 1420中。最后,在仅由中央区块1412生成的图像数据上训练神经网络模型(例如,不使用从边缘或近边缘区块生成的训练数据),并且所得的权重被包括在边缘权重集WcT 1422中。Subsequently, the neural network model is trained on image data generated only by near-edge blocks 1410 (eg, without using training data generated from edge or central blocks), and the resulting weights are included in near-edge weight set
因此,每个权重集包括用于配置神经网络模型的对应的多个权重,其中所配置的神经网络用于处理来自对应类别的区块的传感器数据。例如,如相对于图7、图9、图10和图11所讨论的,神经网络模型的拓扑结构包括(i)不组合传感器数据和连续感测循环之间的所得特征映射图的一个或多个空间层,和(ii)组合连续感测循环之间的所得特征映射图的时间层。因此,每个权重集包括空间层的对应空间权重和时间层的对应时间权重。例如,用于边缘区块的边缘权重集WeT 1418包括空间层的对应第一一个或多个空间权重和时间层的对应第一一个或多个时间权重。类似地,用于中央区块的中央权重集WcT 1422包括空间层的对应第二一个或多个空间权重和时间层的对应第二一个或多个时间权重。Accordingly, each set of weights includes a corresponding plurality of weights for configuring a neural network model for processing sensor data from a corresponding class of blocks. For example, as discussed with respect to FIGS. 7, 9, 10, and 11, the topology of the neural network model includes (i) one or more features that do not combine sensor data and resulting feature maps between successive sensing cycles. spatial layers, and (ii) a temporal layer combining the resulting feature maps between successive sensing cycles. Thus, each weight set includes corresponding spatial weights for the spatial layer and corresponding temporal weights for the temporal layer. For example, the set of
在执行碱基检出循环时的推理阶段期间,当要检出边缘区块的簇内的碱基时,用边缘权重集WeT 1418配置神经网络模型,并且来自边缘区块的传感器数据用于碱基检出操作。类似地,当要检出近边缘区块的簇内的碱基时,用近边缘权重集WnT 1420配置神经网络模型,并且来自近边缘区块的传感器数据用于碱基检出操作。最后,当要检出中央区块的簇内的碱基时,用中央权重集WeT 1422配置神经网络模型,并且来自中央区块的传感器数据用于碱基检出操作。During the inference phase when performing a base calling loop, when a base within a cluster of an edge block is to be called, the neural network model is configured with the edge weight set
图15示出了用于碱基检出的另一示例性基于区块位置的权重选择方案。例如,图15中示出的是流通池1400,其包括该多个槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,其中每个槽道包括对应的多个区块。FIG. 15 illustrates another exemplary tile location-based weight selection scheme for base calling. For example, shown in FIG. 15 is a
在图15的示例中,基于区块所属的对应槽道的位置来对流通池1400的每个区块进行分类。例如,流通池1400的顶部一个或多个槽道(诸如槽道1450P和1450(P-1))被分类为顶部周边槽道,流通池1400的底部一个或多个槽道(如槽道1450a和1450b)被分类为底部周边槽道,并且流通池1400的中央一个或多个槽道(如槽道1450c和1450(P-2))被分类为中央槽道。注意,属于每个类别的槽道数量仅仅是示例,并且变型可以是可能的。例如,代替两个槽道,每个周边槽道类别可包括一个对应的槽道或三个对应的槽道等。In the example of FIG. 15, each segment of the
顶部周边槽道内的区块被分类为顶部周边槽道区块1508a,底部周边槽道内的区块被分类为底部周边槽道区块1508b,并且中央槽道内的区块被分类为中央槽道区块1510。The blocks within the top peripheral channel are classified as top
出于相对于图14所讨论的原因,在一个实施方案中,可以向图15的流通池中的各种类别的槽道内的区块分配不同的权重集。例如,在图15的具体实施中,使用两个候选权重集:(i)用于周边槽道区块1508a、1508b(例如,属于顶部和底部周边槽道的区块)的周边权重集WpL 1504,和(ii)用于中央槽道区块1510的中央权重集WcL 1506。For reasons discussed with respect to FIG. 14 , in one embodiment, different sets of weights may be assigned to the blocks within the various classes of channels in the flow cell of FIG. 15 . For example, in the implementation of FIG. 15, two candidate weight sets are used: (i) Perimeter weight set
例如,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由周边槽道区块1508a、1508b生成的图像数据上训练神经网络模型(例如,不使用从中央槽道区块1510生成的训练数据)。所得权重被包括在周边权重集WpL1504中。For example, when training a neural network model for base calling, such as those discussed with respect to FIGS. The neural network model is trained on the image data (eg, without using the training data generated from the central channel block 1510). The resulting weights are included in the perimeter weight set WpL1504.
随后,在仅由中央槽道区块1510生成的图像数据上训练神经网络模型(例如,不使用从周边槽道区块1508a、1508b生成的训练数据),并且所得权重被包括在中央权重集WcL1506中。Subsequently, the neural network model is trained on the image data generated by only the central channel block 1510 (e.g., without using training data generated from the
在执行碱基检出循环时的推理阶段期间,当要检出周边槽道区块1508的簇内的碱基时,用来自周边权重集WpL 1504的权重配置神经网络模型,并且来自周边槽道区块的传感器数据1508用于碱基检出操作。类似地,当要检出中央槽道区块1510的簇内的碱基时,用来自中央权重集WcL 1506的权重配置神经网络模型,并且来自中央槽道区块1510的传感器数据用于碱基检出操作。During the inference phase when performing a base calling loop, when a base within a cluster of the perimeter lane block 1508 is to be called, the neural network model is configured with weights from the perimeter weight set
图16示出了用于碱基检出的又一示例性基于区块位置的权重选择方案。例如,图16中示出的是流通池1400,其包括该多个槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,其中每个槽道包括对应的多个区块。FIG. 16 illustrates yet another exemplary block location-based weight selection scheme for base calling. For example, shown in FIG. 16 is a
在图16的示例中,基于虚线1603,将流通池1400划分为多个段或区段(即,虚线1603用于分类,并且实际上不存在于流通池中)。例如,流通池1400被划分为顶部左侧区段1610TL(权重集WTL)、顶部中央区段1610TC(权重集WTC)、顶部右侧区段1610TR(权重集WTR)、中间左侧区段1610ML(权重集WML)、中央区段1610C(权重集WC)、中间右侧区段1610MR(权重集WMR)、底部左侧区段1610BL(权重集WML)、底部中央区段1610BC(权重集WBC)和底部左侧区段1610BL(权重集WBL)。基于区块所属的区段来对流通池1400的每个区块进行分类。In the example of FIG. 16, the
出于类似于相对于图14所讨论的原因,在一个实施方案中,向图16的各种区段内的区块分配对应的权重集。例如,在图16的具体实施中,向顶部左侧区段1610TL中的区块分配顶部左侧权重集WTL,向顶部中央区段1610TC中的区块分配顶部中央权重集WTC,向顶部右侧区段1610TR中的区块分配顶部右侧权重集WTR,向中间左侧区段1610ML中的区块分配中间左侧权重集WML,向中央区段1610C中的区块分配中央权重集WC,向中间右侧区段1610MR中的区块分配中间右侧权重集WMR,向底部左侧区段1610BL中的区块分配底部左侧权重集WML,向底部中央区段1610BC中的区块分配底部中央权重集WBC,向底部左侧区段1610BL中的区块分配底部左侧权重集WBL。For reasons similar to those discussed with respect to FIG. 14 , in one embodiment, blocks within the various sections of FIG. 16 are assigned corresponding sets of weights. For example, in the implementation of FIG. 16, tiles in top left section 1610TL are assigned top left weight set WTL, tiles in top central section 1610TC are assigned top central weight set WTC, and top right Blocks in section 1610TR are assigned top right weight set WTR, blocks in middle left section 1610ML are assigned middle left weight set WML, blocks in
例如,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由顶部左侧区段1610TL上的区块生成的传感器数据上训练神经网络模型(例如,不使用来自其他类别的区块的传感器数据),并且所得权重被包括在顶部左侧权重集WTL中。对于各种其他区段的区块重复该过程,以生成各种候选权重集,诸如顶部中央权重集WTC、顶部右侧权重集WTR、中间左侧权重集WML、中央权重集WC、中间右侧权重集WMR、底部左侧权重集WML、底部中央权重集WBC和底部右侧区段权重集WBL。For example, when training a neural network model for base calling, such as those discussed with respect to FIGS. The neural network model is trained on the generated sensor data (eg, without using sensor data from other classes of blocks), and the resulting weights are included in the top left weight set WTL. This process is repeated for blocks of various other sectors to generate various candidate weight sets, such as top central weight set WTC, top right weight set WTR, middle left weight set WML, central weight set WC, middle right weight set Weight set WMR, bottom left weight set WML, bottom center weight set WBC, and bottom right section weight set WBL.
在执行碱基检出循环时的推理阶段期间,当要检出在顶部左侧区段1610TL内的区块的簇内的碱基时,用对应的顶部左侧权重集WTL内的权重配置神经网络模型,并且来自顶部左侧区段1610TL的区块的传感器数据用于碱基检出操作。对于各种其他区段的区块,类似地重复该过程。During the inference phase when performing a base calling loop, when a base is to be called within a cluster of blocks within the top left segment 1610TL, the neuron is configured with weights within the corresponding top left weight set WTL The network model and sensor data from the blocks of the top left section 1610TL are used for base calling operations. This process is similarly repeated for blocks of various other sectors.
在图16中,流通池1400被分区为9个不同的区段。然而,流通池1400可以被分区为不同数量的区段,诸如包括顶部左侧象限、顶部右侧象限、底部左侧象限和底部右侧象限的四个区段。In Figure 16, the
图17A示出了衰落的示例,其中信号强度随着作为碱基检出操作的测序运行的循环数而降低。衰落是荧光信号强度随着循环数的指数衰减。随着测序运行的进行,分析物链被过度洗涤,暴露于产生反应性物质的激光辐射,并且经受恶劣环境条件。所有这些导致每个分析物中片段的逐渐丢失,从而降低了其荧光信号强度。衰落也称为变暗或信号衰减。图17A示出了衰落1700的一个示例。在图17A中,具有AC微卫星的分析物片段的强度值表现出指数衰减。Figure 17A shows an example of fading in which signal strength decreases with cycle number of a sequencing run as a base calling operation. Fade is the exponential decay of fluorescence signal intensity with cycle number. As the sequencing run progresses, the analyte strands are excessively washed, exposed to laser radiation that produces reactive species, and subjected to harsh environmental conditions. All of this leads to a progressive loss of fragments within each analyte, reducing its fluorescence signal intensity. Fading is also known as dimming or signal decay. An example of fading 1700 is shown in FIG. 17A . In Figure 17A, the intensity values of analyte fragments with AC microsatellites exhibit an exponential decay.
图17B概念性地示出了随着测序循环进展而降低的信噪比。例如,随着测序进行,准确的碱基检出变得越来越困难,因为信号强度降低且噪声增加,从而导致信噪比显著降低。在物理上,观察到与较早合成步骤相比,稍后合成步骤在相对于传感器的不同位置上附着标签。当传感器位于正被合成的序列下方时,由于与较早步骤相比,在稍后测序步骤中,标签附着到距传感器更远的链上,从而导致信号衰减。这导致随着测序循环进展,信号衰减。在一些设计中,在传感器位于保持簇的基板上方的情况下,随着测序进行,信号可增加而不是衰减。Figure 17B conceptually illustrates the decreasing signal-to-noise ratio as the sequencing cycle progresses. For example, accurate base calling becomes increasingly difficult as sequencing progresses because signal strength decreases and noise increases, resulting in a significantly lower signal-to-noise ratio. Physically, it was observed that later synthesis steps attach labels at different positions relative to the sensor compared to earlier synthesis steps. When the sensor is positioned below the sequence being synthesized, the signal is attenuated due to tags being attached to strands farther from the sensor in later sequencing steps than in earlier steps. This results in signal attenuation as the sequencing cycle progresses. In some designs, where the sensor is located above the substrate holding the clusters, the signal may increase rather than decay as sequencing progresses.
在研究的流通池设计中,当信号衰减时,噪声变大。在物理上,随着测序进行,定相和预定相增加噪声。定相是指测序中标签未能沿序列前进的步骤。预定相是指标签在测序循环期间向前跳两个位置而不是一个位置的测序步骤。定相和预定相相对不频繁,在大约500个至1000个循环中发生一次。与预定相相比,定相略微更频繁。定相和预定相影响产生强度数据的簇中的各个链,因此随着测序进行,来自簇的强度噪声分布累积成二项、三项、四项等展开式。In the flow cell designs studied, as the signal decays, the noise becomes louder. Physically, phasing and prephasing add noise as sequencing proceeds. Phasing refers to the step in sequencing where the tags fail to progress along the sequence. Prephasing refers to the sequencing step in which the index jumps forward two positions instead of one during the sequencing cycle. Phasing and prephasing are relatively infrequent, occurring once in about 500 to 1000 cycles. Phasing is slightly more frequent than pre-phasing. Phasing and prephasing affect the individual strands in the cluster from which the intensity data is generated, so that as sequencing progresses, the intensity noise distribution from the clusters accumulates into binomial, trinomial, quaternary, etc. expansions.
衰落、信号衰减和信噪比降低以及图17A和图17B的更多细节可见于2020年5月14日提交的名称为“Systems and Devices for Characterization and PerformanceAnalysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US)中,该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Further details of fading, signal attenuation, and signal-to-noise ratio reduction, and Figures 17A and 17B can be found in a U.S. nonprovisional patent entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 Application No. 16/874,599 (Attorney Docket No. ILLM 1011-4/IP-1750-US), which is incorporated herein by reference as if fully set forth herein.
因此,在碱基检出期间,碱基检出的可靠性或质量(例如,检出碱基正确的概率)可基于针对其当前碱基正被检出的碱基检出循环数。因此,除了或代替根据区块的位置(例如,如相对于图14、图15、图16所讨论的),权重集还可基于针对其碱基检出操作正被执行的当前循环数。图18示出了用于碱基检出的示例性基于碱基检出循环数的权重选择方案。Thus, during base calling, the reliability or quality of a base call (eg, the probability that a base is called correctly) may be based on the number of base calling cycles for which the current base is being called. Thus, in addition to or instead of depending on the location of the tile (eg, as discussed with respect to FIGS. 14, 15, 16), the set of weights may also be based on the current cycle number for which the base calling operation is being performed. Figure 18 illustrates an exemplary base calling cycle number based weight selection scheme for base calling.
例如,图18指向示例性区块M的碱基检出运行。假设存在N个碱基检出循环,在此期间,将识别示例性区块M中的各种簇中的链。如所讨论的,由于相对于图17A和图17B和/或各种其他因素所讨论的因素,由生物传感器(例如,图1的传感器106、108、110、112和114)检测到的信号强度随着碱基检出循环数而变化(例如,衰减)。例如,假设N个碱基检出感测循环被划分为三个循环子系列,诸如(a)初始感测循环1至N1、(b)中间感测循环(N1+1)至N2和(c)最终感测循环(N2+1)至N,如图18所示,其中N>N2>N1,并且N、N1、N2是正整数。因此,N个感测循环被划分为三个循环子系列,尽管N个感测循环也可以被划分为不同数量(诸如2个、4个或更大数量)的循环子系列。For example, FIG. 18 points to an exemplary block M base calling run. Assume that there are N base calling cycles during which strands in various clusters in exemplary block M will be identified. As discussed, due to the factors discussed with respect to FIGS. 17A and 17B and/or various other factors, the signal strengths detected by the biosensors (e.g.,
需注意,上述三个循环子系列中的每个循环子系列中的感测循环数量可以相等或可不相等,并且是具体实施特定的。仅作为示例并且不限制本公开的范围,如果N为100,则100个循环可以被划分为包括30个初始循环、30个中间循环和40个最终循环的子系列。也就是说,在该简单示例中,N1=30并且N2=60。It should be noted that the number of sensing cycles in each of the above three cycle sub-series may or may not be equal and is implementation specific. As an example only and without limiting the scope of the present disclosure, if N is 100, the 100 cycles may be divided into sub-series comprising 30 initial cycles, 30 intermediate cycles, and 40 final cycles. That is, N1=30 and N2=60 in this simple example.
如相对于图17A和17B所讨论的,例如,由碱基检出器在例如循环数N1中从生物传感器接收到的信号强度的平均水平可以不同于由碱基检出器在循环数N中从生物传感器接收到的信号强度的平均水平。因此,针对例如循环数N1训练的神经网络模型可能不提供针对循环数N的令人满意的结果。As discussed with respect to FIGS. 17A and 17B , for example, the average level of signal strength received by the base caller from the biosensor in, for example, cycle number N1 may be different than that received by the base caller in cycle number N1. The average level of signal strength received from the biosensor. Therefore, a neural network model trained for, for example, the number of cycles N1 may not provide satisfactory results for the number of cycles N.
因此,用于碱基检出的神经网络模型(如相对于图7、图9和图10所讨论的那些神经网络模型)可以针对特定循环子系列进行训练。例如,最初在仅在感测循环1至N1期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第一循环子系列权重集W(1-N1)1810a中。随后,在仅在感测循环(N1+1)至N2期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第二循环子系列权重集W(N1-N2)1810b中。最后,在仅在感测循环(N2+1)至N期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第三循环子系列权重集W(N2-N)1810c中。需注意,例如,在第一循环子系列权重集W(1-N1)1810a中,短语(1-N1)是循环索引,其意味着该权重集与感测循环1至N1相关。可以注意到,在图18的示例中,使用来自一个或多个通道的传感器数据(诸如一个通道、两个通道、三个通道、四个通道或更大数量的通道)来执行碱基检出操作,并且对于给定循环,权重可适用于来自所有此类通道的传感器数据。Accordingly, neural network models for base calling, such as those discussed with respect to Figures 7, 9, and 10, can be trained for a particular subset of cycles. For example, a neural network model is initially trained on sensor data generated only during
在推理阶段期间,当要针对循环1至N1检出碱基时,用第一循环子系列权重集W(1-N1)1810a配置神经网络模型。类似地,当要针对循环(N1+1)至N2检出碱基时,用第二循环子系列权重集W(N1-N2)1810b配置神经网络模型。最后,当要针对循环N2至N3检出碱基时,用第三循环子系列权重集W(N2-N3)1810c配置神经网络模型。During the inference phase, when a base is to be called for
图14、图15、图16示出了基于区块的位置的权重集选择的各种示例。因此,这些附图示出了基于碱基检出操作经过生物传感器上的区块的位置的空间进展的权重集选择的各种示例。另一方面,图18示出了基于碱基检出操作经过一系列感测循环1至N中的感测循环子系列的时间进展的权重集选择的示例。图19将基于空间区块位置的权重集选择的概念(例如,如相对于图14至图16所讨论的)和基于碱基检出循环的时间进展的权重集选择的概念(例如,如相对于图18所讨论的)结合。因此,图19示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的示例性权重选择方案。Figures 14, 15, 16 show various examples of weight set selection based on the location of the tile. Accordingly, these figures illustrate various examples of weight set selection based on the spatial progression of the base calling operation through the location of the tiles on the biosensor. FIG. 18 , on the other hand, shows an example of weight set selection based on the time progression of a base calling operation through a sub-series of sensing cycles in a series of
例如,图19示出了第一区块M1和第二区块M2。假设区块M1是第一类别的区块,并且区块M2是第二类别的区块。仅作为示例,区块M1可以是图14的边缘区块1408,并且区块M2可以是图14的中央区块1412。因此,用于对区块M1中的簇内的链进行碱基检出的权重集将与用于对区块M2中的簇内的链进行碱基检出的权重集不同,例如,如相对于图14、15和16所讨论的。For example, FIG. 19 shows a first block M1 and a second block M2. Assume that the block M1 is a block of the first type, and the block M2 is a block of the second type. By way of example only, tile M1 may be edge tile 1408 of FIG. 14 , and tile M2 may be central tile 1412 of FIG. 14 . Thus, the set of weights used to base call strands within clusters in block M1 will be different from the set of weights used to base call strands within clusters in block M2, e.g., as opposed to discussed in Figures 14, 15 and 16.
类似于图18,在图19中假设存在N个碱基检出循环,在此期间,将识别区块M1和M2中的各种簇中的链。此外,类似于图18,在图19中假设N个碱基检出感测循环被划分为三个循环子系列,诸如(a)初始感测循环1至N1、(b)中间感测循环(N1+1)至N2和(c)最终感测循环(N2+1)至N,其中N>N2>N1,并且N、N1、N2是正整数,尽管在其他示例中N个感测循环也可以被划分为不同数量(诸如2个、4个或更大数量)的循环子系列。Similar to FIG. 18 , in FIG. 19 it is assumed that there are N base calling cycles during which strands in the various clusters in blocks M1 and M2 will be identified. Furthermore, similar to FIG. 18 , it is assumed in FIG. 19 that the N base calling sensing cycles are divided into three cycle subseries such as (a)
在示例中,用于碱基检出的神经网络模型(如相对于图7、图9和图10所讨论的那些神经网络模型)可以针对特定循环子系列并且针对特定区块进行训练。例如,最初在仅在感测循环1至N1期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(1-N1))”。需注意,在该权重集中短语“eT”是区块类别或区块位置索引,其意味着该权重集专用于边缘区块1408。此外,在该权重集中短语“(1-N1)”是循环索引,其意味着该权重集专用于感测循环1至N1。In an example, a neural network model for base calling, such as those discussed with respect to FIGS. 7, 9, and 10, can be trained for a specific subset of cycles and for a specific block. For example, a neural network model is initially trained on sensor data generated only during
类似地,然后在仅在感测循环(N1+1)至N2期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(N1-N2))”。这里,短语“eT”是区块位置或区块类别索引,其意味着该权重集专用于边缘区块1408。类似地,在该权重集中短语“(N1-N2)”是循环索引,其意味着该权重集专用于感测循环(N1+1)至N2。Similarly, the neural network model is then trained on sensor data generated only during sensing cycles (N1+1) to N2 and only for edge tiles 1408, and the resulting weight set is denoted "weight set (eT, (N1 -N2))". Here, the phrase "eT" is a block position or block class index, which means that this set of weights is specific to edge blocks 1408 . Similarly, the phrase "(N1-N2)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N1+1) to N2.
类似地,最初在仅在感测循环(N2+1)至N期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(N2-N))”。这里,短语“eT”是区块位置索引,其意味着该权重集专用于边缘区块1408。类似地,在该权重集中短语“(N2-N)”是循环索引,其意味着该权重集专用于感测循环(N2+1)至N。Similarly, the neural network model is initially trained on sensor data generated only during sensing cycles (N2+1) to N and only for edge tiles 1408, and the resulting weight set is denoted "weight set (eT, (N2 -N))". Here, the phrase "eT" is the tile position index, which means that this set of weights is specific to edge tiles 1408 . Similarly, the phrase "(N2-N)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N2+1) to N.
此外,在仅在感测循环1至N1期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(1-N1))”。需注意,在该权重集中短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。此外,在该权重集中短语“(1-N1)”是循环索引,其意味着该权重集专用于感测循环1至N1。Furthermore, the neural network model is trained on sensor data generated only during
类似地,然后在仅在感测循环(N1+1)至N2期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(N1-N2))”。这里,短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。类似地,在该权重集中短语“(N1-N2)”是循环索引,其意味着该权重集专用于感测循环(N1+1)至N2。Similarly, the neural network model is then trained on sensor data generated only during sensing cycles (N1+1) to N2 and only for the central block 1412, and the resulting weight set is denoted "weight set (cT,(N1 -N2))". Here, the phrase "cT" is the tile position index, which means that this set of weights is dedicated to the central tile 1412 . Similarly, the phrase "(N1-N2)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N1+1) to N2.
类似地,最初在仅在感测循环(N2+1)至N期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(N2-N))”。这里,短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。类似地,在该权重集中短语“(N2-N)”是循环索引,其意味着该权重集专用于感测循环(N2+1)至N。Similarly, the neural network model is initially trained on sensor data generated only during sensing cycles (N2+1) to N and only for the central block 1412, and the resulting weight set is denoted "weight set (cT, (N2 -N))". Here, the phrase "cT" is the tile position index, which means that this set of weights is dedicated to the central tile 1412 . Similarly, the phrase "(N2-N)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N2+1) to N.
在推理阶段期间,当要针对循环1至N1并且针对区块M1(例如,其是图19的示例中的边缘区块1408)检出碱基时,用权重集(eT,(1-N1))配置神经网络模型。类似地,当要针对循环(N1+1)至N2并且针对区块M1检出碱基时,用权重集(eT,(N1-N2))配置神经网络模型。另外,当要针对循环(N2+1)至N并且针对区块M1检出碱基时,用权重集(eT,(N2-N))配置神经网络模型。During the inference phase, when a base is to be called for
类似地,当要针对循环1至N1并且针对区块M2(例如,其是图19的示例中的中央区块1412)检出碱基时,用权重集(cT,(1-N1))配置神经网络模型。类似地,当要针对循环(N1+1)至N2并且针对区块M2检出碱基时,用权重集(cT,(N1-N2))配置神经网络模型。另外,当要针对循环(N2+1)至N并且针对区块M2检出碱基时,用权重集(cT,(N2-N))配置神经网络模型。Similarly, when a base is to be called for
图20示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的另一示例性权重选择方案。图20中所示的区块分类类似于图14中所示的区块分类。例如,参见图14和图20,边缘区块1408被示出为在其内具有对角线,近边缘区块1410被示出为在其内具有交叉影线,并且中央区块1412被示出为在其内具有点或灰色阴影。FIG. 20 shows another exemplary weight selection scheme based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks. The block classification shown in FIG. 20 is similar to the block classification shown in FIG. 14 . For example, referring to FIGS. 14 and 20 , edge block 1408 is shown with diagonal lines therein, near-
图20中还示出了三个框1908、1910和1912。参见框1908,示出了专用于边缘区块1408和各种感测循环子系列的权重集。例如,权重集(eT,(1-N1))专用于边缘区块1408和感测循环1至N1。权重集(eT,(N1-N2))专用于边缘区块1408和感测循环(N1+1)至N2。权重集(eT,(N2-N))专用于边缘区块1408和感测循环(N2+1)至N。Also shown in FIG. 20 are three
类似地,参见框1910,示出了专用于近边缘区块1410和各种感测循环子系列的权重集。例如,权重集(nT,(1-N1))专用于近边缘区块1410和感测循环1至N1。权重集(nT,(N1-N2))专用于近边缘区块1410和感测循环(N1+1)至N2。权重集(nT,(N2-N))专用于近边缘区块1410和感测循环(N2+1)至N。Similarly, referring to block 1910, a set of weights specific to the near-
类似地,参见框1912,示出了专用于中央区块1412和各种感测循环子系列的权重集。例如,权重集(cT,(1-N1))专用于中央区块1412和感测循环1至N1。权重集(cT,(N1-N2))专用于中央区块1412和感测循环(N1+1)至N2。权重集(cT,(N2-N))专用于中央区块1412和感测循环(N2+1)至N。Similarly, referring to block 1912, a set of weights specific to the central block 1412 and the various sub-series of sensing cycles is shown. For example, the weight set (cT,(1-N1)) is dedicated to the central block 1412 and
图21A示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的另一示例性权重选择方案。图21A中所示的区块分类类似于图15中所示的区块分类。例如,参见图15和图21,周边槽道区块1508(其是图15的顶部周边槽道区块1508a和底部周边槽道区块1508b的组合)被示出为在其内具有对角线,并且中央槽道区块1510被示出为具有虚线或灰色阴影。FIG. 21A shows another exemplary weight selection scheme based on (i) temporal progression of base calling cycle number and (ii) spatial location of blocks. The block classification shown in FIG. 21A is similar to the block classification shown in FIG. 15 . For example, referring to FIGS. 15 and 21 , peripheral channel block 1508 (which is a combination of top
图21A中还示出了两个框2110和2112。参见框2110,示出了专用于周边槽道区块1508和各种感测循环子系列的权重集。例如,权重集(pl,(1-N1))专用于周边槽道区块1508和感测循环1至N1。权重集(pl,(N1-N2))专用于周边槽道区块1508和感测循环(N1+1)至N2。权重集(pl,(N2-N))专用于周边槽道区块1508和感测循环(N2+1)至N。Also shown in Figure 21A are two
类似地,参见框2112,示出了专用于中央槽道区块1510和各种感测循环子系列的权重集。例如,权重集(cl,(1-N1))专用于中央槽道区块1510和感测循环1至N1。权重集(cl,(N1-N2))专用于中央槽道区块1510和感测循环(N1+1)至N2。权重集(cl,(N2-N))专用于中央槽道区块1510和感测循环(N2+1)至N。Similarly, referring to block 2112, a set of weights specific to the central channel block 1510 and various sub-series of sensing cycles is shown. For example, the weight set (cl,(1-N1)) is dedicated to the central channel block 1510 and
在一个实施方案中并且如上文所讨论的,权重集(pl,(1-N1))、权重集(pl,(N1-N2))、权重集(pl,(N2-N))、权重集(cl,(1-N1))、权重集(cl,(N1-N2))、权重集(cl,(N2-N))中的每一者包括对应的权重。例如,权重集(pl,(1-N1))包括用于配置对应的多个空间层和时间层(例如,参见图7和图9中此类层的示例)的第一多个权重,权重集(pl,(N1-N2))包括用于配置对应的多个空间层和时间层的第二多个权重,权重集(pl,(N2-N))包括用于配置对应的多个空间层和时间层的第三多个权重,权重集(cl,(1-N1))包括用于配置对应的多个空间层和时间层的第四多个权重,权重集(cl,(N1-N2))包括用于配置对应的多个空间层和时间层的第五多个权重,并且权重集(cl,(N2-N))包括用于配置对应的多个空间层和时间层的第六多个权重。In one embodiment and as discussed above, weight set (pl, (1-N1)), weight set (pl, (N1-N2)), weight set (pl, (N2-N)), weight set Each of (cl, (1-N1)), weight set (cl, (N1-N2)), weight set (cl, (N2-N)) includes a corresponding weight. For example, the set of weights (pl,(1-N1)) includes a first plurality of weights for configuring a corresponding plurality of spatial and temporal layers (see, for example, FIGS. 7 and 9 for examples of such layers), weights The set (pl, (N1-N2)) includes a second plurality of weights for configuring the corresponding multiple spatial layers and temporal layers, and the weight set (pl, (N2-N)) includes the second multiple weights for configuring the corresponding multiple spatial layers The third plurality of weights of layers and time layers, the weight set (cl, (1-N1)) includes a fourth plurality of weights for configuring corresponding multiple spatial layers and time layers, and the weight set (cl, (N1- N2)) includes a fifth plurality of weights for configuring the corresponding plurality of spatial layers and temporal layers, and the weight set (cl, (N2-N)) includes a fifth plurality of weights for configuring the corresponding plurality of spatial layers and temporal layers More than six weights.
第一多个权重中的至少一个权重与第二多个权重中的对应权重不同(在一些示例中,两个权重集可具有一个或多个公共或相同的权重)。第二多个权重中的至少一个权重与第三多个权重中的对应权重不同,第三多个权重中的至少一个权重与第四多个权重中的对应权重不同,诸如此类。在一个实施方案中,各种权重集中的一个或多个权重使用不同缩放系数来量化。At least one weight of the first plurality of weights is different from a corresponding weight of the second plurality of weights (in some examples, both sets of weights may have one or more common or identical weights). At least one weight in the second plurality of weights is different from a corresponding weight in the third plurality of weights, at least one weight in the third plurality of weights is different from a corresponding weight in the fourth plurality of weights, and so on. In one embodiment, one or more weights in the various weight sets are quantized using different scaling factors.
因为各种权重集与对应的测序循环相关联,在示例中,各种权重集中的权重分别对应于各种测序化学、测序配置和/或测序测定。例如,权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序化学、第二测序化学和第三测序化学(例如,它们分别在测序循环1至N1、(N1+1)至N2和(N2+1)至N期间使用。权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序测定、第二测序测定和第三测序测定。权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序配置、第二测序配置和第三测序配置。Because various sets of weights are associated with corresponding sequencing cycles, in examples, the weights in the various sets of weights correspond to various sequencing chemistries, sequencing configurations, and/or sequencing assays, respectively. For example, weight set (pl,(1-N1)), weight set (pl,(N1-N2)) and weight set (pl,(N2-N)) correspond to the first sequencing chemistry, the second sequencing chemistry and A third sequencing chemistry (eg, they are used during
图21B示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的又一示例性权重选择方案。图21B中所示的区块分类类似于图16中所示的区块分类。例如,参见图16和图21B,流通池1400被划分为顶部左侧区段1610TL、顶部中央区段1610TC、顶部右侧区段1610TR、中间左侧区段1610ML、中央区段1610C、中间右侧区段1610MR、底部左侧区段1610BL、底部中央区段1610BC和底部左侧区段1610BL。基于区块所属的区段来对流通池1400的每个区块进行分类。FIG. 21B shows yet another exemplary weight selection scheme based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks. The block classification shown in FIG. 21B is similar to the block classification shown in FIG. 16 . For example, referring to FIGS. 16 and 21B , the
图21B还示出了表格2150,其包括用于各种区段的区块且用于感测循环1至N的各种子系列的各种权重。例如,参见表格2150的第一行,权重集(TL,(1-N1))专用于顶部左侧区段1610TL的区块和感测循环1至N1。权重集(TL,(N1-N2))专用于顶部左侧区段1610TL的区块和感测循环(N1+1)至N2。权重集(TL,(N2-N))专用于顶部左侧区段1610TL的区块和感测循环(N2+1)至N。FIG. 21B also shows a table 2150 that includes various weights for the blocks of the various sectors and for the various sub-series of sensing cycles 1-N. For example, referring to the first row of table 2150, the weight set (TL,(1-N1)) is dedicated to the blocks and
类似地,参见表格2150的第二行,权重集(TC,(1-N1))专用于顶部中央区段1610TC的区块和感测循环1至N1。权重集(TC,(N1-N2))专用于顶部中央区段1610TC的区块和感测循环(N1+1)至N2。权重集(TC,(N2-N))专用于顶部中央区段1610TC的区块和感测循环(N2+1)至N。类似地,表格2150的各种其他行包括用于各种其他区段的区块且用于各种感测循环子系列的权重集,并且基于以上讨论对于本领域技术人员而言将是显而易见的。Similarly, referring to the second row of table 2150, the weight set (TC,(1-N1)) is specific to the block and
图22示出了碱基检出操作2200的一个具体实施,其中基于空间的区块信息和时间的感测循环子系列信息来选择用于碱基检出的权重集。FIG. 22 illustrates an implementation of a
对于图22的碱基检出操作2200,假设流通池1400的区块根据图15和图21A的示例进行分类。此类区块分类不旨在限制本公开的范围,并且碱基检出操作2200也可应用于任何其他类型的区块分类,诸如相对于图14、图16、图20、图21B所讨论的任何区块分类和/或由本领域技术人员基于本公开的教导内容所设想的任何其他区块分类。For the
此外,对于图22的碱基检出操作2200,假设N个感测循环被划分为三个循环子系列,包括(a)循环1至N1、(b)循环(N1+1)至N2和(c)循环(N2+1)至N,如相对于图18至图21B所讨论的。同样,此类感测循环划分不旨在限制本公开的范围,并且碱基检出操作2200也可应用于本领域技术人员基于本公开的教导内容可设想的任何其他类型的感测循环子划分。In addition, for the
在图22中,碱基检出操作1a至6a专用于周边槽道区块和循环1至N1。类似地,碱基检出操作1b至6b专用于中央槽道区块和循环1至N1。1a至6a和1b至6b的操作可针对循环(N1+1)至N2重复,并且可进一步针对循环(N2+1)至N重复,但是此类重复在图22中未详细示出。基于针对循环1至N1对操作1a至6a和1b至6b的讨论,本领域技术人员将理解针对循环(N1+1)至N2,以及进一步针对循环(N2+1)至N的此类重复。In FIG. 22,
在动作1a处,数据流逻辑451(例如,参见图4)接收用于周边槽道区块1508且用于循环1至N1的簇传感器数据和权重集(pl,(1-N1))(参见图21A)。簇数据包括测序图像,其描绘在测序运行的测序循环1至N1时周边槽道区块1508内的簇的强度发射,如上所述。在动作2a处,数据流逻辑451将用于周边槽道区块1508且用于循环1至N1的簇数据和权重集(pl,(1-N1))转发到由可配置处理器450(例如,参见图4)执行的基于神经网络的碱基检出器2308(例如,其示例在图7、图9、图10中示出)。在基于神经网络的碱基检出器2308中加载用于周边槽道区块1508且用于循环1至N1的簇数据和权重集(pl,(1-N1))。而且,尽管在图22中未示出,但是神经网络模型的拓扑结构也经由数据流逻辑451从存储器加载到可配置处理器450。At act la, dataflow logic 451 (see, e.g., FIG. 4 ) receives cluster sensor data and weight set (pl,(1-N1)) for perimeter channel block 1508 and for
在动作3a处,可配置处理器450用经加载的权重集(pl,(1-N1))配置在可配置处理器450上运行的神经网络的拓扑结构。用经加载的权重集(pl,(1-N1))配置的基于神经网络的碱基检出器2308基于经加载的权重集(pl,(1-N1))从簇数据生成表示(例如,特征映射图)(例如,凭借通过其配置的空间和时间卷积层处理簇数据),并且基于表示产生用于周边槽道区块1508内的多个簇且用于测序循环1至N1的碱基检出分类数据(例如,碱基检出分类分数)。例如,基于神经网络的碱基检出器2308对簇数据应用经加载的权重集(pl,(1-N1))以生成碱基检出分类数据。在一个具体实施中,碱基检出分类分数未经归一化,例如,它们未经受由softmax函数进行的指数归一化。At action 3a, the
在动作4a处,可配置处理器450向数据流逻辑451发送用于周边槽道区块1508内的簇且用于循环1至N1的碱基检出分类数据。在动作5a处,数据流逻辑451向主机处理器2304提供用于周边槽道区块1508内的簇且用于循环1至N1的碱基检出分类分数。At
在动作6a处,主机处理器2304将未经归一化碱基检出分类分数归一化(例如,通过应用softmax函数,图7的框740或图9的框930),并且生成用于周边槽道区块1508的簇内的链且用于循环1至N1的归一化碱基检出分类分数,即,碱基检出。At act 6a, the host processor 2304 normalizes the unnormalized base call classification scores (e.g., by applying a softmax function, block 740 of FIG. 7 or block 930 of FIG. 9 ), and generates Strands within clusters of the channel block 1508 and used for normalized base call classification scores, ie, base calls, for
因此,在操作1a至6a中,使用针对周边槽道区块1508且针对循环1至N1进行专门训练的权重集(pl,(1-N1)),系统对周边槽道区块1508的簇内的链并且针对循环1至N1中进行碱基检出。需注意,操作1a至6a描绘碱基检出操作的高级和简化版本,并且可能不示出可针对碱基检出执行的一个或多个其他操作。碱基检出操作的更多细节可见于2020年8月28日提交的名称为“DETECTING AND FILTERING CLUSTERS BASED ON ARTIFICIALINTELLIGENCE-PREDICTED BASE CALLS”的美国临时专利申请号63/072,032(代理人案卷号ILLM 1018-1/IP-1860-PRV),该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Thus, in
操作1a至6a专用于针对周边槽道区块1508的簇内的链且针对循环1至N1的碱基检出。这些操作重复为操作1b至6b,但是针对中央槽道区块1510内的簇且针对循环1至N1。例如,在动作1b处,数据流逻辑451接收用于中央槽道区块1510且用于循环1至N1的簇数据和权重集(cl,(1-N1))(参见图21A)。簇数据包括测序图像,其描绘在测序运行的测序循环1至N1时中央槽道区块1510内的簇的强度发射,如上所述。在动作2b处,数据流逻辑451将用于中央槽道区块1508且用于循环1至N1的簇数据和权重集(cl,(1-N1))转发到由可配置处理器450执行的基于神经网络的碱基检出器2308。用于中央槽道区块1510且用于循环1至N1的簇数据和权重集(cl,(1-N1))用于重新配置基于神经网络的碱基检出器2308。Operations la through 6a are dedicated to base calling for strands within clusters of peripheral lane block 1508 and for
在动作3b处,在可配置处理器450上运行的基于重新配置的神经网络的碱基检出器2308从簇数据生成初始表示(例如,特征映射图)(例如,凭借通过其空间和时间卷积层处理簇数据),并且基于初始中间表示产生用于中央槽道区块1510内的多个簇且用于测序循环1至N1的碱基检出分类分数。在一个具体实施中,初始碱基检出分类分数未经归一化,例如,它们未经受由softmax函数进行的指数归一化。At act 3b, the reconfigurable neural network based base caller 2308 running on the
在动作4b处,可配置处理器450向数据流逻辑451发送用于中央槽道区块1510内的簇且用于循环1至N1的碱基检出分类分数。在动作5b处,数据流逻辑451向主机处理器2304提供用于中央槽道区块1510内的簇且用于循环1至N1的碱基检出分类分数。At act 4b, the
在动作6b处,主机处理器2304将未经归一化碱基检出分类分数归一化(例如,通过应用softmax函数),并且生成用于中央槽道区块1510的簇内的链且用于循环1至N1的归一化碱基检出分类分数,即,碱基检出。At
因此,碱基检出操作1a至6a专用于周边槽道区块1508和循环1至N1。类似地,碱基检出操作1b至6b专用于中央槽道区块1510和循环1至N1。1a至6a和1b至6b的操作针对循环(N1+1)至N2重复,并且进一步针对循环(N2+1)至N重复,如图22象征性地所示。Thus, base calling operations la through 6a are dedicated to peripheral lane block 1508 and
返回参见图7,所示的模型包括隔离的叠堆701、702、703、704、705。叠堆701接收来自循环K+2的补片的区块数据作为输入。叠堆702接收来自循环K+1的补片的区块数据作为输入。叠堆703接收来自循环K的补片的区块数据作为输入。叠堆704接收来自循环K-1的补片的区块数据作为输入。叠堆705接收来自循环K-2的补片的区块数据作为输入。隔离叠堆的层各自执行内核的卷积操作,该内核包括层的输入数据上的多个滤波器。来自叠堆701至705中的每个叠堆的输出特征集(中间数据)作为输入被提供到时间组合层的逆层次结构720,其中来自多个循环的中间数据被组合。Referring back to FIG. 7 , the model shown includes
因此,如相对于图7、图9和图11所讨论的,叠堆701、…、705执行隔离的空间卷积。来自各种叠堆701、…、705内的各种循环的输入之间没有时间混合或交互。最后,在叠堆701、…、705中的数据处理之后,在区段720中存在来自各种测序循环的数据的处理。叠堆701、…、705内的各种层在本文中也称为空间层,并且叠堆701、…、705内的各种过滤器的内核的权重在本文中称为空间权重。类似地,区段720内的各种层在本文中也称为时间层,并且区段720内的各种过滤器的内核的权重在本文中也称为时间权重。例如,在图9中的空间卷积921、922、923期间应用的权重是空间权重,而在图9中的时间卷积924、925期间应用的权重是时间权重。Thus, as discussed with respect to FIGS. 7 , 9 and 11 ,
图23A示出了用于各种类别的区块和用于各种感测循环的各种权重集,各个权重集包括对应的空间权重和对应的时间权重。图23A中所示的区块分类类似于相对于图15和图21A所讨论的区块分类。如相对于图21A所讨论的,用于循环1至N1的周边槽道区块1508与对应权重集(pl,1-N1)相关联。如图23A所示,权重集(pl,1-N1)包括对应的空间权重(s-pl,(1-N1))和对应的时间权重(t-pl,(1-N1))。当神经网络模型用于处理用于循环1至N1的用于周边槽道区块1508的簇传感器数据时,使用空间权重(s-pl,(1-N1))来配置神经网络模型的空间层。当神经网络模型用于处理用于循环1至N1的用于周边槽道区块1508的簇传感器数据时,使用时间权重(t-pl,(1-N1))来配置神经网络模型的时间层。Figure 23A shows various sets of weights for various classes of blocks and for various sensing cycles, each set of weights including corresponding spatial weights and corresponding temporal weights. The block classification shown in Figure 23A is similar to the block classification discussed with respect to Figures 15 and 21A. As discussed with respect to FIG. 21A , the peripheral channel block 1508 for
类似地,如同样相对于图21A所讨论的,用于循环N1至N2的周边槽道区块1508与对应权重集(pl,N1-N2)相关联。如图23A所示,权重集(pl,N1-N2)包括对应的空间权重(s-pl,(N1-N2))和对应的时间权重(t-pl,(N1-N2))。图23A的各种其他权重集也类似地具有对应的空间权重和时间权重。Similarly, as also discussed with respect to FIG. 21A , perimeter channel blocks 1508 for cycles N1 through N2 are associated with corresponding weight sets (pl, N1-N2). As shown in FIG. 23A , the set of weights (pl, N1-N2) includes corresponding spatial weights (s-pl, (N1-N2)) and corresponding temporal weights (t-pl, (N1-N2)). The various other weight sets of FIG. 23A similarly have corresponding spatial and temporal weights.
图23B示出了用于各种类别的区块和用于各种循环的各种权重集,其中特定类别的区块的不同权重集包括公共空间权重和不同的时间权重。图23A中所示的区块分类类似于相对于图15、图21A和图23A所讨论的区块分类。然而,与图23A不同,在图23B中,用于周边槽道区块1508的权重集(pl,(1-N1))、(pl,(N1-N2))和(pl,(N2-N))具有公共空间权重(s-pl)。因此,相同或公共空间权重(s-pl)用于周边槽道区块1508且用于循环子系列1至N1、(N+1)至N2和(N2+1)至N中的每个循环子系列。Figure 23B shows various sets of weights for various classes of tiles and for various cycles, where the different sets of weights for a particular class of tiles include common spatial weights and different temporal weights. The block classification shown in Figure 23A is similar to the block classification discussed with respect to Figures 15, 21A, and 23A. However, unlike FIG. 23A , in FIG. 23B the weight sets (pl,(1-N1)), (pl,(N1-N2)) and (pl,(N2-N2)) for the perimeter channel block 1508 )) have common spatial weights (s-pl). Thus, the same or common spatial weight (s-pl) is used for the perimeter channel block 1508 and for each cycle in the
权重集(pl,(1-N1))、(pl,(N1-N2))和(pl,(N2-N))具有不同的时间权重,诸如分别为时间权重(t-pl,(1-N1))、时间权重(t-pl,(N1-N2))和时间权重(t-pl,(N2-N))。Weight sets (pl, (1-N1)), (pl, (N1-N2)) and (pl, (N2-N)) have different time weights, such as time weights (t-pl, (1-N2) respectively N1)), time weight (t-pl, (N1-N2)) and time weight (t-pl, (N2-N)).
类似地,用于中央槽道区块1510的权重集(cl,(1-N1))、(cl,(N1-N2))和(cl,(N2-N))具有公共空间权重(s-cl)。因此,相同或公共空间权重(s-cl)用于中央槽道区块1510且用于循环子系列1至N1、(N+1)至N2和(N2+1)至N中的每个循环子系列。Similarly, the weight sets (cl,(1-N1)), (cl,(N1-N2)) and (cl,(N2-N)) for the central channel block 1510 have common spatial weights (s- cl). Thus, the same or common spatial weight (s-cl) is used for the central channel block 1510 and for each cycle in the
权重集(cl,(1-N1))、(cl,(N1-N2))和(cl,(N2-N))具有不同的时间权重,诸如分别为时间权重(t-cl,(1-N1))、时间权重(t-cl,(N1-N2))和时间权重(t-cl,(N2-N))。Weight sets (cl,(1-N1)), (cl,(N1-N2)) and (cl,(N2-N)) have different time weights, such as time weights (t-cl,(1-N2) respectively N1)), time weight (t-cl, (N1-N2)) and time weight (t-cl, (N2-N)).
在一个实施方案中并且如相对于图17A和图17B所讨论的,随着测序循环进展,衰落、定相和/或预定相引起传感器数据的劣化。此类劣化由神经网络模型的时间层(诸如图7的框720内的层或图9的层924、925)解决。因此,在图23B中,各种测序循环子系列的时间权重进行不同的训练。例如,用于循环1至N1且用于给定区块类别的时间权重与用于循环N1至N2用于相同区块类别的时间权重不同。相比之下,因为空间层(诸如图7的框701、…、705内的层或图9的层921、922、923)可能不会显著解决信号质量的劣化,所有循环共享用于给定区块类别的公共空间权重,如图23B所示。In one embodiment and as discussed with respect to FIGS. 17A and 17B , fading, phasing, and/or prephasing cause degradation of sensor data as the sequencing cycle progresses. Such degradations are addressed by temporal layers of the neural network model, such as layers within
因此,当处理特定区块类别的传感器数据(比如说,用于周边槽道区块1508)时,最初在可配置处理器中加载用于循环1至N1的权重集(pl,(1-N1))的公共空间权重(s-pl)和时间权重(t-pl,(1-N1)),并且用这些空间权重和时间权重配置基于神经网络的碱基检出器2308。例如,用公共空间权重(s-pl)配置基于神经网络的碱基检出器2308的空间层,并且用时间权重(t-pl,(1-N1))配置基于神经网络的碱基检出器2308的时间层。基于经配置的神经网络的碱基检出器2308对周边槽道区块1508的用于循环1至N1的传感器数据应用经配置的空间和时间层,以产生周边槽道区块1508的用于循环1至N1的碱基检出分类数据。Thus, when processing sensor data for a particular block class (say, for the peripheral channel block 1508), the set of weights (pl,(1-N1 )), and configure the neural network based base caller 2308 with these spatial and temporal weights. For example, the spatial layer of the neural network based base caller 2308 is configured with common spatial weights (s-pl) and the neural network based base calling is configured with temporal weights (t-pl,(1-N1)) The time layer of the device 2308. The configured neural network based base caller 2308 applies the configured spatial and temporal layers to the sensor data for the peripheral channel block 1508 for
随后,在处理用于循环(N1+1)的传感器数据之前,加载权重集(pl,(N1-N2))的时间权重(t-pl,(N1-N2)),而不加载该权重集的任何对应的空间权重。用时间权重(t-pl,(N1-N2))配置基于神经网络的碱基检出器2308的时间层。然后,基于神经网络的碱基检出器2308对周边槽道区块1508的用于循环(N1+1)至N2的传感器数据应用先前配置的空间层(例如,其先前用公共空间权重(s-pl)配置)和重新配置的时间层(例如,其用时间权重(t-pl,(N1-N2))重新配置),以产生周边槽道区块1508的用于循环(N1+1)至N2的碱基检出分类数据。Subsequently, the temporal weights (t-pl,(N1-N2)) of the weight set (pl,(N1-N2)) are loaded without loading the weight set before processing the sensor data for the loop (N1+1) Any corresponding spatial weights for . The temporal layers of the neural network based base caller 2308 are configured with temporal weights (t-pl,(N1-N2)). The neural network based base caller 2308 then applies a previously configured spatial layer (e.g., which was previously assigned with the common spatial weights (s -pl) configuration) and a reconfigured temporal layer (e.g., it is reconfigured with temporal weights (t-pl,(N1-N2))) to generate peripheral channel block 1508 for cycle (N1+1) Base call classification data to N2.
随后,在处理用于循环(N2+1)的传感器数据之前,加载权重集(pl,(N2-N))的时间权重(t-pl,(N2-N)),而不加载该权重集的任何对应的空间权重。用时间权重(t-pl,(N2-N))重新配置基于神经网络的碱基检出器2308的时间层。然后,基于神经网络的碱基检出器2308对周边槽道区块的用于循环(N2+1)至N的传感器数据应用先前配置的空间层(例如,其先前用公共空间权重(s-pl)配置)和重新配置的时间层(例如,其用时间权重(t-pl,(N2-N))重新配置),以产生周边槽道区块的用于循环(N2+1)至N的碱基检出分类数据。Subsequently, the temporal weights (t-pl,(N2-N)) of the weight set (pl,(N2-N)) are loaded without loading the weight set before processing the sensor data for the loop (N2+1) Any corresponding spatial weights for . The temporal layers of the neural network based base caller 2308 are reconfigured with temporal weights (t-pl,(N2-N)). The neural network-based base caller 2308 then applies a previously configured spatial layer (e.g., which was previously assigned with the common spatial weights (s− pl) configuration) and a reconfigured temporal layer (e.g., it is reconfigured with temporal weights (t-pl, (N2-N))) to generate surrounding channel blocks for looping (N2+1) to N base call classification data for .
以对应的类似方式产生用于其他区块类别(诸如中央槽道区块1510)的碱基检出分类数据,本领域技术人员将基于以上讨论和图23B的图示理解所述类似方式。Base call classification data for other block classes, such as central lane block 1510, is generated in a correspondingly similar manner, which will be understood by those skilled in the art based on the above discussion and the illustration of Figure 23B.
图23C示出了基于一个或多个测序运行参数2382来选择权重集的系统2300。例如,示出了可以在可配置处理器450和/或主机处理器2304上执行的权重集选择逻辑2386。权重集选择逻辑2386接收一个或多个测序运行参数2382以及相对于图14至图23B所讨论的一个或多个其他权重集选择标准。权重集选择逻辑2386基于一个或多个测序运行参数2382和/或相对于图14至图23B所讨论的一个或多个其他权重集选择标准,从多个候选权重集2384a、…、2384N中选择权重集。在图23B的示例中,权重集选择逻辑2386选择权重集2384b。然后,在可配置处理器450中加载所选择的权重集,并且使用所选择的权重集来配置神经网络拓扑结构以进行碱基检出,如本文所讨论的。FIG. 23C illustrates a system 2300 for selecting a set of weights based on one or more sequencing run parameters 2382 . For example, weight set selection logic 2386 that may be executed on
一个或多个测序运行参数2382可包括与当前测序运行相关联的一个或多个适当参数。例如,测序运行中使用的反应组分(诸如试剂、酶、样品、其他生物分子和缓冲溶液)可影响传感器数据,并且可基于所使用的反应组分的类型、参数或批次来选择权重集。例如,定相特征(参见图17B)可基于用于测序运行的试剂包,并且可基于试剂包的类型、寿命和/或批次而变化。因此,可针对各种类型批次的反应组分生成各种候选权重集,并且权重集选择逻辑2386可基于用于当前测序循环的反应组分来选择权重集。One or more sequencing run parameters 2382 may include one or more appropriate parameters associated with the current sequencing run. For example, reaction components used in a sequencing run (such as reagents, enzymes, samples, other biomolecules, and buffer solutions) can affect sensor data, and weight sets can be selected based on the type, parameters, or batch of reaction components used . For example, phasing characteristics (see FIG. 17B ) can be based on the reagent pack used for the sequencing run, and can vary based on the type, age, and/or lot of the reagent pack. Accordingly, various candidate weight sets can be generated for various types of batches of reaction components, and the weight set selection logic 2386 can select a weight set based on the reaction components used for the current sequencing cycle.
在另一示例中,权重集选择逻辑2386可估计定相特征,并且基于定相特征来选择权重集。例如,可针对不同的定相特征生成不同的权重集。然后在测序运行中的早期,可估计定相参数并使用定相参数来选择权重集。在又一示例中,可尝试多个候选权重集,并且可选择具有最低错误率(或最高信噪比)的权重集用于整个测序运行。In another example, the weight set selection logic 2386 can estimate the phasing characteristics and select a weight set based on the phasing characteristics. For example, different sets of weights can be generated for different phasing features. Then early in the sequencing run, phasing parameters can be estimated and used to select weight sets. In yet another example, multiple candidate weight sets can be tried, and the weight set with the lowest error rate (or highest signal-to-noise ratio) can be selected for the entire sequencing run.
图24是根据一个具体实施的碱基检出系统2400的框图。碱基检出系统2400可操作以获得与生物物质或化学物质中的至少一者相关的任何信息或数据。在一些具体实施中,碱基检出系统2400是可类似于台式设备或台式计算机的工作站。例如,用于进行所需反应的大部分(或全部)系统和部件可位于共同的外壳2416内。Figure 24 is a block diagram of a
在特定具体实施中,碱基检出系统2400是被配置用于各种应用的核酸测序系统(或测序仪),各种应用包括但不限于从头测序、全基因组或靶基因组区域的重测序以及宏基因组学。测序仪也可用于DNA或RNA分析。在一些具体实施中,碱基检出系统2400还可被配置为在生物传感器中生成反应位点。例如,碱基检出系统2400可被配置为接收样品并且生成来源于样品的克隆扩增核酸的表面附着簇。每个簇可构成生物传感器中的反应位点或作为其一部分。In certain implementations,
示例性碱基检出系统2400可包括被配置为与生物传感器2402相互作用以在生物传感器2402内执行所需反应的系统插座或接口2412。在以下相对于图24的描述中,将生物传感器2402装载到系统插座2412中。然而,应当理解,可将包括生物传感器2402的卡盒插入到系统插座2412中,并且在一些状态下,可暂时或永久地移除卡盒。如上所述,除了别的以外,卡盒还可包括流体控制部件和流体储存部件。Exemplary
在特定具体实施中,碱基检出系统2400被配置为在生物传感器2402内执行大量平行反应。生物传感器2402包括可发生所需反应的一个或多个反应位点。反应位点可例如固定至生物传感器的固体表面或固定至位于生物传感器的对应反应室内的小珠(或其他可移动基板)。反应位点可包括,例如,克隆扩增核酸的簇。生物传感器2402可以包括固态成像设备(例如,CCD或CMOS成像器件)和安装到其上的流通池。流通池可包括一个或多个流动通道,该一个或多个流动通道从碱基检出系统2400接收溶液并且将溶液引向反应位点。任选地,生物传感器2402可以被配置为接合热元件,以用于将热能传递到流动通道中或从流动通道传递出去。In a particular implementation,
碱基检出系统2400可包括彼此相互作用以执行用于生物或化学分析的预先确定的方法或测定协议的各种部件、组件和系统(或子系统)。例如,碱基检出系统2400包括系统控制器2404,该系统控制器可与碱基检出系统2400的各种部件、组件和子系统以及生物传感器2402通信。例如,除了系统插座2412之外,碱基检出系统2400还可包括:流体控制系统2406,该流体控制系统用于控制流体在碱基检出系统2400和生物传感器2402的整个流体网络中的流动;流体储存系统2408,该流体储存系统被配置为容纳可以由生物测定系统使用的所有流体(例如,气体或液体);温度控制系统2410,该温度控制系统可以调节流体网络、流体储存系统2408和/或生物传感器2402中流体的温度;和照明系统2409,该照明系统被配置为照亮生物传感器2402。如上所述,如果将具有生物传感器2402的卡盒装载到系统插座2412中,则该卡盒还可以包括流体控制部件和流体储存部件。
还如图所示,碱基检出系统2400可包括与用户交互的用户界面2414。例如,用户界面2414可以包括用于显示或请求来自用户的信息的显示器2413和用于接收用户输入的用户输入设备2415。在一些具体实施中,显示器2413和用户输入设备2415是相同的设备。例如,用户界面2414可包括触敏显示器,该触敏显示器被配置为检测个体触摸的存在并且还识别触摸在显示器上的位置。然而,可以使用其他用户输入设备2415,诸如鼠标、触摸板、键盘、小键盘、手持扫描仪、语音识别系统、运动识别系统等。如将在下文更详细地讨论,碱基检出系统2400可与包括生物传感器2402(例如,呈卡盒的形式)的各种部件通信,以执行所需反应。碱基检出系统2400还可被配置为分析从生物传感器获得的数据以向用户提供所需信息。As also shown, the
系统控制器2404可包括任何基于处理器或基于微处理器的系统,包括使用微控制器、精简指令集计算机(RISC)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、逻辑电路以及能够执行本文所述功能的任何其他电路或处理器。上述示例仅是示例性的,因此不旨在以任何方式限制术语系统控制器的定义和/或含义。在示例性具体实施中,系统控制器2404执行存储在一个或多个存储元件、存储器或模块中的指令集,以便进行获得检测数据和分析检测数据中的至少一者。检测数据可包括多个像素信号序列,使得可在许多碱基检出循环内检测来自数百万个传感器(或像素)中的每个传感器(或像素)的像素信号序列。储存元件可为呈碱基检出系统2400内的信息源或物理存储器元件的形式。
指令集可包括指示碱基检出系统2400或生物传感器2402执行具体操作(诸如本文所述的各种具体实施的方法和过程)的各种命令。指令集可为软件程序的形式,该软件程序可形成有形的一个或多个非暂态计算机可读介质的一部分。如本文所用,术语“软件”和“固件”是可互换的,并且包括存储在存储器中以供计算机执行的任何计算机程序,包括RAM存储器、ROM存储器、EPROM存储器、EEPROM存储器和非易失性RAM(NVRAM)存储器。上述存储器类型仅是示例性的,因此不限制可用于存储计算机程序的存储器类型。The set of instructions may include various commands that instruct
软件可为各种形式,诸如系统软件或应用软件。此外,软件可以是独立程序的集合的形式,或者是较大程序内的程序模块或程序模块的一部分的形式。软件还可包括面向对象编程形式的模块化编程。在获得检测数据之后,检测数据可由碱基检出系统2400自动处理,响应于用户输入而处理,或者响应于另一个处理机器提出的请求(例如,通过通信链路的远程请求)而处理。在例示的具体实施中,系统控制器2404包括分析模块2538(在图25中示出)。在其他具体实施中,系统控制器2404不包括分析模块2538,而是可以访问分析模块2538(例如,分析模块2538可以单独地托管在云上)。Software can be in various forms, such as system software or application software. Furthermore, software may be in the form of a collection of stand-alone programs, or a program module within a larger program, or a portion of a program module. The software may also include modular programming in the form of object-oriented programming. After the assay data is obtained, the assay data can be processed by the
系统控制器2404可经由通信链路连接到生物传感器2402和碱基检出系统2400的其他部件。系统控制器2404还可以通信地连接到非现场系统或服务器。通信链路可以是硬连线的、有线的或无线的。系统控制器2404可以从用户界面2414和用户输入设备2415接收用户输入或命令。
流体控制系统2406包括流体网络,并且被配置为引导和调节一种或多种流体通过流体网络的流动。流体网络可以与生物传感器2402和流体储存系统2408流体连通。例如,选定的流体可以从流体储存系统2408抽吸并以受控方式引导至生物传感器2402,或者流体可以从生物传感器2402抽吸并朝向例如流体储存系统2408中的废物储存器引导。虽然未示出,但流体控制系统2406可以包括检测流体网络内的流体的流速或压力的流量传感器。传感器可以与系统控制器2404通信。
温度控制系统2410被配置为调节流体网络、流体储存系统2408和/或生物传感器2402的不同区域处流体的温度。例如,温度控制系统2410可以包括热循环仪,该热循环仪与生物传感器2402对接并且控制沿着生物传感器2402中的反应位点流动的流体的温度。温度控制系统2410还可调节碱基检出系统2400或生物传感器2402的固体元件或部件的温度。尽管未示出,但温度控制系统2410可以包括用于检测流体或其他部件的温度的传感器。传感器可以与系统控制器2404通信。
流体储存系统2408与生物传感器2402流体连通,并且可以储存用于在其中进行所需反应的各种反应组分或反应物。流体储存系统2408还可以储存用于洗涤或清洁流体网络和生物传感器2402以及用于稀释反应物的流体。例如,流体储存系统2408可以包括各种储存器,以储存样品、试剂、酶、其他生物分子、缓冲溶液、水性溶液和非极性溶液等。此外,流体储存系统2408还可以包括废物储存器,用于接收来自生物传感器2402的废物。在包括卡盒的具体实施中,卡盒可包括流体储存系统、流体控制系统或温度控制系统中的一者或多者。因此,本文所述的与那些系统有关的一个或多个部件可容纳在卡盒外壳内。例如,卡盒可具有各种储存器,以储存样品、试剂、酶、其他生物分子、缓冲溶液、水性溶液和非极性溶液、废物等。因此,流体储存系统、流体控制系统或温度控制系统中的一者或多者可经由卡盒或其他生物传感器与生物测定系统可移除地接合。
照明系统2409可以包括光源(例如,一个或多个LED)和用于照亮生物传感器的多个光学部件。光源的示例可包括激光器、弧光灯、LED或激光二极管。光学部件可以是例如反射器、二向色镜、分束器、准直器、透镜、滤光器、楔镜、棱镜、反射镜、检测器等。在使用照明系统的具体实施中,照明系统2409可以被配置为将激发光引导至反应位点。作为一个示例,荧光团可由绿色波长的光激发,因此激发光的波长可为大约532nm。在一个具体实施中,照明系统2409被配置为产生平行于生物传感器2402的表面的表面法线的照明。在另一具体实施中,照明系统2409被配置为产生相对于生物传感器2402的表面的表面法线成偏角的照明。在又一具体实施中,照明系统2409被配置为产生具有多个角度的照明,包括一些平行照明和一些偏角照明。The
系统插座或接口2412被配置为以机械、电气和流体方式中的至少一种方式接合生物传感器2402。系统插座2412可将生物传感器2402保持在所需取向,以有利于流体流过生物传感器2402。系统插座2412还可包括电触点,该电触点被配置为接合生物传感器2402,使得碱基检出系统2400可与生物传感器2402通信和/或向生物传感器2402提供功率。此外,系统插座2412可以包括被配置为接合生物传感器2402的流体端口(例如,喷嘴)。在一些具体实施中,生物传感器2402以机械方式、电方式以及流体方式可移除地耦接到系统插座2412。System socket or
此外,碱基检出系统2400可与其他系统或网络或与其他生物测定系统2400远程通信。由生物测定系统2400获得的检测数据可储存在远程数据库中。Additionally, the
图25是可在图24的系统中使用的系统控制器2404的框图。在一个具体实施中,系统控制器2404包括可以彼此通信的一个或多个处理器或模块。处理器或模块中的每一者可以包括用于执行特定过程的算法(例如,存储在有形和/或非暂态计算机可读存储介质上的指令)或子算法。系统控制器2404在概念上被示出为模块的集合,但可以利用专用硬件板、DSP、处理器等的任何组合来实现。另选地,系统控制器2404可以利用具有单个处理器或多个处理器的现成PC来实现,其中功能操作分布在处理器之间。作为进一步的选择,下文所述的模块可利用混合配置来实现,其中某些模块化功能利用专用硬件来执行,而其余模块化功能利用现成PC等来执行。模块还可被实现为处理单元内的软件模块。FIG. 25 is a block diagram of a
在操作期间,通信端口2520可向生物传感器2402(图24)和/或子系统2406、2408、2410(图24)传输信息(例如,命令)或从其接收信息(例如,数据)。在具体实施中,通信端口2520可以输出多个像素信号序列。通信端口2520可从用户界面2414(图24)接收用户输入并且将数据或信息传输到用户界面2414。来自生物传感器2402或子系统2406、2408、2410的数据可以在生物测定会话期间由系统控制器2404实时处理。除此之外或另选地,数据可在生物测定会话期间临时储存在系统存储器中,并且以比实时或脱机操作更慢的速度进行处理。During operation,
如图25所示,系统控制器2404可包括与主控制模块2530通信的多个模块2531至2539。主控制模块2530可与用户界面2414(图24)通信。尽管模块2531至2539被示出为与主控制模块2530直接通信,但模块2531至2539也可以彼此直接通信,与用户界面2414和生物传感器2402直接通信。另外,模块2531至2539可以通过其他模块与主控制模块2530通信。As shown in FIG. 25 ,
多个模块2531至2539包括分别与子系统2406、2408、2410和2409通信的系统模块2531至2533、2539。流体控制模块2531可以与流体控制系统2406通信,以控制流体网络的阀和流量传感器,从而控制一种或多种流体通过流体网络的流动。流体储存模块2532可以在流体量低时或在废物储存器处于或接近容量时通知用户。流体储存模块2532还可以与温度控制模块2533通信,使得流体可以储存在所需温度下。照明模块2539可以与照明系统2409通信,以在协议期间的指定时间照亮反应位点,诸如在已发生所需反应(例如,结合事件)之后。在一些具体实施中,照明模块2539可以与照明系统2409通信,从而以指定角度照亮反应位点。The plurality of modules 2531-2539 includes system modules 2531-2533, 2539 in communication with
多个模块2531至2539还可以包括与生物传感器2402通信的设备模块2534和确定与生物传感器2402相关的识别信息的识别模块2535。设备模块2534可例如与系统插座2412通信以确认生物传感器已与碱基检出系统2400建立电连接和流体连接。识别模块2535可以接收识别生物传感器2402的信号。识别模块2535可以使用生物传感器2402的身份来向用户提供其他信息。例如,识别模块2535可以确定并随后显示批号、制造日期或建议与生物传感器2402一起运行的协议。Plurality of
多个模块2531至2539还包括接收和分析来自生物传感器2402的信号数据(例如,图像数据)的分析模块2538(也称为信号处理模块或信号处理器)。分析模块2538包括用于储存检测数据的存储器(例如,RAM或闪存)。检测数据可包括多个像素信号序列,使得可在许多碱基检出循环内检测来自数百万个传感器(或像素)中的每个传感器(或像素)的像素信号序列。信号数据可以被存储用于后续分析,或者可以被传输到用户界面2414以向用户显示所需信息。在一些具体实施中,信号数据可以在分析模块2538接收到信号数据之前由固态成像器件(例如,CMOS图像传感器)处理。The number of
分析模块2538被配置为在多个测序循环的每个测序循环处从光检测器获得图像数据。图像数据来源于由光检测器检测到的发射信号,并且通过神经网络(例如,基于神经网络的模板生成器2548、基于神经网络的碱基检出器2558(例如,参见图7、图9和图10)和/或基于神经网络的质量评分器2568)处理多个测序循环的每个测序循环的图像数据,并且在多个测序循环的每个测序循环处针对分析物中的至少一些产生碱基检出。The
协议模块2536和协议模块2537与主控制模块2530通信,以在进行预先确定的测定协议时控制子系统2406、2408和2410的操作。协议模块2536和2537可包括用于指示碱基检出系统2400根据预先确定的协议执行具体操作的指令集。如图所示,协议模块可以是边合成边测序(SBS)模块2536,该模块被配置为发出用于执行边合成边测序过程的各种命令。在SBS中,监测核酸引物沿核酸模板的延伸,以确定模板中核苷酸的序列。基础化学过程可以是聚合(例如,由聚合酶催化)或连接(例如,由连接酶催化)。在特定的基于聚合酶的SBS具体实施中,以依赖于模板的方式将荧光标记的核苷酸添加至引物(从而使引物延伸),使得对添加至引物的核苷酸的顺序和类型的检测可用于确定模板的序列。例如,为了启动第一SBS循环,可发出命令以将一个或多个标记的核苷酸、DNA聚合酶等递送至/通过容纳有核酸模板阵列的流通池。核酸模板可位于对应的反应位点。其中引物延伸导致标记的核苷酸掺入的那些反应位点可通过成像事件来检测。在成像事件期间,照明系统2409可向反应位点提供激发光。任选地,核苷酸还可以包括一旦将核苷酸添加到引物就终止进一步的引物延伸的可逆终止属性。例如,可以将具有可逆终止子部分的核苷酸类似物添加到引物,使得后续的延伸直到递送解封闭剂以去除该部分才发生。因此,对于使用可逆终止的具体实施,可发出命令以将解封闭剂递送到流通池(在检测发生之前或之后)。可发出一个或多个命令以实现各个递送步骤之间的洗涤。然后可以重复该循环n次,以将引物延伸n个核苷酸,从而检测长度为n的序列。示例性测序技术描述于:例如Bentley等人,Nature456:53-59(2008)、WO04/018497、US 7,057,026、WO 91/06678、WO 07/123744、US 7,329,492、US 7,211,414、US7,315,019和US 7,405,281中,这些文献中的每一篇以引用方式并入本文。
对于SBS循环的核苷酸递送步骤,可以一次递送单一类型的核苷酸,或者可以递送多种不同的核苷酸类型(例如,A、C、T和G一起)。对于一次仅存在单一类型的核苷酸的核苷酸递送构型,不同的核苷酸不需要具有不同的标记,因为它们可基于个体化递送中固有的时间间隔来区分。因此,测序方法或装置可使用单色检测。例如,激发源仅需要提供单个波长或单个波长范围内的激发。对于其中递送导致多种不同核苷酸同时存在于流通池中的核苷酸递送构型,可基于附着到混合物中相应核苷酸类型的不同荧光标记来区分掺入不同核苷酸类型的位点。例如,可使用四种不同的核苷酸,每种核苷酸具有四种不同荧光团中的一种。在一个具体实施中,可使用在光谱的四个不同区域中的激发来区分四种不同的荧光团。例如,可使用四种不同的激发辐射源。另选地,可使用少于四种不同的激发源,但来自单个源的激发辐射的光学过滤可用于在流通池处产生不同范围的激发辐射。For the nucleotide delivery step of the SBS cycle, a single type of nucleotide can be delivered at a time, or multiple different nucleotide types (eg, A, C, T, and G together) can be delivered. For nucleotide delivery configurations where only a single type of nucleotide is present at a time, different nucleotides need not have different labels, as they can be distinguished based on the time interval inherent in individualized delivery. Accordingly, a sequencing method or device may use single-color detection. For example, an excitation source need only provide excitation at a single wavelength or range of wavelengths. For nucleotide delivery configurations where delivery results in the simultaneous presence of multiple different nucleotide types in the flow cell, the sites where different nucleotide types are incorporated can be distinguished based on the different fluorescent labels attached to the corresponding nucleotide types in the mixture. point. For example, four different nucleotides, each with one of four different fluorophores, can be used. In one implementation, four different fluorophores can be distinguished using excitation in four different regions of the spectrum. For example, four different excitation radiation sources can be used. Alternatively, fewer than four different excitation sources may be used, but optical filtering of the excitation radiation from a single source may be used to generate different ranges of excitation radiation at the flow cell.
在一些具体实施中,可在具有四种不同核苷酸的混合物中检测到少于四种不同颜色。例如,核苷酸对可在相同波长下检测,但基于对中的一个成员相对于另一个成员的强度差异,或基于对中的一个成员的导致与检测到的该对的另一个成员的信号相比明显的信号出现或消失的变化(例如,通过化学改性、光化学改性或物理改性)来区分。用于使用少于四种颜色的检测来区分四个不同核苷酸的示例性装置和方法描述于例如美国专利申请序列号61/538,294和61/619,878,其通过引用整体并入本文。2012年9月21日提交的美国申请13/624,200也全文以引用方式并入。In some implementations, fewer than four different colors can be detected in a mixture with four different nucleotides. For example, pairs of nucleotides can be detected at the same wavelength, but based on differences in the intensity of one member of the pair relative to the other, or based on one member of the pair resulting in a signal that differs from the detected signal of the other member of the pair. Distinguished by changes (eg, by chemical modification, photochemical modification, or physical modification) compared to the appearance or disappearance of a distinct signal. Exemplary devices and methods for differentiating four different nucleotides using detection using fewer than four colors are described, for example, in US Patent Application Serial Nos. 61/538,294 and 61/619,878, which are incorporated herein by reference in their entirety. US Application 13/624,200, filed September 21, 2012, is also incorporated by reference in its entirety.
多个协议模块还可以包括样品制备(或生成)模块2537,该模块被配置为向流体控制系统2406和温度控制系统2410发出命令,以扩增生物传感器2402内的产物。例如,生物传感器2402可接合至碱基检出系统2400。扩增模块2537可以向流体控制系统2406发出指令,以将必要的扩增组分递送到生物传感器2402内的反应室。在其他具体实施中,反应位点可能已包含一些用于扩增的组分,诸如模板DNA和/或引物。在将扩增组分递送至反应室之后,扩增模块2537可以指示温度控制系统2410根据已知的扩增协议循环通过不同的温度阶段。在一些具体实施中,扩增和/或核苷酸掺入等温进行。Number of protocol modules may also include a sample preparation (or generation)
SBS模块2536可以发出命令以执行桥式PCR,其中克隆扩增子的簇形成于流通池的通道内的局部区域上。通过桥式PCR产生扩增子后,可将扩增子“线性化”以制备单链模板DNA或sstDNA,并且可将测序引物杂交至侧接感兴趣的区域的通用序列。例如,可如上所述或如下使用基于可逆终止子的边合成边测序方法。The
每个碱基检出或测序循环可通过单个碱基延伸sstDNA,这可例如通过使用经修饰的DNA聚合酶和四种类型的核苷酸的混合物来完成。不同类型的核苷酸可具有独特的荧光标记,并且每个核苷酸还可具有可逆终止子,该可逆终止子仅允许在每个循环中发生单碱基掺入。在将单个碱基添加到sstDNA之后,激发光可入射到反应位点上并且可检测荧光发射。在检测后,可从sstDNA化学切割荧光标记和终止子。接下来可为另一个类似的碱基检出或测序循环。在此类测序协议中,SBS模块2536可以指示流体控制系统2406引导试剂和酶溶液流过生物传感器2402。可与本文所述的装置和方法一起使用的基于可逆终止子的示例性SBS方法描述于美国专利申请公布2007/0166705 A1、美国专利申请公布2006/0188901 A1、美国专利7,057,026、美国专利申请公布2006/0240439 A1、美国专利申请公布2006/02814714709 A1、PCT公布WO 05/065814、PCT公布WO 06/064199,这些专利中的每一篇均全文以引用方式并入本文。用于基于可逆终止子的SBS的示例性试剂描述于US 7,541,444、US7,057,026、US 7,427,673、US 7,566,537和US 7,592,435中,这些专利中的每一篇均全文以引用方式并入本文。Each base calling or sequencing cycle can extend sstDNA by a single base, which can be done, for example, by using a modified DNA polymerase and a mixture of the four types of nucleotides. Different types of nucleotides can have unique fluorescent labels, and each nucleotide can also have a reversible terminator that allows only a single base incorporation to occur each cycle. Following the addition of a single base to the sstDNA, excitation light can be incident on the reaction site and fluorescence emission can be detected. After detection, the fluorescent label and terminator can be chemically cleaved from the sstDNA. This may be followed by another similar base calling or sequencing cycle. In such sequencing protocols, the
在一些具体实施中,扩增模块和SBS模块可在单个测定协议中操作,其中例如扩增模板核酸并随后将其在同一盒内测序。In some implementations, the amplification module and the SBS module can be operated in a single assay protocol, where, for example, a template nucleic acid is amplified and then sequenced within the same cassette.
碱基检出系统2400还可允许用户重新配置测定协议。例如,碱基检出系统2400可通过用户界面2414向用户提供用于修改所确定的协议的选项。例如,如果确定生物传感器2402将用于扩增,则碱基检出系统2400可请求退火循环的温度。此外,如果用户已提供对于所选测定协议通常不可接受的用户输入,则碱基检出系统2400可向用户发出警告。The
在具体实施中,生物传感器2402包括数百万个传感器(或像素),每个传感器(或像素)在后续的碱基检出循环内生成多个像素信号序列。分析模块2538根据传感器阵列上传感器的逐行和/或逐列位置来检测多个像素信号序列并将它们归属于对应的传感器(或像素)。In a specific implementation, the
传感器阵列中的每个传感器可产生流通池的区块的传感器数据,其中区块位于流通池上的在碱基检出操作期间设置遗传物质的簇的区域中。传感器数据可包括像素阵列中的图像数据。对于给定循环,传感器数据可包括多于一个图像,从而产生多特征每像素作为区块数据。Each sensor in the sensor array can generate sensor data for a block of the flow cell in a region on the flow cell where the cluster of genetic material was disposed during a base calling operation. The sensor data may include image data in the pixel array. For a given cycle, the sensor data may include more than one image, resulting in multiple features per pixel as block data.
图26是可用于实现所公开的技术的计算机系统2600的简化框图。计算机系统2600包括经由总线子系统2655与多个外围设备通信的至少一个中央处理单元(CPU)2672。这些外围设备可以包括存储子系统2610,该存储子系统包括例如存储器设备和文件存储子系统2636、用户界面输入设备2638、用户界面输出设备2676和网络接口子系统2674。输入设备和输出设备允许用户与计算机系统2600进行交互。网络接口子系统2674提供通向外部网络的接口,该接口包括通向其他计算机系统中的对应接口设备的接口。FIG. 26 is a simplified block diagram of a computer system 2600 that can be used to implement the disclosed techniques. Computer system 2600 includes at least one central processing unit (CPU) 2672 in communication with a number of peripheral devices via bus subsystem 2655 . These peripherals may include storage subsystem 2610 including, for example, memory device and
用户界面输入设备2638可以包括:键盘;指向设备,诸如鼠标、轨迹球、触摸板或图形输入板;扫描仪;结合到显示器中的触摸屏;音频输入设备,诸如语音识别系统和麦克风;以及其他类型的输入设备。一般来讲,使用术语“输入设备”旨在包括将信息输入到计算机系统2600中的所有可能类型的设备和方式。User
用户界面输出设备2676可以包括显示子系统、打印机、传真机或非视觉显示器(诸如音频输出设备)。显示子系统可包括LED显示器、阴极射线管(CRT)、平板设备诸如液晶显示器(LCD)、投影设备或用于产生可见图像的一些其他机构。显示子系统还可提供非视觉显示器,诸如音频输出设备。一般来讲,使用术语“输出设备”旨在包括将信息从计算机系统2600输出到用户或者输出到另一机器或计算机系统的所有可能类型的设备和方式。User
存储子系统2610存储提供本文描述的一些或全部模块和方法的功能的编程结构和数据结构。这些软件模块通常由深度学习处理器2678执行。Storage subsystem 2610 stores programming structures and data structures that provide the functionality of some or all of the modules and methods described herein. These software modules are typically executed by the deep learning processor 2678.
在一个具体实施中,神经网络使用深度学习处理器2678来实现,这些深度学习处理器可以是可配置和可重构处理器、现场可编程门阵列(FPGA)、专用集成电路(ASIC)和/或粗粒度可重构架构(CGRA)和图形处理单元(GPU)或其他配置的设备。深度学习处理器2678可以由深度学习云平台诸如Google Cloud PlatformTM、XilinxTM和CirrascaleTM托管。深度学习处理器14978的示例包括Google的Tensor Processing Unit(TPU)TM、机架解决方案(如GX4 Rackmount SeriesTM、GX149 Rackmount SeriesTM)、NVIDIA DGX-1TM、Microsoft的Stratix V FPGATM、Graphcore的Intelligent Processor Unit(IPU)TM、Qualcomm的具有Snapdragon processorsTM的Zeroth PlatformTM、NVIDIA的VoltaTM、NVIDIA的DRIVE PXTM、NVIDIA的JETSON TX1/TX2 MODULETM、Intel的NirvanaTM、Movidius VPUTM、Fujitsu DPITM、ARM的DynamicIQTM、IBM TrueNorthTM等。In one specific implementation, the neural network is implemented using deep learning processors 2678, which may be configurable and reconfigurable processors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and/or Or coarse-grained reconfigurable architecture (CGRA) and graphics processing unit (GPU) or other configured devices. The deep learning processor 2678 can be hosted by a deep learning cloud platform such as Google Cloud Platform ™ , Xilinx ™ , and Cirrascale ™ . Examples of Deep Learning Processor 14978 include Google's Tensor Processing Unit (TPU) TM , Rackmount Solutions (such as GX4 Rackmount Series TM , GX149 Rackmount Series TM ), NVIDIA DGX-1 TM , Microsoft's Stratix V FPGA TM , Graphcore's Intelligent Processor Unit (IPU) TM , Qualcomm's Zeroth Platform TM with Snapdragon processors TM , NVIDIA's Volta TM , NVIDIA's DRIVE PX TM , NVIDIA's JETSON TX1/TX2 MODULE TM , Intel's Nirvana TM , Movidius VPU TM , Fujitsu DPI TM , DynamicIQ TM of ARM, IBM TrueNorth TM , etc.
在存储子系统2610中使用的存储器子系统2622可以包括多个存储器,包括用于在程序执行期间存储指令和数据的主随机存取存储器(RAM)2634和其中存储固定指令的只读存储器(ROM)2632。文件存储子系统2636可以为程序文件和数据文件提供持久性存储,并且可以包括硬盘驱动器、软盘驱动器以及相关联的可移动介质、CD-ROM驱动器、光盘驱动器或可移动介质磁盘盒。实现某些具体实施的功能的模块可以由文件存储子系统2636存储在存储子系统2610中,或者存储在处理器可访问的其他机器中。Memory subsystem 2622 used in storage subsystem 2610 may include a number of memories including main random access memory (RAM) 2634 for storing instructions and data during program execution and read only memory (ROM) in which fixed instructions are stored. )2632.
总线子系统2655提供用于使计算机系统2600的各种部件和子系统按照预期彼此通信的机构。尽管总线子系统2655被示意性地示出为单条总线,但是总线子系统的另选的具体实施可以使用多条总线。Bus subsystem 2655 provides a mechanism for the various components and subsystems of computer system 2600 to communicate with each other as intended. Although the bus subsystem 2655 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple buses.
计算机系统2600本身可以具有不同类型,包括个人计算机、便携式计算机、工作站、计算机终端、网络计算机、电视机、主机、服务器群、一组广泛分布的松散联网的计算机,或者任何其他数据处理系统或用户设备。由于计算机和网络的不断变化的性质,对图26中描绘的计算机系统2600的描述仅旨在作为用于示出本发明的优选具体实施的具体示例。计算机系统2600的许多其他配置是可能的,其具有比图26中描绘的计算机系统更多或更少的部件。Computer system 2600 itself may be of various types, including a personal computer, portable computer, workstation, computer terminal, network computer, television, mainframe, server farm, a widely distributed group of loosely networked computers, or any other data processing system or user equipment. Due to the ever-changing nature of computers and networks, the description of computer system 2600 depicted in FIG. 26 is intended only as a specific example for illustrating a preferred implementation of the invention. Many other configurations of computer system 2600 are possible, having more or fewer components than the computer system depicted in FIG. 26 .
Claims (29)
Applications Claiming Priority (9)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163161880P | 2021-03-16 | 2021-03-16 | |
| US202163161896P | 2021-03-16 | 2021-03-16 | |
| US63/161896 | 2021-03-16 | ||
| US63/161880 | 2021-03-16 | ||
| US17/687,583 US12525320B2 (en) | 2021-03-16 | 2022-03-04 | Neural network parameter quantization for base calling |
| US17/687551 | 2022-03-04 | ||
| US17/687,551 US20220301657A1 (en) | 2021-03-16 | 2022-03-04 | Tile location and/or cycle based weight set selection for base calling |
| US17/687583 | 2022-03-04 | ||
| PCT/US2022/020460 WO2022197752A1 (en) | 2021-03-16 | 2022-03-15 | Tile location and/or cycle based weight set selection for base calling |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115803815A true CN115803815A (en) | 2023-03-14 |
Family
ID=85057463
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202280005057.3A Pending CN115699019A (en) | 2021-03-16 | 2022-03-15 | Neural Network Parameter Quantification for Base Calling |
| CN202280005111.4A Pending CN115803815A (en) | 2021-03-16 | 2022-03-15 | Block position and/or rotation based weight set selection for base detection |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202280005057.3A Pending CN115699019A (en) | 2021-03-16 | 2022-03-15 | Neural Network Parameter Quantification for Base Calling |
Country Status (7)
| Country | Link |
|---|---|
| EP (2) | EP4309179A1 (en) |
| JP (2) | JP7726929B2 (en) |
| KR (1) | KR20230157230A (en) |
| CN (2) | CN115699019A (en) |
| AU (2) | AU2022238841A1 (en) |
| CA (2) | CA3183581A1 (en) |
| IL (1) | IL299077B2 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108573468A (en) * | 2017-03-07 | 2018-09-25 | 伊鲁米那股份有限公司 | Optical distortion correction for imaged samples |
| US20200302224A1 (en) * | 2019-03-21 | 2020-09-24 | Illumina, Inc. | Artificial Intelligence-Based Sequencing |
| CA3104851A1 (en) * | 2019-05-16 | 2020-11-19 | Illumina, Inc. | Base calling using convolutions |
| CN112313666A (en) * | 2019-03-21 | 2021-02-02 | 因美纳有限公司 | Training data generation for artificial intelligence based sequencing |
| CN113166804A (en) * | 2018-11-28 | 2021-07-23 | 牛津纳米孔科技公司 | Analyzing nanopore signals using machine learning techniques |
-
2022
- 2022-03-15 CN CN202280005057.3A patent/CN115699019A/en active Pending
- 2022-03-15 KR KR1020227045560A patent/KR20230157230A/en active Pending
- 2022-03-15 CA CA3183581A patent/CA3183581A1/en active Pending
- 2022-03-15 AU AU2022238841A patent/AU2022238841A1/en active Pending
- 2022-03-15 CN CN202280005111.4A patent/CN115803815A/en active Pending
- 2022-03-15 EP EP22714689.1A patent/EP4309179A1/en active Pending
- 2022-03-15 EP EP22714690.9A patent/EP4309080A1/en not_active Withdrawn
- 2022-03-15 JP JP2022580969A patent/JP7726929B2/en active Active
- 2022-03-15 CA CA3183567A patent/CA3183567A1/en active Pending
- 2022-03-15 IL IL299077A patent/IL299077B2/en unknown
- 2022-03-15 AU AU2022237501A patent/AU2022237501A1/en active Pending
-
2025
- 2025-08-07 JP JP2025132498A patent/JP2025179067A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108573468A (en) * | 2017-03-07 | 2018-09-25 | 伊鲁米那股份有限公司 | Optical distortion correction for imaged samples |
| CN113166804A (en) * | 2018-11-28 | 2021-07-23 | 牛津纳米孔科技公司 | Analyzing nanopore signals using machine learning techniques |
| US20200302224A1 (en) * | 2019-03-21 | 2020-09-24 | Illumina, Inc. | Artificial Intelligence-Based Sequencing |
| CN112313666A (en) * | 2019-03-21 | 2021-02-02 | 因美纳有限公司 | Training data generation for artificial intelligence based sequencing |
| CA3104851A1 (en) * | 2019-05-16 | 2020-11-19 | Illumina, Inc. | Base calling using convolutions |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115699019A (en) | 2023-02-03 |
| EP4309080A1 (en) | 2024-01-24 |
| AU2022237501A1 (en) | 2023-02-02 |
| CA3183567A1 (en) | 2022-09-22 |
| IL299077B2 (en) | 2025-11-01 |
| EP4309179A1 (en) | 2024-01-24 |
| IL299077B1 (en) | 2025-07-01 |
| JP2024510539A (en) | 2024-03-08 |
| AU2022238841A1 (en) | 2023-02-02 |
| IL299077A (en) | 2023-02-01 |
| CA3183581A1 (en) | 2022-09-22 |
| JP2025179067A (en) | 2025-12-09 |
| JP7726929B2 (en) | 2025-08-20 |
| KR20230157230A (en) | 2023-11-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN115136243B (en) | Hardware execution and acceleration of artificial intelligence based base detectors | |
| US20220301657A1 (en) | Tile location and/or cycle based weight set selection for base calling | |
| US20230041989A1 (en) | Base calling using multiple base caller models | |
| CN117501372A (en) | A self-learning base caller trained using organismal sequences | |
| US20230026084A1 (en) | Self-learned base caller, trained using organism sequences | |
| US20220415445A1 (en) | Self-learned base caller, trained using oligo sequences | |
| JP7726929B2 (en) | Tile Position and/or Cycle-Based Weight Set Selection for Base Calling | |
| WO2023009758A1 (en) | Quality score calibration of basecalling systems | |
| WO2022197752A1 (en) | Tile location and/or cycle based weight set selection for base calling | |
| EP4364155B1 (en) | Self-learned base caller, trained using oligo sequences | |
| JP7809733B2 (en) | Self-learning base code trained using oligo sequences | |
| US20230029970A1 (en) | Quality score calibration of basecalling systems | |
| JP2024529843A (en) | Base calling using multiple base call models | |
| CN117546248A (en) | Base detection using multiple base detector model | |
| CN117529780A (en) | Quality Score Calibration of Base Calling Systems |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |