[go: up one dir, main page]

CN115803815A - Block position and/or rotation based weight set selection for base detection - Google Patents

Block position and/or rotation based weight set selection for base detection Download PDF

Info

Publication number
CN115803815A
CN115803815A CN202280005111.4A CN202280005111A CN115803815A CN 115803815 A CN115803815 A CN 115803815A CN 202280005111 A CN202280005111 A CN 202280005111A CN 115803815 A CN115803815 A CN 115803815A
Authority
CN
China
Prior art keywords
weights
sensing
series
sensor data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280005111.4A
Other languages
Chinese (zh)
Inventor
G·D·帕纳比
M·D·哈姆
A·C·杜普瑞兹
D·卡什夫哈吉吉
K·贾加纳坦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imena Software Co ltd
Inmair Ltd
Original Assignee
Imena Software Co ltd
Inmair Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/687,583 external-priority patent/US12525320B2/en
Application filed by Imena Software Co ltd, Inmair Ltd filed Critical Imena Software Co ltd
Priority claimed from PCT/US2022/020460 external-priority patent/WO2022197752A1/en
Publication of CN115803815A publication Critical patent/CN115803815A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Pens And Brushes (AREA)
  • Road Signs Or Road Markings (AREA)
  • Road Paving Structures (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种用于碱基检出的系统,该系统包括存储器,该存储器存储神经网络的拓扑结构、多个权重集和用于一系列感测循环的传感器数据。测序事件跨越该碱基检出操作经过感测循环子系列的时间进展和该碱基检出操作经过生物传感器上的位置的空间进展。可配置处理器被配置为:在可配置处理器上加载拓扑结构,根据感测循环的受试者子系列和/或生物传感器上的受试者位置来选择权重集,在处理元件上加载用于感测循环的受试者子系列和受试者位置的受试者传感器数据,使用所选择的权重集来配置拓扑结构,以及使神经网络处理受试者传感器数据以产生用于受试者子系列和受试者位置的碱基检出分类数据。

Figure 202280005111

The present invention discloses a system for base calling that includes a memory that stores a topology of a neural network, a plurality of weight sets, and sensor data for a series of sensing cycles. Sequencing events span the temporal progression of the base calling operation through a subseries of sensing cycles and the spatial progression of the base calling operation through locations on the biosensor. The configurable processor is configured to: load the topology on the configurable processor, select the set of weights based on the subject sub-series of sensing cycles and/or the subject's position on the biosensor, load the processing element with subject sensor data for the subject subseries of the sensing cycle and the subject location, configure the topology using the selected set of weights, and cause the neural network to process the subject sensor data to generate Base calling categorical data for subseries and subject positions.

Figure 202280005111

Description

用于碱基检出的基于区块位置和/或循环的权重集选择Block position and/or cycle based weight set selection for base calling

优先权申请priority application

本申请要求2021年3月16日提交的名称为“Tile Location and/or Cycle BasedWeight Set Selection for Base Calling”的美国临时专利申请号63/161,880(代理人案卷号ILLM 1019-1/IP-1861-PRV);2021年3月16日提交的名称为“Neural NetworkParameter Quantization for Base Calling”的美国临时专利申请号63/161,896(代理人案卷号ILLM 1019-2/IP-2049-PRV);2022年3月4日提交的名称为“Tile Location and/orCycle Based Weight Set Selection for Base Calling”的美国非临时专利申请号17/687,551(代理人案卷号ILLM 1019-3/IP-1861-US);2022年3月4日提交的名称为“NeuralNetwork Parameter Quantization for Base Calling”的美国非临时专利申请号17,687,583(代理人案卷号ILLM 1019-4/IP-2049-US)的权益。优先权申请据此以引用方式并入本文中用于所有目的。This application claims U.S. Provisional Patent Application No. 63/161,880, filed March 16, 2021, entitled "Tile Location and/or Cycle BasedWeight Set Selection for Base Calling" (Attorney Docket No. ILLM 1019-1/IP-1861- PRV); U.S. Provisional Patent Application No. 63/161,896, filed March 16, 2021, entitled "Neural NetworkParameter Quantization for Base Calling" (Attorney Docket No. ILLM 1019-2/IP-2049-PRV); March 2022 U.S. Nonprovisional Patent Application No. 17/687,551 (Attorney Docket ILLM 1019-3/IP-1861-US), entitled "Tile Location and/orCycle Based Weight Set Selection for Base Calling," filed May 4; 2022 Benefit of U.S. Nonprovisional Patent Application No. 17,687,583 (Attorney Docket ILLM 1019-4/IP-2049-US), filed March 4, entitled "NeuralNetwork Parameter Quantization for Base Calling." The priority application is hereby incorporated herein by reference for all purposes.

技术领域technical field

本发明所公开的技术涉及人工智能类型计算机和数字数据处理系统以及对应数据处理方法和用于仿真智能的产品(即,基于知识的系统、推断系统和知识采集系统);并且包括用于不确定性推断的系统(例如,模糊逻辑系统)、自适应系统、机器学习系统和人工神经网络。具体地,所公开的技术涉及使用深度神经网络诸如深度卷积神经网络来分析数据以及权重集的选择性使用。The technology disclosed in the present invention relates to artificial intelligence type computers and digital data processing systems and corresponding data processing methods and products for simulating intelligence (that is, knowledge-based systems, inference systems, and knowledge acquisition systems); Inference systems (eg, fuzzy logic systems), adaptive systems, machine learning systems, and artificial neural networks. In particular, the disclosed techniques relate to the use of deep neural networks, such as deep convolutional neural networks, to analyze data and the selective use of weight sets.

文献并入Literature incorporated

以下文献以引用方式并入,即如同在本文完整示出一样:The following documents are incorporated by reference as if fully set forth herein:

2020年2月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED BASE CALLINGOF INDEX SEQUENCES”的美国临时专利申请号62/979,384(代理人案卷号ILLM 1015-1/IP-1857-PRV);U.S. Provisional Patent Application No. 62/979,384, entitled "ARTIFICIAL INTELLIGENCE-BASED BASE CALLINGOF INDEX SEQUENCES," filed February 20, 2020 (Attorney Docket No. ILLM 1015-1/IP-1857-PRV);

2020年2月20日提交的名称为“ARTIFICIAL INTELLIGENCE-BASED MANY-TO-MANYBASE CALLING”的美国临时专利申请号62/979,414(代理人案卷号ILLM 1016-1/IP-1858-PRV);U.S. Provisional Patent Application No. 62/979,414, entitled "ARTIFICIAL INTELLIGENCE-BASED MANY-TO-MANYBASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1016-1/IP-1858-PRV);

2020年2月20日提交的标题为“KNOWLEDGE DISTILLATION-BASED COMPRESSION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的美国临时专利申请号62/979,385(代理人案卷号ILLM 1017-1/IP-1859-PRV);U.S. Provisional Patent Application No. 62/979,385, entitled "KNOWLEDGE DISTILLATION-BASED COMPRESSION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER," filed February 20, 2020 (Attorney Docket No. ILLM 1017-1/IP-1859-PRV);

2020年8月28日提交的标题为“DETECTING AND FILTERING CLUSTERS BASED ONARTIFICIAL INTELLIGENCE-PREDICTED BASE CALLS”的美国临时专利申请号63/072,032(代理人案卷号ILLM 1018-1/IP-1860-PRV);U.S. Provisional Patent Application No. 63/072,032, entitled "DETECTING AND FILTERING CLUSTERS BASED ONARTIFICIAL INTELLIGENCE-PREDICTED BASE CALLS," filed August 28, 2020 (Attorney Docket No. ILLM 1018-1/IP-1860-PRV);

2020年2月20日提交的标题为“DATA COMPRESSION FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING”的美国临时专利申请号62/979,411(代理人案卷号ILLM 1029-1/IP-1964-PRV);U.S. Provisional Patent Application No. 62/979,411, entitled "DATA COMPRESSION FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1029-1/IP-1964-PRV);

2020年2月20日提交的标题为“SQUEEZING LAYER FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING”的美国临时专利申请号62/979,399(代理人案卷号ILLM 1030-1/IP-1982-PRV);U.S. Provisional Patent Application No. 62/979,399, entitled "SQUEEZING LAYER FOR ARTIFICIALINTELLIGENCE-BASED BASE CALLING," filed February 20, 2020 (Attorney Docket No. ILLM 1030-1/IP-1982-PRV);

2020年3月20日提交的标题为“TRAINING DATA GENERATION FOR ARTIFICIALINTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/825,987(代理人案卷号ILLM 1008-16/IP-1693-US);U.S. Nonprovisional Patent Application No. 16/825,987, entitled "TRAINING DATA GENERATION FOR ARTIFICIALINTELLIGENCE-BASED SEQUENCING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-16/IP-1693-US);

2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED GENERATION OFSEQUENCING METADATA”的美国非临时专利申请号16/825,991(代理人案卷号ILLM 1008-17/IP-1741-US);U.S. Nonprovisional Patent Application No. 16/825,991, entitled "ARTIFICIAL INTELLIGENCE-BASED GENERATION OFSEQUENCING METADATA," filed March 20, 2020 (Attorney Docket No. ILLM 1008-17/IP-1741-US);

2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED BASE CALLING”的美国非临时专利申请号16/826,126(代理人案卷号ILLM 1008-18/IP-1744-US);U.S. Nonprovisional Patent Application No. 16/826,126, entitled "ARTIFICIAL INTELLIGENCE-BASED BASE CALLING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-18/IP-1744-US);

2020年3月20日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED QUALITYSCORING”的美国非临时专利申请号16/826,134(代理人案卷号ILLM 1008-19/IP-1747-US);U.S. Nonprovisional Patent Application No. 16/826,134, entitled "ARTIFICIAL INTELLIGENCE-BASED QUALITYSCORING," filed March 20, 2020 (Attorney Docket No. ILLM 1008-19/IP-1747-US);

2020年3月21日提交的标题为“ARTIFICIAL INTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/826,168(代理人案卷号ILLM 1008-20/IP-1752-US);U.S. Nonprovisional Patent Application No. 16/826,168, entitled "ARTIFICIAL INTELLIGENCE-BASED SEQUENCING," filed March 21, 2020 (Attorney Docket No. ILLM 1008-20/IP-1752-US);

2020年5月14日提交的名称为“Systems and Devices for Characterizationand Performance Analysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US);以及U.S. Nonprovisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM 1011-4/IP-1750-US) ;as well as

2021年2月15日提交的名称为“HARDWARE EXECUTION AND ACCELERATION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的美国非临时专利申请号17/176,147(代理人案卷号ILLM1020-2/IP-1866-US)。U.S. Nonprovisional Patent Application No. 17/176,147, entitled "HARDWARE EXECUTION AND ACCELERATION OFARTIFICIAL INTELLIGENCE-BASED BASE CALLER," filed February 15, 2021 (Attorney Docket No. ILLM1020-2/IP-1866-US).

背景技术Background technique

本部分中讨论的主题不应仅因为在本部分中有提及就被认为是现有技术。类似地,在本部分中提及的或与作为背景技术提供的主题相关联的问题不应被认为先前在现有技术中已被认识到。本部分中的主题仅表示不同的方法,这些方法本身也可对应于受权利要求书保护的技术的具体实施。Subject matter discussed in this section should not be admitted to be prior art merely by virtue of its mention in this section. Similarly, issues mentioned in this section or in connection with subject matter provided as background should not be admitted to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which may themselves correspond to specific implementations of the claimed technology.

近年来,计算能力的快速提高使得深度卷积神经网络(CNN)在许多准确度显著提高的计算机视觉任务上取得了很大的成功。在推理阶段,许多应用需要以严格的功率消耗要求对一个图像进行低等待时间处理,这降低了图形处理单元(GPU)和其他通用平台的效率,通过定制专用于深度学习算法推理的数字电路,为特定的加速硬件(例如,现场可编程门阵列(FPGA))带来了机会。然而,由于大数据量、密集型计算、变化的算法结构和频繁的存储器访问,在便携式和嵌入式系统上部署CNN仍然具有挑战性。In recent years, the rapid increase in computing power has allowed deep convolutional neural networks (CNNs) to achieve great success in many computer vision tasks with significantly improved accuracy. In the inference stage, many applications require low-latency processing of an image with strict power consumption requirements, which reduces the efficiency of graphics processing units (GPUs) and other general-purpose platforms. By customizing digital circuits dedicated to deep learning algorithm inference, Opportunities arise for specific acceleration hardware such as Field Programmable Gate Arrays (FPGAs). However, it is still challenging to deploy CNNs on portable and embedded systems due to large data volumes, intensive computations, changing algorithmic structures, and frequent memory accesses.

由于卷积在CNN中贡献了大部分运算,因此卷积加速方案显著影响了硬件CNN加速器的效率和性能。卷积涉及具有沿内核和特征图滑动的四个循环级的乘法和累加(MAC)运算。第一循环级计算内核窗口内的像素的MAC。第二循环级跨不同的输入特征图累加MAC的乘积之和。在完成第一循环级和第二循环级之后,通过添加偏置来获得最终的输出像素。第三循环级在输入特征图内滑动内核窗口。第四循环级生成不同的输出特征图。Since convolution contributes most of the operations in CNN, the convolution acceleration scheme significantly affects the efficiency and performance of hardware CNN accelerators. Convolution involves multiply and accumulate (MAC) operations with four recurrent stages sliding along the kernel and feature maps. The first loop stage computes the MAC of the pixels within the kernel window. The second recurrent stage accumulates the sum of products of the MACs across different input feature maps. After completion of the first and second cyclic stages, the final output pixels are obtained by adding a bias. The third recurrent stage slides a kernel window within the input feature map. The fourth recurrent stage generates different output feature maps.

FPGA由于其(1)高度可重构性,(2)与专用集成电路(ASIC)相比开发时间更快,以跟上CNN的快速发展,(3)良好的性能,以及(4)与GPU相比优越的能量效率,获得了越来越多的关注和普及,特别是在加速推理任务方面。FPGA的高性能和高效率可以通过合成针对特定计算定制的电路来实现,以利用定制的存储器系统直接处理数十亿次运算。例如,现代FPGA上的数百至数千个数字信号处理(DSP)块以高并行性支持核心卷积操作,例如,乘法和加法。外部片上存储器和片上处理引擎(PE)之间的专用数据缓冲器可被设计成通过在FPGA芯片上配置数十兆字节的片上块随机存取存储器(BRAM)来实现优选的数据流。FPGA due to its (1) high reconfigurability, (2) faster development time compared to application-specific integrated circuit (ASIC) to keep up with the rapid development of CNN, (3) good performance, and (4) integration with GPU Compared to superior energy efficiency, it has gained increasing attention and popularity, especially in accelerating inference tasks. The high performance and efficiency of FPGAs can be achieved by synthesizing circuits tailored for specific computations to directly process billions of operations with custom memory systems. For example, hundreds to thousands of digital signal processing (DSP) blocks on modern FPGAs support core convolution operations, eg, multiplication and addition, with high parallelism. A dedicated data buffer between the external on-chip memory and the on-chip processing engine (PE) can be designed to enable optimal data flow by configuring tens of megabytes of on-chip block random access memory (BRAM) on the FPGA chip.

需要高效的数据流和CNN加速的硬件架构来最小化数据通信,同时最大化资源利用来实现高性能。因此有机会设计出在具有高性能、高效率和高度灵活性的加速硬件上加速各种CNN算法的推理过程的方法和框架。Efficient data flow and CNN-accelerated hardware architecture are required to minimize data communication while maximizing resource utilization to achieve high performance. Therefore, there is an opportunity to design methods and frameworks to accelerate the inference process of various CNN algorithms on accelerated hardware with high performance, high efficiency, and high flexibility.

附图说明Description of drawings

在附图中,在所有不同视图中,类似的参考符号通常是指类似的部件。另外,附图未必按比例绘制,而是重点说明所公开的技术的原理。在以下描述中,参考以下附图描述了所公开的技术的各种具体实施,其中:In the drawings, like reference characters generally refer to like parts throughout the different views. Additionally, the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosed technology. In the following description, various implementations of the disclosed technology are described with reference to the following figures, in which:

图1示出了可以在各种实施方案中使用的生物传感器的横截面。Figure 1 shows a cross-section of a biosensor that can be used in various embodiments.

图2示出了在其区块中包含簇的流通池的一个具体实施。Figure 2 shows an implementation of a flow cell containing clusters in its blocks.

图3示出了具有八个槽道的示例性流通池,并且还示出了一个区块及其簇和它们的周围背景的放大视图。Figure 3 shows an exemplary flow cell with eight channels, and also shows an enlarged view of a block and its clusters and their surrounding context.

图4是用于分析来自测序系统的传感器数据(诸如碱基检出传感器输出)的系统的简化框图。4 is a simplified block diagram of a system for analyzing sensor data from a sequencing system, such as base calling sensor output.

图5是示出了碱基检出操作的方面的简化图,该方面包括由主机处理器执行的运行时程序的功能。5 is a simplified diagram illustrating aspects of a base calling operation, including the functionality of a runtime program executed by a host processor.

图6是可配置处理器(诸如,图4的可配置处理器)的配置的简化图。6 is a simplified diagram of the configuration of a configurable processor, such as the configurable processor of FIG. 4 .

图7是可使用如本文所述配置的可配置或可重构阵列执行的神经网络架构的图。7 is a diagram of a neural network architecture that may be implemented using a configurable or reconfigurable array configured as described herein.

图8A是由如图7一样的神经网络架构使用的传感器数据的区块的组织的简化图示。FIG. 8A is a simplified illustration of the organization of blocks of sensor data used by a neural network architecture like FIG. 7 .

图8B是由如图7一样的神经网络架构使用的传感器数据的区块的补片的简化图示。FIG. 8B is a simplified illustration of a patch of blocks of sensor data used by a neural network architecture like FIG. 7 .

图9示出了可配置或可重构阵列(诸如现场可编程门阵列(FPGA))上的如图7一样的神经网络的配置的一部分。Figure 9 shows a portion of the configuration of a neural network like Figure 7 on a configurable or reconfigurable array, such as a Field Programmable Gate Array (FPGA).

图10是可使用如本文所述配置的可配置或可重构阵列执行的另一个另选神经网络架构的图。10 is a diagram of another alternative neural network architecture that may be implemented using a configurable or reconfigurable array configured as described herein.

图11示出了基于神经网络的碱基检出器的专门化架构的一个具体实施,该基于神经网络的碱基检出器用于隔离对不同测序循环的数据的处理。FIG. 11 shows a specific implementation of a specialized architecture of a neural network-based base caller for isolating the processing of data for different sequencing cycles.

图12示出了隔离层的一个具体实施,每个隔离层可包括卷积。Figure 12 shows one implementation of isolation layers, each of which may include convolutions.

图13A示出了组合层的一个具体实施,每个组合层可包括卷积。Figure 13A shows one implementation of combined layers, each of which may include convolutions.

图13B示出了组合层的另一具体实施,每个组合层可包括卷积。Figure 13B shows another implementation of combined layers, each of which may include convolutions.

图14、图15和图16示出了用于碱基检出的各种示例性基于区块位置的权重选择方案。Figures 14, 15, and 16 illustrate various exemplary tile position-based weight selection schemes for base calling.

图17A示出了衰落的示例,其中信号强度随着作为碱基检出操作的测序运行的循环数而降低。Figure 17A shows an example of fading in which signal strength decreases with cycle number of a sequencing run as a base calling operation.

图17B概念性地示出了随着测序循环进展而降低的信噪比。Figure 17B conceptually illustrates the decreasing signal-to-noise ratio as the sequencing cycle progresses.

图18示出了用于碱基检出的示例性基于碱基检出循环数的权重选择方案。Figure 18 illustrates an exemplary base calling cycle number based weight selection scheme for base calling.

图19、图20、图21A和图21B示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的各种示例性权重选择方案。Figures 19, 20, 21A and 21B illustrate various exemplary weight selection schemes based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks.

图22示出了碱基检出操作的一个具体实施,其中基于空间的区块信息和时间的感测循环子系列信息来选择用于碱基检出的权重集。FIG. 22 shows an implementation of a base calling operation, in which a set of weights for base calling is selected based on spatial block information and temporal sensing cycle subseries information.

图23A示出了用于各种类别的区块和用于各种感测循环的各种权重集,各个权重集包括对应的空间权重和对应的时间权重。Figure 23A shows various sets of weights for various classes of blocks and for various sensing cycles, each set of weights including corresponding spatial weights and corresponding temporal weights.

图23B示出了用于各种类别的区块和用于各种循环的各种权重集,其中特定类别的区块的不同权重集包括公共空间权重和不同的时间权重。Figure 23B shows various sets of weights for various classes of tiles and for various cycles, where the different sets of weights for a particular class of tiles include common spatial weights and different temporal weights.

图23C示出了基于一个或多个测序运行参数来选择权重集的系统。Figure 23C illustrates a system for selecting a set of weights based on one or more sequencing run parameters.

图24是根据一个具体实施的碱基检出系统的框图。Figure 24 is a block diagram of a base calling system according to one implementation.

图25是可在图24的系统中使用的系统控制器的框图。FIG. 25 is a block diagram of a system controller that may be used in the system of FIG. 24 .

图26是可用于实施所公开的技术的计算机系统的简化框图。26 is a simplified block diagram of a computer system that can be used to implement the disclosed techniques.

具体实施方式Detailed ways

本文所述的实施方案可用于学术或商业分析的各种生物或化学过程和系统。更具体地,本文所述的实施方案可用于期望检测指示期望反应的事件、属性、质量或特征的各种过程和系统中。例如,本文所述的实施方案包括卡盒、生物传感器及其部件,以及与卡盒和生物传感器一起操作的生物测定系统。在特定实施方案中,卡盒和生物传感器包括流通池和一个或多个传感器、像素、光检测器或光电二极管,它们以基本上一体的结构耦接在一起。Embodiments described herein can be used in various biological or chemical processes and systems for academic or commercial analysis. More specifically, the embodiments described herein can be used in various processes and systems where it is desirable to detect an event, property, quality or characteristic indicative of a desired response. For example, embodiments described herein include cartridges, biosensors and components thereof, and bioassay systems that operate with the cartridges and biosensors. In certain embodiments, the cartridge and biosensor include a flow cell and one or more sensors, pixels, photodetectors, or photodiodes coupled together in a substantially unitary structure.

当结合附图阅读时,将更好地理解特定实施方案的以下详细描述。就附图示出了各种实施方案的功能框的图示而言,功能框不一定指示硬件电路之间的划分。因此,例如,功能框中的一个或多个功能框(例如,处理器或存储器)可以在单件硬件(例如,通用信号处理器或随机存取存储器、硬盘等)中实施。类似地,程序可以是独立程序,可以作为操作系统中的子例程并入,可以为安装的软件包中的功能,等等。应当理解,各种实施方案不限于附图中所示的布置和仪器。The following detailed descriptions of certain embodiments are better understood when read in conjunction with the accompanying figures. To the extent that figures show diagrams of the functional blocks of various embodiments, the functional blocks are not necessarily indicative of the division between hardware circuitry. Thus, for example, one or more of the functional blocks (eg, a processor or memory) may be implemented in a single piece of hardware (eg, a general-purpose signal processor or random access memory, hard disk, etc.). Similarly, a program may be a stand-alone program, may be incorporated as a subroutine in an operating system, may be a function in an installed software package, and so on. It should be understood that the various embodiments are not limited to the arrangements and instrumentalities shown in the drawings.

如本文所用,以单数形式叙述且前面带有词语“一个”或“一种”的元件或步骤应当理解为不排除多个所述元件或步骤,除非明确地指明此类排除。此外,对“一个实施方案”的引用并非旨在被解释为排除也包含所叙述特征的附加实施方案的存在。此外,除非有相反的明确说明,否则“包括”或“具有”或“包含”具有特定属性的一个或多个元件的实施方案可包括附加元件,无论它们是否具有该属性。As used herein, an element or step recited in the singular and preceded by the word "a" or "an" should be understood as not excluding plural of said elements or steps, unless such exclusion is explicitly stated. Furthermore, references to "one embodiment" are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, an embodiment that "comprises" or "has" or "comprises" one or more elements having a particular attribute may include additional elements whether or not they have that attribute, unless expressly stated to the contrary.

如本文所用,“期望反应”包括感兴趣的分析物的化学、电、物理或光学属性(或质量)中的至少一者的变化。在特定实施方案中,期望反应为阳性结合事件(例如,将荧光标记的生物分子与感兴趣的分析物结合)。更一般地,期望反应可以是化学转化、化学变化或化学相互作用。期望反应也可为电属性的变化。例如,期望反应可以是溶液内离子浓度的变化。示例性反应包括但不限于化学反应,诸如还原、氧化、添加、消除、重排、酯化、酰胺化、醚化、环化或取代;结合相互作用,其中第一化学品与第二化学品结合;离解反应,其中两种或更多种化学品彼此分离;荧光;发光;生物发光;化学发光;和生物反应,诸如核酸复制、核酸扩增、核酸杂交、核酸连接、磷酸化、酶催化、受体结合或配体结合。期望反应还可以是质子的添加或消除,例如,可检测为周围溶液或环境的pH变化。附加的期望反应可检测离子的跨膜流动(例如,天然或合成双层膜),例如当离子流过膜时,电流中断并且可检测到该中断。As used herein, a "desired response" includes a change in at least one of the chemical, electrical, physical or optical properties (or qualities) of an analyte of interest. In certain embodiments, the desired response is a positive binding event (eg, binding of a fluorescently labeled biomolecule to an analyte of interest). More generally, the desired reaction may be a chemical transformation, chemical change, or chemical interaction. The desired response may also be a change in electrical properties. For example, the desired response may be a change in the concentration of ions in the solution. Exemplary reactions include, but are not limited to, chemical reactions such as reduction, oxidation, addition, elimination, rearrangement, esterification, amidation, etherification, cyclization, or substitution; binding interactions in which a first chemical interacts with a second chemical Binding; dissociation reactions, in which two or more chemicals are separated from each other; fluorescence; luminescence; bioluminescence; chemiluminescence; and biological reactions, such as nucleic acid replication, nucleic acid amplification, nucleic acid hybridization, nucleic acid ligation, phosphorylation, enzymatic catalysis , receptor binding or ligand binding. The desired reaction may also be the addition or removal of protons, for example detectable as a change in pH of the surrounding solution or environment. An additional desired reaction can detect the flow of ions across a membrane (eg, a natural or synthetic bilayer membrane), eg, when ions flow through the membrane, the current is interrupted and the interruption can be detected.

在特定实施方案中,期望反应包括将荧光标记的分子与分析物结合。分析物可为寡核苷酸,并且荧光标记的分子可为核苷酸。当激发光被导向具有标记核苷酸的寡核苷酸,并且荧光团发出可检测的荧光信号时,可检测到期望反应。在另选的实施方案中,检测到的荧光是化学发光或生物发光的结果。期望反应还可例如通过使供体荧光团接近受体荧光团来增加荧光(或

Figure BDA0004006227830000071
)共振能量转移(FRET),通过分离供体荧光团和受体荧光团来降低FRET,通过分离淬灭基团与荧光团来增加荧光,或通过共定位淬灭基团和荧光团来减少荧光。In certain embodiments, the desired reaction comprises binding a fluorescently labeled molecule to the analyte. Analytes can be oligonucleotides, and fluorescently labeled molecules can be nucleotides. When excitation light is directed at the oligonucleotide with labeled nucleotides, and the fluorophore emits a detectable fluorescent signal, the desired reaction is detected. In alternative embodiments, the detected fluorescence is the result of chemiluminescence or bioluminescence. The desired reaction can also increase fluorescence (or
Figure BDA0004006227830000071
) resonance energy transfer (FRET), which reduces FRET by separating the donor and acceptor fluorophores, increases fluorescence by separating the quencher from the fluorophore, or decreases fluorescence by colocalizing the quencher and fluorophore .

如本文所用,“反应组分”或“反应物”包括可用于获得期望反应的任何物质。例如,反应组分包括试剂、酶、样品、其他生物分子和缓冲液。通常将反应组分递送至溶液中的反应位点和/或固定在反应位点处。反应组分可直接或间接地与另一种物质相互作用,诸如感兴趣的分析物。As used herein, "reaction component" or "reactant" includes any substance that can be used to obtain a desired reaction. For example, reaction components include reagents, enzymes, samples, other biomolecules, and buffers. Typically the reaction components are delivered to and/or immobilized at the reaction site in solution. A reaction component may directly or indirectly interact with another species, such as an analyte of interest.

如本文所用,术语“反应位点”是可发生期望反应的局部区域。反应位点可包括其上可固定物质的基板的支撑表面。例如,反应位点可包括在其上具有核酸群体的流通池的通道中的基本上平坦的表面。通常但并不总是,群体中的核酸具有相同的序列,例如为单链或双链模板的克隆拷贝。然而,在一些实施方案中,反应位点可包含仅单个核酸分子,例如单链或双链形式。此外,多个反应位点可沿着支撑表面不均匀地分布或以预先确定的方式布置(例如,在矩阵中并排布置,诸如在微阵列中)。反应位点还可包括反应室(或孔),其至少部分地限定被配置为分隔期望反应的空间区域或体积。As used herein, the term "reaction site" is a localized area where a desired reaction can occur. A reaction site may comprise a support surface of a substrate on which a substance may be immobilized. For example, a reaction site may comprise a substantially planar surface in a channel of a flow cell having a population of nucleic acids thereon. Usually, but not always, the nucleic acids in a population have the same sequence, eg, are clonal copies of a single- or double-stranded template. However, in some embodiments, a reactive site may comprise only a single nucleic acid molecule, eg, in single- or double-stranded form. In addition, the plurality of reaction sites may be unevenly distributed along the support surface or arranged in a predetermined pattern (eg, arranged side-by-side in a matrix, such as in a microarray). A reaction site may also include a reaction chamber (or well) that at least partially defines a spatial region or volume configured to separate desired reactions.

本申请可互换地使用术语“反应室”和“孔”。如本文所用,术语“反应室”或“孔”包括与流动通道流体连通的空间区域。反应室可至少部分地与周围环境或其他空间区域隔开。例如,多个反应室可通过共用壁彼此隔开。作为更具体的示例,反应室可包括由孔的内部表面限定的腔并且具有开口或孔口,使得腔可与流动通道流体连通。包括此类反应室的生物传感器在2011年10月20日提交的国际申请号PCT/US2011/057111中更详细地描述,该国际申请全文以引用方式并入本文。This application uses the terms "reaction chamber" and "well" interchangeably. As used herein, the term "reaction chamber" or "well" includes a region of space that is in fluid communication with a flow channel. The reaction chamber can be at least partially isolated from the surrounding environment or other spatial regions. For example, multiple reaction chambers may be separated from each other by a common wall. As a more specific example, the reaction chamber can include a cavity defined by the interior surface of the bore and have an opening or orifice such that the cavity can be in fluid communication with the flow channel. Biosensors comprising such reaction chambers are described in more detail in International Application No. PCT/US2011/057111, filed October 20, 2011, which is incorporated herein by reference in its entirety.

在一些实施方案中,反应室的尺寸和形状相对于固体(包括半固体)被设定成使得固体可完全或部分地插入其中。例如,反应室的尺寸和形状可被设定成容纳仅一个捕获小珠。该捕获小珠可在其上具有克隆扩增的DNA或其他物质。或者,反应室的尺寸和形状可被设定成接纳大约数量的小珠或固体基板。又如,反应室还可填充有多孔凝胶或物质,该多孔凝胶或物质被配置为控制扩散或过滤可流入反应室中的流体。In some embodiments, the reaction chamber is sized and shaped relative to the solid (including semi-solid) such that the solid can be fully or partially inserted therein. For example, the reaction chamber can be sized and shaped to accommodate only one capture bead. The capture beads may have clonally amplified DNA or other material thereon. Alternatively, the reaction chamber can be sized and shaped to receive an approximate number of beads or solid substrate. As another example, the reaction chamber may also be filled with a porous gel or substance configured to control diffusion or filter fluids that may flow into the reaction chamber.

在一些实施方案中,传感器(例如,光检测器、光电二极管)与生物传感器的样品表面的对应像素区域相关联。因此,像素区域是表示生物传感器的样品表面上用于一个传感器(或像素)的区域的几何构造。当在覆盖相关联像素区域的反应位点或反应室处发生了期望反应时,与像素区域相关联的传感器检测从相关联像素区域采集的光发射。在平坦表面实施方案中,像素区域可重叠。在一些情况下,多个传感器可以与单个反应位点或单个反应室相关联。在其他情况下,单个传感器可以与一组反应位点或一组反应室相关联。In some embodiments, sensors (eg, photodetectors, photodiodes) are associated with corresponding pixel regions of the sample surface of the biosensor. Thus, a pixel area is a geometrical configuration representing the area on the sample surface of a biosensor for one sensor (or pixel). A sensor associated with a pixel area detects light emission collected from the associated pixel area when a desired reaction occurs at a reaction site or reaction chamber covering the associated pixel area. In flat surface implementations, the pixel areas may overlap. In some cases, multiple sensors may be associated with a single reaction site or a single reaction chamber. In other cases, a single sensor can be associated with a set of reaction sites or a set of reaction chambers.

如本文所用,“生物传感器”包括具有多个反应位点和/或反应室(或孔)的结构。生物传感器可包括固态成像设备(例如,CCD或CMOS成像器件)和任选地安装到其上的流通池。流通池可包括与反应位点和/或反应室流体连通的至少一个流动通道。作为一个具体示例,生物传感器被配置为流体耦接和电耦接到生物测定系统。生物测定系统可根据预先确定的协议(例如,边合成边测序)将反应物递送到反应位点和/或反应室,并且执行多个成像事件。例如,生物测定系统可引导反应溶液沿反应位点和/或反应室流动。溶液中的至少一种可包含四种类型的具有相同或不同荧光标记的核苷酸。核苷酸可以与位于反应位点和/或反应室的对应的寡核苷酸结合。然后,生物测定系统可使用激发光源(例如,固态光源,诸如发光二极管(LED))照亮反应位点和/或反应室。激发光可具有预定的一个或多个波长,包括一个波长范围。被激发的荧光标记提供可由传感器捕获的发射信号。As used herein, "biosensor" includes structures having multiple reaction sites and/or reaction chambers (or wells). A biosensor may include a solid-state imaging device (eg, a CCD or CMOS imaging device) and optionally a flow cell mounted thereto. A flow cell may comprise at least one flow channel in fluid communication with a reaction site and/or a reaction chamber. As a specific example, a biosensor is configured to be fluidly and electrically coupled to a bioassay system. A bioassay system can deliver reactants to reaction sites and/or reaction chambers and perform multiple imaging events according to a predetermined protocol (eg, sequencing by synthesis). For example, a bioassay system can direct the flow of a reaction solution along a reaction site and/or a reaction chamber. At least one of the solutions may contain four types of nucleotides with the same or different fluorescent labels. Nucleotides can bind to corresponding oligonucleotides located at reaction sites and/or reaction chambers. The bioassay system can then illuminate the reaction site and/or reaction chamber using an excitation light source (eg, a solid state light source such as a light emitting diode (LED)). The excitation light may have a predetermined wavelength or wavelengths, including a range of wavelengths. The excited fluorescent label provides an emission signal that can be captured by the sensor.

在另选的实施方案中,生物传感器可包括电极或被配置为检测其他可识别属性的其他类型的传感器。例如,传感器可被配置为检测离子浓度的变化。在另一示例中,传感器可被配置为检测跨膜的离子电流流动。In alternative embodiments, a biosensor may include electrodes or other types of sensors configured to detect other identifiable properties. For example, a sensor may be configured to detect changes in ion concentration. In another example, the sensor can be configured to detect ionic current flow across the membrane.

如本文所用,“簇”是类似或相同分子或核苷酸序列或DNA链的群体。例如,簇可以是扩增的寡核苷酸或具有相同或相似序列的任何其他组的多核苷酸或多肽。在其他实施方案中,簇可为占据样品表面上的物理区域的任何元素或元素组。在实施方案中,在碱基检出循环期间将簇固定到反应位点和/或反应室。As used herein, a "cluster" is a population of similar or identical molecules or nucleotide sequences or DNA strands. For example, a cluster can be an amplified oligonucleotide or any other set of polynucleotides or polypeptides having the same or similar sequence. In other embodiments, a cluster can be any element or group of elements that occupies a physical area on the sample surface. In embodiments, clusters are immobilized to reaction sites and/or reaction chambers during base calling cycles.

如本文所用,当关于生物分子或生物或化学物质使用时,术语“固定的”包括在分子水平上基本上将生物分子或生物或化学物质连接到表面。例如,可使用吸附技术将生物分子或生物或化学物质固定到基板材料的表面,该吸附技术包括非共价相互作用(例如,静电力、范德华力以及疏水界面的脱水)和共价结合技术,其中官能团或接头有利于将生物分子连接到表面。将生物分子或生物或化学物质固定到基板材料的表面可基于基板表面的属性、携带生物分子或生物或化学物质的液体介质以及生物分子或生物或化学物质本身的属性。在一些情况下,基板表面可被官能化(例如,化学或物理改性),以有利于将生物分子(或生物或化学物质)固定到基板表面。可首先改性基板表面以使官能团与表面结合。然后,官能团可与生物分子或生物或化学物质结合,以将官能团固定在其上。物质可经由凝胶固定到表面,例如,如美国专利公布号US 2011/0059865 A1中所描述,该文献以引用方式并入本文。As used herein, the term "immobilized" when used in reference to a biomolecule or biological or chemical substance includes substantially attaching the biomolecule or biological or chemical substance to a surface at the molecular level. For example, biomolecules or biological or chemical substances can be immobilized to the surface of the substrate material using adsorption techniques including non-covalent interactions (e.g., electrostatic forces, van der Waals forces, and dehydration of hydrophobic interfaces) and covalent bonding techniques, Among them, functional groups or linkers facilitate the attachment of biomolecules to surfaces. The immobilization of biomolecules or biological or chemical substances to the surface of the substrate material may be based on the properties of the substrate surface, the liquid medium carrying the biomolecules or biological or chemical substances, and the properties of the biomolecules or biological or chemical substances themselves. In some cases, the substrate surface can be functionalized (eg, chemically or physically modified) to facilitate the immobilization of biomolecules (or biological or chemical species) to the substrate surface. The surface of the substrate may first be modified to bind functional groups to the surface. The functional groups can then be combined with biomolecules or biological or chemical substances to immobilize the functional groups thereon. Substances can be immobilized to surfaces via gels, for example, as described in US Patent Publication No. US 2011/0059865 Al, which is incorporated herein by reference.

在一些实施方案中,可将核酸附着到表面并使用桥式扩增进行扩增。有用的桥式扩增方法描述于例如美国专利号5,641,658、WO 2007/010251、美国专利号6,090,592、美国专利公布号2002/0055100 A1、美国专利号7,115,400、美国专利公布号2004/0096853 A1、美国专利公布号2004/0002090 A1、美国专利公布号2007/0128624 A1或美国专利公布号2008/0009420 A1中有描述,其各自以全文并入本文。用于在表面上扩增核酸的另一种有用方法是滚环扩增(RCA),例如,使用下文进一步详细阐述的方法。在一些实施方案中,可使用一个或多个引物对使核酸附着到表面并扩增。例如,引物中的一个引物可在溶液中,并且另一引物可固定在表面上(例如,5'-附着)。以举例的方式,核酸分子可以与表面上的引物中的一个引物杂交,然后进行固定引物延伸以产生核酸的第一拷贝。然后,溶液中的引物与核酸的第一拷贝杂交,该引物可使用核酸的第一拷贝作为模板来延伸。任选地,在产生核酸的第一拷贝之后,原始核酸分子可以与表面上的第二固定引物杂交,并且可在溶液中的引物延伸的同时或之后延伸。在任何实施方案中,使用固定引物和溶液中的引物的重复延伸(例如,扩增)轮次提供核酸的多个拷贝。In some embodiments, nucleic acids can be attached to surfaces and amplified using bridge amplification. Useful bridge amplification methods are described, for example, in U.S. Patent No. 5,641,658, WO 2007/010251, U.S. Patent No. 6,090,592, U.S. Patent Publication No. 2002/0055100 Al, U.S. Patent No. 7,115,400, U.S. Patent Publication No. 2004/0096853 Al, U.S. Patent No. Described in Publication No. 2004/0002090 Al, US Patent Publication No. 2007/0128624 Al, or US Patent Publication No. 2008/0009420 Al, each of which is incorporated herein in its entirety. Another useful method for amplifying nucleic acids on surfaces is rolling circle amplification (RCA), eg, using the method described in further detail below. In some embodiments, nucleic acids can be attached to a surface and amplified using one or more primer pairs. For example, one of the primers can be in solution and the other primer can be immobilized on the surface (eg, 5'-attached). By way of example, a nucleic acid molecule can be hybridized to one of the primers on the surface, followed by fixed primer extension to produce a first copy of the nucleic acid. The primer in solution then hybridizes to the first copy of the nucleic acid, which primer can be extended using the first copy of the nucleic acid as a template. Optionally, after the first copy of the nucleic acid is produced, the original nucleic acid molecule can be hybridized to a second immobilized primer on the surface, and can be extended concurrently with or subsequent to primer extension in solution. In any embodiment, repeated extension (eg, amplification) rounds using immobilized primers and primers in solution provide multiple copies of the nucleic acid.

在特定实施方案中,由本文所述的系统和方法执行的测定协议包括使用天然核苷酸以及被配置为与天然核苷酸相互作用的酶。天然核苷酸包括例如核糖核苷酸(RNA)或脱氧核糖核苷酸(DNA)。天然核苷酸可为单磷酸盐、二磷酸盐或三磷酸盐形式,并且可具有选自腺嘌呤(A)、胸腺嘧啶(T)、尿嘧啶(U)、鸟嘌呤(G)或胞嘧啶(C)的碱基。然而,应当理解,可使用非天然核苷酸、经修饰的核苷酸或前述核苷酸的类似物。有用的非天然核苷酸的一些示例在下面关于基于可逆终止子的边合成边测序方法来阐述。In certain embodiments, assay protocols performed by the systems and methods described herein include the use of natural nucleotides and enzymes configured to interact with natural nucleotides. Natural nucleotides include, for example, ribonucleotides (RNA) or deoxyribonucleotides (DNA). Natural nucleotides can be in monophosphate, diphosphate or triphosphate form and can have a (C) base. It should be understood, however, that non-natural nucleotides, modified nucleotides, or analogs of the foregoing may be used. Some examples of useful unnatural nucleotides are set forth below with respect to reversible terminator-based sequencing-by-synthesis approaches.

在包括反应室的实施方案中,物品或固体物质(包括半固体物质)可设置在反应室内。当设置时,物品或固体可通过过盈配合、粘附或截留物理地保持或固定在反应室内。可设置在反应室内的物品或固体的示例包括聚合物小珠、微丸、琼脂糖凝胶、粉末、量子点或可被压缩和/或保持在反应室内的其他固体。在特定实施方案中,核酸超结构(诸如DNA球)可例如通过附接至反应室的内表面或通过停留在反应室内的液体中而设置在反应室中或反应室处。可进行DNA球或其他核酸超结构,然后将其设置在反应室中或反应室处。另选地,DNA球可在反应室处原位合成。DNA球可通过滚环扩增来合成以产生特定核酸序列的多联体,并且该多联体可用形成相对紧凑的球的条件处理。DNA球及其合成方法描述于例如美国专利公布号2008/0242560 A1或2008/0234136 A1中,其各自以全文并入本文。保持或设置在反应室中的物质可为固态、液态或气态。In embodiments that include a reaction chamber, objects or solid matter (including semi-solid matter) may be disposed within the reaction chamber. When positioned, items or solids may be physically held or secured within the reaction chamber by interference fit, adhesion or entrapment. Examples of items or solids that may be disposed within the reaction chamber include polymeric beads, pellets, sepharose, powders, quantum dots, or other solids that may be compressed and/or held within the reaction chamber. In particular embodiments, nucleic acid superstructures, such as DNA spheres, may be disposed in or at the reaction chamber, eg, by attaching to an inner surface of the reaction chamber or by residing in a liquid within the reaction chamber. DNA spheres or other nucleic acid superstructures can be made and then placed in or at the reaction chamber. Alternatively, DNA spheres can be synthesized in situ at the reaction chamber. DNA spheres can be synthesized by rolling circle amplification to produce concatemers of specific nucleic acid sequences, and the concatemers can be treated with conditions that form relatively compact spheres. DNA spheres and methods of their synthesis are described, for example, in US Patent Publication Nos. 2008/0242560 Al or 2008/0234136 Al, each of which is incorporated herein in its entirety. The substance held or disposed in the reaction chamber may be solid, liquid or gaseous.

如本文所用,“碱基检出”识别核酸序列中的核苷酸碱基。碱基检出是指确定特定循环处每个簇的碱基检出(A、C、G、T)的过程。例如,可利用在美国专利申请公开号2013/0079232的合并材料中描述的四通道、两通道或一通道方法和系统来执行碱基检出。在特定实施方案中,碱基检出循环被称为“采样事件”。在一染料和双通道测序协议中,采样事件包括时间序列中的两个照明阶段,使得在每个阶段生成像素信号。第一照明阶段诱导来自给定簇的照明,指示AT像素信号中的核苷酸碱基A和T,并且第二照明阶段诱导来自给定簇的照明,指示CT像素信号中的核苷酸碱基C和T。As used herein, "base calling" identifies nucleotide bases in a nucleic acid sequence. Base calling refers to the process of determining the base call (A, C, G, T) for each cluster at a specific cycle. For example, base calling can be performed using the four-pass, two-pass or one-pass methods and systems described in the incorporated material of US Patent Application Publication No. 2013/0079232. In certain embodiments, a base calling cycle is referred to as a "sampling event." In one-dye and two-channel sequencing protocols, sampling events consist of two illumination phases in time series such that pixel signals are generated in each phase. The first illumination stage induces illumination from a given cluster, indicative of nucleotide bases A and T in the AT pixel signal, and the second illumination stage induces illumination from a given cluster, indicative of the nucleotide base in the CT pixel signal Base C and T.

生物传感器biological sensor

图1示出了可以在各种实施方案中使用的生物传感器100的横截面。生物传感器100具有像素区域106'、108'、110'、112'和114',这些像素区域可各自在碱基检出循环期间保持多于一个簇(例如,每像素区域2个簇)。如图所示,生物传感器100可包括安装到采样设备104上的流通池102。在例示的实施方案中,流通池102直接附连到采样设备104。然而,在另选的实施方案中,流通池102可以可移除地耦接到采样设备104。采样设备104具有可被官能化的样品表面134(例如,以适合进行期望反应的方式进行化学或物理改性)。例如,样品表面134可被官能化并且可包括多个像素区域106'、108'、110'、112'和114',该多个像素区域可各自在碱基检出循环期间保持多于一个簇(例如,每个像素区域具有对应的簇对106A、106B;108A、108B;110A、110B;112A、112B;和114A、114B固定在其上)。每个像素区域与对应的传感器(或像素或光电二极管)106、108、110、112和114相关联,使得由像素区域接收的光由对应的传感器捕获。像素区域106'也可以与样品表面134上保持簇对的对应反应位点106”相关联,使得从反应位点106”发射的光由像素区域106'接收并且由对应的传感器106捕获。由于这种感测结构,在以下情况下,该碱基检出循环中的像素信号携带基于该两个或更多个簇中的所有簇的信息:其中在碱基检出循环期间,在特定传感器的像素区域中存在两个或更多个簇(例如,每个像素区域具有对应的簇对)。因此,如本文所述的信号处理用于区分每个簇,其中在特定碱基检出循环的给定采样事件中存在比像素信号更多的簇。Figure 1 shows a cross-section of a biosensor 100 that can be used in various embodiments. Biosensor 100 has pixel regions 106', 108', 110', 112', and 114' that can each hold more than one cluster (eg, 2 clusters per pixel region) during a base calling cycle. As shown, biosensor 100 may include a flow cell 102 mounted to a sampling device 104 . In the illustrated embodiment, the flow cell 102 is directly attached to the sampling device 104 . However, in alternative embodiments, flow cell 102 may be removably coupled to sampling device 104 . Sampling device 104 has a sample surface 134 that can be functionalized (eg, chemically or physically modified in a manner appropriate to conduct a desired reaction). For example, sample surface 134 may be functionalized and may include a plurality of pixel regions 106', 108', 110', 112', and 114', which may each hold more than one cluster during a base calling cycle. (eg, each pixel region has a corresponding cluster pair 106A, 106B; 108A, 108B; 110A, 110B; 112A, 112B; and 114A, 114B affixed thereto). Each pixel area is associated with a corresponding sensor (or pixel or photodiode) 106, 108, 110, 112, and 114 such that light received by the pixel area is captured by the corresponding sensor. Pixel regions 106 ′ may also be associated with corresponding reaction sites 106 ″ holding cluster pairs on sample surface 134 such that light emitted from reaction sites 106 ″ is received by pixel regions 106 ′ and captured by corresponding sensors 106 . Due to this sensing structure, the pixel signal in the base-calling cycle carries information based on all of the two or more clusters in the case where during a base-calling cycle, at a specific There are two or more clusters in a pixel area of the sensor (eg, each pixel area has a corresponding pair of clusters). Thus, signal processing as described herein is used to distinguish each cluster where there are more clusters than pixel signals in a given sampling event of a particular base calling cycle.

在例示的实施方案中,流通池102包括侧壁138、125和由侧壁138、125支撑的流罩136。侧壁138、125耦接到样品表面134并且在流罩136与侧壁138、125之间延伸。在一些实施方案中,侧壁138、125由可固化粘合剂层形成,该可固化粘合剂层将流罩136粘结到采样设备104。In the illustrated embodiment, the flow cell 102 includes side walls 138 , 125 and a flow shield 136 supported by the side walls 138 , 125 . Sidewalls 138 , 125 are coupled to sample surface 134 and extend between flow shield 136 and sidewalls 138 , 125 . In some embodiments, sidewalls 138 , 125 are formed from a curable adhesive layer that bonds flow shield 136 to sampling device 104 .

侧壁138、125的尺寸和形状被设定成使得流动通道144存在于流罩136与采样设备104之间。流罩136可包括对从生物传感器100的外部传播到流动通道144中的激发光101透明的材料。在示例中,激发光101以非正交角度接近流罩136。The sidewalls 138 , 125 are sized and shaped such that a flow channel 144 exists between the flow shield 136 and the sampling device 104 . Flow shield 136 may include a material that is transparent to excitation light 101 propagating into flow channel 144 from the exterior of biosensor 100 . In an example, the excitation light 101 approaches the flow mask 136 at a non-orthogonal angle.

另外如图所示,流罩136可包括入口端口和出口端口142、146,该入口端口和出口端口被配置为流体地接合其他端口(未示出)。例如,其他端口可来自卡盒或工作站。流动通道144的尺寸和形状被设定成沿样品表面134引导流体。流动通道144的高度H1和其他尺寸可被配置为维持流体沿样品表面134的基本上均匀的流动。流动通道144的尺寸也可被配置为控制气泡形成。Also as shown, the flow shield 136 may include inlet and outlet ports 142, 146 configured to fluidly engage other ports (not shown). For example, other ports can come from cartridges or workstations. Flow channel 144 is sized and shaped to direct fluid along sample surface 134 . The height H 1 and other dimensions of the flow channel 144 may be configured to maintain a substantially uniform flow of fluid along the sample surface 134 . The dimensions of the flow channel 144 may also be configured to control bubble formation.

以举例的方式,流罩136(或流通池102)可包括透明材料,诸如玻璃或塑料。流罩136可构成具有平面外表面和限定流动通道144的平面内表面的基本上矩形的块。该块可安装到侧壁138、125上。另选地,可蚀刻流通池102以限定流罩136和侧壁138、125。例如,可以将凹槽蚀刻到透明材料中。当蚀刻材料安装到采样设备104时,凹槽可变成流动通道144。By way of example, flow shield 136 (or flow cell 102 ) may comprise a transparent material such as glass or plastic. The flow shield 136 may constitute a substantially rectangular block having a planar outer surface and a planar inner surface defining the flow channel 144 . The block can be mounted to the side walls 138,125. Alternatively, flow cell 102 may be etched to define flow shield 136 and side walls 138 , 125 . For example, grooves may be etched into the transparent material. The grooves may become flow channels 144 when the etching material is installed into the sampling device 104 .

采样设备104可类似于例如包括多个堆叠的基板层120至126的集成电路。基板层120至126可包括基部基板120、固态成像器件122(例如,CMOS图像传感器)、滤波器或光控制层124和钝化层126。应当注意,以上仅是说明性的,并且其他实施方案可包括更少层或附加层。此外,基板层120至126中的每个层可包括多个子层。采样设备104可使用类似于制造集成电路(诸如CMOS图像传感器和CCD)中使用的工艺来制造。例如,基板层120至126或其部分可被生长、沉积、蚀刻等以形成采样设备104。The sampling device 104 may be similar to, for example, an integrated circuit including a plurality of stacked substrate layers 120-126. Substrate layers 120 to 126 may include a base substrate 120 , a solid state imaging device 122 (eg, a CMOS image sensor), a filter or light management layer 124 , and a passivation layer 126 . It should be noted that the above is illustrative only, and other embodiments may include fewer or additional layers. Additionally, each of the substrate layers 120-126 may include multiple sub-layers. Sampling device 104 may be fabricated using processes similar to those used in fabricating integrated circuits such as CMOS image sensors and CCDs. For example, substrate layers 120 - 126 , or portions thereof, may be grown, deposited, etched, etc., to form sampling device 104 .

钝化层126被配置为使滤波器层124屏蔽流动通道144的流体环境。在一些情况下,钝化层126还被配置为提供允许生物分子或其他感兴趣分析物固定在其上的固体表面(即,样品表面134)。例如,每个反应位点可包括固定到样品表面134的生物分子的簇。因此,钝化层126可以由允许反应位点固定到其上的材料形成。钝化层126还可包括至少对期望荧光透明的材料。以举例的方式,钝化层126可包含氮化硅(Si2N4)和/或二氧化硅(SiO2)。然而,可使用其他合适的材料。在例示的实施方案中,钝化层126可以是基本上平面的。然而,在另选的实施方案中,钝化层126可包括凹槽,诸如凹坑、孔、槽等。在例示的实施方案中,钝化层126具有约150nm至200nm,并且更具体地约170nm的厚度。Passivation layer 126 is configured to shield filter layer 124 from the fluidic environment of flow channel 144 . In some cases, passivation layer 126 is also configured to provide a solid surface (ie, sample surface 134 ) that allows biomolecules or other analytes of interest to immobilize thereon. For example, each reaction site may include a cluster of biomolecules immobilized to the sample surface 134 . Accordingly, the passivation layer 126 may be formed of a material that allows reaction sites to be fixed thereto. Passivation layer 126 may also include a material that is transparent to at least the desired fluorescence. By way of example, passivation layer 126 may include silicon nitride (Si 2 N 4 ) and/or silicon dioxide (SiO 2 ). However, other suitable materials may be used. In the illustrated embodiment, passivation layer 126 may be substantially planar. However, in alternative embodiments, the passivation layer 126 may include recesses, such as pits, holes, grooves, and the like. In the illustrated embodiment, passivation layer 126 has a thickness of about 150 nm to 200 nm, and more specifically about 170 nm.

滤波器层124可包括影响光的透射的各种特征。在一些实施方案中,滤波器层124可执行多个功能。例如,滤波器层124可被配置为(a)过滤不想要的光信号,诸如来自激发光源的光信号;(b)将来自反应位点的发射信号导向对应的传感器106、108、110、112和114,这些传感器被配置为检测来自反应位点的发射信号;或(c)阻止或防止检测到来自邻近反应位点的不想要的发射信号。因此,滤波器层124也可称为光控制层。在例示的实施方案中,滤波器层124具有约1μm至5μm,更具体地约2μm至4μm的厚度。在另选的实施方案中,滤波器层124可包括微透镜或其他光学元件的阵列。每个微透镜可被配置为将发射信号从相关联的反应位点引导到传感器。Filter layer 124 may include various features that affect the transmission of light. In some implementations, filter layer 124 may perform multiple functions. For example, filter layer 124 may be configured to (a) filter unwanted light signals, such as light signals from excitation light sources; (b) direct emission signals from reaction sites to corresponding sensors 106, 108, 110, 112 and 114, the sensors are configured to detect emission signals from reaction sites; or (c) block or prevent detection of unwanted emission signals from adjacent reaction sites. Accordingly, filter layer 124 may also be referred to as a light management layer. In the illustrated embodiment, filter layer 124 has a thickness of about 1 μm to 5 μm, more specifically about 2 μm to 4 μm. In alternative embodiments, filter layer 124 may include an array of microlenses or other optical elements. Each microlens can be configured to direct an emission signal from an associated reaction site to a sensor.

在一些实施方案中,固态成像器件122和基部基板120可作为先前构造的固态成像设备(例如,CMOS芯片)一起提供。例如,基底基板120可以是硅晶片,并且固态成像器件122可安装在其上。固态成像器件122包括半导体材料(例如,硅)层和传感器106、108、110、112和114。在例示的实施方案中,传感器是被配置为检测光的光电二极管。在其他实施方案中,传感器包括光检测器。固态成像器件122可通过基于CMOS的制造工艺制造为单个芯片。In some embodiments, solid-state imaging device 122 and base substrate 120 may be provided together as a previously constructed solid-state imaging device (eg, a CMOS chip). For example, the base substrate 120 may be a silicon wafer, and the solid-state imaging device 122 may be mounted thereon. Solid state imaging device 122 includes a layer of semiconductor material (eg, silicon) and sensors 106 , 108 , 110 , 112 , and 114 . In the illustrated embodiment, the sensor is a photodiode configured to detect light. In other embodiments, the sensor includes a light detector. The solid-state imaging device 122 can be manufactured as a single chip by a CMOS-based manufacturing process.

固态成像器件122可包括传感器106、108、110、112和114的密集阵列,这些传感器被配置为检测指示来自流动通道144内或沿该流动通道的期望反应的活动。在一些实施方案中,每个传感器具有约1平方微米至2平方微米(μm2)的像素区域(或检测区域)。阵列可包括五十万个传感器、五百万个传感器、一千万个传感器或甚至一亿两千万个传感器。传感器106、108、110、112和114可被配置为检测指示期望反应的预先确定的波长的光。Solid-state imaging device 122 may include a dense array of sensors 106 , 108 , 110 , 112 , and 114 configured to detect activity indicative of a desired response from within or along flow channel 144 . In some embodiments, each sensor has a pixel area (or detection area) of about 1 to 2 square microns (μm 2 ). Arrays may include half a million sensors, five million sensors, ten million sensors, or even 120 million sensors. Sensors 106, 108, 110, 112, and 114 may be configured to detect light of a predetermined wavelength indicative of a desired response.

在一些实施方案中,采样设备104包括微电路布置,诸如美国专利号7,595,882中描述的微电路布置,该美国专利以引用方式整体并入本文。更具体地,采样设备104可包括具有传感器106、108、110、112和114的平面阵列的集成电路。在采样设备104内形成的电路可被配置用于信号放大、数字化、存储和处理中的至少一者。电路可收集和分析检测到的荧光并生成用于将检测数据传送到信号处理器的像素信号(或检测信号)。电路还可以在采样设备104中执行附加的模拟和/或数字信号处理。采样设备104可包括导电通孔130,这些导电通孔执行信号路由(例如,将像素信号传输到信号处理器)。像素信号也可通过采样设备104的电触点132传输。In some embodiments, sampling device 104 includes a microcircuit arrangement, such as that described in US Patent No. 7,595,882, which is incorporated herein by reference in its entirety. More specifically, sampling device 104 may include an integrated circuit having a planar array of sensors 106 , 108 , 110 , 112 , and 114 . Circuitry formed within sampling device 104 may be configured for at least one of signal amplification, digitization, storage, and processing. Circuitry may collect and analyze the detected fluorescence and generate pixel signals (or detection signals) for communicating the detection data to a signal processor. The circuitry may also perform additional analog and/or digital signal processing in the sampling device 104 . Sampling device 104 may include conductive vias 130 that perform signal routing (eg, transmit pixel signals to a signal processor). Pixel signals may also be transmitted through the electrical contacts 132 of the sampling device 104 .

相对于2020年5月14日提交的名称为“Systems and Devices forCharacterization and Performance Analysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US)进一步详细讨论了采样设备104,该专利申请以引用方式并入本文,如同在本文中完全阐述一样。采样设备104不限于如上所述的上述构造或用途。在另选的实施方案中,采样设备104可采取其他形式。例如,采样设备104可包括CCD设备(诸如CCD相机),其耦接到流通池或移动以与其中具有反应位点的流通池交互。Relative to U.S. Nonprovisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM 1011-4/IP-1750- US) discusses the sampling device 104 in further detail, which patent application is incorporated herein by reference as if fully set forth herein. Sampling device 104 is not limited to the aforementioned configurations or uses as described above. In alternative embodiments, sampling device 104 may take other forms. For example, sampling device 104 may include a CCD device, such as a CCD camera, that is coupled to a flow cell or that moves to interact with a flow cell having reaction sites therein.

图2示出了在其区块中包含簇的流通池200的一个具体实施。流通池200对应于图1的流通池102,例如,没有流罩136。此外,流通池200的描绘在性质上是象征性的,并且流通池200象征性地描绘了其内的各种槽道和区块,而未示出其内的各种其他部件。图2示出了流通池200的顶视图。Figure 2 shows one implementation of a flow cell 200 that includes clusters in its blocks. Flow cell 200 corresponds to flow cell 102 of FIG. 1 , eg, without flow shield 136 . Furthermore, the depiction of flow cell 200 is symbolic in nature, and flow cell 200 symbolically depicts the various channels and blocks therein without showing various other components therein. FIG. 2 shows a top view of flow cell 200 .

在一个实施方案中,流通池200被划分或分区为多个槽道,诸如槽道202a、202b、…、202P,即,P个槽道。在图2的示例中,流通池200被示出为包括8个槽道,即,在该示例中,P=8,但是流通池内的槽道的数量是具体实施特定的。In one embodiment, flow cell 200 is divided or partitioned into a plurality of channels, such as channels 202a, 202b, . . . , 202P, ie, P channels. In the example of Fig. 2, the flow cell 200 is shown as comprising 8 channels, ie P = 8 in this example, but the number of channels within the flow cell is implementation specific.

在一个实施方案中,各个槽道202被进一步分区为被称为“区块”212的非重叠区域。例如,图2示出了示例性槽道的区段208的放大视图。区段208被示出为包括多个区块212。In one embodiment, each channel 202 is further partitioned into non-overlapping regions referred to as “blocks” 212 . For example, FIG. 2 shows an enlarged view of section 208 of an exemplary channel. Section 208 is shown to include a plurality of tiles 212 .

在示例中,每个槽道202包括一个或多个区块列。例如,在图2中,每个槽道202包括两个对应的区块列212,如放大区段208内所示。每个槽道内的每个区块列中的区块数量是具体实施特定的,并且在一个示例中,每个槽道内的每个区块列中可存在50个区块、60个区块、100个区块或另一适当数量的区块。In an example, each slot 202 includes one or more columns of blocks. For example, in FIG. 2 , each channel 202 includes two corresponding block columns 212 , as shown in enlarged section 208 . The number of blocks in each block column within each slot is implementation specific, and in one example there may be 50 blocks, 60 blocks, 100 blocks or another appropriate number of blocks.

每个区块包括对应的多个簇。在测序过程中,对区块上的簇及其周围背景进行成像。例如,图2示出了示例性区块内的示例性簇216。Each block includes a corresponding number of clusters. During the sequencing process, the clusters on the block and their surrounding background are imaged. For example, FIG. 2 shows an example cluster 216 within an example block.

图3示出了具有八个槽道的示例性Illumina GA-IIxTM流通池,并且还示出了一个区块及其簇和它们的周围背景的放大视图。例如,Illumina基因组分析仪II中的每个槽道有一百个区块,Illumina HiSeq2000中的每个槽道有六十八个区块。区块212容纳数十万至数百万的簇。在图3中,在308处(例如,308是区块的放大图像视图)示出了从具有示出为亮点的簇的区块生成的图像,其中标记了示例性簇304。簇304包括模板分子的大约一千个相同副本,但簇的尺寸和形状不同。在测序运行之前,通过对输入文库进行桥式扩增,由模板分子生成簇。扩增和簇生长的目的是增加发射信号的强度,因为成像设备不能可靠地感测单个荧光团。然而,簇304内的DNA片段的物理距离较小,因此成像设备将片段的簇感知为单个点304。Figure 3 shows an exemplary Illumina GA-IIx flow cell with eight channels, and also shows a magnified view of a block and its clusters and their surrounding background. For example, there are one hundred tiles per lane in the Illumina Genome Analyzer II, and sixty-eight tiles per lane in the Illumina HiSeq2000. Blocks 212 hold hundreds of thousands to millions of clusters. In FIG. 3 , an image generated from a block with clusters shown as bright spots is shown at 308 (eg, 308 is an enlarged image view of the block), with exemplary cluster 304 labeled. Cluster 304 includes about a thousand identical copies of the template molecule, but the clusters vary in size and shape. Clusters are generated from template molecules by bridge amplification of the input library prior to a sequencing run. The purpose of amplification and cluster growth is to increase the intensity of the emitted signal, since imaging devices cannot reliably sense individual fluorophores. However, the physical distance of the DNA fragments within a cluster 304 is small, so the imaging device perceives the cluster of fragments as a single point 304 .

相对于2020年3月20日提交的标题为“TRAINING DATA GENERATION FORARTIFICIAL INTELLIGENCE-BASED SEQUENCING”的美国非临时专利申请号16/825,987(代理人案卷号ILLM 1008-16/IP-1693-US)进一步详细讨论了簇和区块;Further Details Relative to U.S. Nonprovisional Patent Application No. 16/825,987, Filed March 20, 2020, entitled "TRAINING DATA GENERATION FORARTIFICIAL INTELLIGENCE-BASED SEQUENCING," (Attorney Docket No. ILLM 1008-16/IP-1693-US) clusters and blocks are discussed;

图4是用于分析来自测序系统的传感器数据(诸如碱基检出传感器输出(例如,参见图1))的系统的简化框图。在图4的示例中,系统包括测序机器400和可配置处理器450。可配置处理器450可以与由主机处理器(诸如中央处理单元(CPU)402)执行的运行时程序协调地执行基于神经网络的碱基检出器。测序机器400包括碱基检出传感器和流通池401(例如,相对于图1至图3所讨论的)。流通池可包括一个或多个区块,其中遗传物质的簇暴露于分析物流的序列,该分析物流的序列用于引起簇中的反应以识别遗传物质中的碱基,如相对于图1至图3所讨论的。传感器感测流通池的每个区块中该序列的每个循环的反应以提供区块数据。下文更详细地描述了该技术的示例。遗传测序是数据密集型操作,其将碱基检出传感器数据转换为在碱基检出操作期间感测到的遗传物质的每个簇的碱基检出序列。4 is a simplified block diagram of a system for analyzing sensor data from a sequencing system, such as base calling sensor output (see, eg, FIG. 1 ). In the example of FIG. 4 , the system includes a sequencing machine 400 and a configurable processor 450 . The configurable processor 450 may execute the neural network-based base caller in coordination with a runtime program executed by a host processor, such as a central processing unit (CPU) 402 . Sequencing machine 400 includes a base calling sensor and flow cell 401 (eg, as discussed with respect to FIGS. 1-3 ). The flow cell may comprise one or more blocks in which clusters of genetic material are exposed to a sequence of analyte streams that are used to elicit reactions in the clusters to identify bases in the genetic material, as described with respect to FIGS. Figure 3 is discussed. A sensor senses the response of each cycle of the sequence in each block of the flow cell to provide block data. Examples of this technique are described in more detail below. Genetic sequencing is a data-intensive operation that converts base-calling sensor data into a base-calling sequence for each cluster of genetic material sensed during a base-calling operation.

该示例中的系统包括执行运行时程序以协调碱基检出操作的CPU402、用于存储区块数据阵列的序列的存储器403、由碱基检出操作产生的碱基检出读段,以及碱基检出操作中使用的其他信息。另外,在该图示中,系统包括用于储存一个配置文件(或多个文件)诸如FPGA位文件的存储器404和用于配置和重新配置可配置处理器450并且执行神经网络的神经网络的模型参数。测序机器400可包括用于配置可配置处理器以及在一些实施方案中的可重构处理器的程序,以执行神经网络。The system in this example includes a CPU 402 that executes a runtime program to coordinate a base calling operation, a memory 403 for storing the sequence of the block data array, the base calling reads resulting from the base calling operation, and the base Additional information used in the base checkout operation. Additionally, in this illustration, the system includes memory 404 for storing a configuration file (or files) such as an FPGA bitfile and a model of a neural network for configuring and reconfiguring a configurable processor 450 and executing a neural network parameter. Sequencing machine 400 may include a program for configuring a configurable processor, and in some embodiments a reconfigurable processor, to execute a neural network.

测序机器400通过总线405耦接到可配置处理器450。总线405可以使用高通量技术来实现,诸如在一个示例中,总线技术与当前由PCI-SIG(PCI特别兴趣小组)维护和开发的PCIe标准(快速外围组件互连)兼容。另外,在该示例中,存储器460通过总线461耦接到可配置处理器450。存储器460可以是设置在具有可配置处理器450的电路板上的板载存储器。存储器460用于由可配置处理器450高速访问在碱基检出操作中使用的工作数据。总线461还可以使用高通量技术诸如与PCIe标准兼容的总线技术来实现。Sequencing machine 400 is coupled to configurable processor 450 via bus 405 . Bus 405 may be implemented using high throughput technology, such as in one example, a bus technology compatible with the PCIe standard (Peripheral Component Interconnect Express) currently maintained and developed by PCI-SIG (PCI Special Interest Group). Additionally, memory 460 is coupled to configurable processor 450 via bus 461 in this example. Memory 460 may be an on-board memory provided on a circuit board with configurable processor 450 . Memory 460 is used for high-speed access by configurable processor 450 to working data used in base calling operations. Bus 461 may also be implemented using high-throughput technology such as a bus technology compatible with the PCIe standard.

可配置处理器,包括现场可编程门阵列(FPGA)、粗粒度可重构阵列(CGRA)以及其他可配置和可重构的设备,可被配置为比使用执行计算机程序的通用处理器可能实现的更有效或更快地实现各种功能。可配置处理器的配置涉及编译功能描述以产生有时称为位流或位文件的配置文件,以及将配置文件分发到处理器上的可配置元件。Configurable processors, including field-programmable gate arrays (FPGAs), coarse-grained reconfigurable arrays (CGRAs), and other configurable and reconfigurable devices, that can be configured to perform better than is possible with a general-purpose processor executing a computer program more efficient or faster implementation of various functions. Configuration of a configurable processor involves compiling a functional description to produce a configuration file, sometimes called a bitstream or bitfile, and distributing the configuration file to configurable elements on the processor.

该配置文件通过将电路配置为设置数据流模式、分布式存储器和其他片上存储器资源的使用、查找表内容、可配置逻辑块和可配置执行单元(如乘法累加单元、可配置互连和可配置阵列的其他元件)的操作,来定义要由可配置处理器执行的逻辑功能。如果配置文件可在现场通过改变加载的配置文件而改变,则可配置处理器是可重构的。例如,配置文件可存储在易失性SRAM元件中、非易失性读写存储器元件中以及它们的组合中,分布在可配置或可重构处理器上的可配置元件阵列中。多种可商购获得的可配置处理器适用于如本文所述的碱基检出操作。示例包括可商购获得的产品,诸如Xilinx AlveoTMU200、XilinxAlveoTMU250、Xilinx AlveoTM U280、Intel/Altera StratixTMGX2800、Intel/AlteraStratixTMGX2800和Intel StratixTMGX10M。在一些示例中,主机CPU可在与可配置处理器相同的集成电路上实现。The configuration file is configured by configuring the circuit to set the data flow mode, use of distributed memory and other on-chip memory resources, look-up table content, configurable logic blocks, and configurable execution other elements of the array) to define the logical functions to be performed by the configurable processor. A configurable processor is reconfigurable if the configuration file can be changed in the field by changing the loaded configuration file. For example, configuration files may be stored in volatile SRAM elements, in non-volatile read-write memory elements, and combinations thereof, distributed among arrays of configurable elements on a configurable or reconfigurable processor. A variety of commercially available configurable processors are suitable for base calling operations as described herein. Examples include commercially available products such as Xilinx Alveo U200, Xilinx Alveo U250, Xilinx Alveo U280, Intel/Altera Stratix GX2800, Intel/Altera Stratix GX2800, and Intel Stratix GX10M. In some examples, the host CPU may be implemented on the same integrated circuit as the configurable processor.

本文所述的实施方案使用可配置处理器450实现多循环神经网络。可配置处理器的配置文件可通过使用高级描述语言(HDL)或寄存器传输级(RTL)语言规范指定要执行的逻辑功能来实现。可使用被设计用于所选择的可配置处理器的资源来编译规范以生成配置文件。为了生成可能不是可配置处理器的专用集成电路的设计,可编译相同或相似的规范。Embodiments described herein use a configurable processor 450 to implement a multi-cycle neural network. A configuration file for a configurable processor may be implemented by specifying the logical functions to be performed using a high-level description language (HDL) or register-transfer level (RTL) language specification. The specification may be compiled using resources designed for the selected configurable processor to generate the configuration file. To generate a design for an ASIC, which may not be a configurable processor, the same or a similar specification may be compiled.

因此,在本文所述的所有实施方案中,可配置处理器的另选方案包括配置的处理器,该配置的处理器包括专用ASIC或专用集成电路或集成电路组,或片上系统(SOC)器件,该配置的处理器被配置为执行如本文所述的基于神经网络的碱基检出操作。Accordingly, in all embodiments described herein, alternatives to configurable processors include configured processors comprising application-specific ASICs or application-specific integrated circuits or groups of integrated circuits, or system-on-chip (SOC) devices , the configured processor configured to perform a neural network based base calling operation as described herein.

一般来讲,如被配置为执行神经网络的运行的本文所述的可配置处理器和配置的处理器在本文中称为神经网络处理器。In general, configurable processors and configured processors as described herein that are configured to perform the operations of a neural network are referred to herein as neural network processors.

在该示例中,可配置处理器450通过使用由CPU 402或其他源执行的程序加载的配置文件进行配置,该配置文件配置可配置处理器454上的可配置元件的阵列以执行碱基检出功能。在该示例中,该配置包括数据流逻辑451,该数据流逻辑耦接到总线405和总线461,并且执行用于在碱基检出操作中使用的元件之间分发数据和控制参数的功能。In this example, configurable processor 450 is configured by using a configuration file loaded by a program executed by CPU 402 or other source that configures an array of configurable elements on configurable processor 454 to perform base calling Function. In this example, the configuration includes data flow logic 451 that is coupled to bus 405 and bus 461 and that performs functions for distributing data and control parameters among elements used in a base calling operation.

另外,可配置处理器450配置有碱基检出执行逻辑452以执行多循环神经网络。逻辑452包括多个多循环执行簇(例如,453),在该示例中,该多个多循环执行簇包括多循环簇1至多循环簇X。可根据涉及操作的所需通量和可配置处理器上的可用资源的权衡来选择多循环簇的数量。Additionally, the configurable processor 450 is configured with base calling execution logic 452 to execute a multi-cycle neural network. Logic 452 includes a plurality of multi-cycle execution clusters (eg, 453 ), including multi-cycle cluster 1 through multi-cycle cluster X in this example. The number of multi-cycle clusters can be chosen according to a trade-off involving the desired throughput of the operation and the resources available on the configurable processor.

多循环簇通过使用可配置处理器上的可配置互连和存储器资源实现的数据流路径454耦接到数据流逻辑451。另外,多循环簇通过使用例如可配置处理器上的可配置互连和存储器资源实现的控制路径455耦接到数据流逻辑451,这些控制路径提供指示可用簇、准备好向可用簇提供用于执行神经网络的运行的输入单元、准备好提供用于神经网络的经训练参数、准备好提供碱基检出分类数据的输出补片的控制信号,以及用于执行神经网络的其他控制数据。The multi-cycle cluster is coupled to dataflow logic 451 through a dataflow path 454 implemented using configurable interconnect and memory resources on a configurable processor. In addition, multi-cycle clusters are coupled to dataflow logic 451 through control paths 455 implemented using, for example, configurable interconnects and memory resources on configurable processors, which provide indication of available clusters, readiness to provide available clusters for An input unit that executes a run of the neural network, a control signal ready to provide trained parameters for the neural network, an output patch ready to provide base calling classification data, and other control data for executing the neural network.

可配置处理器被配置为使用经训练参数来执行多循环神经网络的运行,以产生碱基流操作的感测循环的分类数据。执行神经网络的运行以产生用于碱基检出操作的受试者感测循环的分类数据。神经网络的运行对序列(包括来自N个感测循环中的相应感测循环的区块数据的数字N个阵列)进行操作,其中N个感测循环在本文所述示例中针对时间序列中每个操作的一个碱基位置提供用于不同碱基检出操作的传感器数据。任选地,如果需要,根据正在执行的特定神经网络模型,N个感测循环中的一些可能会失序。数字N可以是大于1的任何数字。在本文所述的一些示例中,N个感测循环中的感测循环表示时间序列中受试者感测循环之前的至少一个感测循环和受试者循环(subject cycle)之后的至少一个感测循环的一组感测循环。本文描述了其中数字N为等于或大于五的整数的示例。The configurable processor is configured to perform an operation of the multi-cycle neural network using the trained parameters to generate classification data for a sensing cycle of the baseflow operation. An operation of the neural network is performed to generate classification data for a subject sensing cycle of a base calling operation. A run of the neural network operates on a sequence (number N arrays comprising block data from corresponding ones of the N sensing cycles, in the examples described herein for each of the time series One base position for one operation provides sensor data for a different base calling operation. Optionally, some of the N sensing cycles may be out of sequence, if desired, depending on the particular neural network model being executed. The number N can be any number greater than one. In some examples described herein, a sensing cycle of the N sensing cycles represents at least one sensing cycle preceding the subject sensing cycle and at least one sensing cycle following the subject cycle in the time series. A set of sensing cycles for the sensing cycle. Examples are described herein where the number N is an integer equal to or greater than five.

数据流逻辑451被配置为使用用于给定运行的输入单元(包括N个阵列的空间对准补片的区块数据)将区块数据和模型的至少一些经训练参数从存储器460移动到用于神经网络的运行的可配置处理器。输入单元可通过一个DMA操作中的直接存储器存取操作来移动,或者在可用时隙期间与所部署的神经网络的执行相协调地移动的较小单元中移动。Dataflow logic 451 is configured to move the tile data and at least some trained parameters of the model from memory 460 to the A configurable processor for running neural networks. Input cells can be moved by a direct memory access operation in one DMA operation, or in smaller cells that move in coordination with the execution of the deployed neural network during available time slots.

如本文所述的用于感测循环的区块数据可包括具有一个或多个特征的传感器数据阵列。例如,传感器数据可包括两个图像,对这两个图像进行分析以识别在DNA、RNA或其他遗传物质的遗传序列中的碱基位置处的四种碱基中的一种。区块数据还可包括关于图像和传感器的元数据。例如,在碱基检出操作的实施方案中,区块数据可包括关于图像与簇的对准的信息,诸如距中心距离的信息,该距离指示传感器数据阵列中的每个像素距区块上遗传物质的簇的中心的距离。Block data for a sensing cycle as described herein may include an array of sensor data having one or more characteristics. For example, sensor data may include two images that are analyzed to identify one of four bases at a base position in a genetic sequence of DNA, RNA, or other genetic material. Block data may also include metadata about images and sensors. For example, in an embodiment of a base calling operation, the tile data may include information about the alignment of the image to the cluster, such as a distance from the center indicating the distance between each pixel in the sensor data array and the image on the tile. The distance from the center of the cluster of genetic material.

在如下所述的多循环神经网络的执行期间,区块数据还可包括在多循环神经网络的执行期间产生的数据,称为中间数据,该数据可在多循环神经网络的运行期间重复使用而不是重新计算。例如,在多循环神经网络的执行期间,数据流逻辑可将中间数据代替用于区块数据阵列的给定补片的传感器数据写入存储器460。下文更详细地描述了类似于此的实施方案。During the execution of the multi-cycle neural network as described below, the block data may also include data generated during the execution of the multi-cycle neural network, called intermediate data, which can be reused during the operation of the multi-cycle neural network without Not a recalculation. For example, during execution of a multi-cycle neural network, dataflow logic may write intermediate data to memory 460 in place of sensor data for a given patch of the tile data array. Embodiments similar to this are described in more detail below.

如图所示,描述了用于分析碱基检出传感器输出的系统,该系统包括可由运行时程序访问的存储器(例如,460),该存储器储存区块数据,这些区块数据包括来自碱基检出操作的感测循环的区块的传感器数据。另外,该系统包括神经网络处理器,诸如可访问存储器的可配置处理器450。神经网络处理器被配置为使用经训练参数来执行神经网络的运行,以产生用于感测循环的分类数据。如本文所述,神经网络的运行对来自N个感测循环的相应感测循环(包括受试者循环)的区块数据的N个阵列的序列进行操作,以产生受试者循环的分类数据。提供数据流逻辑451以使用输入单元(包括来自N个感测循环的相应感测循环的N个阵列的空间对准补片的数据)将区块数据和经训练参数从存储器移动到神经网络处理器以用于神经网络的运行。As shown, a system for analyzing base calling sensor output is described that includes a memory (e.g., 460) accessible by a runtime program that stores block data including Sensor data for a block of a sensing cycle of operation is detected. In addition, the system includes a neural network processor, such as a memory-accessible configurable processor 450 . The neural network processor is configured to perform an operation of the neural network using the trained parameters to generate classification data for the sensing cycle. As described herein, the execution of the neural network operates on a sequence of N arrays of block data from corresponding ones of the N sensing cycles (including the subject cycle) to produce categorical data for the subject cycle . Dataflow logic 451 is provided to move block data and trained parameters from memory to neural network processing using an input unit comprising data from spatially aligned patches of N arrays of corresponding ones of N sensing cycles for the operation of neural networks.

另外,描述了一种系统,其中神经网络处理器能够访问存储器,并且包括多个执行簇,该多个执行簇中的执行逻辑簇被配置为执行神经网络。数据流逻辑能够访问存储器和多个执行簇中的执行簇,以将区块数据的输入单元提供到该多个执行簇中的可用执行簇,这些输入单元包括来自相应感测循环(包括受试者感测循环)的区块数据阵列的数字N个空间对准补片,并且使执行簇将N个空间对准补片应用于神经网络以产生受试者感测循环的空间对准补片的分类数据的输出补片,其中N大于1。Additionally, a system is described in which a neural network processor has access to memory and includes a plurality of execution clusters, clusters of execution logic in the plurality of execution clusters configured to execute a neural network. The data flow logic is capable of accessing memory and execution clusters of the plurality of execution clusters to provide input units of block data to available execution clusters of the plurality of execution clusters, the input units comprising number N spatially aligned patches of the block data array of the or sensing loop), and cause the execution cluster to apply the N spatially aligned patches to the neural network to produce a spatially aligned patch of the subject sensing loop The output patch of the categorical data, where N is greater than 1.

图5是示出了碱基检出操作的方面的简化图,该方面包括由主机处理器执行的运行时程序的功能。在该图中,来自流通池(诸如图1至图2所示的流通池)的图像传感器的输出在线500上提供到图像处理线程501,该图像处理线程可对图像执行处理,诸如各个区块的传感器数据阵列中的重采样、对准和布置,并且可由为流通池中的每个区块计算区块簇掩膜的过程使用,该过程识别与流通池的对应区块上的遗传物质的簇对应的传感器数据阵列中的像素。为了计算簇掩膜,一个示例性算法是基于用于使用来源于softmax输出的度量来检测在早期测序循环中不可靠的簇的过程,然后丢弃来自那些阱/簇的数据,并且不针对那些簇产生输出数据。例如,过程可在第一N1个(例如,25个)碱基检出期间识别具有高可靠性的簇,并且拒绝其他簇。所拒绝的簇可能是多克隆的或强度非常弱的或因基准点模糊。该程序可在主机CPU上执行。在另选的实施方案中,该信息将潜在地用于识别要传回CPU的必要的感兴趣簇,从而限制中间数据所需的存储。5 is a simplified diagram illustrating aspects of a base calling operation, including the functionality of a runtime program executed by a host processor. In this figure, the output from an image sensor of a flow cell, such as that shown in FIGS. resampling, alignment, and placement in the sensor data array of the flow cell, and can be used by the process of computing a block cluster mask for each block in the flow cell, which identifies the genetic material on the corresponding block of the flow cell. Clusters correspond to pixels in the sensor data array. To compute the cluster mask, an exemplary algorithm is based on a procedure for detecting clusters that are unreliable in early sequencing cycles using metrics derived from the softmax output, then discarding data from those wells/clusters and not targeting those clusters Generate output data. For example, the process may identify clusters with high reliability during the first N1 (eg, 25) base calls and reject other clusters. Rejected clusters may be polyclonal or very weak in intensity or obscured by fiducials. The program is executable on the host CPU. In an alternative embodiment, this information would potentially be used to identify the necessary clusters of interest to pass back to the CPU, thereby limiting the storage required for intermediate data.

根据碱基检出操作的状态,图像处理线程501的输出在线502上提供到CPU中的调度逻辑510,该调度逻辑将区块数据阵列在高速总线503上路由到数据高速缓存504,或者在高速总线505上路由到多簇神经网络处理器硬件520,诸如图4的可配置处理器。硬件520将由神经网络输出的分类数据返回到调度逻辑510,该调度逻辑将信息传递到数据高速缓存504,或者在线程511上传递到使用分类数据执行碱基检出和质量分数计算的线程502,并且可以标准格式布置用于碱基检出读段的数据。在线512上将执行碱基检出和质量分数计算的线程502的输出提供给线程503,该线程聚合碱基检出读段,执行其他操作诸如数据压缩,并且将所得的碱基检出输出写入指定目的地以供客户利用。Depending on the status of the base-calling operation, the output of image processing thread 501 is provided on line 502 to scheduling logic 510 in the CPU, which routes the tile data array on high-speed bus 503 to data cache 504, or on high-speed Routed on bus 505 to multi-cluster neural network processor hardware 520 , such as the configurable processor of FIG. 4 . Hardware 520 returns the classified data output by the neural network to dispatch logic 510, which passes the information to data cache 504, or on thread 511 to thread 502 that performs base calling and quality score calculations using the classified data, And data for base calling reads can be laid out in a standard format. The output of thread 502, which performs base calling and quality score calculations, is provided on line 512 to thread 503, which aggregates the base calling reads, performs other operations such as data compression, and writes the resulting base calling output to Enter the designated destination for customers to use.

在一些实施方案中,主机可以包括执行硬件520的输出的最终处理以支持神经网络的线程(未示出)。例如,硬件520可以提供来自多簇神经网络的最终层的分类数据的输出。主机处理器可以对分类数据执行输出激活功能诸如softmax功能,以配置供碱基检出和质量评分线程502使用的数据。另外,主机处理器可执行输入操作(未示出),诸如在输入到硬件520之前对区块数据进行重采样、批量归一化或其他调整。In some embodiments, the host computer may include threads (not shown) that perform final processing of the output of the hardware 520 to support the neural network. For example, hardware 520 may provide an output of classification data from the final layer of a multi-cluster neural network. The host processor can perform an output activation function, such as a softmax function, on the classification data to configure the data for use by the base calling and quality scoring thread 502 . Additionally, the host processor may perform input operations (not shown), such as resampling, batch normalization, or other adjustments to the tile data prior to input to hardware 520 .

图6是可配置处理器(诸如,图4的可配置处理器)的配置的简化图。在图6中,可配置处理器包括具有多个高速PCIe接口的FPGA。FPGA配置有封装器(wrapper)600,该封装器包括参考图1描述的数据流逻辑。封装器600通过CPU通信链路609来管理与CPU中的运行时程序的接口和协调,并且经由DRAM通信链路610来管理与板载DRAM 602(例如,存储器460)的通信。封装器600中的数据流逻辑将通过遍历板载DRAM 602上的数字N个循环的区块数据阵列而检索到的补片数据提供到簇601,并且从簇601检索过程数据615以递送回板载DRAM602。封装器600还管理板载DRAM 602和主机存储器之间的数据传输,以用于区块数据的输入阵列和分类数据的输出补片两者。封装器将线613上的补片数据传输到分配的簇601。封装器在线612上将经训练的参数诸如权重和偏置提供到从板载DRAM 602检索到的簇601。封装器在线611上将配置和控制数据提供到簇601,该簇经由CPU通信链路609从主机上的运行时程序提供或响应于该运行时程序而生成。簇还可以在线616上向封装器600提供状态信号,该状态信号与来自主机的控制信号协作使用,以管理区块数据阵列的遍历,从而提供空间对准的补片数据,并且使用簇601的资源对补片数据执行多循环神经网络。6 is a simplified diagram of the configuration of a configurable processor, such as the configurable processor of FIG. 4 . In Figure 6, the configurable processor includes an FPGA with multiple high-speed PCIe interfaces. The FPGA is configured with a wrapper 600 that includes the data flow logic described with reference to FIG. 1 . Wrapper 600 manages the interface and coordination with the runtime program in the CPU through CPU communication link 609 and manages communication with on-board DRAM 602 (eg, memory 460 ) via DRAM communication link 610 . Data flow logic in encapsulator 600 provides patch data retrieved by traversing the number N cycles of block data arrays on onboard DRAM 602 to cluster 601 and retrieves process data 615 from cluster 601 for delivery back to the board Load DRAM602. Encapsulator 600 also manages data transfer between on-board DRAM 602 and host memory, for both input arrays of block data and output patches of sorted data. The wrapper transfers the patch data on line 613 to the allocated cluster 601 . The wrapper provides trained parameters such as weights and biases to the cluster 601 retrieved from the on-board DRAM 602 on line 612 . The wrapper provides configuration and control data to cluster 601 on line 611 , which is provided from or generated in response to a runtime program on the host via CPU communication link 609 . The cluster may also provide status signals to the encapsulator 600 on line 616, which are used in conjunction with control signals from the host to manage the traversal of the block data array to provide spatially aligned patch data and use the cluster 601's Resource to perform multi-cycle neural networks on patch data.

如上所述,在由封装器600管理的单个可配置处理器上可以存在多个簇,该多个簇被配置用于在区块数据的多个补片中的对应补片上执行。每个簇可被配置为使用本文所述的多个感测循环的区块数据来提供受试者感测循环中的碱基检出的分类数据。As noted above, there may be multiple clusters on a single configurable processor managed by wrapper 600 configured for execution on corresponding ones of the multiple patches of tile data. Each cluster can be configured to provide categorical data for base calls in a sensing cycle of a subject using the block data of a plurality of sensing cycles as described herein.

在系统的示例中,可将模型数据(包括内核数据,如过滤器权重和偏置)从主机CPU发送到可配置处理器,使得模型可根据循环数进行更新。举一个代表性示例,碱基检出操作可包括大约数百个感测循环。在一些实施方案中,碱基检出操作可包括双端读段。例如,模型训练参数可以每20个循环(或其他数量的循环)更新一次,或者根据针对特定系统和神经网络模型实现的更新模式来更新。在包括双端读段的一些实施方案中,其中区块上的遗传簇中的给定字符串的序列包括从第一末端沿字符串向下(或向上)延伸的第一部分和从第二末端沿字符串向上(或向下)延伸的第二部分,可在从第一部分到第二部分的过渡中更新经训练参数。In an example of a system, model data (including kernel data such as filter weights and biases) can be sent from a host CPU to a configurable processor so that the model can be updated according to the number of cycles. As a representative example, a base calling operation may include on the order of hundreds of sensing cycles. In some embodiments, a base calling operation may include paired-end reads. For example, model training parameters may be updated every 20 cycles (or other number of cycles), or according to an update pattern implemented for a particular system and neural network model. In some embodiments that include paired-end reads, wherein the sequence of a given string in a genetic cluster on a block includes a first portion extending down (or up) the string from a first end and a first portion extending from a second end A second portion extending upwards (or downwards) along the string, the trained parameters may be updated in transition from the first portion to the second portion.

在一些示例中,可以将用于区块的感测数据中的用于多个循环的图像数据从CPU发送到封装器600。封装器600可任选地对感测数据进行一些预处理和转换,并且将信息写入板载DRAM 602。每个感测循环的输入区块数据可包括传感器数据阵列,包括每个感测循环每个区块大约4000×3000个像素或更多,其中两个特征表示区块的两个图像的颜色,并且每个特征每个像素一个或两个字节。对于其中数字N为要在多循环神经网络的每个运行中使用的三个感测循环的实施方案,用于多循环神经网络的每个运行的区块数据阵列可消耗每个区块大约数百兆字节。在系统的一些实施方案中,区块数据还包括每个区块存储一次的DFC数据的阵列,或关于传感器数据和区块的其他类型的元数据。In some examples, image data for multiple cycles of the sensing data for a tile may be sent from the CPU to the wrapper 600 . Encapsulator 600 may optionally do some pre-processing and transformation of the sensed data and write the information to on-board DRAM 602 . The input block data for each sensing cycle may comprise an array of sensor data comprising approximately 4000 x 3000 pixels or more per block per sensing cycle, where the two features represent the color of the two images of the block, And one or two bytes per pixel per feature. For an embodiment where the number N is three sensing cycles to be used in each run of the multi-cycle neural network, the array of block data for each run of the multi-cycle neural network can consume approximately Hundreds of megabytes. In some embodiments of the system, the block data also includes an array of DFC data stored once per block, or other types of metadata about the sensor data and the block.

在操作中,当多循环簇可用时,封装器将补片分配给簇。封装器在区块的遍历中获取区块数据的下一个补片,并将其连同适当的控制和配置信息一起发送到所分配的簇。簇可被配置为在可配置处理器上具有足够的存储器,以保存包括来自一些系统中的多个循环的补片且正被就地处理的数据补片,以及当在各种实施方案中使用乒乓缓冲技术或光栅扫描技术完成对当前补片的处理时将被处理的数据补片。In operation, when a multicycle cluster is available, the wrapper assigns patches to the cluster. The wrapper fetches the next patch of block data in the block's traversal and sends it to the allocated cluster along with appropriate control and configuration information. Clusters can be configured with sufficient memory on configurable processors to hold patches of data that include patches from multiple cycles in some systems and are being processed in-place, and when used in various embodiments The data patch that will be processed when the ping-pong buffering technique or the raster scanning technique completes the processing of the current patch.

当分配的簇完成其对当前补片的神经网络的运行并产生输出补片时,其将发信号通知封装器。封装器将从分配的簇读取输出补片,或者另选地,分配的簇将数据推送到封装器。然后,封装器将对DRAM 602中的经处理的区块组装输出补片。当整个区块的处理已完成并且数据的输出补片已传输到DRAM时,封装器将区块的经处理输出阵列以指定格式发送回主机/CPU。在一些实施方案中,板载DRAM 602由封装器600中的存储器管理逻辑管理。运行时程序可控制测序操作,以连续流的方式完成运行中所有循环的区块数据的所有阵列的分析,从而提供实时分析。The allocated cluster will signal the wrapper when it has completed its run of the neural network for the current patch and produced an output patch. The wrapper will read output patches from the allocated cluster, or alternatively the allocated cluster will push data to the wrapper. The encapsulator will then assemble the output patch to the processed block in DRAM 602 . When the processing of the entire block has completed and the output patch of data has been transferred to DRAM, the encapsulator sends the processed output array of the block back to the host/CPU in the specified format. In some embodiments, on-board DRAM 602 is managed by memory management logic in wrapper 600 . A runtime program controls the sequencing operation to complete the analysis of all arrays of block data for all cycles in a run in a continuous stream, providing real-time analysis.

图7是可使用本文所述的系统执行的多循环神经网络模型的图。图7所示的示例可称为五循环输入、一循环输出神经网络。对多循环神经网络模型的输入包括来自给定区块的五个感测循环的区块数据阵列的五个空间对准补片(例如,700个)。空间对准补片具有与集合中的其他补片相同的对准行和列尺寸(x,y),使得信息涉及序列循环中的区块上的遗传物质的相同簇。在该示例中,受试者补片是来自循环K的区块数据阵列的补片。一组五个空间对准补片包括来自在受试者补片之前两个循环的循环K-2的补片、来自在受试者补片之前一个循环的循环K-1的补片、来自在来自受试者循环的补片之后一个循环的循环K+1的补片、以及来自在来自受试者循环的补片之后两个循环的循环K+2的补片。7 is a diagram of a multi-cycle neural network model that can be implemented using the systems described herein. The example shown in Figure 7 may be referred to as a five-cycle input, one-cycle output neural network. The input to the multi-cycle neural network model included five spatially aligned patches (eg, 700) from the block data array for five sensing cycles for a given block. A spatially aligned patch has the same aligned row and column dimensions (x,y) as the other patches in the set, so that the information relates to the same clusters of genetic material on blocks in the sequence cycle. In this example, the subject patch is the patch from the block data array for cycle K. A set of five spatially aligned patches included patches from cycle K-2 two cycles before the subject's patch, patches from cycle K-1 one cycle before the subject's patch, patches from The patch from cycle K+1 one cycle after the patch from the subject cycle, and the patch from cycle K+2 two cycles after the patch from the subject cycle.

该模型包括输入补片中的每个输入补片的神经网络的层的隔离叠堆701。因此,叠堆701接收来自循环K+2的补片的区块数据作为输入,并且与叠堆702、703、704和705隔离,使得它们不共享输入数据或中间数据。在一些实施方案中,叠堆710至705中的所有叠堆可具有相同的模型和相同的经训练参数。在其他实施方案中,模型和经训练参数在不同叠堆中可能不同。叠堆702接收来自循环K+1的补片的区块数据作为输入。叠堆703接收来自循环K的补片的区块数据作为输入。叠堆704接收来自循环K-1的补片的区块数据作为输入。叠堆705接收来自循环K-2的补片的区块数据作为输入。隔离叠堆的层各自执行内核的卷积操作,该内核包括层的输入数据上的多个滤波器。如在以上示例中,补片700可包括三个特征。层710的输出可包括更多的特征,诸如10个至20个特征。同样,层711至716中的每个层的输出可包括适用于特定具体实施的任何数量的特征。滤波器的参数是神经网络的经训练参数,诸如权重和偏置。来自叠堆701至705中的每个叠堆的输出特征集(中间数据)作为输入被提供到时间组合层的逆层次结构720,其中来自多个循环的中间数据被组合。在例示的示例中,逆层次结构720包括:第一层,该第一层包括三个组合层721、722、723,每个组合层接收来自隔离叠堆中的三个隔离叠堆的中间数据;以及最终层,该最终层包括一个组合层730,该组合层接收来自三个时间层721、722、723的中间数据。The model includes an isolated stack 701 of layers of the neural network for each of the input patches. Thus, stack 701 receives as input the tile data from the patch of cycle K+2, and is isolated from stacks 702, 703, 704, and 705 such that they do not share input or intermediate data. In some implementations, all of the stacks 710-705 may have the same model and the same trained parameters. In other embodiments, the model and trained parameters may be different in different stacks. Stack 702 receives as input the tile data from the patch of cycle K+1. Stack 703 receives as input the block data from the patches of cycle K. Stack 704 receives as input the tile data from the patch of cycle K-1. Stack 705 receives as input the tile data from the patch of cycle K-2. The layers of the isolation stack each perform a convolution operation of a kernel comprising multiple filters on the layer's input data. As in the above example, patch 700 may include three features. The output of layer 710 may include more features, such as 10 to 20 features. Likewise, the output of each of layers 711-716 may include any number of features suitable for a particular implementation. The parameters of the filter are the trained parameters of the neural network, such as weights and biases. The output feature sets (intermediate data) from each of the stacks 701 to 705 are provided as input to the inverse hierarchy 720 of the temporal combination layer, where intermediate data from multiple cycles are combined. In the illustrated example, the reverse hierarchy 720 includes a first layer comprising three combined layers 721, 722, 723 each receiving intermediate data from three of the isolated stacks and the final layer, which includes a composite layer 730 that receives intermediate data from the three temporal layers 721 , 722 , 723 .

最终组合层730的输出是位于来自循环K的区块的对应补片中的簇的分类数据的输出补片。可将输出补片组装成循环K的区块的输出阵列分类数据。在一些实施方案中,输出补片可具有不同于输入补片的大小和尺寸。在一些实施方案中,输出补片可包括可经主机滤波以选择簇数据的逐像素数据。The output of the final combination layer 730 is an output patch of classification data for the clusters located in the corresponding patch from the block of cycle K. The output patches may be assembled into output array sort data for blocks of cycle K. In some implementations, the output patch may have a different size and dimensions than the input patch. In some implementations, the output patch can include pixel-by-pixel data that can be host filtered to select cluster data.

根据特定具体实施,然后可将输出分类数据应用于任选地由主机或在可配置处理器上执行的softmax函数740(或其他输出激活函数)。可使用不同于softmax的输出函数(例如,根据最大输出产生碱基检出输出参数,然后利用使用上下文/网络输出的经学习非线性映射给出碱基质量)。Depending on the particular implementation, the output classification data can then be applied to a softmax function 740 (or other output activation function), optionally executed by the host or on a configurable processor. An output function other than softmax may be used (eg, base calling output parameters are generated from the max output, then base quality is given using a learned non-linear mapping using the context/network output).

最后,可提供softmax函数740的输出作为循环K的碱基检出概率(750)并且将其储存在主机存储器中以在后续处理中使用。其他系统可使用用于输出概率计算的另一种函数,例如,另一个非线性模型。Finally, the output of the softmax function 740 can be provided as the base call probability for cycle K (750) and stored in host memory for use in subsequent processing. Other systems may use another function for the output probability calculation, eg, another nonlinear model.

可使用具有多个执行簇的可配置处理器来实现神经网络,以便在等于或接近一个感测循环的时间间隔的持续时间内完成一个区块循环的评估,从而有效地实时提供输出数据。数据流逻辑可被配置为将区块数据和经训练参数的输入单元分布到执行簇,并且分布输出补片以用于聚合在存储器中。The neural network can be implemented using a configurable processor with multiple execution clusters to complete the evaluation of one block cycle in a duration equal to or close to the time interval of one sensing cycle, effectively providing output data in real time. The dataflow logic may be configured to distribute input units of block data and trained parameters to execution clusters, and distribute output patches for aggregation in memory.

参考图8A和图8B描述了用于使用双通道传感器数据的碱基检出操作的如图7一样的五循环输入、一循环输出神经网络的数据的输入单元。例如,对于基因序列中的给定碱基,碱基检出操作可执行两个分析物流和两个反应,该两个反应生成两个信号(诸如图像)通道,这些图像可被处理以识别四种碱基中的哪一种碱基位于遗传物质的每个簇的遗传序列的当前位置处。在其他系统中,可利用不同数量的感测数据的通道。例如,可利用一通道方法和系统来执行碱基检出。美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道或四通道)的碱基检出。An input unit of data of a five-cycle input, one-cycle output neural network as in FIG. 7 for a base calling operation using two-channel sensor data is described with reference to FIGS. 8A and 8B . For example, for a given base in a genetic sequence, a base calling operation can perform two analysis streams and two reactions that generate two channels of signals (such as images) that can be processed to identify four Which of the bases is located at the current position in the genetic sequence of each cluster of genetic material. In other systems, different numbers of channels of sensed data may be utilized. For example, base calling can be performed using one-pass methods and systems. The incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one, two, or four lanes.

图8A示出了针对给定区块(区块M)的五个循环的区块数据阵列,该区块M出于执行五循环输入、一循环输出神经网络的目的使用。该示例中的五循环输入区块数据可被写入板载DRAM或系统中的可由数据流逻辑访问的其他存储器,并且对于循环K-2包括用于通道1的阵列801和用于通道2的阵列811,对于循环K-1包括用于通道1的阵列802和用于通道2的阵列812,对于循环K包括用于通道1的阵列803和用于通道2的阵列813,对于循环K+1包括用于通道1的阵列804和用于通道2的阵列814,对于循环K+2包括用于通道1的阵列805和用于通道2的阵列815。另外,区块的元数据的阵列820可在存储器中写入一次,在该情况下,包括DFC文件以连同每个循环用作对神经网络的输入。FIG. 8A shows a five-cycle block data array for a given block (block M) used for the purpose of executing a five-cycle input, one-cycle output neural network. The five-cycle input block data in this example can be written to on-board DRAM or other memory in the system that is accessible by the dataflow logic, and includes array 801 for channel 1 and array 801 for channel 2 for cycle K-2. Array 811 comprising array 802 for channel 1 and array 812 for channel 2 for cycle K-1, array 803 for channel 1 and array 813 for channel 2 for cycle K, and array 813 for channel 2 for cycle K+1 An array 804 for channel 1 and an array 814 for channel 2 are included, and for cycle K+2 an array 805 for channel 1 and an array 815 for channel 2 are included. Additionally, the array 820 of metadata for the blocks may be written once in memory, in which case a DFC file is included to be used as input to the neural network along with each cycle.

尽管图8A讨论了两通道碱基检出操作,但是使用两个通道仅仅是示例,并且可使用任何其他适当数量的通道来执行碱基检出。例如,美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道、或四通道、或另一适当数量的通道)的碱基检出。Although FIG. 8A discusses a two-lane base calling operation, the use of two lanes is merely an example, and any other suitable number of lanes may be used to perform base calling. For example, the incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one lane, two lanes, or four lanes, or another suitable number of lanes.

数据流逻辑构成区块数据的输入单元,这些输入单元可参考图8B理解,该区块数据包括每个执行簇的区块数据阵列的空间对准补片,该每个执行簇被配置为对输入补片执行神经网络的运行。用于分配的执行簇的输入单元由数据流逻辑通过以下方式构成:从五个输入循环的区块数据阵列801至805、811、815、820中的每个阵列读取空间对准补片(例如,851、852、861、862、870),并且经由数据路径(示意性地,850)将它们递送到被配置用于由分配的执行簇使用的可配置处理器上的存储器。分配的执行簇执行五循环输入/一循环输出神经网络的运行,并且针对受试者循环K递送受试者循环K中的区块的相同补片的分类数据的输出补片。The data flow logic constitutes the input units of the block data, which can be understood with reference to FIG. The input patch performs a run of the neural network. The input unit for the allocated execution cluster consists of data flow logic by reading a spatially aligned patch ( For example, 851, 852, 861, 862, 870), and deliver them via a data path (illustratively, 850) to memory on a configurable processor configured for use by the allocated execution cluster. The assigned execution cluster performs a run of the five-cycle-in/one-cycle-out neural network and delivers, for subject cycle K, an output patch of the classification data of the same patch of the block in subject cycle K.

图9是如图7(例如,701和720)一样的系统中可使用的神经网络的叠堆的简化表示。在该示例中,神经网络的一些功能(例如,900、902)在主机上执行,并且神经网络的其他部分(例如,901)在可配置处理器上执行。FIG. 9 is a simplified representation of a stack of neural networks that may be used in a system like FIG. 7 (eg, 701 and 720 ). In this example, some functions of the neural network (eg, 900, 902) are executed on the host computer, and other parts of the neural network (eg, 901) are executed on the configurable processor.

例如,第一函数可以是在CPU上形成的批量归一化(层910)。然而,在另一示例中,作为函数的批量归一化可融合到一个或多个层中,并且不存在单独的批量归一化层。For example, the first function may be batch normalization formed on the CPU (layer 910). However, in another example, batch normalization as a function may be fused into one or more layers, and there is no separate batch normalization layer.

如上文关于可配置处理器所讨论的,多个空间隔离卷积层被执行为神经网络的第一组卷积层。在该示例中,第一组卷积层在空间上应用2D卷积。As discussed above with respect to the configurable processor, multiple spatially isolated convolutional layers are implemented as the first set of convolutional layers of the neural network. In this example, the first set of convolutional layers apply 2D convolutions spatially.

如图9所示,针对每个叠堆中的数字L/2(L是参考图7描述的)个空间隔离的神经网络层,执行第一空间卷积921,之后执行第二空间卷积922,之后执行第三空间卷积923,并依此类推。如923A处所指示,空间层的数量可以是任何实际数字,针对上下文的该实际数字在不同实施方案中可在从几个到多于20个的范围内。As shown in FIG. 9 , for the number L/2 (L is described with reference to FIG. 7 ) spatially isolated neural network layers in each stack, a first spatial convolution 921 is performed, followed by a second spatial convolution 922 , after which a third spatial convolution 923 is performed, and so on. As indicated at 923A, the number of spatial layers may be any practical number, which for context may range from a few to more than 20 in different implementations.

对于SP_CONV_0,内核权重例如储存在(1,6,6,3,L)结构中,因为对于该层存在3个输入通道。在该示例中,该结构中的“6”归因于将系数储存在变换的Winograd域中(内核大小在空间域中为3×3,但在变换域中扩展)。For SP_CONV_0, the kernel weights are for example stored in a (1,6,6,3,L) structure, since there are 3 input channels for this layer. In this example, the "6" in the structure is due to storing the coefficients in the transformed Winograd domain (the kernel size is 3x3 in the spatial domain, but expanded in the transform domain).

对于该示例,对于其他SP_CONV层,内核权重储存在(1,6,6L)结构中,因为对于这些层中的每个层,存在K(=L)个输入和输出。For this example, for the other SP_CONV layers, the kernel weights are stored in a (1,6,6L) structure, since for each of these layers there are K (=L) inputs and outputs.

空间层的叠堆的输出被提供到时间层,包括在FPGA上执行的卷积层924、925。层924和925可以是跨循环应用1D卷积的卷积层。如924A处所指示,时间层的数量可以是任何实际数字,针对上下文的该实际数字在不同实施方案中可在从几个到多于20个的范围内。The output of the stack of spatial layers is provided to the temporal layers, including convolutional layers 924, 925 executed on the FPGA. Layers 924 and 925 may be convolutional layers that apply 1D convolutions across cycles. As indicated at 924A, the number of temporal layers may be any practical number, which for context may range from a few to more than 20 in different implementations.

第一时间层TEMP_CONV_0层824将循环通道的数量从5减少到3,如图7所示。第二时间层(层925)将循环通道的数量从3减少到1,如图7所示,并且针对每个像素将特征映射图的数量减少到四个输出,从而表示每个碱基检出中的置信度。The first temporal layer TEMP_CONV_0 layer 824 reduces the number of loop channels from 5 to 3, as shown in FIG. 7 . The second temporal layer (Layer 925) reduces the number of recurrent channels from 3 to 1, as shown in Figure 7, and reduces the number of feature maps to four outputs for each pixel, representing each base call confidence in .

时间层的输出被累加在输出补片中并且被递送到主机CPU以应用例如softmax函数930或其他函数以归一化碱基检出概率。The output of the temporal layers is accumulated in an output patch and delivered to the host CPU to apply, for example, a softmax function 930 or other function to normalize the base call probability.

图10示出了示出可针对碱基检出操作执行的10输入、六输出神经网络的另选具体实施。在该示例中,来自循环0至9的空间对准输入补片的区块数据被应用于空间层的隔离叠堆,诸如循环9的叠堆1001。将隔离叠堆的输出应用于具有输出1035(2)至1035(7)的时间叠堆1020的逆分层布置,从而提供受试者循环2至7的碱基检出分类数据。Figure 10 shows an alternative implementation showing a 10-input, six-output neural network that may be implemented for a base calling operation. In this example, tile data from the spatially aligned input patches of cycles 0 to 9 is applied to an isolated stack of spatial layers, such as stack 1001 of cycle 9 . The output of the isolation stack is applied to the inverse hierarchical arrangement of the time stack 1020 with outputs 1035(2) through 1035(7) to provide base call classification data for cycles 2 through 7 of the subject.

图11示出了基于神经网络的碱基检出器(例如,图7)的专门化架构的一个具体实施,该基于神经网络的碱基检出器用于隔离对不同测序循环的数据的处理。首先描述使用特化的架构的动机。FIG. 11 illustrates one implementation of a specialized architecture of a neural network-based base caller (eg, FIG. 7 ) for isolating the processing of data for different sequencing cycles. First describe the motivation for using a specialized architecture.

基于神经网络的碱基检出器处理当前测序循环、一个或多个先前测序循环以及一个或多个后续测序循环的数据。附加测序循环的数据提供序列特异性上下文。基于神经网络的碱基检出器在训练期间学习序列特异性上下文,并且对该序列特异性上下文进行碱基检出。此外,前测序循环和后测序循环的数据为当前测序循环提供了预定相和定相信号的二阶贡献。The neural network-based base caller processes data for the current sequencing cycle, one or more previous sequencing cycles, and one or more subsequent sequencing cycles. Data from additional sequencing cycles provide sequence-specific context. A neural network-based base caller learns a sequence-specific context during training, and performs base calling on that sequence-specific context. In addition, data from pre- and post-sequencing cycles provide second-order contributions of pre- and phased signals to the current sequencing cycle.

在不同测序循环处和不同图像通道中捕获的图像相对于彼此未对准并且具有残差配准误差。考虑到这种未对准,特化的架构包括空间卷积层,该空间卷积层不混合测序循环之间的信息并且仅混合测序循环内的信息。Images captured at different sequencing cycles and in different image channels were misaligned relative to each other and had residual registration errors. To account for this misalignment, the specialized architecture includes spatial convolutional layers that do not mix information between sequencing cycles and only mix information within a sequencing cycle.

空间卷积层使用所谓的“隔离卷积”,该隔离卷积通过经由“专用非共享”卷积序列独立处理多个测序循环中的每个测序循环的数据来实现隔离。该隔离卷积对仅给定测序循环(即,循环内)的数据和所得特征映射图进行卷积,而不对任何其他测序循环的数据和所得特征映射图进行卷积。The spatial convolutional layers use so-called "isolated convolutions" that achieve isolation by independently processing the data for each of the multiple sequencing cycles via a "dedicated non-shared" convolutional sequence. This isolated convolution convolves only the data and resulting feature maps of a given sequencing cycle (ie, within a cycle), and not the data and resulting feature maps of any other sequencing cycles.

例如,考虑输入数据包括(i)待进行碱基检出的当前(时间t)测序循环的当前数据,(ii)先前(时间t-1)测序循环的先前数据,以及(iii)先前(时间t+1)测序循环的后续数据。然后,专门化架构发起三个单独的数据处理管道(或卷积管道),即当前数据处理管道、先前数据处理管道和后续数据处理管道。当前数据处理管道接收当前(时间t)测序循环的当前数据作为输入,并且通过多个空间卷积层独立地处理该当前数据,以产生所谓的“当前空间卷积表示”作为最终空间卷积层的输出。先前数据处理管道接收先前(时间t-1)测序循环的先前数据作为输入,并且通过多个空间卷积层独立地处理该先前数据,以产生所谓的“先前空间卷积表示”作为最终空间卷积层的输出。后续数据处理管道接收后续(时间t+1)测序循环的后续数据作为输入,并且通过多个空间卷积层独立地处理该后续数据以产生所谓的“后续空间卷积表示”作为最终空间卷积层的输出。For example, consider that the input data includes (i) the current data for the current (time t) sequencing cycle to be base called, (ii) the previous data for the previous (time t-1) sequencing cycle, and (iii) the previous (time t) sequencing cycle t+1) Subsequent data of the sequencing cycle. Then, the specialized architecture initiates three separate data processing pipelines (or convolutional pipelines), namely the current data processing pipeline, the previous data processing pipeline and the subsequent data processing pipeline. The current data processing pipeline receives as input the current data of the current (time t) sequencing cycle and independently processes this current data through multiple spatial convolutional layers to produce a so-called "current spatial convolutional representation" as the final spatial convolutional layer Output. The previous data processing pipeline receives as input the previous data of the previous (time t-1) sequencing cycle and independently processes this previous data through multiple spatial convolution layers to produce the so-called "previous spatial convolution representation" as the final spatial convolution The output of the stack. The subsequent data processing pipeline receives as input the subsequent data of subsequent (time t+1) sequencing cycles and independently processes this subsequent data through multiple spatial convolution layers to produce a so-called "subsequent spatial convolution representation" as the final spatial convolution layer output.

在一些具体实施中,当前管道、一个或多个先前管道和一个或多个后续处理管道并行执行。In some implementations, the current pipeline, one or more previous pipelines, and one or more subsequent processing pipelines execute in parallel.

在一些具体实施中,空间卷积层是专门化架构内的空间卷积网络(或子网络)的一部分。In some implementations, the spatial convolutional layer is part of a spatial convolutional network (or sub-network) within a specialized architecture.

基于神经网络的碱基检出器还包括混合测序循环之间(即,循环间)的信息的时间卷积层。时间卷积层从空间卷积网络接收其输入,并且对由相应数据处理管道的最终空间卷积层产生的空间卷积表示进行操作。The neural network-based base caller also includes a temporal convolutional layer that mixes information between sequencing cycles (ie, between cycles). A temporal convolutional layer receives its input from a spatial convolutional network and operates on the spatial convolutional representation produced by the final spatial convolutional layer of the corresponding data processing pipeline.

时间卷积层的循环间可操作性自由源于以下事实:未对准属性通过由空间卷积层序列执行的隔离卷积的叠堆或级联而从空间卷积表示清除,该未对准属性存在于作为输入馈送到空间卷积网络的图像数据中。The inter-loop operability freedom of temporal convolutional layers stems from the fact that misalignment properties are cleaned from spatial convolutional representations by stacking or concatenation of isolated convolutions performed by sequences of spatial convolutional layers, which misalignment Attributes exist in the image data fed as input to a spatial convolutional network.

时间卷积层使用所谓的“组合卷积”,该组合卷积在滑动窗口的基础上逐组地对后续输入中的输入通道进行卷积。在一个具体实施中,这些后续输入是由先前的空间卷积层或先前时间卷积层产生的后续输出。Temporal convolutional layers use so-called "combined convolutions" that convolve the input channels in subsequent inputs group by group on a sliding window basis. In one implementation, these subsequent inputs are subsequent outputs produced by previous spatial convolutional layers or previous temporal convolutional layers.

在一些具体实施中,时间卷积层是专门化架构内的时间卷积网络(或子网络)的一部分。时间卷积网络从空间卷积网络接收其输入。在一个具体实施中,时间卷积网络的第一时间卷积层逐组地组合测序循环之间的空间卷积表示。在另一个具体实施中,时间卷积网络的后续时间卷积层组合先前时间卷积层的后续输出。In some implementations, the temporal convolutional layer is part of a temporal convolutional network (or sub-network) within a specialized architecture. A temporal convolutional network receives its input from a spatial convolutional network. In one implementation, the first temporal convolutional layer of the temporal convolutional network combines the spatial convolutional representations between sequencing cycles group-by-group. In another implementation, subsequent temporal convolutional layers of the temporal convolutional network combine subsequent outputs of previous temporal convolutional layers.

最终时间卷积层的输出被馈送到产生输出的输出层。输出用于在一个或多个测序循环处对一个或多个簇进行碱基检出。The output of the final temporal convolutional layer is fed to the output layer which produces the output. The output is used to base call one or more clusters at one or more sequencing cycles.

在前向传播期间,专门化架构以两个阶段处理来自多个输入的信息。在第一阶段中,使用隔离卷积来防止输入之间的信息混合。在第二阶段中,使用组合卷积来混合输入之间的信息。将来自第二阶段的结果用于对该多个输入进行单个推断。During forward propagation, specialized architectures process information from multiple inputs in two stages. In the first stage, isolated convolutions are used to prevent information mixing between inputs. In the second stage, combinatorial convolutions are used to mix information between inputs. The results from the second stage are used to make a single inference over the multiple inputs.

这不同于其中卷积层同时处理批量中的多个输入并且对该批量中的每个输入进行对应推断的批处理模式技术。相比之下,专门化架构将该多个输入映射到该单个推断。该单个推断可包括多于一个预测,诸如四种碱基(A、C、T和G)中的每种碱基的分类得分。This differs from batch-mode techniques where convolutional layers process multiple inputs in a batch simultaneously and make corresponding inferences for each input in the batch. In contrast, a specialized architecture maps the multiple inputs to this single inference. This single inference may include more than one prediction, such as a classification score for each of the four bases (A, C, T, and G).

在一个具体实施中,这些输入具有时间顺序,使得每个输入在不同的时间步长处生成并且具有多个输入通道。例如,该多个输入可包括以下三个输入:在时间步长(t)处由当前测序循环生成的当前输入、在时间步长(t-1)处由先前测序循环生成的先前输入以及在时间步长(t+1)处由后续测序循环生成的后续输入。在另一个具体实施中,每个输入分别来源于由一个或多个先前卷积层产生的当前输出、先前输出和后续输出,并且包括k个特征映射图。In one implementation, the inputs are temporally ordered such that each input is generated at a different time step and has multiple input channels. For example, the plurality of inputs may include the following three inputs: the current input generated by the current sequencing cycle at time step (t), the previous input generated by the previous sequencing cycle at time step (t-1), and the input generated by the previous sequencing cycle at time step (t-1). Subsequent inputs generated by subsequent sequencing cycles at time step (t+1). In another implementation, each input is derived from the current output, previous output and subsequent output produced by one or more previous convolutional layers, respectively, and includes k feature maps.

在一个具体实施中,每个输入可包括以下五个输入通道:红色图像通道、红色距离通道、绿色图像通道、绿色距离通道和缩放通道。在另一个具体实施中,每个输入可包括由先前卷积层产生的k特征映射图,并且每个特征映射图被视为输入通道。在又一示例中,每个输入可具有仅一个通道、两个通道或另一不同数量的通道。美国专利申请公开号2013/0079232的合并材料讨论了使用各种数量的通道(诸如一通道、两通道或四通道)的碱基检出。In one specific implementation, each input may include the following five input channels: a red image channel, a red distance channel, a green image channel, a green distance channel, and a scaling channel. In another implementation, each input may include k feature maps produced by previous convolutional layers, and each feature map is considered as an input channel. In yet another example, each input may have only one channel, two channels, or another different number of channels. The incorporated material of US Patent Application Publication No. 2013/0079232 discusses base calling using various numbers of lanes, such as one, two, or four lanes.

图12示出了隔离层的一个具体实施,每个隔离层可包括卷积。隔离卷积通过将卷积滤波器同步地应用于每个输入一次来处理该多个输入。利用隔离卷积,卷积滤波器组合相同输入中的输入通道,并且不组合不同输入中的输入通道。在一个具体实施中,将相同的卷积滤波器同步地应用于每个输入。在另一个具体实施中,将不同的卷积滤波器同步地应用于每个输入。在一些具体实施中,每个空间卷积层包括一组k个卷积滤波器,其中每个卷积滤波器同步地应用于每个输入。Figure 12 shows one implementation of isolation layers, each of which may include convolutions. Isolated convolutions process this multiple inputs by synchronously applying convolution filters once to each input. With isolated convolution, a convolution filter combines input channels in the same input and does not combine input channels in different inputs. In one implementation, the same convolutional filter is applied to each input synchronously. In another implementation, different convolution filters are applied to each input synchronously. In some implementations, each spatial convolutional layer includes a set of k convolutional filters, where each convolutional filter is applied synchronously to each input.

图13A示出了组合层的一个具体实施,每个组合层可包括卷积。图13B示出了组合层的另一具体实施,每个组合层可包括卷积。组合卷积通过对不同输入的对应输入通道进行分组并将卷积滤波器应用于每个分组来混合不同输入之间的信息。对这些对应输入通道的分组和卷积滤波器的应用是在滑动窗口的基础上发生的。在该上下文中,窗口跨越两个或更多个后续输入通道,其表示例如两个后续测序循环的输出。由于该窗口是滑动窗口,因此大多数输入通道用于两个或更多个窗口中。Figure 13A shows one implementation of combined layers, each of which may include convolutions. Figure 13B shows another implementation of combined layers, each of which may include convolutions. Combined convolution mixes information between different inputs by grouping their corresponding input channels and applying a convolution filter to each grouping. The grouping of these corresponding input channels and the application of convolutional filters occurs on a sliding window basis. In this context, a window spans two or more subsequent input lanes, representing eg the output of two subsequent sequencing cycles. Since the window is a sliding window, most input channels are used in two or more windows.

在一些具体实施中,不同输入源于由先前空间卷积层或先前时间卷积层产生的输出序列。在该输出序列中,这些不同输入被布置为后续输出并且因此被后续时间卷积层视为后续输入。然后,在该后续时间卷积层中,这些组合卷积将卷积滤波器应用于这些后续输入中的对应输入通道组。In some implementations, the different inputs originate from output sequences produced by previous spatial convolutional layers or previous temporal convolutional layers. In this output sequence, these different inputs are arranged as subsequent outputs and are thus considered subsequent inputs by subsequent temporal convolutional layers. Then, in the subsequent temporal convolutional layers, the combined convolutions apply convolution filters to corresponding sets of input channels in these subsequent inputs.

在一个具体实施中,这些后续输入具有时间顺序,使得当前输入在时间步长(t)处由当前测序循环生成,先前输入在时间步长(t-1)处由先测序循环生成,并且后续输入在时间步长(t+1)处由后续测序循环生成。在另一个具体实施中,每个后续输入分别来源于由一个或多个先前卷积层产生的当前输出、先前输出和后续输出,并且包括k个特征映射图。In one implementation, these subsequent inputs have a temporal order such that the current input is generated by the current sequencing cycle at time step (t), the previous input is generated by the previous sequencing cycle at time step (t-1), and the subsequent Inputs are generated by subsequent sequencing cycles at time steps (t+1). In another specific implementation, each subsequent input is respectively derived from the current output, previous output and subsequent output produced by one or more previous convolutional layers, and includes k feature maps.

在一个具体实施中,每个输入可包括以下五个输入通道:红色图像通道、红色距离通道、绿色图像通道、绿色距离通道和缩放通道。在另一个具体实施中,每个输入可包括由先前卷积层产生的k特征映射图,并且每个特征映射图被视为输入通道。In one specific implementation, each input may include the following five input channels: a red image channel, a red distance channel, a green image channel, a green distance channel, and a scaling channel. In another implementation, each input may include k feature maps produced by previous convolutional layers, and each feature map is considered as an input channel.

卷积滤波器的深度B取决于后续输入的数量,这些后续输入的对应输入通道由卷积滤波器在滑动窗口的基础上逐组地进行卷积。换句话讲,深度B等于每个滑动窗口中的后续输入的数量和组大小。The depth B of the convolutional filter depends on the number of subsequent inputs whose corresponding input channels are convoluted by the convolutional filter group by group on a sliding window basis. In other words, the depth B is equal to the number and group size of subsequent inputs in each sliding window.

在图13A中,来自两个后续输入的对应输入通道在每个滑动窗口中组合,并且因此B=2。在图13B中,来自三个后续输入的对应输入通道在每个滑动窗口中组合,并且因此B=3。In FIG. 13A, corresponding input channels from two subsequent inputs are combined in each sliding window, and thus B=2. In Figure 13B, corresponding input channels from three subsequent inputs are combined in each sliding window, and thus B=3.

在一个具体实施中,滑动窗口共享相同的卷积滤波器。在另一个具体实施中,针对每个滑动窗口使用不同的卷积滤波器。在一些具体实施中,每个时间卷积层包括一组k个卷积滤波器,其中每个卷积滤波器在滑动窗口的基础上应用于后续输入。In one implementation, the sliding windows share the same convolutional filter. In another implementation, a different convolution filter is used for each sliding window. In some implementations, each temporal convolutional layer includes a set of k convolutional filters, where each convolutional filter is applied to subsequent inputs on a sliding window basis.

图4至图10的更多细节及其变型可见于2021年2月15日提交的名称为“HARDWAREEXECUTION AND ACCELERATION OF ARTIFICIAL INTELLIGENCE-BASED BASE CALLER”的共同未决的美国非临时专利申请号17/176,147(代理人案卷号ILLM 1020-2/IP-1866-US),该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Further details of Figures 4 through 10 and variations thereof can be found in co-pending U.S. Nonprovisional Patent Application No. 17/176,147, filed February 15, 2021, entitled "HARDWARE EXECUTION AND ACCELERATION OF ARTIFICIAL INTELLIGENCE-BASED BASE CALLER" (Attorney Docket No. ILLM 1020-2/IP-1866-US), which patent application is incorporated herein by reference as if fully set forth herein.

图14示出了用于碱基检出的示例性基于区块位置的权重选择方案。例如,图14中示出的是包括多个槽道1450的示例性流通池1400,每个槽道包括对应的多个区块(例如,如相对于图1和图2所讨论的)。流通池1400的描绘在性质上是象征性的,并且流通池1400象征性地描绘了其内的各种槽道和区块,而未示出流通池1400的各种其他部件。图14示出了流通池1400的顶视图(例如,而未示出图1的流罩136)。FIG. 14 illustrates an exemplary tile position-based weight selection scheme for base calling. For example, shown in FIG. 14 is an exemplary flow cell 1400 comprising a plurality of channels 1450, each channel comprising a corresponding plurality of blocks (eg, as discussed with respect to FIGS. 1 and 2). The depiction of flow cell 1400 is symbolic in nature, and flow cell 1400 symbolically depicts the various channels and blocks within it, while various other components of flow cell 1400 are not shown. FIG. 14 shows a top view of a flow cell 1400 (eg, without the flow shield 136 of FIG. 1 ).

在一个实施方案中并且同样如相对于图2讨论的,流通池1400被划分或分区为多个槽道,诸如槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,即,P个槽道,其中P是正整数。同样如相对于图2所讨论的,在一个实施方案中,各个槽道1450被进一步分区为被称为区块的非重叠区域。在示例中,每个槽道1450包括一个或多个区块列。例如,在图14中,每个槽道1450包括两个对应的区块列,其中图14中单个区块由对应的矩形框示出。每个槽道内的每个区块列内的区块数量是具体实施特定的。每个区块包括对应的多个簇。在测序过程中,对区块上的簇及其周围背景进行成像。例如,图2和图3示出了区块内的簇的示例。In one embodiment and also as discussed with respect to FIG. 2 , flow cell 1400 is divided or zoned into a plurality of channels, such as channels 1450a, 1450b, 1450c, . . . , 1450(P-2), 1450(P- 1) and 1450P, ie, P slots, where P is a positive integer. Also as discussed with respect to FIG. 2, in one embodiment, the individual channels 1450 are further partitioned into non-overlapping regions referred to as blocks. In an example, each slot 1450 includes one or more tile columns. For example, in FIG. 14, each channel 1450 includes two corresponding columns of blocks, where a single block in FIG. 14 is shown by a corresponding rectangular box. The number of blocks in each block column in each slot is implementation specific. Each block includes a corresponding number of clusters. During the sequencing process, the clusters on the block and their surrounding background are imaged. For example, Figures 2 and 3 show examples of clusters within a block.

在一个实施方案中,例如,基于区块的位置,将流通池1400的区块分类为各种类型。在图14的示例性具体实施中,流通池1400的区块中的各个区块被分类为边缘块1408、近边缘区块1410或非边缘(或中央)区块1412。In one embodiment, the sections of the flow cell 1400 are classified into various types, eg, based on the location of the sections. In the exemplary implementation of FIG. 14 , individual ones of the blocks of the flow cell 1400 are classified as edge blocks 1408 , near edge blocks 1410 , or non-edge (or central) blocks 1412 .

例如,在流通池1400的垂直边缘(例如,沿Y轴)和/或水平边缘(例如,沿X轴)上的区块被分类为边缘区块1408,如图14所示。因此,边缘区块1408与流通池1400的对应边缘紧邻。For example, blocks on the vertical edges (eg, along the Y-axis) and/or horizontal edges (eg, along the X-axis) of the flow cell 1400 are classified as edge blocks 1408 , as shown in FIG. 14 . Thus, the edge blocks 1408 are immediately adjacent to the corresponding edges of the flow cell 1400 .

靠近(例如,紧邻)边缘区块的区块被分类为近边缘区块1410。例如,近边缘区块1410是与流通池1400的边缘隔开的区块。因此,边缘区块1408将对应的近边缘区块1410与流通池1400的对应边缘隔开。Blocks that are close to (eg, immediately adjacent to) edge blocks are classified as near-edge blocks 1410 . For example, near-edge segment 1410 is a segment that is spaced from the edge of flow cell 1400 . Thus, the edge segment 1408 separates the corresponding near-edge segment 1410 from the corresponding edge of the flow cell 1400 .

不是边缘区块或近边缘区块的区块是非边缘区块1412,也被称为中央区块1412。因此,中央区块1412例如与边缘区块1408或近边缘区块1410相比相对更靠近流通池1400的中心。例如,中央区块1414通过边缘区块1408和近边缘区块1410与流通池1400的边缘隔开。Blocks that are not edge blocks or near-edge blocks are non-edge blocks 1412 , also referred to as central blocks 1412 . Thus, central block 1412 is relatively closer to the center of flow cell 1400 than, for example, edge block 1408 or near-edge block 1410 . For example, central block 1414 is separated from the edge of flow cell 1400 by edge block 1408 and near edge block 1410 .

尽管在图14中流通池1400的区块分类为三类(诸如边缘、近边缘和中央或非边缘),但是此类分类仅仅是示例,并且也可使用不同的基于区块位置的分类。例如,在另一具体实施中,区块可以被分类为(i)边缘或近边缘区块,和(ii)中央区块(例如,边缘区块和近边缘区块类别可合并成单个类别),从而得到两个区块类别。Although in FIG. 14 the blocks of the flow cell 1400 are classified into three categories (such as edge, near edge, and central or non-edge), such classifications are merely examples, and different block location-based classifications may also be used. For example, in another implementation, tiles may be categorized as (i) edge or near-edge tiles, and (ii) central tiles (e.g., edge and near-edge tile categories may be combined into a single category) , resulting in two block categories.

如先前所讨论的,图7和图10是可用于碱基检出的示例性多循环神经网络模型,并且图9是可用于如图7和图9的系统中的神经网络的叠堆的简化表示。用于碱基检出的神经网络模型内的各种功能使用偏置和权重。例如,在卷积操作期间,包括一个或多个内核的过滤器(例如,如图12所示)具有对应的多个权重,该多个权重在神经网络模型的训练阶段期间进行训练。例如,使用从一个或多个区块生成的训练数据来调谐权重,并且这些权重用于例如图14的流通池中的碱基检出。As previously discussed, Figures 7 and 10 are exemplary multi-cycle neural network models that can be used for base calling, and Figure 9 is a simplification of a stack of neural networks that can be used in systems such as Figures 7 and 9 express. Bias and weights are used by various functions within a neural network model for base calling. For example, during a convolution operation, a filter comprising one or more kernels (eg, as shown in FIG. 12 ) has a corresponding plurality of weights that are trained during the training phase of the neural network model. For example, the weights are tuned using training data generated from one or more blocks, and these weights are used, for example, for base calling in the flow cell of FIG. 14 .

针对流通池1400的各个区块中的簇执行碱基检出循环。在示例中,与区块的碱基检出操作相关的参数可基于区块的相对位置。例如,相对于图1所讨论的激发光101被导向流通池的区块,并且例如,基于各个区块的位置和/或发射激发光101的一个或多个光源的位置,不同的区块可接收不同量的激发光101。例如,如果发射激发光101的光源垂直位于流通池1400上方,则中央区块1412可接收与边缘区块1408和/或近边缘区块1410不同量的光。A base calling cycle is performed for the clusters in each block of the flow cell 1400 . In an example, parameters related to a basecalling operation for a block may be based on the relative position of the block. For example, the excitation light 101 discussed with respect to FIG. Different amounts of excitation light 101 are received. For example, if the light source emitting excitation light 101 is positioned vertically above flow cell 1400 , central block 1412 may receive a different amount of light than edge block 1408 and/or near edge block 1410 .

在另一示例中,在流通池1400周围的周边或外部光(例如,来自生物传感器100外部的环境光)可影响由流通池1400的各个区块接收的激发光101的量和/或特征。仅作为示例,边缘区块1408可接收激发光101以及来自流通池1400外部的一定量的周边光,而中央区块1412可主要接收激发光101。In another example, ambient or external light around the flowcell 1400 (eg, ambient light from outside the biosensor 100 ) can affect the amount and/or characteristics of the excitation light 101 received by various sections of the flowcell 1400 . By way of example only, edge block 1408 may receive excitation light 101 and some amount of ambient light from outside flow cell 1400 , while central block 1412 may primarily receive excitation light 101 .

在又一示例中,包括在流通池1400中的各个传感器(或像素或光电二极管)(例如,图1中所示的传感器106、108、110、112和114)可基于对应传感器的位置来感测光,这些位置基于对应区块的位置。例如,与周边光对与中央区块1412相关联的一个或多个其他传感器的感测操作的影响相比,由与边缘区块1408相关联的一个或多个传感器执行的感测操作可相对更多地受到周边光(以及激发光101)的影响。In yet another example, individual sensors (or pixels or photodiodes) included in flow cell 1400 (eg, sensors 106, 108, 110, 112, and 114 shown in FIG. Metering, these positions are based on the position of the corresponding block. For example, the sensing operations performed by one or more sensors associated with edge block 1408 may be relatively sensitive compared to the effect of ambient light on the sensing operations of one or more other sensors associated with central block 1412. are more affected by ambient light (and excitation light 101 ).

在另一示例中,反应物(例如,其包括可用于在碱基检出期间获得期望反应的任何物质,诸如试剂、酶、样品、其他生物分子和缓冲溶液)流向各种区块的流动也可能受到区块位置的影响。例如,靠近反应物的源的区块可比离源更远的区块接受到更大量的反应物。In another example, the flow of reactants (e.g., including anything that can be used to obtain a desired response during base calling, such as reagents, enzymes, samples, other biomolecules, and buffer solutions) to the various blocks is also May be affected by block location. For example, a block closer to a source of a reactant may receive a larger amount of reactant than a block further from the source.

因此,换句话说,对于不同类别的区块,与碱基检出相关联的参数可能略有不同。因此,在一个实施方案中,不同的权重集用于不同类别的区块,以补偿上述讨论的碱基检出过程的示例性区块位置依赖性。So, in other words, the parameters associated with base calling may be slightly different for different classes of blocks. Thus, in one embodiment, different sets of weights are used for different classes of blocks to compensate for the exemplary block position dependence of the base calling process discussed above.

例如,在图14的具体实施中,使用三个候选权重集:(i)用于边缘区块的边缘权重集WeT 1418,(ii)用于近边缘区块的近边缘权重集WnT 1420,以及(iii)用于中央(或非边缘)边缘区块的中央权重集WcT 1422。For example, in the implementation of FIG. 14, three candidate weight sets are used: (i) edge weight set WeT 1418 for edge blocks, (ii) near-edge weight set WnT 1420 for near-edge blocks, and (iii) Central weight set WcT 1422 for central (or non-edge) edge blocks.

在示例中,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由边缘区块1408生成的图像数据上进行训练(例如,不使用从近边缘或中央区块生成的训练数据)。所得权重被包括在边缘权重集WeT 1418中。In an example, when training a neural network model for base calling, such as those discussed with respect to FIGS. train on (e.g., do not use training data generated from near-edge or central patches). The resulting weights are included in edge weight set WeT 1418 .

随后,在仅由近边缘区块1410生成的图像数据上训练神经网络模型(例如,不使用从边缘或中央区块生成的训练数据),并且所得权重被包括在近边缘权重集WnT 1420中。最后,在仅由中央区块1412生成的图像数据上训练神经网络模型(例如,不使用从边缘或近边缘区块生成的训练数据),并且所得的权重被包括在边缘权重集WcT 1422中。Subsequently, the neural network model is trained on image data generated only by near-edge blocks 1410 (eg, without using training data generated from edge or central blocks), and the resulting weights are included in near-edge weight set WnT 1420 . Finally, the neural network model is trained on image data generated only by the central block 1412 (eg, without using training data generated from edge or near-edge blocks), and the resulting weights are included in edge weight set WcT 1422 .

因此,每个权重集包括用于配置神经网络模型的对应的多个权重,其中所配置的神经网络用于处理来自对应类别的区块的传感器数据。例如,如相对于图7、图9、图10和图11所讨论的,神经网络模型的拓扑结构包括(i)不组合传感器数据和连续感测循环之间的所得特征映射图的一个或多个空间层,和(ii)组合连续感测循环之间的所得特征映射图的时间层。因此,每个权重集包括空间层的对应空间权重和时间层的对应时间权重。例如,用于边缘区块的边缘权重集WeT 1418包括空间层的对应第一一个或多个空间权重和时间层的对应第一一个或多个时间权重。类似地,用于中央区块的中央权重集WcT 1422包括空间层的对应第二一个或多个空间权重和时间层的对应第二一个或多个时间权重。Accordingly, each set of weights includes a corresponding plurality of weights for configuring a neural network model for processing sensor data from a corresponding class of blocks. For example, as discussed with respect to FIGS. 7, 9, 10, and 11, the topology of the neural network model includes (i) one or more features that do not combine sensor data and resulting feature maps between successive sensing cycles. spatial layers, and (ii) a temporal layer combining the resulting feature maps between successive sensing cycles. Thus, each weight set includes corresponding spatial weights for the spatial layer and corresponding temporal weights for the temporal layer. For example, the set of edge weights WeT 1418 for an edge block includes a corresponding first one or more spatial weights for the spatial layer and a corresponding first one or more temporal weights for the temporal layer. Similarly, the central set of weights WcT 1422 for the central tile includes a corresponding second one or more spatial weights for the spatial layer and a corresponding second one or more temporal weights for the temporal layer.

在执行碱基检出循环时的推理阶段期间,当要检出边缘区块的簇内的碱基时,用边缘权重集WeT 1418配置神经网络模型,并且来自边缘区块的传感器数据用于碱基检出操作。类似地,当要检出近边缘区块的簇内的碱基时,用近边缘权重集WnT 1420配置神经网络模型,并且来自近边缘区块的传感器数据用于碱基检出操作。最后,当要检出中央区块的簇内的碱基时,用中央权重集WeT 1422配置神经网络模型,并且来自中央区块的传感器数据用于碱基检出操作。During the inference phase when performing a base calling loop, when a base within a cluster of an edge block is to be called, the neural network model is configured with the edge weight set WeT 1418, and the sensor data from the edge block is used for the base Base checkout operation. Similarly, when bases are to be called within a cluster of near-margin blocks, the neural network model is configured with the near-margin weight set WnT 1420 and sensor data from the near-margin blocks are used for the base calling operation. Finally, when the bases within the clusters of the central block are to be called, the neural network model is configured with the central weight set WeT 1422, and the sensor data from the central block is used for the base calling operation.

图15示出了用于碱基检出的另一示例性基于区块位置的权重选择方案。例如,图15中示出的是流通池1400,其包括该多个槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,其中每个槽道包括对应的多个区块。FIG. 15 illustrates another exemplary tile location-based weight selection scheme for base calling. For example, shown in FIG. 15 is a flow cell 1400 comprising the plurality of channels 1450a, 1450b, 1450c, ..., 1450(P-2), 1450(P-1), and 1450P, wherein each channel comprises Corresponding multiple blocks.

在图15的示例中,基于区块所属的对应槽道的位置来对流通池1400的每个区块进行分类。例如,流通池1400的顶部一个或多个槽道(诸如槽道1450P和1450(P-1))被分类为顶部周边槽道,流通池1400的底部一个或多个槽道(如槽道1450a和1450b)被分类为底部周边槽道,并且流通池1400的中央一个或多个槽道(如槽道1450c和1450(P-2))被分类为中央槽道。注意,属于每个类别的槽道数量仅仅是示例,并且变型可以是可能的。例如,代替两个槽道,每个周边槽道类别可包括一个对应的槽道或三个对应的槽道等。In the example of FIG. 15, each segment of the flow cell 1400 is classified based on the location of the corresponding lane to which the segment belongs. For example, the top one or more channels of flow cell 1400 (such as channels 1450P and 1450(P-1)) are classified as top peripheral channels, and the bottom one or more channels of flow cell 1400 (such as channel 1450a and 1450b) are classified as bottom peripheral channels, and the central channel or channels of flowcell 1400, such as channels 1450c and 1450(P-2), are classified as central channels. Note that the number of channels belonging to each category is just an example and variations may be possible. For example, instead of two channels, each peripheral channel category may include one corresponding channel or three corresponding channels, etc.

顶部周边槽道内的区块被分类为顶部周边槽道区块1508a,底部周边槽道内的区块被分类为底部周边槽道区块1508b,并且中央槽道内的区块被分类为中央槽道区块1510。The blocks within the top peripheral channel are classified as top peripheral channel blocks 1508a, the blocks within the bottom peripheral channel are classified as bottom peripheral channel blocks 1508b, and the blocks within the central channel are classified as central channel regions Block 1510.

出于相对于图14所讨论的原因,在一个实施方案中,可以向图15的流通池中的各种类别的槽道内的区块分配不同的权重集。例如,在图15的具体实施中,使用两个候选权重集:(i)用于周边槽道区块1508a、1508b(例如,属于顶部和底部周边槽道的区块)的周边权重集WpL 1504,和(ii)用于中央槽道区块1510的中央权重集WcL 1506。For reasons discussed with respect to FIG. 14 , in one embodiment, different sets of weights may be assigned to the blocks within the various classes of channels in the flow cell of FIG. 15 . For example, in the implementation of FIG. 15, two candidate weight sets are used: (i) Perimeter weight set WpL 1504 for perimeter channel blocks 1508a, 1508b (e.g., blocks belonging to top and bottom perimeter channels) , and (ii) the central weight set WcL 1506 for the central channel block 1510.

例如,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由周边槽道区块1508a、1508b生成的图像数据上训练神经网络模型(例如,不使用从中央槽道区块1510生成的训练数据)。所得权重被包括在周边权重集WpL1504中。For example, when training a neural network model for base calling, such as those discussed with respect to FIGS. The neural network model is trained on the image data (eg, without using the training data generated from the central channel block 1510). The resulting weights are included in the perimeter weight set WpL1504.

随后,在仅由中央槽道区块1510生成的图像数据上训练神经网络模型(例如,不使用从周边槽道区块1508a、1508b生成的训练数据),并且所得权重被包括在中央权重集WcL1506中。Subsequently, the neural network model is trained on the image data generated by only the central channel block 1510 (e.g., without using training data generated from the peripheral channel blocks 1508a, 1508b), and the resulting weights are included in the central weight set WcL1506 middle.

在执行碱基检出循环时的推理阶段期间,当要检出周边槽道区块1508的簇内的碱基时,用来自周边权重集WpL 1504的权重配置神经网络模型,并且来自周边槽道区块的传感器数据1508用于碱基检出操作。类似地,当要检出中央槽道区块1510的簇内的碱基时,用来自中央权重集WcL 1506的权重配置神经网络模型,并且来自中央槽道区块1510的传感器数据用于碱基检出操作。During the inference phase when performing a base calling loop, when a base within a cluster of the perimeter lane block 1508 is to be called, the neural network model is configured with weights from the perimeter weight set WpL 1504, and from the perimeter lane The block's sensor data 1508 is used for base calling operations. Similarly, when bases are to be called within clusters of central lane block 1510, the neural network model is configured with weights from central weight set WcL 1506, and sensor data from central lane block 1510 is used for base Checkout operation.

图16示出了用于碱基检出的又一示例性基于区块位置的权重选择方案。例如,图16中示出的是流通池1400,其包括该多个槽道1450a、1450b、1450c、…、1450(P-2)、1450(P-1)和1450P,其中每个槽道包括对应的多个区块。FIG. 16 illustrates yet another exemplary block location-based weight selection scheme for base calling. For example, shown in FIG. 16 is a flow cell 1400 that includes the plurality of channels 1450a, 1450b, 1450c, ..., 1450(P-2), 1450(P-1), and 1450P, wherein each channel includes Corresponding multiple blocks.

在图16的示例中,基于虚线1603,将流通池1400划分为多个段或区段(即,虚线1603用于分类,并且实际上不存在于流通池中)。例如,流通池1400被划分为顶部左侧区段1610TL(权重集WTL)、顶部中央区段1610TC(权重集WTC)、顶部右侧区段1610TR(权重集WTR)、中间左侧区段1610ML(权重集WML)、中央区段1610C(权重集WC)、中间右侧区段1610MR(权重集WMR)、底部左侧区段1610BL(权重集WML)、底部中央区段1610BC(权重集WBC)和底部左侧区段1610BL(权重集WBL)。基于区块所属的区段来对流通池1400的每个区块进行分类。In the example of FIG. 16, the flow cell 1400 is divided into segments or sections based on the dashed lines 1603 (ie, the dashed lines 1603 are used for classification and are not actually present in the flow cell). For example, flow cell 1400 is divided into top left section 1610TL (weight set WTL), top central section 1610TC (weight set WTC), top right section 1610TR (weight set WTR), middle left section 1610ML ( weight set WML), central section 1610C (weight set WC), middle right section 1610MR (weight set WMR), bottom left section 1610BL (weight set WML), bottom central section 1610BC (weight set WBC), and Bottom left section 1610BL (weight set WBL). Each block of the flow cell 1400 is classified based on the section to which the block belongs.

出于类似于相对于图14所讨论的原因,在一个实施方案中,向图16的各种区段内的区块分配对应的权重集。例如,在图16的具体实施中,向顶部左侧区段1610TL中的区块分配顶部左侧权重集WTL,向顶部中央区段1610TC中的区块分配顶部中央权重集WTC,向顶部右侧区段1610TR中的区块分配顶部右侧权重集WTR,向中间左侧区段1610ML中的区块分配中间左侧权重集WML,向中央区段1610C中的区块分配中央权重集WC,向中间右侧区段1610MR中的区块分配中间右侧权重集WMR,向底部左侧区段1610BL中的区块分配底部左侧权重集WML,向底部中央区段1610BC中的区块分配底部中央权重集WBC,向底部左侧区段1610BL中的区块分配底部左侧权重集WBL。For reasons similar to those discussed with respect to FIG. 14 , in one embodiment, blocks within the various sections of FIG. 16 are assigned corresponding sets of weights. For example, in the implementation of FIG. 16, tiles in top left section 1610TL are assigned top left weight set WTL, tiles in top central section 1610TC are assigned top central weight set WTC, and top right Blocks in section 1610TR are assigned top right weight set WTR, blocks in middle left section 1610ML are assigned middle left weight set WML, blocks in central section 1610C are assigned central weight set WC, and blocks in middle left section 1610C are assigned central weight set WC. Blocks in middle right section 1610MR are assigned middle right weight set WMR, blocks in bottom left section 1610BL are assigned bottom left weight set WML, blocks in bottom center section 1610BC are assigned bottom center Weight Set WBC, assigns the bottom left weight set WBL to the blocks in the bottom left section 1610BL.

例如,在训练用于碱基检出的神经网络模型(诸如相对于图7、图9和图10所讨论的那些神经网络模型)时,最初在仅由顶部左侧区段1610TL上的区块生成的传感器数据上训练神经网络模型(例如,不使用来自其他类别的区块的传感器数据),并且所得权重被包括在顶部左侧权重集WTL中。对于各种其他区段的区块重复该过程,以生成各种候选权重集,诸如顶部中央权重集WTC、顶部右侧权重集WTR、中间左侧权重集WML、中央权重集WC、中间右侧权重集WMR、底部左侧权重集WML、底部中央权重集WBC和底部右侧区段权重集WBL。For example, when training a neural network model for base calling, such as those discussed with respect to FIGS. The neural network model is trained on the generated sensor data (eg, without using sensor data from other classes of blocks), and the resulting weights are included in the top left weight set WTL. This process is repeated for blocks of various other sectors to generate various candidate weight sets, such as top central weight set WTC, top right weight set WTR, middle left weight set WML, central weight set WC, middle right weight set Weight set WMR, bottom left weight set WML, bottom center weight set WBC, and bottom right section weight set WBL.

在执行碱基检出循环时的推理阶段期间,当要检出在顶部左侧区段1610TL内的区块的簇内的碱基时,用对应的顶部左侧权重集WTL内的权重配置神经网络模型,并且来自顶部左侧区段1610TL的区块的传感器数据用于碱基检出操作。对于各种其他区段的区块,类似地重复该过程。During the inference phase when performing a base calling loop, when a base is to be called within a cluster of blocks within the top left segment 1610TL, the neuron is configured with weights within the corresponding top left weight set WTL The network model and sensor data from the blocks of the top left section 1610TL are used for base calling operations. This process is similarly repeated for blocks of various other sectors.

在图16中,流通池1400被分区为9个不同的区段。然而,流通池1400可以被分区为不同数量的区段,诸如包括顶部左侧象限、顶部右侧象限、底部左侧象限和底部右侧象限的四个区段。In Figure 16, the flow cell 1400 is partitioned into 9 different sections. However, the flow cell 1400 may be partitioned into a different number of sections, such as four sections including a top left quadrant, a top right quadrant, a bottom left quadrant, and a bottom right quadrant.

图17A示出了衰落的示例,其中信号强度随着作为碱基检出操作的测序运行的循环数而降低。衰落是荧光信号强度随着循环数的指数衰减。随着测序运行的进行,分析物链被过度洗涤,暴露于产生反应性物质的激光辐射,并且经受恶劣环境条件。所有这些导致每个分析物中片段的逐渐丢失,从而降低了其荧光信号强度。衰落也称为变暗或信号衰减。图17A示出了衰落1700的一个示例。在图17A中,具有AC微卫星的分析物片段的强度值表现出指数衰减。Figure 17A shows an example of fading in which signal strength decreases with cycle number of a sequencing run as a base calling operation. Fade is the exponential decay of fluorescence signal intensity with cycle number. As the sequencing run progresses, the analyte strands are excessively washed, exposed to laser radiation that produces reactive species, and subjected to harsh environmental conditions. All of this leads to a progressive loss of fragments within each analyte, reducing its fluorescence signal intensity. Fading is also known as dimming or signal decay. An example of fading 1700 is shown in FIG. 17A . In Figure 17A, the intensity values of analyte fragments with AC microsatellites exhibit an exponential decay.

图17B概念性地示出了随着测序循环进展而降低的信噪比。例如,随着测序进行,准确的碱基检出变得越来越困难,因为信号强度降低且噪声增加,从而导致信噪比显著降低。在物理上,观察到与较早合成步骤相比,稍后合成步骤在相对于传感器的不同位置上附着标签。当传感器位于正被合成的序列下方时,由于与较早步骤相比,在稍后测序步骤中,标签附着到距传感器更远的链上,从而导致信号衰减。这导致随着测序循环进展,信号衰减。在一些设计中,在传感器位于保持簇的基板上方的情况下,随着测序进行,信号可增加而不是衰减。Figure 17B conceptually illustrates the decreasing signal-to-noise ratio as the sequencing cycle progresses. For example, accurate base calling becomes increasingly difficult as sequencing progresses because signal strength decreases and noise increases, resulting in a significantly lower signal-to-noise ratio. Physically, it was observed that later synthesis steps attach labels at different positions relative to the sensor compared to earlier synthesis steps. When the sensor is positioned below the sequence being synthesized, the signal is attenuated due to tags being attached to strands farther from the sensor in later sequencing steps than in earlier steps. This results in signal attenuation as the sequencing cycle progresses. In some designs, where the sensor is located above the substrate holding the clusters, the signal may increase rather than decay as sequencing progresses.

在研究的流通池设计中,当信号衰减时,噪声变大。在物理上,随着测序进行,定相和预定相增加噪声。定相是指测序中标签未能沿序列前进的步骤。预定相是指标签在测序循环期间向前跳两个位置而不是一个位置的测序步骤。定相和预定相相对不频繁,在大约500个至1000个循环中发生一次。与预定相相比,定相略微更频繁。定相和预定相影响产生强度数据的簇中的各个链,因此随着测序进行,来自簇的强度噪声分布累积成二项、三项、四项等展开式。In the flow cell designs studied, as the signal decays, the noise becomes louder. Physically, phasing and prephasing add noise as sequencing proceeds. Phasing refers to the step in sequencing where the tags fail to progress along the sequence. Prephasing refers to the sequencing step in which the index jumps forward two positions instead of one during the sequencing cycle. Phasing and prephasing are relatively infrequent, occurring once in about 500 to 1000 cycles. Phasing is slightly more frequent than pre-phasing. Phasing and prephasing affect the individual strands in the cluster from which the intensity data is generated, so that as sequencing progresses, the intensity noise distribution from the clusters accumulates into binomial, trinomial, quaternary, etc. expansions.

衰落、信号衰减和信噪比降低以及图17A和图17B的更多细节可见于2020年5月14日提交的名称为“Systems and Devices for Characterization and PerformanceAnalysis of Pixel-Based Sequencing”的美国非临时专利申请号16/874,599(代理人案卷号ILLM 1011-4/IP-1750-US)中,该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Further details of fading, signal attenuation, and signal-to-noise ratio reduction, and Figures 17A and 17B can be found in a U.S. nonprovisional patent entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 Application No. 16/874,599 (Attorney Docket No. ILLM 1011-4/IP-1750-US), which is incorporated herein by reference as if fully set forth herein.

因此,在碱基检出期间,碱基检出的可靠性或质量(例如,检出碱基正确的概率)可基于针对其当前碱基正被检出的碱基检出循环数。因此,除了或代替根据区块的位置(例如,如相对于图14、图15、图16所讨论的),权重集还可基于针对其碱基检出操作正被执行的当前循环数。图18示出了用于碱基检出的示例性基于碱基检出循环数的权重选择方案。Thus, during base calling, the reliability or quality of a base call (eg, the probability that a base is called correctly) may be based on the number of base calling cycles for which the current base is being called. Thus, in addition to or instead of depending on the location of the tile (eg, as discussed with respect to FIGS. 14, 15, 16), the set of weights may also be based on the current cycle number for which the base calling operation is being performed. Figure 18 illustrates an exemplary base calling cycle number based weight selection scheme for base calling.

例如,图18指向示例性区块M的碱基检出运行。假设存在N个碱基检出循环,在此期间,将识别示例性区块M中的各种簇中的链。如所讨论的,由于相对于图17A和图17B和/或各种其他因素所讨论的因素,由生物传感器(例如,图1的传感器106、108、110、112和114)检测到的信号强度随着碱基检出循环数而变化(例如,衰减)。例如,假设N个碱基检出感测循环被划分为三个循环子系列,诸如(a)初始感测循环1至N1、(b)中间感测循环(N1+1)至N2和(c)最终感测循环(N2+1)至N,如图18所示,其中N>N2>N1,并且N、N1、N2是正整数。因此,N个感测循环被划分为三个循环子系列,尽管N个感测循环也可以被划分为不同数量(诸如2个、4个或更大数量)的循环子系列。For example, FIG. 18 points to an exemplary block M base calling run. Assume that there are N base calling cycles during which strands in various clusters in exemplary block M will be identified. As discussed, due to the factors discussed with respect to FIGS. 17A and 17B and/or various other factors, the signal strengths detected by the biosensors (e.g., sensors 106, 108, 110, 112, and 114 of FIG. 1 ) Varies (eg, decays) with base calling cycle number. For example, assume that N base-calling sensing cycles are divided into three cycle subseries, such as (a) initial sensing cycles 1 to N1, (b) intermediate sensing cycles (N1+1) to N2, and (c ) The final sensing cycle (N2+1) to N, as shown in FIG. 18, wherein N>N2>N1, and N, N1, N2 are positive integers. Thus, the N sensing cycles are divided into three cycle sub-series, although the N sensing cycles may also be divided into a different number (such as 2, 4 or a larger number) of cycle sub-series.

需注意,上述三个循环子系列中的每个循环子系列中的感测循环数量可以相等或可不相等,并且是具体实施特定的。仅作为示例并且不限制本公开的范围,如果N为100,则100个循环可以被划分为包括30个初始循环、30个中间循环和40个最终循环的子系列。也就是说,在该简单示例中,N1=30并且N2=60。It should be noted that the number of sensing cycles in each of the above three cycle sub-series may or may not be equal and is implementation specific. As an example only and without limiting the scope of the present disclosure, if N is 100, the 100 cycles may be divided into sub-series comprising 30 initial cycles, 30 intermediate cycles, and 40 final cycles. That is, N1=30 and N2=60 in this simple example.

如相对于图17A和17B所讨论的,例如,由碱基检出器在例如循环数N1中从生物传感器接收到的信号强度的平均水平可以不同于由碱基检出器在循环数N中从生物传感器接收到的信号强度的平均水平。因此,针对例如循环数N1训练的神经网络模型可能不提供针对循环数N的令人满意的结果。As discussed with respect to FIGS. 17A and 17B , for example, the average level of signal strength received by the base caller from the biosensor in, for example, cycle number N1 may be different than that received by the base caller in cycle number N1. The average level of signal strength received from the biosensor. Therefore, a neural network model trained for, for example, the number of cycles N1 may not provide satisfactory results for the number of cycles N.

因此,用于碱基检出的神经网络模型(如相对于图7、图9和图10所讨论的那些神经网络模型)可以针对特定循环子系列进行训练。例如,最初在仅在感测循环1至N1期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第一循环子系列权重集W(1-N1)1810a中。随后,在仅在感测循环(N1+1)至N2期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第二循环子系列权重集W(N1-N2)1810b中。最后,在仅在感测循环(N2+1)至N期间生成的传感器数据上训练神经网络模型,并且所得权重被包括在第三循环子系列权重集W(N2-N)1810c中。需注意,例如,在第一循环子系列权重集W(1-N1)1810a中,短语(1-N1)是循环索引,其意味着该权重集与感测循环1至N1相关。可以注意到,在图18的示例中,使用来自一个或多个通道的传感器数据(诸如一个通道、两个通道、三个通道、四个通道或更大数量的通道)来执行碱基检出操作,并且对于给定循环,权重可适用于来自所有此类通道的传感器数据。Accordingly, neural network models for base calling, such as those discussed with respect to Figures 7, 9, and 10, can be trained for a particular subset of cycles. For example, a neural network model is initially trained on sensor data generated only during sensing cycles 1 to N1, and the resulting weights are included in the first cycle subseries weight set W(1-N1) 1810a. Subsequently, the neural network model is trained on sensor data generated only during sensing cycles (N1+1) to N2, and the resulting weights are included in the second cycle sub-series weight set W(N1-N2) 1810b. Finally, the neural network model is trained on sensor data generated only during sensing cycles (N2+1) to N, and the resulting weights are included in the third cycle sub-series weight set W(N2-N) 1810c. Note that, for example, in the first cycle sub-series weight set W(1-N1) 1810a, the phrase (1-N1) is the cycle index, which means that the weight set is associated with sensing cycles 1 to N1. It may be noted that in the example of FIG. 18 , base calling is performed using sensor data from one or more lanes, such as one lane, two lanes, three lanes, four lanes, or a greater number of lanes. operation, and for a given loop, weights can be applied to sensor data from all such channels.

在推理阶段期间,当要针对循环1至N1检出碱基时,用第一循环子系列权重集W(1-N1)1810a配置神经网络模型。类似地,当要针对循环(N1+1)至N2检出碱基时,用第二循环子系列权重集W(N1-N2)1810b配置神经网络模型。最后,当要针对循环N2至N3检出碱基时,用第三循环子系列权重集W(N2-N3)1810c配置神经网络模型。During the inference phase, when a base is to be called for rounds 1 to N1, the neural network model is configured with the first round subseries weight set W(1-N1) 1810a. Similarly, when base calls are to be made for the cycle (N1+1) to N2, the neural network model is configured with the second cycle subseries weight set W(N1-N2) 1810b. Finally, when bases are to be called for cycles N2 to N3, the neural network model is configured with the third cycle subseries weight set W(N2-N3) 1810c.

图14、图15、图16示出了基于区块的位置的权重集选择的各种示例。因此,这些附图示出了基于碱基检出操作经过生物传感器上的区块的位置的空间进展的权重集选择的各种示例。另一方面,图18示出了基于碱基检出操作经过一系列感测循环1至N中的感测循环子系列的时间进展的权重集选择的示例。图19将基于空间区块位置的权重集选择的概念(例如,如相对于图14至图16所讨论的)和基于碱基检出循环的时间进展的权重集选择的概念(例如,如相对于图18所讨论的)结合。因此,图19示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的示例性权重选择方案。Figures 14, 15, 16 show various examples of weight set selection based on the location of the tile. Accordingly, these figures illustrate various examples of weight set selection based on the spatial progression of the base calling operation through the location of the tiles on the biosensor. FIG. 18 , on the other hand, shows an example of weight set selection based on the time progression of a base calling operation through a sub-series of sensing cycles in a series of sensing cycles 1 to N. FIG. 19 combines the concepts of weight set selection based on spatial block location (e.g., as discussed with respect to FIGS. discussed in Figure 18) combination. Accordingly, FIG. 19 shows an exemplary weight selection scheme based on (i) the temporal progression of base calling cycle numbers and (ii) the spatial location of the blocks.

例如,图19示出了第一区块M1和第二区块M2。假设区块M1是第一类别的区块,并且区块M2是第二类别的区块。仅作为示例,区块M1可以是图14的边缘区块1408,并且区块M2可以是图14的中央区块1412。因此,用于对区块M1中的簇内的链进行碱基检出的权重集将与用于对区块M2中的簇内的链进行碱基检出的权重集不同,例如,如相对于图14、15和16所讨论的。For example, FIG. 19 shows a first block M1 and a second block M2. Assume that the block M1 is a block of the first type, and the block M2 is a block of the second type. By way of example only, tile M1 may be edge tile 1408 of FIG. 14 , and tile M2 may be central tile 1412 of FIG. 14 . Thus, the set of weights used to base call strands within clusters in block M1 will be different from the set of weights used to base call strands within clusters in block M2, e.g., as opposed to discussed in Figures 14, 15 and 16.

类似于图18,在图19中假设存在N个碱基检出循环,在此期间,将识别区块M1和M2中的各种簇中的链。此外,类似于图18,在图19中假设N个碱基检出感测循环被划分为三个循环子系列,诸如(a)初始感测循环1至N1、(b)中间感测循环(N1+1)至N2和(c)最终感测循环(N2+1)至N,其中N>N2>N1,并且N、N1、N2是正整数,尽管在其他示例中N个感测循环也可以被划分为不同数量(诸如2个、4个或更大数量)的循环子系列。Similar to FIG. 18 , in FIG. 19 it is assumed that there are N base calling cycles during which strands in the various clusters in blocks M1 and M2 will be identified. Furthermore, similar to FIG. 18 , it is assumed in FIG. 19 that the N base calling sensing cycles are divided into three cycle subseries such as (a) initial sensing cycles 1 to N1, (b) intermediate sensing cycles ( N1+1) to N2 and (c) final sensing cycles (N2+1) to N, where N>N2>N1, and N, N1, N2 are positive integers, although in other examples N sensing cycles can also Divided into a different number of cyclic sub-series such as 2, 4 or more.

在示例中,用于碱基检出的神经网络模型(如相对于图7、图9和图10所讨论的那些神经网络模型)可以针对特定循环子系列并且针对特定区块进行训练。例如,最初在仅在感测循环1至N1期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(1-N1))”。需注意,在该权重集中短语“eT”是区块类别或区块位置索引,其意味着该权重集专用于边缘区块1408。此外,在该权重集中短语“(1-N1)”是循环索引,其意味着该权重集专用于感测循环1至N1。In an example, a neural network model for base calling, such as those discussed with respect to FIGS. 7, 9, and 10, can be trained for a specific subset of cycles and for a specific block. For example, a neural network model is initially trained on sensor data generated only during sensing cycles 1 to N1 and only for edge tiles 1408, and the resulting set of weights is denoted "weight set (eT,(1-N1))" . Note that the phrase "eT" in this weight set is a block class or block position index, which means that this weight set is specific to edge blocks 1408 . Also, the phrase "(1-N1)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles 1 to N1.

类似地,然后在仅在感测循环(N1+1)至N2期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(N1-N2))”。这里,短语“eT”是区块位置或区块类别索引,其意味着该权重集专用于边缘区块1408。类似地,在该权重集中短语“(N1-N2)”是循环索引,其意味着该权重集专用于感测循环(N1+1)至N2。Similarly, the neural network model is then trained on sensor data generated only during sensing cycles (N1+1) to N2 and only for edge tiles 1408, and the resulting weight set is denoted "weight set (eT, (N1 -N2))". Here, the phrase "eT" is a block position or block class index, which means that this set of weights is specific to edge blocks 1408 . Similarly, the phrase "(N1-N2)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N1+1) to N2.

类似地,最初在仅在感测循环(N2+1)至N期间并且仅针对边缘区块1408生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(eT,(N2-N))”。这里,短语“eT”是区块位置索引,其意味着该权重集专用于边缘区块1408。类似地,在该权重集中短语“(N2-N)”是循环索引,其意味着该权重集专用于感测循环(N2+1)至N。Similarly, the neural network model is initially trained on sensor data generated only during sensing cycles (N2+1) to N and only for edge tiles 1408, and the resulting weight set is denoted "weight set (eT, (N2 -N))". Here, the phrase "eT" is the tile position index, which means that this set of weights is specific to edge tiles 1408 . Similarly, the phrase "(N2-N)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N2+1) to N.

此外,在仅在感测循环1至N1期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(1-N1))”。需注意,在该权重集中短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。此外,在该权重集中短语“(1-N1)”是循环索引,其意味着该权重集专用于感测循环1至N1。Furthermore, the neural network model is trained on sensor data generated only during sensing cycles 1 to N1 and only for the central block 1412, and the resulting set of weights is denoted "weight set (cT,(1-N1))". Note that the phrase "cT" in this weight set is a block position index, which means that this weight set is dedicated to the central block 1412 . Also, the phrase "(1-N1)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles 1 to N1.

类似地,然后在仅在感测循环(N1+1)至N2期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(N1-N2))”。这里,短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。类似地,在该权重集中短语“(N1-N2)”是循环索引,其意味着该权重集专用于感测循环(N1+1)至N2。Similarly, the neural network model is then trained on sensor data generated only during sensing cycles (N1+1) to N2 and only for the central block 1412, and the resulting weight set is denoted "weight set (cT,(N1 -N2))". Here, the phrase "cT" is the tile position index, which means that this set of weights is dedicated to the central tile 1412 . Similarly, the phrase "(N1-N2)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N1+1) to N2.

类似地,最初在仅在感测循环(N2+1)至N期间并且仅针对中央区块1412生成的传感器数据上训练神经网络模型,并且所得权重集被标记为“权重集(cT,(N2-N))”。这里,短语“cT”是区块位置索引,其意味着该权重集专用于中央区块1412。类似地,在该权重集中短语“(N2-N)”是循环索引,其意味着该权重集专用于感测循环(N2+1)至N。Similarly, the neural network model is initially trained on sensor data generated only during sensing cycles (N2+1) to N and only for the central block 1412, and the resulting weight set is denoted "weight set (cT, (N2 -N))". Here, the phrase "cT" is the tile position index, which means that this set of weights is dedicated to the central tile 1412 . Similarly, the phrase "(N2-N)" in this weight set is a cycle index, which means that this weight set is dedicated to sensing cycles (N2+1) to N.

在推理阶段期间,当要针对循环1至N1并且针对区块M1(例如,其是图19的示例中的边缘区块1408)检出碱基时,用权重集(eT,(1-N1))配置神经网络模型。类似地,当要针对循环(N1+1)至N2并且针对区块M1检出碱基时,用权重集(eT,(N1-N2))配置神经网络模型。另外,当要针对循环(N2+1)至N并且针对区块M1检出碱基时,用权重集(eT,(N2-N))配置神经网络模型。During the inference phase, when a base is to be called for loops 1 through N1 and for block M1 (e.g., which is edge block 1408 in the example of FIG. 19 ), the set of weights (eT,(1-N1) ) to configure the neural network model. Similarly, when bases are to be called for the cycle (N1+1) to N2 and for the block M1, the neural network model is configured with the weight set (eT, (N1-N2)). In addition, when the base is to be detected for the cycle (N2+1) to N and for the block M1, the neural network model is configured with the weight set (eT, (N2-N)).

类似地,当要针对循环1至N1并且针对区块M2(例如,其是图19的示例中的中央区块1412)检出碱基时,用权重集(cT,(1-N1))配置神经网络模型。类似地,当要针对循环(N1+1)至N2并且针对区块M2检出碱基时,用权重集(cT,(N1-N2))配置神经网络模型。另外,当要针对循环(N2+1)至N并且针对区块M2检出碱基时,用权重集(cT,(N2-N))配置神经网络模型。Similarly, when a base is to be called for rounds 1 to N1 and for block M2 (e.g., which is central block 1412 in the example of FIG. 19 ), configure with weight set (cT,(1-N1)) neural network model. Similarly, when bases are to be called for the cycle (N1+1) to N2 and for the block M2, the neural network model is configured with the weight set (cT, (N1-N2)). In addition, when the base is to be detected for the cycle (N2+1) to N and for the block M2, the neural network model is configured with the weight set (cT, (N2-N)).

图20示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的另一示例性权重选择方案。图20中所示的区块分类类似于图14中所示的区块分类。例如,参见图14和图20,边缘区块1408被示出为在其内具有对角线,近边缘区块1410被示出为在其内具有交叉影线,并且中央区块1412被示出为在其内具有点或灰色阴影。FIG. 20 shows another exemplary weight selection scheme based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks. The block classification shown in FIG. 20 is similar to the block classification shown in FIG. 14 . For example, referring to FIGS. 14 and 20 , edge block 1408 is shown with diagonal lines therein, near-edge block 1410 is shown with cross-hatching therein, and central block 1412 is shown as having dots or shades of gray within it.

图20中还示出了三个框1908、1910和1912。参见框1908,示出了专用于边缘区块1408和各种感测循环子系列的权重集。例如,权重集(eT,(1-N1))专用于边缘区块1408和感测循环1至N1。权重集(eT,(N1-N2))专用于边缘区块1408和感测循环(N1+1)至N2。权重集(eT,(N2-N))专用于边缘区块1408和感测循环(N2+1)至N。Also shown in FIG. 20 are three blocks 1908 , 1910 and 1912 . Referring to block 1908, a set of weights specific to the edge tiles 1408 and various sub-series of sensing cycles is shown. For example, the weight set (eT,(1-N1)) is dedicated to the edge block 1408 and sensing cycles 1 to N1. The set of weights (eT, (N1-N2)) is dedicated to the edge block 1408 and the sensing cycles (N1+1) to N2. The set of weights (eT, (N2-N)) is dedicated to edge blocks 1408 and sensing cycles (N2+1) to N.

类似地,参见框1910,示出了专用于近边缘区块1410和各种感测循环子系列的权重集。例如,权重集(nT,(1-N1))专用于近边缘区块1410和感测循环1至N1。权重集(nT,(N1-N2))专用于近边缘区块1410和感测循环(N1+1)至N2。权重集(nT,(N2-N))专用于近边缘区块1410和感测循环(N2+1)至N。Similarly, referring to block 1910, a set of weights specific to the near-edge block 1410 and various sub-series of sensing cycles is shown. For example, the weight set (nT,(1-N1)) is dedicated to near-edge block 1410 and sensing cycles 1 to N1. The set of weights (nT, (N1-N2)) is dedicated to near-edge block 1410 and sensing cycles (N1+1) to N2. The set of weights (nT, (N2-N)) is dedicated to near-edge block 1410 and sensing cycles (N2+1) to N.

类似地,参见框1912,示出了专用于中央区块1412和各种感测循环子系列的权重集。例如,权重集(cT,(1-N1))专用于中央区块1412和感测循环1至N1。权重集(cT,(N1-N2))专用于中央区块1412和感测循环(N1+1)至N2。权重集(cT,(N2-N))专用于中央区块1412和感测循环(N2+1)至N。Similarly, referring to block 1912, a set of weights specific to the central block 1412 and the various sub-series of sensing cycles is shown. For example, the weight set (cT,(1-N1)) is dedicated to the central block 1412 and sensing cycles 1 to N1. The set of weights (cT, (N1-N2)) is dedicated to the central block 1412 and sensing cycles (N1+1) to N2. The set of weights (cT, (N2-N)) is dedicated to the central block 1412 and sensing cycles (N2+1) to N.

图21A示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的另一示例性权重选择方案。图21A中所示的区块分类类似于图15中所示的区块分类。例如,参见图15和图21,周边槽道区块1508(其是图15的顶部周边槽道区块1508a和底部周边槽道区块1508b的组合)被示出为在其内具有对角线,并且中央槽道区块1510被示出为具有虚线或灰色阴影。FIG. 21A shows another exemplary weight selection scheme based on (i) temporal progression of base calling cycle number and (ii) spatial location of blocks. The block classification shown in FIG. 21A is similar to the block classification shown in FIG. 15 . For example, referring to FIGS. 15 and 21 , peripheral channel block 1508 (which is a combination of top peripheral channel block 1508 a and bottom peripheral channel block 1508 b of FIG. 15 ) is shown with diagonal lines therein. , and the central channel block 1510 is shown with dashed lines or gray shading.

图21A中还示出了两个框2110和2112。参见框2110,示出了专用于周边槽道区块1508和各种感测循环子系列的权重集。例如,权重集(pl,(1-N1))专用于周边槽道区块1508和感测循环1至N1。权重集(pl,(N1-N2))专用于周边槽道区块1508和感测循环(N1+1)至N2。权重集(pl,(N2-N))专用于周边槽道区块1508和感测循环(N2+1)至N。Also shown in Figure 21A are two blocks 2110 and 2112. Referring to block 2110, a set of weights specific to the perimeter channel block 1508 and the various subseries of sensing cycles is shown. For example, the weight set (pl,(1-N1)) is dedicated to the perimeter channel block 1508 and sensing cycles 1 to N1. The set of weights (p1, (N1-N2)) is dedicated to the perimeter channel block 1508 and the sensing cycles (N1+1) to N2. The set of weights (p1, (N2-N)) is dedicated to the perimeter channel block 1508 and sensing cycles (N2+1) to N.

类似地,参见框2112,示出了专用于中央槽道区块1510和各种感测循环子系列的权重集。例如,权重集(cl,(1-N1))专用于中央槽道区块1510和感测循环1至N1。权重集(cl,(N1-N2))专用于中央槽道区块1510和感测循环(N1+1)至N2。权重集(cl,(N2-N))专用于中央槽道区块1510和感测循环(N2+1)至N。Similarly, referring to block 2112, a set of weights specific to the central channel block 1510 and various sub-series of sensing cycles is shown. For example, the weight set (cl,(1-N1)) is dedicated to the central channel block 1510 and sensing cycles 1 to N1. The set of weights (cl,(N1-N2)) is dedicated to the central channel block 1510 and the sensing cycles (N1+1) to N2. The weight set (cl,(N2-N)) is dedicated to the central channel block 1510 and the sensing cycles (N2+1) to N.

在一个实施方案中并且如上文所讨论的,权重集(pl,(1-N1))、权重集(pl,(N1-N2))、权重集(pl,(N2-N))、权重集(cl,(1-N1))、权重集(cl,(N1-N2))、权重集(cl,(N2-N))中的每一者包括对应的权重。例如,权重集(pl,(1-N1))包括用于配置对应的多个空间层和时间层(例如,参见图7和图9中此类层的示例)的第一多个权重,权重集(pl,(N1-N2))包括用于配置对应的多个空间层和时间层的第二多个权重,权重集(pl,(N2-N))包括用于配置对应的多个空间层和时间层的第三多个权重,权重集(cl,(1-N1))包括用于配置对应的多个空间层和时间层的第四多个权重,权重集(cl,(N1-N2))包括用于配置对应的多个空间层和时间层的第五多个权重,并且权重集(cl,(N2-N))包括用于配置对应的多个空间层和时间层的第六多个权重。In one embodiment and as discussed above, weight set (pl, (1-N1)), weight set (pl, (N1-N2)), weight set (pl, (N2-N)), weight set Each of (cl, (1-N1)), weight set (cl, (N1-N2)), weight set (cl, (N2-N)) includes a corresponding weight. For example, the set of weights (pl,(1-N1)) includes a first plurality of weights for configuring a corresponding plurality of spatial and temporal layers (see, for example, FIGS. 7 and 9 for examples of such layers), weights The set (pl, (N1-N2)) includes a second plurality of weights for configuring the corresponding multiple spatial layers and temporal layers, and the weight set (pl, (N2-N)) includes the second multiple weights for configuring the corresponding multiple spatial layers The third plurality of weights of layers and time layers, the weight set (cl, (1-N1)) includes a fourth plurality of weights for configuring corresponding multiple spatial layers and time layers, and the weight set (cl, (N1- N2)) includes a fifth plurality of weights for configuring the corresponding plurality of spatial layers and temporal layers, and the weight set (cl, (N2-N)) includes a fifth plurality of weights for configuring the corresponding plurality of spatial layers and temporal layers More than six weights.

第一多个权重中的至少一个权重与第二多个权重中的对应权重不同(在一些示例中,两个权重集可具有一个或多个公共或相同的权重)。第二多个权重中的至少一个权重与第三多个权重中的对应权重不同,第三多个权重中的至少一个权重与第四多个权重中的对应权重不同,诸如此类。在一个实施方案中,各种权重集中的一个或多个权重使用不同缩放系数来量化。At least one weight of the first plurality of weights is different from a corresponding weight of the second plurality of weights (in some examples, both sets of weights may have one or more common or identical weights). At least one weight in the second plurality of weights is different from a corresponding weight in the third plurality of weights, at least one weight in the third plurality of weights is different from a corresponding weight in the fourth plurality of weights, and so on. In one embodiment, one or more weights in the various weight sets are quantized using different scaling factors.

因为各种权重集与对应的测序循环相关联,在示例中,各种权重集中的权重分别对应于各种测序化学、测序配置和/或测序测定。例如,权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序化学、第二测序化学和第三测序化学(例如,它们分别在测序循环1至N1、(N1+1)至N2和(N2+1)至N期间使用。权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序测定、第二测序测定和第三测序测定。权重集(pl,(1-N1))、权重集(pl,(N1-N2))和权重集(pl,(N2-N))分别对应于第一测序配置、第二测序配置和第三测序配置。Because various sets of weights are associated with corresponding sequencing cycles, in examples, the weights in the various sets of weights correspond to various sequencing chemistries, sequencing configurations, and/or sequencing assays, respectively. For example, weight set (pl,(1-N1)), weight set (pl,(N1-N2)) and weight set (pl,(N2-N)) correspond to the first sequencing chemistry, the second sequencing chemistry and A third sequencing chemistry (eg, they are used during sequencing cycles 1 to N1, (N1+1) to N2, and (N2+1) to N, respectively. Weight set (pl, (1-N1)), weight set (pl ,(N1-N2)) and weight set (pl,(N2-N)) correspond to the first sequencing determination, the second sequencing determination and the third sequencing determination respectively. Weight set (pl,(1-N1)), weight The set (pl, (N1-N2)) and the weight set (pl, (N2-N)) correspond to the first sequencing configuration, the second sequencing configuration, and the third sequencing configuration, respectively.

图21B示出了基于(i)碱基检出循环数的时间进展和(ii)区块的空间位置的又一示例性权重选择方案。图21B中所示的区块分类类似于图16中所示的区块分类。例如,参见图16和图21B,流通池1400被划分为顶部左侧区段1610TL、顶部中央区段1610TC、顶部右侧区段1610TR、中间左侧区段1610ML、中央区段1610C、中间右侧区段1610MR、底部左侧区段1610BL、底部中央区段1610BC和底部左侧区段1610BL。基于区块所属的区段来对流通池1400的每个区块进行分类。FIG. 21B shows yet another exemplary weight selection scheme based on (i) temporal progression of base calling cycle numbers and (ii) spatial location of blocks. The block classification shown in FIG. 21B is similar to the block classification shown in FIG. 16 . For example, referring to FIGS. 16 and 21B , the flow cell 1400 is divided into a top left section 1610TL, a top central section 1610TC, a top right section 1610TR, a middle left section 1610ML, a central section 1610C, a middle right section Section 1610MR, bottom left section 1610BL, bottom central section 1610BC, and bottom left section 1610BL. Each block of the flow cell 1400 is classified based on the section to which the block belongs.

图21B还示出了表格2150,其包括用于各种区段的区块且用于感测循环1至N的各种子系列的各种权重。例如,参见表格2150的第一行,权重集(TL,(1-N1))专用于顶部左侧区段1610TL的区块和感测循环1至N1。权重集(TL,(N1-N2))专用于顶部左侧区段1610TL的区块和感测循环(N1+1)至N2。权重集(TL,(N2-N))专用于顶部左侧区段1610TL的区块和感测循环(N2+1)至N。FIG. 21B also shows a table 2150 that includes various weights for the blocks of the various sectors and for the various sub-series of sensing cycles 1-N. For example, referring to the first row of table 2150, the weight set (TL,(1-N1)) is dedicated to the blocks and sensing cycles 1 to N1 of the top left section 1610TL. The set of weights (TL,(N1-N2)) is specific to the block and sensing loops (N1+1) to N2 of the top left section 1610TL. The set of weights (TL,(N2-N)) is specific to the blocks and sensing cycles (N2+1) to N of the top left section 1610TL.

类似地,参见表格2150的第二行,权重集(TC,(1-N1))专用于顶部中央区段1610TC的区块和感测循环1至N1。权重集(TC,(N1-N2))专用于顶部中央区段1610TC的区块和感测循环(N1+1)至N2。权重集(TC,(N2-N))专用于顶部中央区段1610TC的区块和感测循环(N2+1)至N。类似地,表格2150的各种其他行包括用于各种其他区段的区块且用于各种感测循环子系列的权重集,并且基于以上讨论对于本领域技术人员而言将是显而易见的。Similarly, referring to the second row of table 2150, the weight set (TC,(1-N1)) is specific to the block and sensing cycles 1 to N1 of the top central section 1610TC. The set of weights (TC, (N1-N2)) is specific to the blocks of the top center section 1610TC and the sensing cycles (N1+1) to N2. The set of weights (TC, (N2-N)) is specific to the block and sensing cycles (N2+1) to N of the top central section 1610TC. Similarly, various other rows of table 2150 include blocks for various other segments and weight sets for various subseries of sensing cycles, and will be apparent to those skilled in the art based on the above discussion. .

图22示出了碱基检出操作2200的一个具体实施,其中基于空间的区块信息和时间的感测循环子系列信息来选择用于碱基检出的权重集。FIG. 22 illustrates an implementation of a base calling operation 2200 in which a set of weights for base calling is selected based on spatial block information and temporal sensing cycle subseries information.

对于图22的碱基检出操作2200,假设流通池1400的区块根据图15和图21A的示例进行分类。此类区块分类不旨在限制本公开的范围,并且碱基检出操作2200也可应用于任何其他类型的区块分类,诸如相对于图14、图16、图20、图21B所讨论的任何区块分类和/或由本领域技术人员基于本公开的教导内容所设想的任何其他区块分类。For the base calling operation 2200 of Figure 22, assume that the blocks of the flow cell 1400 are sorted according to the examples of Figures 15 and 21A. Such block classification is not intended to limit the scope of the present disclosure, and the base calling operation 2200 may also be applied to any other type of block classification, such as discussed with respect to FIGS. 14, 16, 20, 21B Any block classification and/or any other block classification contemplated by those skilled in the art based on the teachings of this disclosure.

此外,对于图22的碱基检出操作2200,假设N个感测循环被划分为三个循环子系列,包括(a)循环1至N1、(b)循环(N1+1)至N2和(c)循环(N2+1)至N,如相对于图18至图21B所讨论的。同样,此类感测循环划分不旨在限制本公开的范围,并且碱基检出操作2200也可应用于本领域技术人员基于本公开的教导内容可设想的任何其他类型的感测循环子划分。In addition, for the base calling operation 2200 of FIG. 22 , it is assumed that N sensing cycles are divided into three cycle subseries, including (a) cycles 1 to N1, (b) cycles (N1+1) to N2, and ( c) Cycle (N2+1) to N, as discussed with respect to Figures 18-21B. Again, such sensing cycle divisions are not intended to limit the scope of the present disclosure, and the base calling operation 2200 is also applicable to any other type of sensing cycle subdivisions that one skilled in the art can conceive based on the teachings of this disclosure. .

在图22中,碱基检出操作1a至6a专用于周边槽道区块和循环1至N1。类似地,碱基检出操作1b至6b专用于中央槽道区块和循环1至N1。1a至6a和1b至6b的操作可针对循环(N1+1)至N2重复,并且可进一步针对循环(N2+1)至N重复,但是此类重复在图22中未详细示出。基于针对循环1至N1对操作1a至6a和1b至6b的讨论,本领域技术人员将理解针对循环(N1+1)至N2,以及进一步针对循环(N2+1)至N的此类重复。In FIG. 22, base calling operations 1a to 6a are dedicated to peripheral lane blocks and cycles 1 to N1. Similarly, base calling operations 1b to 6b are dedicated to the central lane block and cycles 1 to N1. Operations 1a to 6a and 1b to 6b can be repeated for cycles (N1+1) to N2, and can be further performed for cycles (N2+1) to N repeats, but such repeats are not shown in detail in FIG. 22 . Such repetitions for cycles (N1+1) to N2, and further for cycles (N2+1) to N, will be understood by those skilled in the art based on the discussion of operations 1a to 6a and 1b to 6b for cycles 1 to N1.

在动作1a处,数据流逻辑451(例如,参见图4)接收用于周边槽道区块1508且用于循环1至N1的簇传感器数据和权重集(pl,(1-N1))(参见图21A)。簇数据包括测序图像,其描绘在测序运行的测序循环1至N1时周边槽道区块1508内的簇的强度发射,如上所述。在动作2a处,数据流逻辑451将用于周边槽道区块1508且用于循环1至N1的簇数据和权重集(pl,(1-N1))转发到由可配置处理器450(例如,参见图4)执行的基于神经网络的碱基检出器2308(例如,其示例在图7、图9、图10中示出)。在基于神经网络的碱基检出器2308中加载用于周边槽道区块1508且用于循环1至N1的簇数据和权重集(pl,(1-N1))。而且,尽管在图22中未示出,但是神经网络模型的拓扑结构也经由数据流逻辑451从存储器加载到可配置处理器450。At act la, dataflow logic 451 (see, e.g., FIG. 4 ) receives cluster sensor data and weight set (pl,(1-N1)) for perimeter channel block 1508 and for cycle 1 to N1 (see Figure 21A). The cluster data includes a sequencing image that depicts the intensity emission of the clusters within the peripheral lane block 1508 at sequencing cycles 1 through N1 of the sequencing run, as described above. At action 2a, the dataflow logic 451 forwards the cluster data and weight set (p1,(1-N1)) for the perimeter slot block 1508 and for cycles 1 to N1 to be processed by the configurable processor 450 (e.g. , see FIG. 4) implemented neural network based base caller 2308 (eg, examples of which are shown in FIG. 7, FIG. 9, FIG. 10). The set of cluster data and weights (p1,(1-N1)) for the peripheral channel block 1508 and for round 1 to N1 are loaded in the neural network based base caller 2308. Also, although not shown in FIG. 22 , the topology of the neural network model is also loaded from memory to configurable processor 450 via data flow logic 451 .

在动作3a处,可配置处理器450用经加载的权重集(pl,(1-N1))配置在可配置处理器450上运行的神经网络的拓扑结构。用经加载的权重集(pl,(1-N1))配置的基于神经网络的碱基检出器2308基于经加载的权重集(pl,(1-N1))从簇数据生成表示(例如,特征映射图)(例如,凭借通过其配置的空间和时间卷积层处理簇数据),并且基于表示产生用于周边槽道区块1508内的多个簇且用于测序循环1至N1的碱基检出分类数据(例如,碱基检出分类分数)。例如,基于神经网络的碱基检出器2308对簇数据应用经加载的权重集(pl,(1-N1))以生成碱基检出分类数据。在一个具体实施中,碱基检出分类分数未经归一化,例如,它们未经受由softmax函数进行的指数归一化。At action 3a, the configurable processor 450 configures the topology of the neural network running on the configurable processor 450 with the loaded set of weights (pl,(1-N1)). The neural network based base caller 2308 configured with the loaded set of weights (pl,(1-N1)) generates a representation from the cluster data based on the loaded set of weights (pl,(1-N1)) (e.g., feature map) (e.g., by virtue of spatial and temporal convolutional layers configured through it to process cluster data), and based on the representation generate bases for multiple clusters within the perimeter lane block 1508 and for sequencing cycles 1 through N1 Base call classification data (eg, base call classification scores). For example, the neural network based base caller 2308 applies the loaded set of weights (pl,(1-N1)) to the cluster data to generate base calling classification data. In one implementation, the base call classification scores are not normalized, eg, they are not subjected to exponential normalization by a softmax function.

在动作4a处,可配置处理器450向数据流逻辑451发送用于周边槽道区块1508内的簇且用于循环1至N1的碱基检出分类数据。在动作5a处,数据流逻辑451向主机处理器2304提供用于周边槽道区块1508内的簇且用于循环1至N1的碱基检出分类分数。At act 4a, the configurable processor 450 sends to the dataflow logic 451 base calling sort data for clusters within the perimeter lane block 1508 and for cycles 1 to N1. At act 5a, the dataflow logic 451 provides the host processor 2304 with base call sort scores for clusters within the perimeter lane block 1508 and for rounds 1 through N1.

在动作6a处,主机处理器2304将未经归一化碱基检出分类分数归一化(例如,通过应用softmax函数,图7的框740或图9的框930),并且生成用于周边槽道区块1508的簇内的链且用于循环1至N1的归一化碱基检出分类分数,即,碱基检出。At act 6a, the host processor 2304 normalizes the unnormalized base call classification scores (e.g., by applying a softmax function, block 740 of FIG. 7 or block 930 of FIG. 9 ), and generates Strands within clusters of the channel block 1508 and used for normalized base call classification scores, ie, base calls, for rounds 1 to N1.

因此,在操作1a至6a中,使用针对周边槽道区块1508且针对循环1至N1进行专门训练的权重集(pl,(1-N1)),系统对周边槽道区块1508的簇内的链并且针对循环1至N1中进行碱基检出。需注意,操作1a至6a描绘碱基检出操作的高级和简化版本,并且可能不示出可针对碱基检出执行的一个或多个其他操作。碱基检出操作的更多细节可见于2020年8月28日提交的名称为“DETECTING AND FILTERING CLUSTERS BASED ON ARTIFICIALINTELLIGENCE-PREDICTED BASE CALLS”的美国临时专利申请号63/072,032(代理人案卷号ILLM 1018-1/IP-1860-PRV),该专利申请以引用方式并入本文,如同在本文中完全阐述一样。Thus, in operations 1a to 6a, using the set of weights (pl,(1-N1)) trained specifically for the peripheral channel block 1508 and for cycles 1 to N1, the system and base calling for cycles 1 to N1. Note that operations la through 6a depict advanced and simplified versions of base calling operations and may not illustrate one or more other operations that may be performed for base calling. Additional details of base calling operations can be found in U.S. Provisional Patent Application No. 63/072,032, filed August 28, 2020, entitled "DETECTING AND FILTERING CLUSTERS BASED ON ARTIFICIALINTELLIGENCE-PREDICTED BASE CALLS," (Attorney Docket No. ILLM 1018 -1/IP-1860-PRV), which patent application is incorporated herein by reference as if fully set forth herein.

操作1a至6a专用于针对周边槽道区块1508的簇内的链且针对循环1至N1的碱基检出。这些操作重复为操作1b至6b,但是针对中央槽道区块1510内的簇且针对循环1至N1。例如,在动作1b处,数据流逻辑451接收用于中央槽道区块1510且用于循环1至N1的簇数据和权重集(cl,(1-N1))(参见图21A)。簇数据包括测序图像,其描绘在测序运行的测序循环1至N1时中央槽道区块1510内的簇的强度发射,如上所述。在动作2b处,数据流逻辑451将用于中央槽道区块1508且用于循环1至N1的簇数据和权重集(cl,(1-N1))转发到由可配置处理器450执行的基于神经网络的碱基检出器2308。用于中央槽道区块1510且用于循环1至N1的簇数据和权重集(cl,(1-N1))用于重新配置基于神经网络的碱基检出器2308。Operations la through 6a are dedicated to base calling for strands within clusters of peripheral lane block 1508 and for cycles 1 through N1. These operations are repeated as operations lb-6b, but for the clusters within the central channel block 1510 and for rounds 1-N1. For example, at act 1b, the dataflow logic 451 receives the cluster data and weight set (cl,(1-N1)) for the central slot block 1510 and for cycles 1 to N1 (see FIG. 21A ). The cluster data includes a sequencing image that depicts the intensity emission of the clusters within the central lane block 1510 at sequencing cycles 1 through N1 of the sequencing run, as described above. At action 2b, the dataflow logic 451 forwards the cluster data and weight set (cl,(1-N1)) for the central slot block 1508 and for cycles 1 to N1 to the configurable processor 450 executing Neural network based base caller 2308. The set of cluster data and weights (cl,(1-N1)) for the central lane block 1510 and for round 1 to N1 are used to reconfigure the neural network based base caller 2308.

在动作3b处,在可配置处理器450上运行的基于重新配置的神经网络的碱基检出器2308从簇数据生成初始表示(例如,特征映射图)(例如,凭借通过其空间和时间卷积层处理簇数据),并且基于初始中间表示产生用于中央槽道区块1510内的多个簇且用于测序循环1至N1的碱基检出分类分数。在一个具体实施中,初始碱基检出分类分数未经归一化,例如,它们未经受由softmax函数进行的指数归一化。At act 3b, the reconfigurable neural network based base caller 2308 running on the configurable processor 450 generates an initial representation (e.g., a feature map) from the cluster data (e.g., by virtue of passing through its spatial and temporal volumes cluster data) and generate base call classification scores for multiple clusters within the central lane block 1510 and for sequencing cycles 1 through N1 based on the initial intermediate representation. In one implementation, the initial base call classification scores are not normalized, eg, they are not subjected to exponential normalization by a softmax function.

在动作4b处,可配置处理器450向数据流逻辑451发送用于中央槽道区块1510内的簇且用于循环1至N1的碱基检出分类分数。在动作5b处,数据流逻辑451向主机处理器2304提供用于中央槽道区块1510内的簇且用于循环1至N1的碱基检出分类分数。At act 4b, the configurable processor 450 sends to the dataflow logic 451 the base call sort scores for the clusters within the central lane block 1510 and for rounds 1 to N1. At act 5b, the dataflow logic 451 provides the host processor 2304 with base call sort scores for the clusters within the central lane block 1510 and for rounds 1 through N1.

在动作6b处,主机处理器2304将未经归一化碱基检出分类分数归一化(例如,通过应用softmax函数),并且生成用于中央槽道区块1510的簇内的链且用于循环1至N1的归一化碱基检出分类分数,即,碱基检出。At act 6b, the host processor 2304 normalizes the unnormalized base call classification scores (e.g., by applying a softmax function) and generates chains within clusters for the central lane block 1510 and uses Normalized base call classification scores at cycles 1 to N1, ie, base calls.

因此,碱基检出操作1a至6a专用于周边槽道区块1508和循环1至N1。类似地,碱基检出操作1b至6b专用于中央槽道区块1510和循环1至N1。1a至6a和1b至6b的操作针对循环(N1+1)至N2重复,并且进一步针对循环(N2+1)至N重复,如图22象征性地所示。Thus, base calling operations la through 6a are dedicated to peripheral lane block 1508 and cycles 1 through N1. Similarly, base calling operations 1b through 6b are dedicated to central lane block 1510 and cycles 1 through N1. Operations 1a through 6a and 1b through 6b are repeated for cycles (N1+1) through N2, and further for cycles ( N2+1) to N repeats, as symbolically shown in FIG. 22 .

返回参见图7,所示的模型包括隔离的叠堆701、702、703、704、705。叠堆701接收来自循环K+2的补片的区块数据作为输入。叠堆702接收来自循环K+1的补片的区块数据作为输入。叠堆703接收来自循环K的补片的区块数据作为输入。叠堆704接收来自循环K-1的补片的区块数据作为输入。叠堆705接收来自循环K-2的补片的区块数据作为输入。隔离叠堆的层各自执行内核的卷积操作,该内核包括层的输入数据上的多个滤波器。来自叠堆701至705中的每个叠堆的输出特征集(中间数据)作为输入被提供到时间组合层的逆层次结构720,其中来自多个循环的中间数据被组合。Referring back to FIG. 7 , the model shown includes isolated stacks 701 , 702 , 703 , 704 , 705 . Stack 701 receives as input the tile data from the patch of cycle K+2. Stack 702 receives as input the tile data from the patch of cycle K+1. Stack 703 receives as input the block data from the patches of cycle K. Stack 704 receives as input the tile data from the patch of cycle K-1. Stack 705 receives as input the tile data from the patch of cycle K-2. The layers of the isolation stack each perform a convolution operation of a kernel comprising multiple filters on the layer's input data. The output feature sets (intermediate data) from each of the stacks 701 to 705 are provided as input to the inverse hierarchy 720 of the temporal combination layer, where intermediate data from multiple cycles are combined.

因此,如相对于图7、图9和图11所讨论的,叠堆701、…、705执行隔离的空间卷积。来自各种叠堆701、…、705内的各种循环的输入之间没有时间混合或交互。最后,在叠堆701、…、705中的数据处理之后,在区段720中存在来自各种测序循环的数据的处理。叠堆701、…、705内的各种层在本文中也称为空间层,并且叠堆701、…、705内的各种过滤器的内核的权重在本文中称为空间权重。类似地,区段720内的各种层在本文中也称为时间层,并且区段720内的各种过滤器的内核的权重在本文中也称为时间权重。例如,在图9中的空间卷积921、922、923期间应用的权重是空间权重,而在图9中的时间卷积924、925期间应用的权重是时间权重。Thus, as discussed with respect to FIGS. 7 , 9 and 11 , stacks 701 , . . . , 705 perform isolated spatial convolutions. There is no temporal mixing or interaction between inputs from the various loops within the various stacks 701 , . . . 705 . Finally, following the data processing in the stack 701 , . . . , 705 , in section 720 there is processing of the data from the various sequencing cycles. The various layers within the stack 701 , . . . , 705 are also referred to herein as spatial layers, and the weights of the kernels of the various filters within the stack 701 , . . . , 705 are referred to herein as spatial weights. Similarly, the various layers within section 720 are also referred to herein as temporal layers, and the weights of the kernels of the various filters within section 720 are also referred to herein as temporal weights. For example, the weights applied during the spatial convolutions 921 , 922 , 923 in FIG. 9 are spatial weights, while the weights applied during the temporal convolutions 924 , 925 in FIG. 9 are temporal weights.

图23A示出了用于各种类别的区块和用于各种感测循环的各种权重集,各个权重集包括对应的空间权重和对应的时间权重。图23A中所示的区块分类类似于相对于图15和图21A所讨论的区块分类。如相对于图21A所讨论的,用于循环1至N1的周边槽道区块1508与对应权重集(pl,1-N1)相关联。如图23A所示,权重集(pl,1-N1)包括对应的空间权重(s-pl,(1-N1))和对应的时间权重(t-pl,(1-N1))。当神经网络模型用于处理用于循环1至N1的用于周边槽道区块1508的簇传感器数据时,使用空间权重(s-pl,(1-N1))来配置神经网络模型的空间层。当神经网络模型用于处理用于循环1至N1的用于周边槽道区块1508的簇传感器数据时,使用时间权重(t-pl,(1-N1))来配置神经网络模型的时间层。Figure 23A shows various sets of weights for various classes of blocks and for various sensing cycles, each set of weights including corresponding spatial weights and corresponding temporal weights. The block classification shown in Figure 23A is similar to the block classification discussed with respect to Figures 15 and 21A. As discussed with respect to FIG. 21A , the peripheral channel block 1508 for cycles 1 through N1 is associated with a corresponding weight set (p1,1-N1). As shown in FIG. 23A, the weight set (pl, 1-N1) includes a corresponding spatial weight (s-pl, (1-N1)) and a corresponding temporal weight (t-pl, (1-N1)). Spatial weights (s-pl,(1-N1)) are used to configure the spatial layer of the neural network model when the neural network model is used to process cluster sensor data for the perimeter channel block 1508 for cycles 1 to N1 . When the neural network model is used to process cluster sensor data for the perimeter channel block 1508 for cycle 1 to N1, the temporal weights (t-pl,(1-N1)) are used to configure the temporal layer of the neural network model .

类似地,如同样相对于图21A所讨论的,用于循环N1至N2的周边槽道区块1508与对应权重集(pl,N1-N2)相关联。如图23A所示,权重集(pl,N1-N2)包括对应的空间权重(s-pl,(N1-N2))和对应的时间权重(t-pl,(N1-N2))。图23A的各种其他权重集也类似地具有对应的空间权重和时间权重。Similarly, as also discussed with respect to FIG. 21A , perimeter channel blocks 1508 for cycles N1 through N2 are associated with corresponding weight sets (pl, N1-N2). As shown in FIG. 23A , the set of weights (pl, N1-N2) includes corresponding spatial weights (s-pl, (N1-N2)) and corresponding temporal weights (t-pl, (N1-N2)). The various other weight sets of FIG. 23A similarly have corresponding spatial and temporal weights.

图23B示出了用于各种类别的区块和用于各种循环的各种权重集,其中特定类别的区块的不同权重集包括公共空间权重和不同的时间权重。图23A中所示的区块分类类似于相对于图15、图21A和图23A所讨论的区块分类。然而,与图23A不同,在图23B中,用于周边槽道区块1508的权重集(pl,(1-N1))、(pl,(N1-N2))和(pl,(N2-N))具有公共空间权重(s-pl)。因此,相同或公共空间权重(s-pl)用于周边槽道区块1508且用于循环子系列1至N1、(N+1)至N2和(N2+1)至N中的每个循环子系列。Figure 23B shows various sets of weights for various classes of tiles and for various cycles, where the different sets of weights for a particular class of tiles include common spatial weights and different temporal weights. The block classification shown in Figure 23A is similar to the block classification discussed with respect to Figures 15, 21A, and 23A. However, unlike FIG. 23A , in FIG. 23B the weight sets (pl,(1-N1)), (pl,(N1-N2)) and (pl,(N2-N2)) for the perimeter channel block 1508 )) have common spatial weights (s-pl). Thus, the same or common spatial weight (s-pl) is used for the perimeter channel block 1508 and for each cycle in the cycle subseries 1 to N1, (N+1) to N2, and (N2+1) to N sub-series.

权重集(pl,(1-N1))、(pl,(N1-N2))和(pl,(N2-N))具有不同的时间权重,诸如分别为时间权重(t-pl,(1-N1))、时间权重(t-pl,(N1-N2))和时间权重(t-pl,(N2-N))。Weight sets (pl, (1-N1)), (pl, (N1-N2)) and (pl, (N2-N)) have different time weights, such as time weights (t-pl, (1-N2) respectively N1)), time weight (t-pl, (N1-N2)) and time weight (t-pl, (N2-N)).

类似地,用于中央槽道区块1510的权重集(cl,(1-N1))、(cl,(N1-N2))和(cl,(N2-N))具有公共空间权重(s-cl)。因此,相同或公共空间权重(s-cl)用于中央槽道区块1510且用于循环子系列1至N1、(N+1)至N2和(N2+1)至N中的每个循环子系列。Similarly, the weight sets (cl,(1-N1)), (cl,(N1-N2)) and (cl,(N2-N)) for the central channel block 1510 have common spatial weights (s- cl). Thus, the same or common spatial weight (s-cl) is used for the central channel block 1510 and for each cycle in the cycle subseries 1 to N1, (N+1) to N2, and (N2+1) to N sub-series.

权重集(cl,(1-N1))、(cl,(N1-N2))和(cl,(N2-N))具有不同的时间权重,诸如分别为时间权重(t-cl,(1-N1))、时间权重(t-cl,(N1-N2))和时间权重(t-cl,(N2-N))。Weight sets (cl,(1-N1)), (cl,(N1-N2)) and (cl,(N2-N)) have different time weights, such as time weights (t-cl,(1-N2) respectively N1)), time weight (t-cl, (N1-N2)) and time weight (t-cl, (N2-N)).

在一个实施方案中并且如相对于图17A和图17B所讨论的,随着测序循环进展,衰落、定相和/或预定相引起传感器数据的劣化。此类劣化由神经网络模型的时间层(诸如图7的框720内的层或图9的层924、925)解决。因此,在图23B中,各种测序循环子系列的时间权重进行不同的训练。例如,用于循环1至N1且用于给定区块类别的时间权重与用于循环N1至N2用于相同区块类别的时间权重不同。相比之下,因为空间层(诸如图7的框701、…、705内的层或图9的层921、922、923)可能不会显著解决信号质量的劣化,所有循环共享用于给定区块类别的公共空间权重,如图23B所示。In one embodiment and as discussed with respect to FIGS. 17A and 17B , fading, phasing, and/or prephasing cause degradation of sensor data as the sequencing cycle progresses. Such degradations are addressed by temporal layers of the neural network model, such as layers within block 720 of FIG. 7 or layers 924, 925 of FIG. 9 . Thus, in Figure 23B, the time weights for the various subseries of sequencing cycles are trained differently. For example, the time weight for a given block class for cycles 1 to N1 is different from the time weight for the same block class for cycles N1 to N2. In contrast, because spatial layers (such as the layers within blocks 701, . . . , 705 of FIG. Common space weights for block categories, as shown in Figure 23B.

因此,当处理特定区块类别的传感器数据(比如说,用于周边槽道区块1508)时,最初在可配置处理器中加载用于循环1至N1的权重集(pl,(1-N1))的公共空间权重(s-pl)和时间权重(t-pl,(1-N1)),并且用这些空间权重和时间权重配置基于神经网络的碱基检出器2308。例如,用公共空间权重(s-pl)配置基于神经网络的碱基检出器2308的空间层,并且用时间权重(t-pl,(1-N1))配置基于神经网络的碱基检出器2308的时间层。基于经配置的神经网络的碱基检出器2308对周边槽道区块1508的用于循环1至N1的传感器数据应用经配置的空间和时间层,以产生周边槽道区块1508的用于循环1至N1的碱基检出分类数据。Thus, when processing sensor data for a particular block class (say, for the peripheral channel block 1508), the set of weights (pl,(1-N1 )), and configure the neural network based base caller 2308 with these spatial and temporal weights. For example, the spatial layer of the neural network based base caller 2308 is configured with common spatial weights (s-pl) and the neural network based base calling is configured with temporal weights (t-pl,(1-N1)) The time layer of the device 2308. The configured neural network based base caller 2308 applies the configured spatial and temporal layers to the sensor data for the peripheral channel block 1508 for cycles 1 to N1 to generate the peripheral channel block 1508 for Base calling classification data for cycles 1 to N1.

随后,在处理用于循环(N1+1)的传感器数据之前,加载权重集(pl,(N1-N2))的时间权重(t-pl,(N1-N2)),而不加载该权重集的任何对应的空间权重。用时间权重(t-pl,(N1-N2))配置基于神经网络的碱基检出器2308的时间层。然后,基于神经网络的碱基检出器2308对周边槽道区块1508的用于循环(N1+1)至N2的传感器数据应用先前配置的空间层(例如,其先前用公共空间权重(s-pl)配置)和重新配置的时间层(例如,其用时间权重(t-pl,(N1-N2))重新配置),以产生周边槽道区块1508的用于循环(N1+1)至N2的碱基检出分类数据。Subsequently, the temporal weights (t-pl,(N1-N2)) of the weight set (pl,(N1-N2)) are loaded without loading the weight set before processing the sensor data for the loop (N1+1) Any corresponding spatial weights for . The temporal layers of the neural network based base caller 2308 are configured with temporal weights (t-pl,(N1-N2)). The neural network based base caller 2308 then applies a previously configured spatial layer (e.g., which was previously assigned with the common spatial weights (s -pl) configuration) and a reconfigured temporal layer (e.g., it is reconfigured with temporal weights (t-pl,(N1-N2))) to generate peripheral channel block 1508 for cycle (N1+1) Base call classification data to N2.

随后,在处理用于循环(N2+1)的传感器数据之前,加载权重集(pl,(N2-N))的时间权重(t-pl,(N2-N)),而不加载该权重集的任何对应的空间权重。用时间权重(t-pl,(N2-N))重新配置基于神经网络的碱基检出器2308的时间层。然后,基于神经网络的碱基检出器2308对周边槽道区块的用于循环(N2+1)至N的传感器数据应用先前配置的空间层(例如,其先前用公共空间权重(s-pl)配置)和重新配置的时间层(例如,其用时间权重(t-pl,(N2-N))重新配置),以产生周边槽道区块的用于循环(N2+1)至N的碱基检出分类数据。Subsequently, the temporal weights (t-pl,(N2-N)) of the weight set (pl,(N2-N)) are loaded without loading the weight set before processing the sensor data for the loop (N2+1) Any corresponding spatial weights for . The temporal layers of the neural network based base caller 2308 are reconfigured with temporal weights (t-pl,(N2-N)). The neural network-based base caller 2308 then applies a previously configured spatial layer (e.g., which was previously assigned with the common spatial weights (s− pl) configuration) and a reconfigured temporal layer (e.g., it is reconfigured with temporal weights (t-pl, (N2-N))) to generate surrounding channel blocks for looping (N2+1) to N base call classification data for .

以对应的类似方式产生用于其他区块类别(诸如中央槽道区块1510)的碱基检出分类数据,本领域技术人员将基于以上讨论和图23B的图示理解所述类似方式。Base call classification data for other block classes, such as central lane block 1510, is generated in a correspondingly similar manner, which will be understood by those skilled in the art based on the above discussion and the illustration of Figure 23B.

图23C示出了基于一个或多个测序运行参数2382来选择权重集的系统2300。例如,示出了可以在可配置处理器450和/或主机处理器2304上执行的权重集选择逻辑2386。权重集选择逻辑2386接收一个或多个测序运行参数2382以及相对于图14至图23B所讨论的一个或多个其他权重集选择标准。权重集选择逻辑2386基于一个或多个测序运行参数2382和/或相对于图14至图23B所讨论的一个或多个其他权重集选择标准,从多个候选权重集2384a、…、2384N中选择权重集。在图23B的示例中,权重集选择逻辑2386选择权重集2384b。然后,在可配置处理器450中加载所选择的权重集,并且使用所选择的权重集来配置神经网络拓扑结构以进行碱基检出,如本文所讨论的。FIG. 23C illustrates a system 2300 for selecting a set of weights based on one or more sequencing run parameters 2382 . For example, weight set selection logic 2386 that may be executed on configurable processor 450 and/or host processor 2304 is shown. Weight set selection logic 2386 receives one or more sequencing run parameters 2382 and one or more other weight set selection criteria discussed with respect to FIGS. 14-23B . Weight set selection logic 2386 selects from a plurality of candidate weight sets 2384a, ..., 2384N based on one or more sequencing run parameters 2382 and/or one or more other weight set selection criteria discussed with respect to FIGS. set of weights. In the example of FIG. 23B, weight set selection logic 2386 selects weight set 2384b. The selected set of weights is then loaded in the configurable processor 450 and used to configure the neural network topology for base calling, as discussed herein.

一个或多个测序运行参数2382可包括与当前测序运行相关联的一个或多个适当参数。例如,测序运行中使用的反应组分(诸如试剂、酶、样品、其他生物分子和缓冲溶液)可影响传感器数据,并且可基于所使用的反应组分的类型、参数或批次来选择权重集。例如,定相特征(参见图17B)可基于用于测序运行的试剂包,并且可基于试剂包的类型、寿命和/或批次而变化。因此,可针对各种类型批次的反应组分生成各种候选权重集,并且权重集选择逻辑2386可基于用于当前测序循环的反应组分来选择权重集。One or more sequencing run parameters 2382 may include one or more appropriate parameters associated with the current sequencing run. For example, reaction components used in a sequencing run (such as reagents, enzymes, samples, other biomolecules, and buffer solutions) can affect sensor data, and weight sets can be selected based on the type, parameters, or batch of reaction components used . For example, phasing characteristics (see FIG. 17B ) can be based on the reagent pack used for the sequencing run, and can vary based on the type, age, and/or lot of the reagent pack. Accordingly, various candidate weight sets can be generated for various types of batches of reaction components, and the weight set selection logic 2386 can select a weight set based on the reaction components used for the current sequencing cycle.

在另一示例中,权重集选择逻辑2386可估计定相特征,并且基于定相特征来选择权重集。例如,可针对不同的定相特征生成不同的权重集。然后在测序运行中的早期,可估计定相参数并使用定相参数来选择权重集。在又一示例中,可尝试多个候选权重集,并且可选择具有最低错误率(或最高信噪比)的权重集用于整个测序运行。In another example, the weight set selection logic 2386 can estimate the phasing characteristics and select a weight set based on the phasing characteristics. For example, different sets of weights can be generated for different phasing features. Then early in the sequencing run, phasing parameters can be estimated and used to select weight sets. In yet another example, multiple candidate weight sets can be tried, and the weight set with the lowest error rate (or highest signal-to-noise ratio) can be selected for the entire sequencing run.

图24是根据一个具体实施的碱基检出系统2400的框图。碱基检出系统2400可操作以获得与生物物质或化学物质中的至少一者相关的任何信息或数据。在一些具体实施中,碱基检出系统2400是可类似于台式设备或台式计算机的工作站。例如,用于进行所需反应的大部分(或全部)系统和部件可位于共同的外壳2416内。Figure 24 is a block diagram of a base calling system 2400 according to one implementation. The base calling system 2400 is operable to obtain any information or data related to at least one of a biological substance or a chemical substance. In some implementations, base calling system 2400 is a workstation that can resemble a desktop device or a desktop computer. For example, most (or all) of the systems and components used to perform the desired reactions may be located within a common housing 2416 .

在特定具体实施中,碱基检出系统2400是被配置用于各种应用的核酸测序系统(或测序仪),各种应用包括但不限于从头测序、全基因组或靶基因组区域的重测序以及宏基因组学。测序仪也可用于DNA或RNA分析。在一些具体实施中,碱基检出系统2400还可被配置为在生物传感器中生成反应位点。例如,碱基检出系统2400可被配置为接收样品并且生成来源于样品的克隆扩增核酸的表面附着簇。每个簇可构成生物传感器中的反应位点或作为其一部分。In certain implementations, base calling system 2400 is a nucleic acid sequencing system (or sequencer) configured for various applications including, but not limited to, de novo sequencing, resequencing of whole genomes or targeted genomic regions, and Metagenomics. Sequencers can also be used for DNA or RNA analysis. In some implementations, base calling system 2400 can also be configured to generate reactive sites in a biosensor. For example, base calling system 2400 can be configured to receive a sample and generate surface-attached clusters of clonally amplified nucleic acid derived from the sample. Each cluster can constitute or be part of a reactive site in a biosensor.

示例性碱基检出系统2400可包括被配置为与生物传感器2402相互作用以在生物传感器2402内执行所需反应的系统插座或接口2412。在以下相对于图24的描述中,将生物传感器2402装载到系统插座2412中。然而,应当理解,可将包括生物传感器2402的卡盒插入到系统插座2412中,并且在一些状态下,可暂时或永久地移除卡盒。如上所述,除了别的以外,卡盒还可包括流体控制部件和流体储存部件。Exemplary base calling system 2400 can include a system socket or interface 2412 configured to interact with biosensor 2402 to perform desired reactions within biosensor 2402 . In the description below with respect to FIG. 24 , biosensor 2402 is loaded into system socket 2412 . It should be understood, however, that a cartridge including biosensor 2402 may be inserted into system receptacle 2412 and, in some cases, removed temporarily or permanently. As noted above, the cartridge may include, among other things, a fluid control component and a fluid storage component.

在特定具体实施中,碱基检出系统2400被配置为在生物传感器2402内执行大量平行反应。生物传感器2402包括可发生所需反应的一个或多个反应位点。反应位点可例如固定至生物传感器的固体表面或固定至位于生物传感器的对应反应室内的小珠(或其他可移动基板)。反应位点可包括,例如,克隆扩增核酸的簇。生物传感器2402可以包括固态成像设备(例如,CCD或CMOS成像器件)和安装到其上的流通池。流通池可包括一个或多个流动通道,该一个或多个流动通道从碱基检出系统2400接收溶液并且将溶液引向反应位点。任选地,生物传感器2402可以被配置为接合热元件,以用于将热能传递到流动通道中或从流动通道传递出去。In a particular implementation, base calling system 2400 is configured to perform a large number of parallel reactions within biosensor 2402 . Biosensor 2402 includes one or more reaction sites where a desired reaction can occur. Reaction sites may, for example, be immobilized to the solid surface of the biosensor or to beads (or other movable substrates) located within corresponding reaction chambers of the biosensor. A reaction site can include, for example, a cluster of clonally amplified nucleic acids. Biosensor 2402 may include a solid-state imaging device (eg, a CCD or CMOS imaging device) and a flow cell mounted thereto. A flow cell may include one or more flow channels that receive a solution from the base calling system 2400 and direct the solution to a reaction site. Optionally, biosensor 2402 may be configured to engage a thermal element for transferring thermal energy into or out of the flow channel.

碱基检出系统2400可包括彼此相互作用以执行用于生物或化学分析的预先确定的方法或测定协议的各种部件、组件和系统(或子系统)。例如,碱基检出系统2400包括系统控制器2404,该系统控制器可与碱基检出系统2400的各种部件、组件和子系统以及生物传感器2402通信。例如,除了系统插座2412之外,碱基检出系统2400还可包括:流体控制系统2406,该流体控制系统用于控制流体在碱基检出系统2400和生物传感器2402的整个流体网络中的流动;流体储存系统2408,该流体储存系统被配置为容纳可以由生物测定系统使用的所有流体(例如,气体或液体);温度控制系统2410,该温度控制系统可以调节流体网络、流体储存系统2408和/或生物传感器2402中流体的温度;和照明系统2409,该照明系统被配置为照亮生物传感器2402。如上所述,如果将具有生物传感器2402的卡盒装载到系统插座2412中,则该卡盒还可以包括流体控制部件和流体储存部件。Base calling system 2400 may include various components, assemblies and systems (or subsystems) that interact with each other to perform a predetermined method or assay protocol for biological or chemical analysis. For example, base calling system 2400 includes a system controller 2404 that can communicate with various components, components, and subsystems of base calling system 2400 and biosensor 2402 . For example, in addition to system socket 2412, base calling system 2400 may also include: fluid control system 2406 for controlling the flow of fluids throughout the fluidic network of base calling system 2400 and biosensors 2402 a fluid storage system 2408 configured to hold all fluids (eg, gases or liquids) that can be used by the bioassay system; a temperature control system 2410 that can regulate the fluid network, fluid storage system 2408, and and/or the temperature of the fluid in the biosensor 2402; and an illumination system 2409 configured to illuminate the biosensor 2402. As noted above, if a cartridge with biosensor 2402 is loaded into system receptacle 2412, the cartridge may also include fluid control components and fluid storage components.

还如图所示,碱基检出系统2400可包括与用户交互的用户界面2414。例如,用户界面2414可以包括用于显示或请求来自用户的信息的显示器2413和用于接收用户输入的用户输入设备2415。在一些具体实施中,显示器2413和用户输入设备2415是相同的设备。例如,用户界面2414可包括触敏显示器,该触敏显示器被配置为检测个体触摸的存在并且还识别触摸在显示器上的位置。然而,可以使用其他用户输入设备2415,诸如鼠标、触摸板、键盘、小键盘、手持扫描仪、语音识别系统、运动识别系统等。如将在下文更详细地讨论,碱基检出系统2400可与包括生物传感器2402(例如,呈卡盒的形式)的各种部件通信,以执行所需反应。碱基检出系统2400还可被配置为分析从生物传感器获得的数据以向用户提供所需信息。As also shown, the base calling system 2400 can include a user interface 2414 for interacting with a user. For example, user interface 2414 may include a display 2413 for displaying or requesting information from a user and a user input device 2415 for receiving user input. In some implementations, the display 2413 and the user input device 2415 are the same device. For example, user interface 2414 may include a touch-sensitive display configured to detect the presence of an individual touch and also identify the location of the touch on the display. However, other user input devices 2415 may be used, such as mice, touch pads, keyboards, keypads, handheld scanners, voice recognition systems, motion recognition systems, and the like. As will be discussed in more detail below, base calling system 2400 can communicate with various components including biosensor 2402 (eg, in the form of a cartridge) to perform desired reactions. The base calling system 2400 can also be configured to analyze data obtained from biosensors to provide desired information to the user.

系统控制器2404可包括任何基于处理器或基于微处理器的系统,包括使用微控制器、精简指令集计算机(RISC)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、逻辑电路以及能够执行本文所述功能的任何其他电路或处理器。上述示例仅是示例性的,因此不旨在以任何方式限制术语系统控制器的定义和/或含义。在示例性具体实施中,系统控制器2404执行存储在一个或多个存储元件、存储器或模块中的指令集,以便进行获得检测数据和分析检测数据中的至少一者。检测数据可包括多个像素信号序列,使得可在许多碱基检出循环内检测来自数百万个传感器(或像素)中的每个传感器(或像素)的像素信号序列。储存元件可为呈碱基检出系统2400内的信息源或物理存储器元件的形式。System controller 2404 may include any processor-based or microprocessor-based system, including those using microcontrollers, reduced instruction set computers (RISCs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), logic circuits and any other circuit or processor capable of performing the functions described herein. The above examples are exemplary only, and thus are not intended to limit in any way the definition and/or meaning of the term system controller. In an exemplary implementation, the system controller 2404 executes a set of instructions stored in one or more storage elements, memories, or modules to at least one of obtain detection data and analyze detection data. The detection data may include multiple pixel signal sequences such that a pixel signal sequence from each of the millions of sensors (or pixels) may be detected over many base calling cycles. The storage element may be in the form of an information source or a physical memory element within the base calling system 2400 .

指令集可包括指示碱基检出系统2400或生物传感器2402执行具体操作(诸如本文所述的各种具体实施的方法和过程)的各种命令。指令集可为软件程序的形式,该软件程序可形成有形的一个或多个非暂态计算机可读介质的一部分。如本文所用,术语“软件”和“固件”是可互换的,并且包括存储在存储器中以供计算机执行的任何计算机程序,包括RAM存储器、ROM存储器、EPROM存储器、EEPROM存储器和非易失性RAM(NVRAM)存储器。上述存储器类型仅是示例性的,因此不限制可用于存储计算机程序的存储器类型。The set of instructions may include various commands that instruct base calling system 2400 or biosensor 2402 to perform specific operations, such as the various implemented methods and processes described herein. A set of instructions may be in the form of a software program that may form part of one or more tangible non-transitory computer readable media. As used herein, the terms "software" and "firmware" are interchangeable and include any computer program stored in memory for execution by a computer, including RAM memory, ROM memory, EPROM memory, EEPROM memory, and non-volatile RAM (NVRAM) memory. The memory types described above are exemplary only, and thus do not limit the types of memory that can be used to store computer programs.

软件可为各种形式,诸如系统软件或应用软件。此外,软件可以是独立程序的集合的形式,或者是较大程序内的程序模块或程序模块的一部分的形式。软件还可包括面向对象编程形式的模块化编程。在获得检测数据之后,检测数据可由碱基检出系统2400自动处理,响应于用户输入而处理,或者响应于另一个处理机器提出的请求(例如,通过通信链路的远程请求)而处理。在例示的具体实施中,系统控制器2404包括分析模块2538(在图25中示出)。在其他具体实施中,系统控制器2404不包括分析模块2538,而是可以访问分析模块2538(例如,分析模块2538可以单独地托管在云上)。Software can be in various forms, such as system software or application software. Furthermore, software may be in the form of a collection of stand-alone programs, or a program module within a larger program, or a portion of a program module. The software may also include modular programming in the form of object-oriented programming. After the assay data is obtained, the assay data can be processed by the base calling system 2400 automatically, in response to user input, or in response to a request from another processing machine (eg, remotely via a communication link). In the illustrated implementation, the system controller 2404 includes an analysis module 2538 (shown in FIG. 25 ). In other implementations, the system controller 2404 does not include the analysis module 2538, but can access the analysis module 2538 (eg, the analysis module 2538 can be hosted separately on the cloud).

系统控制器2404可经由通信链路连接到生物传感器2402和碱基检出系统2400的其他部件。系统控制器2404还可以通信地连接到非现场系统或服务器。通信链路可以是硬连线的、有线的或无线的。系统控制器2404可以从用户界面2414和用户输入设备2415接收用户输入或命令。System controller 2404 may be connected to biosensor 2402 and other components of base calling system 2400 via communication links. System controller 2404 may also be communicatively coupled to off-site systems or servers. Communication links can be hardwired, wired or wireless. System controller 2404 may receive user input or commands from user interface 2414 and user input device 2415 .

流体控制系统2406包括流体网络,并且被配置为引导和调节一种或多种流体通过流体网络的流动。流体网络可以与生物传感器2402和流体储存系统2408流体连通。例如,选定的流体可以从流体储存系统2408抽吸并以受控方式引导至生物传感器2402,或者流体可以从生物传感器2402抽吸并朝向例如流体储存系统2408中的废物储存器引导。虽然未示出,但流体控制系统2406可以包括检测流体网络内的流体的流速或压力的流量传感器。传感器可以与系统控制器2404通信。Fluid control system 2406 includes a fluid network and is configured to direct and regulate the flow of one or more fluids through the fluid network. The fluid network may be in fluid communication with biosensor 2402 and fluid storage system 2408 . For example, selected fluids may be drawn from fluid storage system 2408 and directed to biosensor 2402 in a controlled manner, or fluid may be drawn from biosensor 2402 and directed toward a waste reservoir in fluid storage system 2408, for example. Although not shown, fluid control system 2406 may include flow sensors that detect the flow rate or pressure of fluid within the fluid network. The sensors may communicate with the system controller 2404.

温度控制系统2410被配置为调节流体网络、流体储存系统2408和/或生物传感器2402的不同区域处流体的温度。例如,温度控制系统2410可以包括热循环仪,该热循环仪与生物传感器2402对接并且控制沿着生物传感器2402中的反应位点流动的流体的温度。温度控制系统2410还可调节碱基检出系统2400或生物传感器2402的固体元件或部件的温度。尽管未示出,但温度控制系统2410可以包括用于检测流体或其他部件的温度的传感器。传感器可以与系统控制器2404通信。Temperature control system 2410 is configured to regulate the temperature of fluid at various regions of fluid network, fluid storage system 2408 , and/or biosensor 2402 . For example, temperature control system 2410 may include a thermal cycler that interfaces with biosensor 2402 and controls the temperature of fluid flowing along reaction sites in biosensor 2402 . Temperature control system 2410 may also regulate the temperature of solid elements or components of base calling system 2400 or biosensor 2402 . Although not shown, temperature control system 2410 may include sensors for detecting the temperature of fluid or other components. The sensors may communicate with the system controller 2404.

流体储存系统2408与生物传感器2402流体连通,并且可以储存用于在其中进行所需反应的各种反应组分或反应物。流体储存系统2408还可以储存用于洗涤或清洁流体网络和生物传感器2402以及用于稀释反应物的流体。例如,流体储存系统2408可以包括各种储存器,以储存样品、试剂、酶、其他生物分子、缓冲溶液、水性溶液和非极性溶液等。此外,流体储存系统2408还可以包括废物储存器,用于接收来自生物传感器2402的废物。在包括卡盒的具体实施中,卡盒可包括流体储存系统、流体控制系统或温度控制系统中的一者或多者。因此,本文所述的与那些系统有关的一个或多个部件可容纳在卡盒外壳内。例如,卡盒可具有各种储存器,以储存样品、试剂、酶、其他生物分子、缓冲溶液、水性溶液和非极性溶液、废物等。因此,流体储存系统、流体控制系统或温度控制系统中的一者或多者可经由卡盒或其他生物传感器与生物测定系统可移除地接合。Fluid storage system 2408 is in fluid communication with biosensor 2402 and can store various reaction components or reactants for performing desired reactions therein. Fluid storage system 2408 may also store fluids for washing or cleaning the fluid network and biosensors 2402 and for diluting reactants. For example, fluid storage system 2408 may include various reservoirs to store samples, reagents, enzymes, other biomolecules, buffered solutions, aqueous and non-polar solutions, and the like. Additionally, fluid storage system 2408 may also include a waste reservoir for receiving waste from biosensor 2402 . In implementations that include a cartridge, the cartridge may include one or more of a fluid storage system, a fluid control system, or a temperature control system. Accordingly, one or more components described herein in relation to those systems may be housed within the cartridge housing. For example, a cartridge may have various reservoirs to store samples, reagents, enzymes, other biomolecules, buffer solutions, aqueous and non-polar solutions, waste, and the like. Accordingly, one or more of the fluid storage system, fluid control system, or temperature control system may be removably engaged with the bioassay system via a cartridge or other biosensor.

照明系统2409可以包括光源(例如,一个或多个LED)和用于照亮生物传感器的多个光学部件。光源的示例可包括激光器、弧光灯、LED或激光二极管。光学部件可以是例如反射器、二向色镜、分束器、准直器、透镜、滤光器、楔镜、棱镜、反射镜、检测器等。在使用照明系统的具体实施中,照明系统2409可以被配置为将激发光引导至反应位点。作为一个示例,荧光团可由绿色波长的光激发,因此激发光的波长可为大约532nm。在一个具体实施中,照明系统2409被配置为产生平行于生物传感器2402的表面的表面法线的照明。在另一具体实施中,照明系统2409被配置为产生相对于生物传感器2402的表面的表面法线成偏角的照明。在又一具体实施中,照明系统2409被配置为产生具有多个角度的照明,包括一些平行照明和一些偏角照明。The illumination system 2409 may include a light source (eg, one or more LEDs) and a plurality of optical components for illuminating the biosensor. Examples of light sources may include lasers, arc lamps, LEDs, or laser diodes. Optical components may be, for example, reflectors, dichroic mirrors, beam splitters, collimators, lenses, filters, wedges, prisms, mirrors, detectors, and the like. In implementations using an illumination system, the illumination system 2409 can be configured to direct excitation light to the reaction sites. As an example, a fluorophore may be excited by light at a green wavelength, so the wavelength of the excitation light may be approximately 532 nm. In one implementation, the illumination system 2409 is configured to produce illumination parallel to the surface normal of the surface of the biosensor 2402 . In another implementation, the illumination system 2409 is configured to produce illumination at an off-angle relative to the surface normal of the surface of the biosensor 2402 . In yet another implementation, the lighting system 2409 is configured to generate lighting with multiple angles, including some parallel lighting and some off-angle lighting.

系统插座或接口2412被配置为以机械、电气和流体方式中的至少一种方式接合生物传感器2402。系统插座2412可将生物传感器2402保持在所需取向,以有利于流体流过生物传感器2402。系统插座2412还可包括电触点,该电触点被配置为接合生物传感器2402,使得碱基检出系统2400可与生物传感器2402通信和/或向生物传感器2402提供功率。此外,系统插座2412可以包括被配置为接合生物传感器2402的流体端口(例如,喷嘴)。在一些具体实施中,生物传感器2402以机械方式、电方式以及流体方式可移除地耦接到系统插座2412。System socket or interface 2412 is configured to at least one of mechanically, electrically, and fluidly engage biosensor 2402 . System receptacle 2412 can hold biosensor 2402 in a desired orientation to facilitate fluid flow through biosensor 2402 . System socket 2412 may also include electrical contacts configured to engage biosensor 2402 such that base calling system 2400 may communicate with and/or provide power to biosensor 2402 . Additionally, system receptacle 2412 may include a fluid port (eg, a nozzle) configured to engage biosensor 2402 . In some implementations, biosensor 2402 is removably coupled to system socket 2412 mechanically, electrically, and fluidically.

此外,碱基检出系统2400可与其他系统或网络或与其他生物测定系统2400远程通信。由生物测定系统2400获得的检测数据可储存在远程数据库中。Additionally, the base calling system 2400 can communicate remotely with other systems or networks or with other bioassay systems 2400 . Detection data obtained by the biometric system 2400 may be stored in a remote database.

图25是可在图24的系统中使用的系统控制器2404的框图。在一个具体实施中,系统控制器2404包括可以彼此通信的一个或多个处理器或模块。处理器或模块中的每一者可以包括用于执行特定过程的算法(例如,存储在有形和/或非暂态计算机可读存储介质上的指令)或子算法。系统控制器2404在概念上被示出为模块的集合,但可以利用专用硬件板、DSP、处理器等的任何组合来实现。另选地,系统控制器2404可以利用具有单个处理器或多个处理器的现成PC来实现,其中功能操作分布在处理器之间。作为进一步的选择,下文所述的模块可利用混合配置来实现,其中某些模块化功能利用专用硬件来执行,而其余模块化功能利用现成PC等来执行。模块还可被实现为处理单元内的软件模块。FIG. 25 is a block diagram of a system controller 2404 that may be used in the system of FIG. 24 . In one implementation, the system controller 2404 includes one or more processors or modules that can communicate with each other. Each of the processors or modules may include an algorithm (eg, instructions stored on a tangible and/or non-transitory computer-readable storage medium) or sub-algorithm for performing a particular process. The system controller 2404 is shown conceptually as a collection of modules, but may be implemented with any combination of dedicated hardware boards, DSPs, processors, and the like. Alternatively, system controller 2404 may be implemented using an off-the-shelf PC with a single processor or multiple processors, with functional operations distributed among the processors. As a further option, the modules described below may be implemented using a hybrid configuration, where some of the modular functions are performed using dedicated hardware, while the remaining modular functions are performed using off-the-shelf PCs or the like. A module may also be implemented as a software module within a processing unit.

在操作期间,通信端口2520可向生物传感器2402(图24)和/或子系统2406、2408、2410(图24)传输信息(例如,命令)或从其接收信息(例如,数据)。在具体实施中,通信端口2520可以输出多个像素信号序列。通信端口2520可从用户界面2414(图24)接收用户输入并且将数据或信息传输到用户界面2414。来自生物传感器2402或子系统2406、2408、2410的数据可以在生物测定会话期间由系统控制器2404实时处理。除此之外或另选地,数据可在生物测定会话期间临时储存在系统存储器中,并且以比实时或脱机操作更慢的速度进行处理。During operation, communication port 2520 may transmit information (eg, commands) or receive information (eg, data) from biosensor 2402 ( FIG. 24 ) and/or subsystems 2406 , 2408 , 2410 ( FIG. 24 ). In a specific implementation, the communication port 2520 may output a plurality of pixel signal sequences. Communication port 2520 may receive user input from user interface 2414 ( FIG. 24 ) and transmit data or information to user interface 2414 . Data from biosensors 2402 or subsystems 2406, 2408, 2410 may be processed in real-time by system controller 2404 during a biometric session. Additionally or alternatively, data may be temporarily stored in system memory during a biometric session and processed at a slower rate than real-time or offline operation.

如图25所示,系统控制器2404可包括与主控制模块2530通信的多个模块2531至2539。主控制模块2530可与用户界面2414(图24)通信。尽管模块2531至2539被示出为与主控制模块2530直接通信,但模块2531至2539也可以彼此直接通信,与用户界面2414和生物传感器2402直接通信。另外,模块2531至2539可以通过其他模块与主控制模块2530通信。As shown in FIG. 25 , system controller 2404 may include a number of modules 2531 - 2539 in communication with master control module 2530 . The main control module 2530 can communicate with the user interface 2414 (FIG. 24). Although modules 2531 - 2539 are shown in direct communication with main control module 2530 , modules 2531 - 2539 may communicate directly with each other, with user interface 2414 and biosensor 2402 . In addition, the modules 2531 to 2539 may communicate with the main control module 2530 through other modules.

多个模块2531至2539包括分别与子系统2406、2408、2410和2409通信的系统模块2531至2533、2539。流体控制模块2531可以与流体控制系统2406通信,以控制流体网络的阀和流量传感器,从而控制一种或多种流体通过流体网络的流动。流体储存模块2532可以在流体量低时或在废物储存器处于或接近容量时通知用户。流体储存模块2532还可以与温度控制模块2533通信,使得流体可以储存在所需温度下。照明模块2539可以与照明系统2409通信,以在协议期间的指定时间照亮反应位点,诸如在已发生所需反应(例如,结合事件)之后。在一些具体实施中,照明模块2539可以与照明系统2409通信,从而以指定角度照亮反应位点。The plurality of modules 2531-2539 includes system modules 2531-2533, 2539 in communication with subsystems 2406, 2408, 2410, and 2409, respectively. Fluid control module 2531 may communicate with fluid control system 2406 to control valves and flow sensors of the fluid network to control the flow of one or more fluids through the fluid network. The fluid storage module 2532 can notify the user when the fluid level is low or when the waste reservoir is at or near capacity. The fluid storage module 2532 can also communicate with the temperature control module 2533 so that the fluid can be stored at a desired temperature. The illumination module 2539 can communicate with the illumination system 2409 to illuminate the reaction site at specified times during the protocol, such as after a desired reaction (eg, a binding event) has occurred. In some implementations, the illumination module 2539 can communicate with the illumination system 2409 to illuminate the reaction site at a specified angle.

多个模块2531至2539还可以包括与生物传感器2402通信的设备模块2534和确定与生物传感器2402相关的识别信息的识别模块2535。设备模块2534可例如与系统插座2412通信以确认生物传感器已与碱基检出系统2400建立电连接和流体连接。识别模块2535可以接收识别生物传感器2402的信号。识别模块2535可以使用生物传感器2402的身份来向用户提供其他信息。例如,识别模块2535可以确定并随后显示批号、制造日期或建议与生物传感器2402一起运行的协议。Plurality of modules 2531 to 2539 may also include a device module 2534 that communicates with biosensor 2402 and an identification module 2535 that determines identification information associated with biosensor 2402 . Device module 2534 may communicate with system socket 2412 to confirm that the biosensor has established electrical and fluid connection with base calling system 2400, for example. Identification module 2535 may receive a signal identifying biosensor 2402 . Identification module 2535 may use the identity of biosensor 2402 to provide other information to the user. For example, identification module 2535 may determine and then display a lot number, date of manufacture, or a protocol recommended to run with biosensor 2402 .

多个模块2531至2539还包括接收和分析来自生物传感器2402的信号数据(例如,图像数据)的分析模块2538(也称为信号处理模块或信号处理器)。分析模块2538包括用于储存检测数据的存储器(例如,RAM或闪存)。检测数据可包括多个像素信号序列,使得可在许多碱基检出循环内检测来自数百万个传感器(或像素)中的每个传感器(或像素)的像素信号序列。信号数据可以被存储用于后续分析,或者可以被传输到用户界面2414以向用户显示所需信息。在一些具体实施中,信号数据可以在分析模块2538接收到信号数据之前由固态成像器件(例如,CMOS图像传感器)处理。The number of modules 2531 to 2539 also includes an analysis module 2538 (also referred to as a signal processing module or signal processor) that receives and analyzes signal data (eg, image data) from the biosensor 2402 . Analysis module 2538 includes memory (eg, RAM or flash memory) for storing detection data. The detection data may include multiple pixel signal sequences such that a pixel signal sequence from each of the millions of sensors (or pixels) may be detected over many base calling cycles. Signal data may be stored for subsequent analysis, or may be transmitted to user interface 2414 to display desired information to the user. In some implementations, the signal data can be processed by a solid-state imaging device (eg, a CMOS image sensor) before the signal data is received by the analysis module 2538 .

分析模块2538被配置为在多个测序循环的每个测序循环处从光检测器获得图像数据。图像数据来源于由光检测器检测到的发射信号,并且通过神经网络(例如,基于神经网络的模板生成器2548、基于神经网络的碱基检出器2558(例如,参见图7、图9和图10)和/或基于神经网络的质量评分器2568)处理多个测序循环的每个测序循环的图像数据,并且在多个测序循环的每个测序循环处针对分析物中的至少一些产生碱基检出。The analysis module 2538 is configured to obtain image data from the light detectors at each of the plurality of sequencing cycles. The image data is derived from the emission signal detected by the photodetector and passed through a neural network (e.g., neural network-based template generator 2548, neural network-based base caller 2558 (see, e.g., FIGS. 7, 9 and 10 ) and/or neural network-based quality scorer 2568) process the image data for each of the plurality of sequencing cycles and generate bases for at least some of the analytes at each of the plurality of sequencing cycles base checkout.

协议模块2536和协议模块2537与主控制模块2530通信,以在进行预先确定的测定协议时控制子系统2406、2408和2410的操作。协议模块2536和2537可包括用于指示碱基检出系统2400根据预先确定的协议执行具体操作的指令集。如图所示,协议模块可以是边合成边测序(SBS)模块2536,该模块被配置为发出用于执行边合成边测序过程的各种命令。在SBS中,监测核酸引物沿核酸模板的延伸,以确定模板中核苷酸的序列。基础化学过程可以是聚合(例如,由聚合酶催化)或连接(例如,由连接酶催化)。在特定的基于聚合酶的SBS具体实施中,以依赖于模板的方式将荧光标记的核苷酸添加至引物(从而使引物延伸),使得对添加至引物的核苷酸的顺序和类型的检测可用于确定模板的序列。例如,为了启动第一SBS循环,可发出命令以将一个或多个标记的核苷酸、DNA聚合酶等递送至/通过容纳有核酸模板阵列的流通池。核酸模板可位于对应的反应位点。其中引物延伸导致标记的核苷酸掺入的那些反应位点可通过成像事件来检测。在成像事件期间,照明系统2409可向反应位点提供激发光。任选地,核苷酸还可以包括一旦将核苷酸添加到引物就终止进一步的引物延伸的可逆终止属性。例如,可以将具有可逆终止子部分的核苷酸类似物添加到引物,使得后续的延伸直到递送解封闭剂以去除该部分才发生。因此,对于使用可逆终止的具体实施,可发出命令以将解封闭剂递送到流通池(在检测发生之前或之后)。可发出一个或多个命令以实现各个递送步骤之间的洗涤。然后可以重复该循环n次,以将引物延伸n个核苷酸,从而检测长度为n的序列。示例性测序技术描述于:例如Bentley等人,Nature456:53-59(2008)、WO04/018497、US 7,057,026、WO 91/06678、WO 07/123744、US 7,329,492、US 7,211,414、US7,315,019和US 7,405,281中,这些文献中的每一篇以引用方式并入本文。Protocol module 2536 and protocol module 2537 communicate with main control module 2530 to control the operation of subsystems 2406, 2408, and 2410 when performing predetermined assay protocols. The protocol modules 2536 and 2537 may include instruction sets for instructing the base calling system 2400 to perform specific operations according to a predetermined protocol. As shown, the protocol module may be a sequencing by synthesis (SBS) module 2536 configured to issue various commands for performing the sequencing by synthesis process. In SBS, the extension of a nucleic acid primer along a nucleic acid template is monitored to determine the sequence of nucleotides in the template. The underlying chemical process can be polymerization (eg, catalyzed by a polymerase) or ligation (eg, catalyzed by a ligase). In certain polymerase-based SBS implementations, fluorescently labeled nucleotides are added to primers (thus allowing primer extension) in a template-dependent manner, allowing detection of the order and type of nucleotides added to the primers. Can be used to determine the sequence of the template. For example, to initiate a first SBS cycle, a command can be issued to deliver one or more labeled nucleotides, DNA polymerase, etc. to/through a flow cell containing an array of nucleic acid templates. Nucleic acid templates can be located at corresponding reaction sites. Those reactive sites where primer extension results in incorporation of labeled nucleotides can be detected by imaging events. During an imaging event, illumination system 2409 can provide excitation light to the reaction sites. Optionally, the nucleotide may also include a reversible termination property that terminates further primer extension once the nucleotide has been added to the primer. For example, a nucleotide analog having a reversible terminator moiety can be added to the primer such that subsequent extension does not occur until delivery of an unblocking agent to remove the moiety. Thus, for implementations using reversible termination, a command can be issued to deliver the deblocking agent to the flow cell (either before or after detection occurs). One or more commands may be issued to effectuate washing between various delivery steps. This cycle can then be repeated n times to extend the primer by n nucleotides to detect a sequence of length n. Exemplary sequencing techniques are described in, for example, Bentley et al., Nature 456:53-59 (2008), WO 04/018497, US 7,057,026, WO 91/06678, WO 07/123744, US 7,329,492, US 7,211,414, US 7,315,019 and US 7,405,281 , each of these documents is incorporated herein by reference.

对于SBS循环的核苷酸递送步骤,可以一次递送单一类型的核苷酸,或者可以递送多种不同的核苷酸类型(例如,A、C、T和G一起)。对于一次仅存在单一类型的核苷酸的核苷酸递送构型,不同的核苷酸不需要具有不同的标记,因为它们可基于个体化递送中固有的时间间隔来区分。因此,测序方法或装置可使用单色检测。例如,激发源仅需要提供单个波长或单个波长范围内的激发。对于其中递送导致多种不同核苷酸同时存在于流通池中的核苷酸递送构型,可基于附着到混合物中相应核苷酸类型的不同荧光标记来区分掺入不同核苷酸类型的位点。例如,可使用四种不同的核苷酸,每种核苷酸具有四种不同荧光团中的一种。在一个具体实施中,可使用在光谱的四个不同区域中的激发来区分四种不同的荧光团。例如,可使用四种不同的激发辐射源。另选地,可使用少于四种不同的激发源,但来自单个源的激发辐射的光学过滤可用于在流通池处产生不同范围的激发辐射。For the nucleotide delivery step of the SBS cycle, a single type of nucleotide can be delivered at a time, or multiple different nucleotide types (eg, A, C, T, and G together) can be delivered. For nucleotide delivery configurations where only a single type of nucleotide is present at a time, different nucleotides need not have different labels, as they can be distinguished based on the time interval inherent in individualized delivery. Accordingly, a sequencing method or device may use single-color detection. For example, an excitation source need only provide excitation at a single wavelength or range of wavelengths. For nucleotide delivery configurations where delivery results in the simultaneous presence of multiple different nucleotide types in the flow cell, the sites where different nucleotide types are incorporated can be distinguished based on the different fluorescent labels attached to the corresponding nucleotide types in the mixture. point. For example, four different nucleotides, each with one of four different fluorophores, can be used. In one implementation, four different fluorophores can be distinguished using excitation in four different regions of the spectrum. For example, four different excitation radiation sources can be used. Alternatively, fewer than four different excitation sources may be used, but optical filtering of the excitation radiation from a single source may be used to generate different ranges of excitation radiation at the flow cell.

在一些具体实施中,可在具有四种不同核苷酸的混合物中检测到少于四种不同颜色。例如,核苷酸对可在相同波长下检测,但基于对中的一个成员相对于另一个成员的强度差异,或基于对中的一个成员的导致与检测到的该对的另一个成员的信号相比明显的信号出现或消失的变化(例如,通过化学改性、光化学改性或物理改性)来区分。用于使用少于四种颜色的检测来区分四个不同核苷酸的示例性装置和方法描述于例如美国专利申请序列号61/538,294和61/619,878,其通过引用整体并入本文。2012年9月21日提交的美国申请13/624,200也全文以引用方式并入。In some implementations, fewer than four different colors can be detected in a mixture with four different nucleotides. For example, pairs of nucleotides can be detected at the same wavelength, but based on differences in the intensity of one member of the pair relative to the other, or based on one member of the pair resulting in a signal that differs from the detected signal of the other member of the pair. Distinguished by changes (eg, by chemical modification, photochemical modification, or physical modification) compared to the appearance or disappearance of a distinct signal. Exemplary devices and methods for differentiating four different nucleotides using detection using fewer than four colors are described, for example, in US Patent Application Serial Nos. 61/538,294 and 61/619,878, which are incorporated herein by reference in their entirety. US Application 13/624,200, filed September 21, 2012, is also incorporated by reference in its entirety.

多个协议模块还可以包括样品制备(或生成)模块2537,该模块被配置为向流体控制系统2406和温度控制系统2410发出命令,以扩增生物传感器2402内的产物。例如,生物传感器2402可接合至碱基检出系统2400。扩增模块2537可以向流体控制系统2406发出指令,以将必要的扩增组分递送到生物传感器2402内的反应室。在其他具体实施中,反应位点可能已包含一些用于扩增的组分,诸如模板DNA和/或引物。在将扩增组分递送至反应室之后,扩增模块2537可以指示温度控制系统2410根据已知的扩增协议循环通过不同的温度阶段。在一些具体实施中,扩增和/或核苷酸掺入等温进行。Number of protocol modules may also include a sample preparation (or generation) module 2537 configured to issue commands to fluid control system 2406 and temperature control system 2410 to amplify products within biosensor 2402 . For example, biosensor 2402 can be coupled to base calling system 2400 . Amplification module 2537 may issue instructions to fluid control system 2406 to deliver the necessary amplification components to reaction chambers within biosensor 2402 . In other implementations, the reaction site may already contain some components for amplification, such as template DNA and/or primers. After delivering the amplification components to the reaction chamber, the amplification module 2537 can instruct the temperature control system 2410 to cycle through different temperature stages according to known amplification protocols. In some implementations, amplification and/or nucleotide incorporation is performed isothermally.

SBS模块2536可以发出命令以执行桥式PCR,其中克隆扩增子的簇形成于流通池的通道内的局部区域上。通过桥式PCR产生扩增子后,可将扩增子“线性化”以制备单链模板DNA或sstDNA,并且可将测序引物杂交至侧接感兴趣的区域的通用序列。例如,可如上所述或如下使用基于可逆终止子的边合成边测序方法。The SBS module 2536 can issue commands to perform bridge PCR, in which clusters of clonal amplicons form on localized regions within the channels of the flow cell. After amplicons are generated by bridge PCR, the amplicons can be "linearized" to make single-stranded template DNA or sstDNA, and sequencing primers can be hybridized to universal sequences flanking the region of interest. For example, a reversible terminator-based sequencing-by-synthesis approach can be used as described above or as follows.

每个碱基检出或测序循环可通过单个碱基延伸sstDNA,这可例如通过使用经修饰的DNA聚合酶和四种类型的核苷酸的混合物来完成。不同类型的核苷酸可具有独特的荧光标记,并且每个核苷酸还可具有可逆终止子,该可逆终止子仅允许在每个循环中发生单碱基掺入。在将单个碱基添加到sstDNA之后,激发光可入射到反应位点上并且可检测荧光发射。在检测后,可从sstDNA化学切割荧光标记和终止子。接下来可为另一个类似的碱基检出或测序循环。在此类测序协议中,SBS模块2536可以指示流体控制系统2406引导试剂和酶溶液流过生物传感器2402。可与本文所述的装置和方法一起使用的基于可逆终止子的示例性SBS方法描述于美国专利申请公布2007/0166705 A1、美国专利申请公布2006/0188901 A1、美国专利7,057,026、美国专利申请公布2006/0240439 A1、美国专利申请公布2006/02814714709 A1、PCT公布WO 05/065814、PCT公布WO 06/064199,这些专利中的每一篇均全文以引用方式并入本文。用于基于可逆终止子的SBS的示例性试剂描述于US 7,541,444、US7,057,026、US 7,427,673、US 7,566,537和US 7,592,435中,这些专利中的每一篇均全文以引用方式并入本文。Each base calling or sequencing cycle can extend sstDNA by a single base, which can be done, for example, by using a modified DNA polymerase and a mixture of the four types of nucleotides. Different types of nucleotides can have unique fluorescent labels, and each nucleotide can also have a reversible terminator that allows only a single base incorporation to occur each cycle. Following the addition of a single base to the sstDNA, excitation light can be incident on the reaction site and fluorescence emission can be detected. After detection, the fluorescent label and terminator can be chemically cleaved from the sstDNA. This may be followed by another similar base calling or sequencing cycle. In such sequencing protocols, the SBS module 2536 can instruct the fluid control system 2406 to direct the flow of reagent and enzyme solutions through the biosensor 2402 . Exemplary reversible terminator-based SBS methods that can be used with the devices and methods described herein are described in U.S. Patent Application Publication 2007/0166705 Al, U.S. Patent Application Publication 2006/0188901 Al, U.S. Patent 7,057,026, U.S. Patent Application Publication 2006 /0240439 Al, US Patent Application Publication 2006/02814714709 Al, PCT Publication WO 05/065814, PCT Publication WO 06/064199, each of which is incorporated herein by reference in its entirety. Exemplary reagents for reversible terminator-based SBS are described in US 7,541,444, US 7,057,026, US 7,427,673, US 7,566,537, and US 7,592,435, each of which is incorporated herein by reference in its entirety.

在一些具体实施中,扩增模块和SBS模块可在单个测定协议中操作,其中例如扩增模板核酸并随后将其在同一盒内测序。In some implementations, the amplification module and the SBS module can be operated in a single assay protocol, where, for example, a template nucleic acid is amplified and then sequenced within the same cassette.

碱基检出系统2400还可允许用户重新配置测定协议。例如,碱基检出系统2400可通过用户界面2414向用户提供用于修改所确定的协议的选项。例如,如果确定生物传感器2402将用于扩增,则碱基检出系统2400可请求退火循环的温度。此外,如果用户已提供对于所选测定协议通常不可接受的用户输入,则碱基检出系统2400可向用户发出警告。The base calling system 2400 may also allow the user to reconfigure the assay protocol. For example, base calling system 2400 can provide a user through user interface 2414 with options for modifying the determined protocol. For example, base calling system 2400 may request a temperature for an annealing cycle if it is determined that biosensor 2402 will be used for amplification. Additionally, the base calling system 2400 can issue a warning to the user if the user has provided user input that is generally not acceptable for the selected assay protocol.

在具体实施中,生物传感器2402包括数百万个传感器(或像素),每个传感器(或像素)在后续的碱基检出循环内生成多个像素信号序列。分析模块2538根据传感器阵列上传感器的逐行和/或逐列位置来检测多个像素信号序列并将它们归属于对应的传感器(或像素)。In a specific implementation, the biosensor 2402 includes millions of sensors (or pixels), and each sensor (or pixel) generates a plurality of pixel signal sequences in subsequent base calling cycles. The analysis module 2538 detects and assigns a plurality of pixel signal sequences to corresponding sensors (or pixels) based on the row-by-row and/or column-by-column positions of the sensors on the sensor array.

传感器阵列中的每个传感器可产生流通池的区块的传感器数据,其中区块位于流通池上的在碱基检出操作期间设置遗传物质的簇的区域中。传感器数据可包括像素阵列中的图像数据。对于给定循环,传感器数据可包括多于一个图像,从而产生多特征每像素作为区块数据。Each sensor in the sensor array can generate sensor data for a block of the flow cell in a region on the flow cell where the cluster of genetic material was disposed during a base calling operation. The sensor data may include image data in the pixel array. For a given cycle, the sensor data may include more than one image, resulting in multiple features per pixel as block data.

图26是可用于实现所公开的技术的计算机系统2600的简化框图。计算机系统2600包括经由总线子系统2655与多个外围设备通信的至少一个中央处理单元(CPU)2672。这些外围设备可以包括存储子系统2610,该存储子系统包括例如存储器设备和文件存储子系统2636、用户界面输入设备2638、用户界面输出设备2676和网络接口子系统2674。输入设备和输出设备允许用户与计算机系统2600进行交互。网络接口子系统2674提供通向外部网络的接口,该接口包括通向其他计算机系统中的对应接口设备的接口。FIG. 26 is a simplified block diagram of a computer system 2600 that can be used to implement the disclosed techniques. Computer system 2600 includes at least one central processing unit (CPU) 2672 in communication with a number of peripheral devices via bus subsystem 2655 . These peripherals may include storage subsystem 2610 including, for example, memory device and file storage subsystem 2636 , user interface input device 2638 , user interface output device 2676 , and network interface subsystem 2674 . Input and output devices allow a user to interact with computer system 2600 . Network interface subsystem 2674 provides interfaces to external networks, including interfaces to corresponding interface devices in other computer systems.

用户界面输入设备2638可以包括:键盘;指向设备,诸如鼠标、轨迹球、触摸板或图形输入板;扫描仪;结合到显示器中的触摸屏;音频输入设备,诸如语音识别系统和麦克风;以及其他类型的输入设备。一般来讲,使用术语“输入设备”旨在包括将信息输入到计算机系统2600中的所有可能类型的设备和方式。User interface input devices 2638 may include: a keyboard; pointing devices such as a mouse, trackball, touch pad, or tablet; a scanner; a touch screen incorporated into a display; audio input devices such as a voice recognition system and a microphone; input device. In general, use of the term "input device" is intended to include all possible types of devices and manners of entering information into computer system 2600 .

用户界面输出设备2676可以包括显示子系统、打印机、传真机或非视觉显示器(诸如音频输出设备)。显示子系统可包括LED显示器、阴极射线管(CRT)、平板设备诸如液晶显示器(LCD)、投影设备或用于产生可见图像的一些其他机构。显示子系统还可提供非视觉显示器,诸如音频输出设备。一般来讲,使用术语“输出设备”旨在包括将信息从计算机系统2600输出到用户或者输出到另一机器或计算机系统的所有可能类型的设备和方式。User interface output devices 2676 may include a display subsystem, a printer, a facsimile machine, or a non-visual display such as an audio output device. The display subsystem may include an LED display, a cathode ray tube (CRT), a flat panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for producing a visible image. The display subsystem may also provide non-visual displays, such as audio output devices. In general, use of the term "output device" is intended to include all possible types of devices and means of outputting information from computer system 2600 to a user or to another machine or computer system.

存储子系统2610存储提供本文描述的一些或全部模块和方法的功能的编程结构和数据结构。这些软件模块通常由深度学习处理器2678执行。Storage subsystem 2610 stores programming structures and data structures that provide the functionality of some or all of the modules and methods described herein. These software modules are typically executed by the deep learning processor 2678.

在一个具体实施中,神经网络使用深度学习处理器2678来实现,这些深度学习处理器可以是可配置和可重构处理器、现场可编程门阵列(FPGA)、专用集成电路(ASIC)和/或粗粒度可重构架构(CGRA)和图形处理单元(GPU)或其他配置的设备。深度学习处理器2678可以由深度学习云平台诸如Google Cloud PlatformTM、XilinxTM和CirrascaleTM托管。深度学习处理器14978的示例包括Google的Tensor Processing Unit(TPU)TM、机架解决方案(如GX4 Rackmount SeriesTM、GX149 Rackmount SeriesTM)、NVIDIA DGX-1TM、Microsoft的Stratix V FPGATM、Graphcore的Intelligent Processor Unit(IPU)TM、Qualcomm的具有Snapdragon processorsTM的Zeroth PlatformTM、NVIDIA的VoltaTM、NVIDIA的DRIVE PXTM、NVIDIA的JETSON TX1/TX2 MODULETM、Intel的NirvanaTM、Movidius VPUTM、Fujitsu DPITM、ARM的DynamicIQTM、IBM TrueNorthTM等。In one specific implementation, the neural network is implemented using deep learning processors 2678, which may be configurable and reconfigurable processors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and/or Or coarse-grained reconfigurable architecture (CGRA) and graphics processing unit (GPU) or other configured devices. The deep learning processor 2678 can be hosted by a deep learning cloud platform such as Google Cloud Platform , Xilinx , and Cirrascale . Examples of Deep Learning Processor 14978 include Google's Tensor Processing Unit (TPU) TM , Rackmount Solutions (such as GX4 Rackmount Series TM , GX149 Rackmount Series TM ), NVIDIA DGX-1 TM , Microsoft's Stratix V FPGA TM , Graphcore's Intelligent Processor Unit (IPU) TM , Qualcomm's Zeroth Platform TM with Snapdragon processors TM , NVIDIA's Volta TM , NVIDIA's DRIVE PX TM , NVIDIA's JETSON TX1/TX2 MODULE TM , Intel's Nirvana TM , Movidius VPU TM , Fujitsu DPI TM , DynamicIQ TM of ARM, IBM TrueNorth TM , etc.

在存储子系统2610中使用的存储器子系统2622可以包括多个存储器,包括用于在程序执行期间存储指令和数据的主随机存取存储器(RAM)2634和其中存储固定指令的只读存储器(ROM)2632。文件存储子系统2636可以为程序文件和数据文件提供持久性存储,并且可以包括硬盘驱动器、软盘驱动器以及相关联的可移动介质、CD-ROM驱动器、光盘驱动器或可移动介质磁盘盒。实现某些具体实施的功能的模块可以由文件存储子系统2636存储在存储子系统2610中,或者存储在处理器可访问的其他机器中。Memory subsystem 2622 used in storage subsystem 2610 may include a number of memories including main random access memory (RAM) 2634 for storing instructions and data during program execution and read only memory (ROM) in which fixed instructions are stored. )2632. File storage subsystem 2636 may provide persistent storage for program files and data files and may include hard drives, floppy disk drives and associated removable media, CD-ROM drives, optical drives, or removable media cartridges. Modules that implement certain implemented functions may be stored in the storage subsystem 2610 by the file storage subsystem 2636, or in other machines accessible by the processor.

总线子系统2655提供用于使计算机系统2600的各种部件和子系统按照预期彼此通信的机构。尽管总线子系统2655被示意性地示出为单条总线,但是总线子系统的另选的具体实施可以使用多条总线。Bus subsystem 2655 provides a mechanism for the various components and subsystems of computer system 2600 to communicate with each other as intended. Although the bus subsystem 2655 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple buses.

计算机系统2600本身可以具有不同类型,包括个人计算机、便携式计算机、工作站、计算机终端、网络计算机、电视机、主机、服务器群、一组广泛分布的松散联网的计算机,或者任何其他数据处理系统或用户设备。由于计算机和网络的不断变化的性质,对图26中描绘的计算机系统2600的描述仅旨在作为用于示出本发明的优选具体实施的具体示例。计算机系统2600的许多其他配置是可能的,其具有比图26中描绘的计算机系统更多或更少的部件。Computer system 2600 itself may be of various types, including a personal computer, portable computer, workstation, computer terminal, network computer, television, mainframe, server farm, a widely distributed group of loosely networked computers, or any other data processing system or user equipment. Due to the ever-changing nature of computers and networks, the description of computer system 2600 depicted in FIG. 26 is intended only as a specific example for illustrating a preferred implementation of the invention. Many other configurations of computer system 2600 are possible, having more or fewer components than the computer system depicted in FIG. 26 .

Claims (29)

1.一种系统,所述系统包括:1. A system comprising: 主机处理器;host processor; 存储器,所述存储器能够由所述主机处理器访问,所述存储器存储:memory, the memory being accessible by the host processor, the memory storing: 神经网络的拓扑结构,The topology of the neural network, 用于配置所述拓扑结构以执行碱基检出操作的多个权重集,所述多个权重集中的权重集在多个训练数据集中的相应训练数据集上进行训练,所述训练数据集对应于所述碱基检出操作的多个测序事件中的相应测序事件,所述测序事件跨越所述碱基检出操作经过一系列感测循环中的感测循环子系列的时间进展和所述碱基检出操作经过生物传感器上的位置的空间进展,和A plurality of weight sets for configuring the topology to perform base calling operations, the weight sets in the plurality of weight sets are trained on corresponding training data sets in the plurality of training data sets, the training data sets correspond to A corresponding sequencing event in a plurality of sequencing events of the base calling operation spanning the time progression of the base calling operation through a subseries of sensing cycles in a series of sensing cycles and the the spatial progression of base calling operations through positions on the biosensor, and 用于所述一系列感测循环中的感测循环的传感器数据;和sensor data for a sensing cycle in the series of sensing cycles; and 可配置处理器,所述可配置处理器能够访问所述存储器并且配置有数据流逻辑以:a configurable processor capable of accessing the memory and configured with dataflow logic to: 在所述可配置处理器的处理元件上加载所述拓扑结构,loading said topology on a processing element of said configurable processor, 至少部分地基于感测循环的受试者子系列和/或所述生物传感器上的受试者位置从所述多个权重集中选择权重集,selecting a weight set from the plurality of weight sets based at least in part on a subject subseries of sensing cycles and/or a subject position on the biosensor, 在所述处理元件上加载用于所述感测循环的受试者子系列和所述生物传感器上的所述受试者位置的受试者传感器数据,以及loading on the processing element subject sensor data for the subject sub-series of the sensing cycle and the subject's location on the biosensor, and 在所述处理元件上加载所选择的权重集中的权重以用所述权重配置所述拓扑结构,并且使所述神经网络对所述受试者传感器数据应用所选择的权重集中的所述权重以产生碱基检出分类数据。loading weights in the selected set of weights on the processing element to configure the topology with the weights, and causing the neural network to apply the weights in the selected set of weights to the subject sensor data to Generate base call classification data. 2.根据权利要求1所述的系统,其中所述感测循环子系列包括初始感测循环子系列、中间感测循环子系列和最终感测循环子系列,并且其中所述训练数据集和所述权重集分别对应于所述初始感测循环子系列、所述中间感测循环子系列和所述最终感测循环子系列。2. The system of claim 1 , wherein the sensing cycle sub-series comprises an initial sensing cycle sub-series, an intermediate sensing cycle sub-series and a final sensing cycle sub-series, and wherein the training data set and the The weight sets respectively correspond to the initial sensing cycle sub-series, the intermediate sensing cycle sub-series and the final sensing cycle sub-series. 3.根据权利要求1或2所述的系统,其中所述生物传感器上的所述位置包括边缘位置和非边缘位置,并且其中所述训练数据集和所述权重集分别对应于所述边缘位置和所述非边缘位置。3. The system according to claim 1 or 2, wherein the locations on the biosensor include edge locations and non-edge locations, and wherein the training data set and the weight set respectively correspond to the edge locations and the non-edge locations. 4.根据权利要求1至3中任一项所述的系统,其中所述生物传感器上的所述位置包括第一象限位置、第二象限位置、第三象限位置和第四象限位置,并且其中所述训练数据集和所述权重集分别对应于所述第一象限位置、所述第二象限位置、所述第三象限位置和所述第四象限位置。4. The system of any one of claims 1 to 3, wherein the locations on the biosensor include a first quadrant location, a second quadrant location, a third quadrant location, and a fourth quadrant location, and wherein The training data set and the weight set respectively correspond to the first quadrant position, the second quadrant position, the third quadrant position and the fourth quadrant position. 5.根据权利要求1至4中任一项所述的系统,其中所述生物传感器被分区为多个区块,并且所述边缘位置、所述非边缘位置、所述第一象限位置、所述第二象限位置、所述第三象限位置和所述第四象限位置中的各个位置包括所述多个区块中对应的一个或多个区块。5. The system according to any one of claims 1 to 4, wherein the biosensor is partitioned into a plurality of blocks, and the edge location, the non-edge location, the first quadrant location, the Each of the second quadrant location, the third quadrant location, and the fourth quadrant location includes a corresponding one or more of the plurality of blocks. 6.根据权利要求1至5中任一项所述的系统,其中所述测序事件跨越所述碱基检出操作经过碱基检出双端读段的时间进展,并且其中所述训练数据集和所述权重集分别对应于所述双端读段中的读段。6. The system of any one of claims 1 to 5, wherein the sequencing events span the time progression of the base calling operation through base called paired-end reads, and wherein the training dataset and the weight sets respectively correspond to reads in the paired-end reads. 7.根据权利要求1至6中任一项所述的系统,其中:7. The system according to any one of claims 1 to 6, wherein: 所述感测循环子系列包括初始感测循环子系列、中间感测循环子系列和最终感测循环子系列;The sensing cycle sub-series includes an initial sensing cycle sub-series, an intermediate sensing cycle sub-series and a final sensing cycle sub-series; 所述生物传感器上的所述位置包括边缘位置和非边缘位置;并且said locations on said biosensor include edge locations and non-edge locations; and 所述训练数据集并且因此所述权重集分别对应于(i)所述初始感测循环子系列和所述边缘位置,(ii)所述中间感测循环子系列和所述边缘位置,(iii)所述最终感测循环子系列和所述边缘位置,(iv)所述初始感测循环子系列和所述非边缘位置,(v)所述中间感测循环子系列和所述非边缘位置,以及(vi)所述最终感测循环子系列和所述非边缘位置。The training data set and thus the weight set correspond to (i) the initial sensing cycle sub-series and the edge positions, (ii) the intermediate sensing cycle sub-series and the edge positions, respectively, (iii) ) said final sensing cycle sub-series and said edge positions, (iv) said initial sensing cycle sub-series and said non-edge positions, (v) said intermediate sensing cycle sub-series and said non-edge positions , and (vi) said final sensing cycle subseries and said non-edge locations. 8.根据权利要求1至7中任一项所述的系统,其中:8. The system according to any one of claims 1 to 7, wherein: 所述感测循环子系列包括初始感测循环子系列、中间感测循环子系列和最终感测循环子系列;The sensing cycle sub-series includes an initial sensing cycle sub-series, an intermediate sensing cycle sub-series and a final sensing cycle sub-series; 所述生物传感器上的所述位置包括第一类别的位置和第二类别的位置;并且said locations on said biosensor include locations of a first category and locations of a second category; and 所述训练数据集并且因此所述权重集分别对应于(i)所述初始感测循环子系列和所述第一类别的位置,(ii)所述中间感测循环子系列和所述第一类别的位置,(iii)所述最终感测循环子系列和所述第一类别的位置,(iv)所述初始感测循环子系列和所述第二类别的非边缘位置,(v)所述中间感测循环子系列和所述第二类别的位置,以及(vi)所述最终感测循环子系列和所述第二类别的位置。The training data set and thus the weight set correspond to (i) the initial sensing cycle sub-series and the position of the first category, (ii) the intermediate sensing cycle sub-series and the first class respectively. The location of the category, (iii) the location of the final sensing cycle subseries and the first category, (iv) the initial sensing cycle subseries and the non-edge location of the second category, (v) the (vi) the position of the intermediate subseries of sensing cycles and the second category, and (vi) the position of the final subseries of sensing cycles and the second category. 9.根据权利要求1至8中任一项所述的系统,其中所述可配置处理器进一步:9. The system of any one of claims 1 to 8, wherein the configurable processor is further: 确定当前测序运行的一个或多个参数;以及determining one or more parameters of the current sequencing run; and 进一步基于所述当前测序运行的所确定的所述一个或多个参数从所述多个权重集中选择所述权重集。The set of weights is selected from the plurality of sets of weights further based on the determined one or more parameters of the current sequencing run. 10.根据权利要求9所述的系统,其中所述当前测序运行的所确定的所述一个或多个参数包括以下中的一者或多者:所述生物传感器中使用的反应组分的特征或与所述传感器数据相关联的定相特征。10. The system of claim 9, wherein the determined one or more parameters of the current sequencing run include one or more of: characteristics of reaction components used in the biosensor Or phased features associated with said sensor data. 11.一种系统,所述系统包括:11. A system comprising: 主机处理器;host processor; 存储器,所述存储器能够由所述主机处理器访问,所述存储器存储:memory, the memory being accessible by the host processor, the memory storing: 神经网络的拓扑结构,The topology of the neural network, 用于配置所述拓扑结构以执行碱基检出操作的第一权重集、第二权重集和第三权重集,所述第一权重集、所述第二权重集和所述第三权重集分别对应于一系列感测循环中的第一感测循环子系列、第二感测循环子系列和第三感测循环子系列,和a first set of weights, a second set of weights and a third set of weights for configuring the topology to perform a base calling operation, the first set of weights, the second set of weights and the third set of weights respectively corresponding to a first sensing cycle sub-series, a second sensing cycle sub-series and a third sensing cycle sub-series in a series of sensing cycles, and 分别对应于所述第一感测循环子系列、所述第二感测循环子系列和所述第三感测循环子系列的第一传感器数据、第二传感器数据和第三传感器数据;和first sensor data, second sensor data, and third sensor data corresponding to the first, second, and third sensing cycle sub-series, respectively; and 可配置处理器,所述可配置处理器能够访问所述存储器并且配置有数据流逻辑以:a configurable processor capable of accessing the memory and configured with dataflow logic to: 在所述可配置处理器的处理元件上加载所述拓扑结构,loading said topology on a processing element of said configurable processor, 在所述处理元件上加载所述第一传感器数据,在所述处理元件上加载所述第一权重集以用所述第一权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述第一传感器数据应用所述第一权重集中的所述权重以产生用于所述第一感测循环子系列中的感测循环的第一碱基检出分类数据,loading the first sensor data on the processing element, loading the first set of weights on the processing element to configure the topology with weights in the first set of weights, and causing the neural network to applying the weights in the first set of weights to the first sensor data to generate first base call classification data for a sensing cycle in the first sub-series of sensing cycles, 在所述处理元件上加载所述第二传感器数据,在所述处理元件上加载所述第二权重集以用所述第二权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述第二传感器数据应用所述第二权重集中的所述权重以产生用于所述第二感测循环子系列中的感测循环的第二碱基检出分类数据,以及loading the second sensor data on the processing element, loading the second set of weights on the processing element to configure the topology with weights in the second set of weights, and causing the neural network to applying the weights in the second set of weights to the second sensor data to generate second base call classification data for sensing cycles in the second sub-series of sensing cycles, and 在所述处理元件上加载所述第三传感器数据,在所述处理元件上加载所述第三权重集以用所述第三权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述第三传感器数据应用所述第三权重集中的所述权重以产生用于所述第三感测循环子系列中的感测循环的第三碱基检出分类数据。loading the third sensor data on the processing element, loading the third set of weights on the processing element to configure the topology with weights in the third set of weights, and causing the neural network to The third sensor data applies the weights of the third set of weights to generate third base call classification data for a sensing cycle in the third sub-series of sensing cycles. 12.根据权利要求11所述的系统,其中所述存储器进一步存储:12. The system of claim 11, wherein the memory further stores: 用于配置所述拓扑结构以执行碱基检出操作的第四权重集、第五权重集和后续权重集,所述第四权重集、所述第五权重集和所述后续权重集分别对应于所述一系列感测循环中的第四感测循环子系列、第五感测循环子系列和后续感测循环子系列;和A fourth weight set, a fifth weight set, and a subsequent weight set for configuring the topology to perform a base call operation, the fourth weight set, the fifth weight set, and the subsequent weight set respectively correspond to a fourth sub-series of sensing cycles, a fifth sub-series of sensing cycles, and a subsequent sub-series of sensing cycles in the series of sensing cycles; and 用于所述第四感测循环子系列、所述第五感测循环子系列和所述后续感测循环子系列的第四传感器数据、第五传感器数据和后续传感器数据。Fourth sensor data, fifth sensor data and subsequent sensor data for said fourth sub-series of sensing cycles, said fifth sub-series of sensing cycles and said subsequent sub-series of sensing cycles. 13.根据权利要求11或12所述的系统,其中所述可配置处理器配置有数据流逻辑以:13. The system of claim 11 or 12, wherein the configurable processor is configured with data flow logic to: 在所述处理元件上加载所述第四传感器数据,在所述处理元件上加载所述第四权重集以用所述第四权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述第四传感器数据应用所述第四权重集中的所述权重以产生用于所述第四感测循环子系列中的感测循环的第四碱基检出分类数据;loading the fourth sensor data on the processing element, loading the fourth set of weights on the processing element to configure the topology with weights in the fourth set of weights, and causing the neural network to applying the weights in the fourth set of weights to the fourth sensor data to generate fourth base call classification data for sensing cycles in the fourth sub-series of sensing cycles; 在所述处理元件上加载所述第五传感器数据,在所述处理元件上加载所述第五权重集以用所述第五权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述第五传感器数据应用所述第五权重集中的所述权重以产生用于所述第五感测循环子系列中的感测循环的第五碱基检出分类数据;以及loading the fifth sensor data on the processing element, loading the fifth set of weights on the processing element to configure the topology with weights in the fifth set of weights, and causing the neural network to applying the weights in the fifth set of weights to the fifth sensor data to generate fifth base call classification data for a sensing cycle in the fifth subseries of sensing cycles; and 在所述处理元件上加载所述后续传感器数据和所述后续权重集以用所述后续权重集中的权重配置所述拓扑结构,并且使所述神经网络对所述后续传感器数据应用所述后续权重集中的所述权重以产生用于所述后续感测循环子系列中的感测循环的后续碱基检出分类数据。loading the subsequent sensor data and the subsequent set of weights on the processing element to configure the topology with weights in the subsequent set of weights, and causing the neural network to apply the subsequent weights to the subsequent sensor data The weights are aggregated to generate subsequent base calling classification data for sensing cycles in the subseries of subsequent sensing cycles. 14.根据权利要求11至13中任一项所述的系统,其中所述拓扑结构采用来自连续感测循环的传感器数据作为输入,并且所述拓扑结构包括空间层和时间层,所述空间层不组合所述传感器数据和所述连续感测循环之间的所得特征映射图,所述时间层组合所述连续感测循环之间的所得特征映射图。14. The system according to any one of claims 11 to 13, wherein the topology employs sensor data from successive sensing cycles as input, and the topology comprises a spatial layer and a temporal layer, the spatial layer Instead of combining the sensor data and the resulting feature maps between the successive sensing cycles, the temporal layers combine the resulting feature maps between the successive sensing cycles. 15.根据权利要求11至14中任一项所述的系统,其中所述第一权重集包括针对所述空间层的第一空间权重和针对所述时间层的第一时间权重,所述第二权重集包括针对所述空间层的第二空间权重和针对所述时间层的第二时间权重,并且所述第三权重集包括针对所述空间层的第三空间权重和针对所述时间层的第三时间权重。15. The system according to any one of claims 11 to 14, wherein the first set of weights comprises first spatial weights for the spatial layer and first temporal weights for the temporal layer, the first The second set of weights includes a second spatial weight for the spatial layer and a second temporal weight for the temporal layer, and the third set of weights includes a third spatial weight for the spatial layer and a second temporal weight for the temporal layer The third time weight of . 16.根据权利要求11至15中任一项所述的系统,其中所述第一权重集包括针对所述空间层的空间权重和针对所述时间层的第一时间权重,所述第二权重集包括针对所述时间层的第二时间权重,并且所述第三权重集包括针对所述时间层的第三时间权重,并且其中所述可配置处理器配置有数据流逻辑以:16. The system according to any one of claims 11 to 15, wherein said first set of weights comprises spatial weights for said spatial layer and first temporal weights for said temporal layer, said second weight set includes a second temporal weight for the temporal stratum, and the third set of weights includes a third temporal weight for the temporal stratum, and wherein the configurable processor is configured with dataflow logic to: 在所述处理元件上加载所述第一传感器数据,在所述处理元件上加载所述空间权重和所述第一时间权重以用所述空间权重配置所述空间层并用所述第一时间权重配置所述时间层,并且使所述神经网络对所述第一传感器数据应用经配置的空间层和经配置的时间层以产生用于所述第一感测循环子系列中的感测循环的第一碱基检出分类数据;loading the first sensor data on the processing element, loading the spatial weights and the first temporal weights on the processing element to configure the spatial layer with the spatial weights and with the first temporal weights configuring the temporal layer, and causing the neural network to apply the configured spatial layer and the configured temporal layer to the first sensor data to generate a metric for a sensing cycle in the first sensing cycle sub-series First base call classification data; 在所述处理元件上加载所述第二传感器数据,在所述处理元件上加载所述第二时间权重以用所述第二时间权重中的权重重新配置所述时间层而不重新配置所述空间层,并且使所述神经网络对所述第二传感器数据应用重新配置的时间层和先前配置的空间层以产生用于所述第二感测循环子系列中的感测循环的第二碱基检出分类数据;以及loading the second sensor data on the processing element, loading the second temporal weights on the processing element to reconfigure the temporal layer with weights in the second temporal weights without reconfiguring the a spatial layer, and causing the neural network to apply a reconfigured temporal layer and a previously configured spatial layer to the second sensor data to generate a second base for a sensing cycle in the second subseries of sensing cycles base call classification data; and 在所述处理元件上加载所述第三传感器数据,在所述处理元件上加载所述第三时间权重以用所述第三时间权重中的权重重新配置所述时间层而不重新配置所述空间层,并且使所述神经网络对所述第三传感器数据应用重新配置的时间层和先前配置的空间层以产生用于所述第三感测循环子系列中的感测循环的第三碱基检出分类数据。The third sensor data is loaded on the processing element, the third temporal weights are loaded on the processing element to reconfigure the temporal layer with weights in the third temporal weights without reconfiguring the a spatial layer, and causing the neural network to apply a reconfigured temporal layer and a previously configured spatial layer to the third sensor data to produce a third base for a sensing cycle in the third sensing cycle subseries Base checkout categorical data. 17.根据权利要求11至16中任一项所述的系统,其中所述第一权重集、所述第二权重集和所述第三权重集中的权重使用不同缩放系数来量化。17. The system of any one of claims 11 to 16, wherein weights in the first set of weights, the second set of weights and the third set of weights are quantized using different scaling factors. 18.根据权利要求11至17中任一项所述的系统,其中所述第一权重集、所述第二权重集和所述第三权重集中的权重分别对应于第一测序化学、第二测序化学和第三测序化学。18. The system of any one of claims 11 to 17, wherein weights in the first set of weights, the second set of weights, and the third set of weights correspond to a first sequencing chemistry, a second sequencing chemistry, respectively. Sequencing chemistry and third sequencing chemistry. 19.根据权利要求11至18中任一项所述的系统,其中所述第一权重集、所述第二权重集和所述第三权重集中的权重分别对应于第一测序测定、第二测序测定和第三测序测定。19. The system of any one of claims 11 to 18, wherein weights in the first set of weights, the second set of weights, and the third set of weights correspond to the first sequencing assay, the second set of weights, respectively. Sequencing Assay and Third Sequencing Assay. 20.根据权利要求11至19中任一项所述的系统,其中所述第一权重集、所述第二权重集和所述第三权重集中的权重分别对应于第一测序配置、第二测序配置和第三测序配置。20. The system of any one of claims 11 to 19, wherein weights in the first set of weights, the second set of weights, and the third set of weights correspond to the first sequencing configuration, the second A sequencing configuration and a third sequencing configuration. 21.一种用于生成碱基检出分类数据的计算机实现的方法,所述方法包括:21. A computer-implemented method for generating base calling classification data, the method comprising: 在处理器的处理元件上加载神经网络的拓扑结构,所述处理器用于执行碱基检出操作;loading the topology of the neural network on a processing element of a processor for performing a base calling operation; 存储(i)来自流通池的第一一个或多个区块内的簇的第一传感器数据,(ii)来自所述流通池的第二一个或多个区块内的簇的第二传感器数据,(iii)包括第一一个或多个权重的第一权重集,以及(iv)包括第二一个或多个权重的第二权重集,其中所述第一传感器数据和所述第二传感器数据在一系列感测循环中的感测循环子集期间生成;Storing (i) first sensor data from clusters within a first one or more blocks of a flow cell, (ii) second sensor data from clusters within a second one or more blocks of said flow cell. sensor data, (iii) a first set of weights comprising a first one or more weights, and (iv) a second set of weights comprising a second one or more weights, wherein said first sensor data and said second sensor data is generated during a subset of sensing cycles in the series of sensing cycles; 用所述第一权重集配置所述神经网络的所述拓扑结构,并且使用所述第一权重集配置的所述神经网络处理所述第一传感器数据并产生用于所述第一一个或多个区块和用于所述感测循环子集的第一碱基检出分类数据;以及configuring the topology of the neural network with the first set of weights, and processing the first sensor data with the neural network configured using the first set of weights and generating a a plurality of blocks and first base call classification data for the subset of sensing cycles; and 用所述第二权重集配置所述神经网络的所述拓扑结构,并且使用所述第二权重集配置的所述神经网络处理所述第二传感器数据并产生用于所述第二一个或多个区块和用于所述感测循环子集的第二碱基检出分类数据。configuring the topology of the neural network with the second set of weights, and processing the second sensor data with the neural network configured using the second set of weights and generating a A plurality of blocks and second base calling classification data for the subset of sensing cycles. 22.根据权利要求21所述的方法,其中所述感测循环子集是第一感测循环子集,并且其中所述方法进一步包括:22. The method of claim 21 , wherein the sensing cycle subset is a first sensing cycle subset, and wherein the method further comprises: 存储(i)来自所述第一一个或多个区块内的簇的第三传感器数据,(ii)来自所述第二一个或多个区块内的簇的第四传感器数据,(iii)第三权重集,以及(iv)第四权重集,其中所述第三传感器数据和所述第四传感器数据在所述一系列感测循环中的第二感测循环子集期间生成,所述第二感测循环子集在所述一系列感测循环中在所述第一感测循环子集之后;storing (i) third sensor data from clusters within said first one or more blocks, (ii) fourth sensor data from clusters within said second one or more blocks, ( iii) a third set of weights, and (iv) a fourth set of weights, wherein said third sensor data and said fourth sensor data are generated during a second subset of sensing cycles in said series of sensing cycles, the second subset of sensing cycles follows the first subset of sensing cycles in the series of sensing cycles; 用所述第三权重集配置所述神经网络的所述拓扑结构,并且使用所述第三权重集配置的所述神经网络处理所述第三传感器数据并产生用于所述第一一个或多个区块和用于所述第二感测循环子集的第三碱基检出分类数据;以及configuring the topology of the neural network with the third set of weights, and processing the third sensor data with the neural network configured using the third set of weights and generating a a plurality of blocks and third base call classification data for the second subset of sensing cycles; and 用所述第四权重集配置所述神经网络的所述拓扑结构,并且使用所述第四权重集配置的所述神经网络处理所述第四传感器数据并产生用于所述第二一个或多个区块和用于所述第二感测循环子集的第四碱基检出分类数据。configuring the topology of the neural network with the fourth set of weights, and processing the fourth sensor data with the neural network configured using the fourth set of weights and generating a A plurality of blocks and fourth base call classification data for the second subset of sensing cycles. 23.根据权利要求21或22所述的方法,其中:23. The method of claim 21 or 22, wherein: 所述第一一个或多个区块在所述流通池的第一区域内;并且said first one or more zones are within a first region of said flow cell; and 所述第二一个或多个区块在所述流通池的第二区域内。The second one or more zones are within a second region of the flow cell. 24.根据权利要求21至23中任一项所述的方法,其中:24. A method according to any one of claims 21 to 23, wherein: 所述第一一个或多个区块是所述流通池的边缘区块;并且said first one or more sections are edge sections of said flow cell; and 所述第二一个或多个区块是所述流通池的非边缘区块。The second one or more segments are non-edge segments of the flow cell. 25.根据权利要求21至24中任一项所述的方法,进一步包括:25. The method of any one of claims 21 to 24, further comprising: 通过在仅从边缘区块生成的传感器数据上训练所述神经网络来生成所述第一权重集;以及generating the first set of weights by training the neural network on sensor data generated only from edge blocks; and 通过在仅从非边缘区块生成的传感器数据上训练所述神经网络来生成所述第二权重集。The second set of weights is generated by training the neural network on sensor data generated only from non-edge blocks. 26.一种系统,所述系统包括:26. A system comprising: 主机处理器;host processor; 存储器,所述存储器能够由所述主机处理器访问,所述存储器存储(i)神经网络的拓扑结构,以及(ii)用于配置所述拓扑结构以执行碱基检出操作的多个权重,其中所述多个权重基于区块位置、一系列感测循环和/或传感器数据;和a memory accessible by the host processor, the memory storing (i) a topology of the neural network, and (ii) a plurality of weights for configuring the topology to perform a base calling operation, wherein the plurality of weights is based on tile location, a series of sensing cycles, and/or sensor data; and 可配置处理器,所述可配置处理器能够访问所述存储器并且配置有数据流逻辑以:a configurable processor capable of accessing the memory and configured with dataflow logic to: 在所述可配置处理器的处理元件上加载所述拓扑结构,loading said topology on a processing element of said configurable processor, 在所述处理元件上加载所述多个权重以用所述多个权重配置所述拓扑结构,以使所述神经网络产生碱基检出分类数据。The plurality of weights is loaded on the processing element to configure the topology with the plurality of weights to cause the neural network to generate base call classification data. 27.根据权利要求26所述的系统,其中所述多个权重是第一多个权重,所述区块位置是第一区块位置,所述一系列感测循环是第一一系列感测循环,并且所述传感器数据是第一传感器数据,并且其中:27. The system of claim 26, wherein the plurality of weights is a first plurality of weights, the tile location is a first tile location, and the series of sensing cycles is a first series of sensing cycles loop, and the sensor data is the first sensor data, and where: 所述存储器用于进一步存储用于配置所述拓扑结构以执行碱基检出操作的第二多个权重,其中所述第二多个权重基于第二区块位置、第二一系列感测循环和/或第二传感器数据;并且The memory is configured to further store a second plurality of weights for configuring the topology to perform a base calling operation, wherein the second plurality of weights is based on a second tile position, a second series of sensing cycles and/or second sensor data; and 所述可配置处理器配置有数据流逻辑以:The configurable processor is configured with dataflow logic to: 在所述处理元件上加载所述第二多个权重以用所述第二多个权重配置所述拓扑结构,以使所述神经网络产生附加碱基检出分类数据。The second plurality of weights is loaded on the processing element to configure the topology with the second plurality of weights to cause the neural network to generate additional base calling classification data. 28.根据权利要求27所述的系统,其中:28. The system of claim 27, wherein: 所述第一区块位置在流通池内的第一区域上;并且the first block location is on a first region within the flow cell; and 所述第二区块位置在所述流通池内的第二区域上。The second block location is on a second region within the flow cell. 29.根据权利要求27或28所述的系统,其中:29. The system of claim 27 or 28, wherein: 所述第二一系列感测循环在所述第一一系列感测循环之后发生。The second series of sensing cycles occurs after the first series of sensing cycles.
CN202280005111.4A 2021-03-16 2022-03-15 Block position and/or rotation based weight set selection for base detection Pending CN115803815A (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US202163161880P 2021-03-16 2021-03-16
US202163161896P 2021-03-16 2021-03-16
US63/161896 2021-03-16
US63/161880 2021-03-16
US17/687,583 US12525320B2 (en) 2021-03-16 2022-03-04 Neural network parameter quantization for base calling
US17/687551 2022-03-04
US17/687,551 US20220301657A1 (en) 2021-03-16 2022-03-04 Tile location and/or cycle based weight set selection for base calling
US17/687583 2022-03-04
PCT/US2022/020460 WO2022197752A1 (en) 2021-03-16 2022-03-15 Tile location and/or cycle based weight set selection for base calling

Publications (1)

Publication Number Publication Date
CN115803815A true CN115803815A (en) 2023-03-14

Family

ID=85057463

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202280005057.3A Pending CN115699019A (en) 2021-03-16 2022-03-15 Neural Network Parameter Quantification for Base Calling
CN202280005111.4A Pending CN115803815A (en) 2021-03-16 2022-03-15 Block position and/or rotation based weight set selection for base detection

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202280005057.3A Pending CN115699019A (en) 2021-03-16 2022-03-15 Neural Network Parameter Quantification for Base Calling

Country Status (7)

Country Link
EP (2) EP4309179A1 (en)
JP (2) JP7726929B2 (en)
KR (1) KR20230157230A (en)
CN (2) CN115699019A (en)
AU (2) AU2022238841A1 (en)
CA (2) CA3183581A1 (en)
IL (1) IL299077B2 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573468A (en) * 2017-03-07 2018-09-25 伊鲁米那股份有限公司 Optical distortion correction for imaged samples
US20200302224A1 (en) * 2019-03-21 2020-09-24 Illumina, Inc. Artificial Intelligence-Based Sequencing
CA3104851A1 (en) * 2019-05-16 2020-11-19 Illumina, Inc. Base calling using convolutions
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing
CN113166804A (en) * 2018-11-28 2021-07-23 牛津纳米孔科技公司 Analyzing nanopore signals using machine learning techniques

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573468A (en) * 2017-03-07 2018-09-25 伊鲁米那股份有限公司 Optical distortion correction for imaged samples
CN113166804A (en) * 2018-11-28 2021-07-23 牛津纳米孔科技公司 Analyzing nanopore signals using machine learning techniques
US20200302224A1 (en) * 2019-03-21 2020-09-24 Illumina, Inc. Artificial Intelligence-Based Sequencing
CN112313666A (en) * 2019-03-21 2021-02-02 因美纳有限公司 Training data generation for artificial intelligence based sequencing
CA3104851A1 (en) * 2019-05-16 2020-11-19 Illumina, Inc. Base calling using convolutions

Also Published As

Publication number Publication date
CN115699019A (en) 2023-02-03
EP4309080A1 (en) 2024-01-24
AU2022237501A1 (en) 2023-02-02
CA3183567A1 (en) 2022-09-22
IL299077B2 (en) 2025-11-01
EP4309179A1 (en) 2024-01-24
IL299077B1 (en) 2025-07-01
JP2024510539A (en) 2024-03-08
AU2022238841A1 (en) 2023-02-02
IL299077A (en) 2023-02-01
CA3183581A1 (en) 2022-09-22
JP2025179067A (en) 2025-12-09
JP7726929B2 (en) 2025-08-20
KR20230157230A (en) 2023-11-16

Similar Documents

Publication Publication Date Title
CN115136243B (en) Hardware execution and acceleration of artificial intelligence based base detectors
US20220301657A1 (en) Tile location and/or cycle based weight set selection for base calling
US20230041989A1 (en) Base calling using multiple base caller models
CN117501372A (en) A self-learning base caller trained using organismal sequences
US20230026084A1 (en) Self-learned base caller, trained using organism sequences
US20220415445A1 (en) Self-learned base caller, trained using oligo sequences
JP7726929B2 (en) Tile Position and/or Cycle-Based Weight Set Selection for Base Calling
WO2023009758A1 (en) Quality score calibration of basecalling systems
WO2022197752A1 (en) Tile location and/or cycle based weight set selection for base calling
EP4364155B1 (en) Self-learned base caller, trained using oligo sequences
JP7809733B2 (en) Self-learning base code trained using oligo sequences
US20230029970A1 (en) Quality score calibration of basecalling systems
JP2024529843A (en) Base calling using multiple base call models
CN117546248A (en) Base detection using multiple base detector model
CN117529780A (en) Quality Score Calibration of Base Calling Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination