NZ743311B2

NZ743311B2 - Genomic infrastructure for on-site or cloud-based dna and rna processing and analysis

Info

Publication number: NZ743311B2
Application number: NZ743311A
Authority: NZ
Inventors: Robert J Mcmillen; Rami Mehio; Michael Ruehle; Rooyen Pieter Van
Original assignee: Edico Genome Corp
Filing date: 2017-01-11
Publication date: 2024-09-03

Abstract

system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes a integrated circuit formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects. One of the physical electrical interconnects forms an input to the integrated circuit connected with an electronic data source for receiving reads of genomic data. The hardwired digital logic circuits are arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis pipeline on the reads of genomic data. Each subset of the hardwired digital logic circuits is formed in a wired configuration to perform the one or more steps in the sequence analysis pipeline.

Claims

1. A computer-implemented method for onsite or cloud-based DNA or RNA processing and analysis, the method comprising: providing a platform application programming interface (API) defining an input for receiving result data from a secondary processing of a plurality of reads of DNA, RNA or genomic sequence data from a subject, providing a bioinformatics processing platform having a memory that stores one or more DNA or RNA reference sequences, and having an integrated circuit formed of a set of pre- configured hardwired digital logic circuits that are interconnected by a plurality of physical electrical interconnects, the integrated circuit having an input for receiving a plurality of reads of DNA or RNA data, and having a memory interface to access the one or more DNA or RNA reference sequences, the hardwired digital logic circuits being arranged as a set of processing engines that are each formed of a subset of the hardwired digital logic circuits to perform one pre-configured step of secondary processing on the plurality of reads of DNA or RNA data, wherein secondary processing comprises receiving the plurality of reads of genomic sequence data and one or more DNA or RNA reference sequences and processing the plurality of reads of genomic sequence data to map and align at least some of the plurality of reads of genomic sequence data according to the one or more DNA or RNA reference sequences, the integrated circuit further having an output to output result data from the secondary processing according to the platform application programming interface (API), wherein the integrated circuit is physically integrated with an automated sequencer; and providing a plurality of user-selectable DNA, RNA or genomic processing pipelines, each having an input defined according to the platform API to receive the result data from the secondary processing, the plurality of DNA, RNA or genomic processing pipelines having a common pipeline API defining tertiary processing operations on the result data from the secondary processing received according to the platform API, wherein tertiary processing comprises performing one or more analyses on the subject’s genetic makeup determined by the secondary processing and each of the plurality of DNA, RNA or genomic processing pipelines is configured to perform a subset of the tertiary processing operations, and executing a user-selected set of the DNA, RNA or genomic processing pipelines, wherein the user-selected set of the DNA, RNA or genomic processing pipelines are configured to output result data of the tertiary processing according to the pipeline API to one or more user- selectable DNA, RNA or genomic analysis applications for additional processing for disease diagnostic, therapeutic treatment and/or prophylactic prevention.

2. The method of claim 1, further comprising: providing a plurality of user-selectable DNA, RNA or genomic analysis applications that are stored in one or more application repositories, each of a selected set of the plurality of DNA, RNA or genomic analysis applications being accessible from an onsite or cloud-based application repository by a computer via an electronic medium for execution by a computer processor to perform a targeted analysis of DNA, RNA or genomic data from the result data of the tertiary processing, each of the plurality of genomic analysis applications being defined by an application API for receiving the result data of the tertiary processing, performing the targeted analysis of the DNA, RNA or genomic data from the result data of the tertiary processing, and outputting the result data from the targeted analysis to one of one or more genomic databases according to the application API.

3. The method of claim 1 or claim 2, further comprising executing, using a computer processor, one or more user-selected DNA, RNA or genomic analysis applications.

4. The method of any one of claims 1 to 3, wherein the plurality of user-selectable genomic processing pipelines are selected from a set of DNA or RNA pipelines that consist of: a genome processing pipeline, an epigenome processing pipeline, a metagenome processing pipeline, a joint genotyping processing pipeline, and a genome analysis tool kit (GATK) processing pipeline.

5. The method of claim 4, wherein the plurality of user-selectable genomic analysis applications are selected from a set of genomic analysis applications that consist of: a non- invasive prenatal testing application, a neo-natal intensive care unit application, a cancer analysis application, a laboratory developed test (LDT) application, and an agricultural and biological analysis application.

6. The method of any one of claims 1 to 5, wherein the memory stores the plurality of reads of DNA or RNA data and the DNA or RNA reference sequence data; and the set of pre- configured hardwired digital logic circuits are comprised in a field programmable gate array (FPGA), one or more of the plurality of physical electrical interconnects comprising a memory interface to access the memory, the set of processing engines comprising a mapping module in a first hardwired configuration to access one or more of the plurality of reads of DNA or RNA data and the DNA or RNA reference sequence data, compare the sequence of nucleotides in at least one of the plurality of reads of DNA or RNA data to the sequence of nucleotides of the DNA or RNA reference sequence data to map the one or more of the plurality of reads of DNA or RNA data to the DNA or RNA reference sequence data so as to produce one or more mapped DNA or RNA reads.

7. The method of claim 6, wherein the FPGA further comprises a second hardwired configuration to access at least one of the mapped reads of DNA or RNA data and the DNA or RNA reference sequence data, compare the sequence of nucleotides in at least one of the mapped reads of DNA or RNA data to the sequence of nucleotides of the DNA or RNA reference sequence data to align the one or more mapped reads of DNA or RNA data to the DNA or RNA reference sequence data.

8. The method of claim 7, wherein the FPGA further comprises a sorting module in a third hardwired configuration to sort the mapped and aligned DNA or RNA reads.

9. The method of any one of claims 1 to 8, wherein the result data from the secondary processing includes reads of genomic data.

10. The method of claim 9, wherein the result data from the secondary processing includes mapped and aligned reads from the plurality of reads of genomic data.

11. The method of claim 10, wherein the result data from the secondary processing includes one or more variant call files generated from the mapped and aligned reads.

12. The method of any one of claims 1 to 11, wherein the memory further stores one or more index of the one or more DNA or RNA reference sequences, and the hardwired digital logic circuits are arranged as a set of processing engines that are each formed of a subset of the hardwired digital logic circuits to perform one pre-configured step of secondary processing on the plurality of reads of DNA or RNA data according to the DNA or RNA reference sequences and the index.