WO2020077552A1 - Tumor prognostic prediction method and system - Google Patents
Tumor prognostic prediction method and system Download PDFInfo
- Publication number
- WO2020077552A1 WO2020077552A1 PCT/CN2018/110565 CN2018110565W WO2020077552A1 WO 2020077552 A1 WO2020077552 A1 WO 2020077552A1 CN 2018110565 W CN2018110565 W CN 2018110565W WO 2020077552 A1 WO2020077552 A1 WO 2020077552A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tumor
- information
- prognosis prediction
- model
- patient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- This application relates to the medical field, in particular to a method and system for predicting tumor prognosis.
- Tumors eg, osteosarcoma, etc.
- the diagnosis and treatment of tumors continue to improve, the mortality of patients is still not effectively controlled.
- Recurrence and metastasis are the main causes of death of tumor patients.
- osteosarcoma can metastasize to various tissues and organs such as lung and spinal cord Threatening the life of the patient.
- the clinical evaluation of tumors mainly through pathological and imaging morphological changes, to clarify the patient's age, tumor pathology type, surgical stage and residual tumor and other indicators.
- the screening of tumor-related genes and molecular markers at the molecular level is currently a hot spot in cancer research.
- Such methods can be used for tumors at the molecular level of tumor cells.
- the patient provides reference indications for surgery, predicts postoperative recurrence or metastasis, objective indications for radical cure of tumors, and provides targets for anti-metastatic treatment.
- One of the embodiments of the present application provides a tumor prognosis prediction method, including: obtaining characteristic information of a tumor patient, the characteristic information reflecting at least gene mutation information of the tumor patient; based on the characteristic information of the tumor patient, according to the tumor prognosis The prediction model determines the prognosis prediction result of the tumor patient.
- the gene mutation information includes genes and mutation abundances that have been mutated on DNA, and / or genes related to tumor prognosis prediction on DNA and their mutation abundances.
- the obtaining characteristic information of the tumor patient further includes: obtaining a tissue sample of the tumor patient; extracting DNA of the tissue sample; preparing a library of the DNA; performing gene sequencing according to the library to obtain Sequencing results; analyzing the sequencing results to determine gene mutation information of the tumor patient.
- the characteristic information further includes at least one of the following information of the tumor patient: age, gender, smoking history, years of education, working years, treatment plan, and sample storage time.
- the tumor prognosis prediction model is a support vector machine model or a neural network model.
- the tumor prognosis prediction method further includes: training the initial model using the feature information of multiple tumor patients and their prognosis information to obtain the tumor prognosis prediction model.
- the training of the initial model using the feature information and prognostic information of multiple tumor patients to obtain the tumor prognosis prediction model includes: removing the mutation abundance in the gene mutation information of the multiple tumor patients less than Mutated gene information at a certain threshold.
- the training the initial model using the feature information of multiple tumor patients and the prognostic information to obtain the tumor prognosis prediction model includes: removing redundant gene mutation information from the gene mutation information of the multiple tumor patients .
- the tumor prognosis prediction model is a support vector machine model; the training the initial model using the feature information of multiple tumor patients and its prognosis information to obtain the tumor prognosis prediction model includes: The contribution value of each gene mutation information in the feature information to the support vector machine model to determine at least part of the genes as genes related to tumor prognosis prediction; using the gene mutation information of the tumor prognosis prediction related genes of multiple tumor patients and its prognosis information training institute The initial model obtains the tumor prognosis prediction model.
- the tumor prognosis prediction model is a support vector machine model; the training initial model to obtain the tumor prognosis prediction model further includes: optimizing the support vector machine model using particle swarm optimization or meshing parameter.
- the prognosis prediction results include: disease progression, stable disease, partial remission, and complete remission; or, the prognosis prediction results include: good treatment effect and bad treatment effect.
- the tumor is osteosarcoma.
- the characteristic information at least reflects mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A APC, RANBP2, RARA.AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2 and RICTOR.
- the gene mutation information of the tumor patient is gene mutation information of the osteosarcoma lesion site.
- One of the embodiments of the present application provides a tumor prognosis prediction system, including an acquisition module and a prediction module, wherein the acquisition module is used to acquire characteristic information of a tumor patient, and the characteristic information reflects at least gene mutation information of the tumor patient
- the prediction module is used to determine the prognosis prediction result of the tumor patient based on the tumor patient's characteristic information and according to the tumor prognosis prediction model.
- the gene mutation information includes genes and mutation abundances that have been mutated on DNA, and / or genes related to tumor prognosis prediction on DNA and their mutation abundances.
- the characteristic information further includes at least one of the following information of the tumor patient: age, gender, smoking history, years of education, years of work, treatment plan, and sample storage time.
- the tumor prognosis prediction model is a support vector machine model or a neural network model.
- the tumor prognosis prediction system further includes a training module for training the initial model to obtain the tumor prognosis prediction model by using feature information of multiple tumor patients and their prognosis information.
- the training module is further configured to remove the mutation gene information whose mutation abundance is less than a set threshold in the gene mutation information of the multiple tumor patients.
- the training module is further used to remove redundant gene mutation information from the gene mutation information of the multiple tumor patients.
- the tumor prognosis prediction model is a support vector machine model; the training module is further configured to: according to the contribution value of each gene mutation information in the feature information of multiple tumor patients to the support vector machine model, determine at least Some genes are genes related to tumor prognosis prediction; the gene mutation information of the tumor prognosis prediction related genes of multiple tumor patients and their prognosis information are used to train the initial model to obtain the tumor prognosis prediction model.
- the tumor prognosis prediction model is a support vector machine model; the training module is also used to optimize the parameters of the support vector machine model using particle swarm optimization or meshing.
- the prognosis prediction results include: disease progression, stable disease, partial remission, and complete remission; or, the prognosis prediction results include: good treatment effect and bad treatment effect.
- the tumor is osteosarcoma.
- the characteristic information at least reflects mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A, APC, RANBP2, RARA.AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2 and RICTOR.
- the gene mutation information of the tumor patient is gene mutation information of the osteosarcoma lesion site.
- the device includes at least one processor and at least one memory; the at least one memory is used to store computer instructions; and the at least one processor is used to execute the computer instructions At least part of the instructions to implement the tumor prognosis prediction method.
- One embodiment of the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed by a processor, implements the tumor prognosis prediction method.
- a tumor prognosis prediction system including: at least one computer-readable storage medium, including a set of instructions for tumor prognosis prediction; and at least one processor in communication with the at least one storage medium, When executing the set of instructions, the at least one processor is configured to: obtain characteristic information of a tumor patient, the characteristic information reflects at least gene mutation information of the tumor patient; and based on the characteristic information of the tumor patient, according to The tumor prognosis prediction model determines the prognosis prediction result of the tumor patient.
- FIG. 1 is a schematic diagram of an application scenario of a tumor prognosis prediction system according to some embodiments of the present application
- FIG. 2 is a schematic structural diagram of a computing device according to some embodiments of the present application.
- FIG. 3 is a block diagram of a tumor prognosis prediction system according to some embodiments of the present application.
- FIG. 4 is an exemplary flowchart of a tumor prognosis prediction method according to some embodiments of the present application.
- FIG. 5 is an exemplary flowchart for determining gene mutation information of a tumor patient according to some embodiments of the present application.
- FIG. 6 is an exemplary flowchart of obtaining a tumor prognosis prediction model according to training shown in some embodiments of the present application;
- FIG. 7 is a heat map of gene mutation in a patient with osteosarcoma according to an exemplary embodiment of the present application.
- FIG. 10 is a schematic diagram of prediction result verification of a tumor prognosis prediction model according to an exemplary embodiment of the present application.
- system means for distinguishing different components, elements, parts, parts or assemblies at different levels.
- the words can be replaced by other expressions.
- FIG. 1 is a schematic diagram of an application scenario of a tumor prognosis prediction system 100 according to some embodiments of the present application.
- the tumor prognosis prediction system 100 may include a server 110, a network 120 and a database 130.
- the database 130 can store the patient's basic information, disease history, treatment plan data, and can also store the patient's genetic information, such as the genetic mutation information of the tumor patient 140 at the tumor site, the genetic information of the normal tissue of the tumor patient, and Reference gene information, etc.
- the patient's biological tissue sample or fluid sample, such as the tissue sample 145 of the tumor patient 140 can be stored in a special storage device for further processing, such as gene sequencing processing.
- the tissue sample 145 may include a tumor tissue sample of the patient or tissue samples of other parts of the patient's body.
- the server 110 may be used to process and analyze relevant information to generate a prognostic prediction result.
- the server 110 may obtain relevant information and / or data from the database 130 (for example, gene mutation information of the tumor patient at the tumor site, basic information of the tumor patient, reference gene data, etc.), or may directly obtain work Relevant information and / or data obtained by processing the tissue sample 145 of the tumor patient 140 by personnel or other equipment and instruments.
- the server 110 may be a server or a server group.
- the server group may be centralized, such as a data center.
- the server group can also be distributed, such as a distributed system.
- the server 110 may be local or remote.
- the server 110 may be implemented on a cloud platform.
- the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an intermediate cloud, a multi-cloud, etc., or any combination thereof.
- the server 110 may be implemented on the computing device 200 having at least one component shown in FIG. 2.
- the server 110 may include a processing engine 112.
- the processing engine 112 can be used to execute instructions (program code) of the server 110.
- the processing engine 112 can execute an instruction to analyze the characteristic information of the tumor patient 140, and then obtain a tumor prognosis prediction result.
- the instructions for analyzing the characteristic information of the tumor patient 140 may be stored in a computer-readable storage medium (not shown) in the form of computer instructions.
- the processing engine 112 may include one or more sub-processing devices (eg, single-core processing devices or multi-core multi-core processing devices).
- the processing engine 112 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), an application specific instruction processor (ASIP), a graphics processor (GPU), a physical processor (PPU), and a digital signal processor ( DSP), field programmable gate array (FPGA), editable logic circuit (PLD), controller, microcontroller unit, reduced instruction set computer (RISC), microprocessor, etc. or any combination of the above.
- CPU central processing unit
- ASIC application specific integrated circuit
- ASIP application specific instruction processor
- GPU graphics processor
- PPU physical processor
- DSP digital signal processor
- FPGA field programmable gate array
- PLD field programmable gate array
- controller microcontroller unit
- RISC reduced instruction set computer
- the network 120 may provide a channel for information exchange.
- information can be exchanged between the server 110 and the database 130 through the network 120.
- the server 110 may receive the reference gene data in the database 130 through the network 120.
- information about tumor patients 140 and / or tissue samples 145 may be transmitted to the server 110 and / or database 130 via the network 120.
- the characteristic information of the tumor patient 140 (such as gene mutation information, basic information, etc.) may be transmitted to the server 110 through the network 120.
- the network 120 may be any type of wired or wireless network.
- the network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an internal network, an internet network, a regional network (LAN), a wide area network (WAN), a wireless regional network (WLAN), and a metropolitan area network (MAN ), Public switched telephone network (PSTN), Bluetooth network, ZigBee network, near field communication (NFC) network, etc. or any combination of the above.
- LAN regional network
- WAN wide area network
- WLAN wireless regional network
- MAN metropolitan area network
- PSTN Public switched telephone network
- Bluetooth network ZigBee network
- NFC near field communication
- the database 130 may be used to store data and / or instruction sets. In some embodiments, the database 130 may store data obtained from the server 110. In some embodiments, the database 130 may store information and / or instructions for execution or use by the server 110 to perform the exemplary methods described in this application. In some embodiments, the reference gene data may be stored in the database 130. Specifically, the database 130 may store genetic data in various types of genomic databases and / or genetic data that has an impact (or significant impact) on tumorigenesis reported in existing literature.
- the genome database may include, but is not limited to, COSMIC database, ClinVar database, HGMD database, OMIM database, TCGA database, GeneCards database, and so on.
- the database 130 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), etc., or any combination thereof.
- the database 130 may be implemented on a cloud platform.
- the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an intermediate cloud, a multi-cloud, etc., or any combination thereof.
- the database 130 may be part of the server 110.
- the tumor patient 140 may be a patient with one or more tumor diseases.
- tumor diseases may include cancer, sarcoma, benign tumor, etc. or any combination thereof.
- the cancer may include squamous cell carcinoma, adenocarcinoma, undifferentiated carcinoma, and the like.
- squamous cell carcinoma may include cancers that occur in the skin, esophagus, lungs, cervix, vagina, vulva, penis, and the like.
- Adenocarcinoma can include cancers that occur in the digestive tract, lungs, uterus, breast, ovary, prostate, thyroid, liver, kidney, pancreas, gallbladder, and other parts.
- Sarcomas can include, but are not limited to: soft tissue sarcoma, osteosarcoma, malignant fibrous histiocytoma, bilateral sarcoma, rhabdomyosarcoma, lymphosarcoma, synovial sarcoma, leiomyoma, and the like.
- Benign tumors may include, but are not limited to, hamartomas, benign pancreatic tumors, thyroid adenoma, breast fibroids, uterine tumors, gastrointestinal plain osteomyomas, soft tissue fibroids, synovial tumors, ligament fibroma, and the like.
- the tumor patient 140 may be an osteosarcoma patient.
- the tumor patient 140 may be a patient whose tumor is at various stages (eg, early, middle, late, etc.).
- the tumor patient 140 may also be a patient at various stages of treatment (eg, before treatment, during treatment, after treatment, etc.).
- the tissue sample 145 may be used to reflect tumor patient 140 tumor related information.
- the tissue sample 145 may be a biological tissue or fluid sample extracted from a tumor site (such as a target lesion) and / or a non-tumor site (such as a site other than the lesion) of the tumor patient 140.
- tissue samples may include, but are not limited to: sputum, blood samples, fresh tissue (such as surgical tissue, puncture tissue, etc.), paraffin-embedded tissue, urine, serous cavity fluid (such as ascites, pleural effusion, Pericardial effusion, etc.), or tissues, cells, etc. extracted from the tumor site, or any combination of the above.
- the tissue sample 145 may include the tissues and cells of the tumor patient 140 at the tumor site and at sites other than the tumor.
- the tissue sample 145 may include only the tissues and cells of the tumor patient 140 at the tumor site.
- the relevant information of the tumor patient 140 and / or the tissue sample 145 may be transmitted to one or more components (such as a server) of the tumor prognosis prediction system 100 by humans (such as staff) or machines (such as equipment, etc.) 110, database 130).
- humans such as staff
- machines such as equipment, etc.
- FIG. 2 is a schematic diagram of the architecture of a computing device 200 according to some embodiments of the present application.
- the computing device 200 may include a processor 210, a memory 220, an input / output interface 230 and a communication port 240.
- the server 110 and / or the database 130 may be implemented on the computing device 200.
- the processing engine 112 may be implemented on the computing device 200 and configured to perform the functions of the processing engine 112 in this application.
- the processor 210 may execute calculation instructions (program code) and perform the functions of the server 110 described in this application.
- Computing instructions may include programs, objects, components, data structures, processes, modules, and functions (functions refer to specific functions described in this application).
- the processor 210 may process instructions in the tumor prognosis prediction system 100 to predict the effect of tumor prognosis.
- the processor 210 may include a microcontroller, a microprocessor, a reduced instruction set computer (RISC), an application specific integrated circuit (ASIC), an application specific instruction set processor (ASIP), and a central processing unit (CPU) , Graphics processing unit (GPU), physical processing unit (PPU), microcontroller unit, digital signal processor (DSP), field programmable gate array (FPGA), advanced RISC machine (ARM), programmable logic device and capable Any circuit, processor, etc. that performs one or more functions, or any combination thereof.
- RISC reduced instruction set computer
- ASIC application specific integrated circuit
- ASIP application specific instruction set processor
- CPU central processing unit
- GPU Graphics processing unit
- PPU physical processing unit
- DSP digital signal processor
- FPGA field programmable gate array
- ARM advanced RISC machine
- the memory 220 may store data / information obtained from any component in the tumor prognosis prediction system 100.
- the memory 220 may include mass storage, removable memory, volatile read and write memory, read-only memory (ROM), etc., or any combination thereof.
- Exemplary mass storage may include magnetic disks, optical disks, solid-state drives, and the like.
- Removable memory can include flash drives, floppy disks, optical disks, memory cards, U disks, compact disks, and mobile hard disks.
- Volatile read and write memory can include random access memory (RAM).
- RAM may include dynamic RAM (DRAM), double-rate synchronous dynamic RAM (DDRSDRAM), static RAM (SRAM), thyristor RAM (T-RAM), zero capacitance (Z-RAM), and so on.
- DRAM dynamic RAM
- DDRSDRAM double-rate synchronous dynamic RAM
- SRAM static RAM
- T-RAM thyristor RAM
- Z-RAM zero capacitance
- ROM may include mask ROM (MROM), programmable ROM (PROM), erasable programmable ROM (PEROM), electrically erasable programmable ROM (EEPROM), compact disk ROM (CD-ROM) and digital universal disk ROM Wait.
- MROM mask ROM
- PROM programmable ROM
- PEROM erasable programmable ROM
- EEPROM electrically erasable programmable ROM
- CD-ROM compact disk ROM
- digital universal disk ROM Wait digital universal disk ROM Wait.
- the input / output interface 230 may be used to input or output signals, data, or information.
- the input / output interface 230 may be used for user (eg, tumor patient 140, user of the tumor prognosis prediction system 100, etc.) to contact the server 110.
- the user can input the characteristic information of the tumor patient through the input / output interface 230.
- the input / output interface 230 may include an input device and an output device.
- Exemplary input devices may include a keyboard, mouse, touch screen, microphone, etc., or any combination thereof.
- Exemplary output devices may include display devices, speakers, printers, projectors, etc., or any combination thereof.
- Exemplary display devices may include a liquid crystal display (LCD), a light-emitting diode (LED) based display, a flat panel display, a curved display, a television device, a cathode ray tube (CRT), etc., or any combination thereof.
- LCD liquid crystal display
- LED light-emitting diode
- flat panel display a flat panel display
- curved display a television device
- cathode ray tube (CRT) cathode ray tube
- the communication port 240 may be connected to the network 120 for data communication.
- the connection may be a wired connection, a wireless connection, or a combination of both.
- Wired connections may include cables, fiber optic cables, telephone lines, etc., or any combination thereof.
- the wireless connection may include Bluetooth, WiFi, WiMax, WLAN, ZigBee, mobile network (e.g., 3G, 4G, or 5G, etc.), etc., or any combination thereof.
- the communication port 240 may be a standardized port, such as RS232, RS485, and so on.
- the communication port 240 may be a specially designed port.
- FIG. 3 is a block diagram of a tumor prognosis prediction system according to some embodiments of the present application.
- the tumor prognosis prediction system may include an acquisition module 310, a prediction module 320, and a training module 330.
- the obtaining module 310 can be used to obtain the characteristic information of the tumor patient 140.
- the feature information may reflect at least the gene mutation information of the tumor patient.
- the characteristic information of the tumor patient 140 may include any combination of one or more of gene mutation information of the tumor patient, basic information of the tumor patient, and the like.
- the prediction module 320 may be used to predict the prognosis prediction result of the tumor patient. For example, the prediction module 320 may determine the prognosis prediction result of the tumor patient based on the tumor patient's feature information and according to the tumor prognosis prediction model.
- the training module 330 may be used for training to obtain a tumor prognosis prediction model. Specifically, the training module 330 may obtain the characteristic information and prognostic information of multiple tumor patients. The training module 330 may use the feature information and prognostic information of multiple tumor patients to train the initial model to obtain a tumor prognosis prediction model. In some embodiments, the training module 330 can remove the mutation gene information whose mutation abundance is less than a set threshold in the gene mutation information. In some embodiments, the training module 330 may remove redundant gene mutation information from the gene mutation information. In some embodiments, the training module 330 may determine that at least part of the genes are related genes for tumor prognosis prediction according to the contribution value of each gene mutation information in the feature information of multiple tumor patients to the support vector machine model.
- the training module 330 may use the gene mutation information of the related genes of the tumor prognosis prediction genes of multiple tumor patients and the prognosis information to train the initial model to obtain the tumor prognosis prediction model. In some embodiments, the training module 330 may also use particle swarm optimization or meshing to optimize the parameters of the support vector machine model.
- system and its modules shown in FIG. 3 can be implemented in various ways.
- the system and its modules may be implemented by hardware, software, or a combination of software and hardware.
- the hardware part can be implemented with dedicated logic;
- the software part can be stored in the memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware.
- processor control code for example, on a carrier medium such as a magnetic disk, CD, or DVD-ROM, such as a read-only memory (firmware Such codes are provided on programmable memories or data carriers such as optical or electronic signal carriers.
- the system and its modules of the present application can be implemented by not only hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. It can also be implemented by, for example, software executed by various types of processors, or by a combination of the above hardware circuits and software (for example, firmware).
- the acquisition module 310, the prediction module 320, and the training module 330 may be different modules in a system, or may be a module that implements the functions of the above two or more modules.
- the acquisition module 310 and the prediction module 320 may also be a module having both acquisition and prediction functions.
- each module may share a storage module, or each module may have its own storage module.
- FIG. 4 is an exemplary flowchart of a tumor prognosis prediction method according to some embodiments of the present application. As shown in FIG. 4, the tumor prognosis prediction method may include:
- Step 410 Obtain characteristic information of a tumor patient, and the characteristic information can at least reflect gene mutation information of the tumor patient. Specifically, step 410 may be performed by the obtaining module 310.
- the characteristic information of the tumor patient 140 may include any combination of one or more of gene mutation information of the tumor patient, basic information of the tumor patient, and the like. In some embodiments, the characteristic information of the tumor patient may only include the genetic mutation information of the tumor patient. Specifically, the gene mutation information of the tumor patient may include genes and mutation abundances that have been mutated on DNA, and / or genes related to tumor prognosis prediction on DNA and their mutation abundances. The basic information of the tumor patient can reflect other information related to the tumor patient except the gene mutation information.
- the basic information of cancer patients can include the age, sex, smoking history, years of education, working years, sample storage time (such as blood storage time, tumor tissue storage time, and other normal tissue storage time) of the cancer patient, treatment plan Etc., or any combination thereof.
- the treatment regimen may include the type of treatment regimen (eg, radiation therapy, chemotherapy, immunotherapy, etc.), duration of treatment, dose of radiation used, dose of medication, name or type of medication, and so on.
- the gene mutation information of the tumor patient may be the gene mutation information of the tumor patient at the tumor site (such as the target lesion).
- the gene mutation information of the osteosarcoma patient may be the gene mutation information of the osteosarcoma lesion.
- the tumor patient 140 may be a patient at various stages of the tumor (eg, early, middle, late, etc.), and / or at various stages of treatment (eg, before treatment, during treatment, after treatment, etc.).
- stages of the tumor eg, early, middle, late, etc.
- treatment e.g., before treatment, during treatment, after treatment, etc.
- characteristic information of patients with osteosarcoma before treatment can be obtained for predicting the prognostic effect of treatment, and can further provide reference for the formulation and selection of treatment plans.
- acquiring / determining the gene mutation information of the tumor patient 140 may include: obtaining a tissue sample 145 of the tumor patient 140, extracting the DNA of the tissue sample, preparing a library of the DNA, and performing gene sequencing based on the library to obtain sequencing results , Analyze the sequencing results to determine the genetic mutation information of tumor patients and other steps.
- determining mutation information of 140 genes in tumor patients please refer to FIG. 5 and related descriptions.
- Step 420 based on the characteristic information of the tumor patient, according to the tumor prognosis prediction model, determine the prognosis prediction result of the tumor patient. Specifically, this step 420 may be performed by the prediction module 320.
- the characteristic information of the tumor patient may be input into the trained tumor prognosis prediction model to obtain the prognosis prediction result of the tumor patient.
- the tumor prognosis prediction model may be a supervised learning model.
- the supervised learning model may include one or a combination of one of support vector machine model, decision tree model, neural network model, nearest neighbor classifier, and so on.
- FIG. 6 and related descriptions For the training process of the tumor prognosis prediction model, please refer to FIG. 6 and related descriptions.
- the prognostic prediction result may be the prognosis of a period of time (eg, 5 years) after treatment.
- the prognostic prediction results can be divided into four categories according to changes in target lesions: PD (progressive disease), stable disease (SD), partial response (PR), partial response (PR), and complete response (CR). .
- PD can mean that the sum of the largest diameters of target lesions increases by 20% or more, or new lesions appear (such as new lesions due to tumor metastasis); SD can mean that the sum of the largest diameters of target lesions does not reach PR, or increases Not reaching PD; PR can refer to the reduction of the sum of the maximum diameter of the target lesions by 30% or more, which should be maintained for at least 4 weeks; CR can refer to the disappearance of all target lesions, no new lesions appear, and the normal tumor markers should be maintained for at least 4 weeks.
- the prognostic prediction result may include two types: good treatment effect and bad treatment effect. Specifically, whether the treatment effect is good or not can be determined according to clinical standards.
- the tumor patient relapses within 5 years after treatment, it means that the treatment effect is not good, and if the tumor patient does not relapse within 5 years after treatment, it means that the treatment effect is good.
- PD and SD can be classified as having a poor therapeutic effect
- PR and CR can be classified as having a good therapeutic effect.
- the survival time of the patient exceeds 5 years after the first treatment, it means that the treatment effect is good; if the survival time of the patient after the first treatment is less than 5 years, it means that the treatment effect is not good.
- the prognostic prediction results may also be classified into other categories, which are not limited in the embodiments of the present application.
- the prognosis prediction results can be divided into three categories: good treatment effect, general treatment effect and poor treatment effect.
- the prognostic prediction result may also be the prediction value of a specific indicator.
- the prognostic prediction results may include, but are not limited to, disease remission rate, disease recurrence rate, disease recurrence within a few years, disease survival rate, survival time, recent mortality, long-term mortality, hospital mortality, out-of-hospital mortality, surgical death Rate etc.
- FIG. 5 is an exemplary flowchart for determining gene mutation information of a tumor patient according to some embodiments of the present application. Specifically, the steps shown in FIG. 5 may be performed by staff (such as doctors, laboratory personnel, operators, etc.) and / or instruments (such as detectors, analyzers, etc.). As shown in FIG. 5, the process of determining gene mutation information of a tumor patient may include:
- Step 510 Obtain a tissue sample of a tumor patient.
- the tissue sample 145 can be used to reflect tumor-related information.
- the tissue sample 145 may be a biological tissue or fluid sample extracted from a tumor site (such as a target lesion) and / or a non-tumor site (such as a site other than the lesion) of the tumor patient 140.
- tissue samples may include, but are not limited to: sputum, blood samples, fresh tissue (such as surgical tissue, puncture tissue, etc.), paraffin-embedded tissue, urine, serous cavity fluid (such as ascites, pleural effusion, Pericardial effusion, etc.), or tissues, cells, etc. extracted from the tumor site, or any combination of the above.
- the tissue sample 145 may include the tissues and cells of the tumor patient 140 at the tumor site or at a site other than the tumor. In some embodiments, the tissue sample 145 may include only the tissues and cells of the tumor patient 140 at the tumor site. In some embodiments, inclusion criteria may be established for the tissue sample 145. For example, the requirements for collecting tissue samples can be formulated. The requirements are surgical tissue, fresh tissue, puncture tissue, 10% neutral formalin, and paraffin-embedded tissue.
- the paraffin white film can be 10 (5 micrometer) or 5 (10 micrometer) white films, and to ensure that the sliced tissue contains a sufficient proportion of tumor cells (such as> 70%), the same HE stain can be added Film (or email to inform the number of tumor cells after HE staining of the specimen sent for examination).
- tumor cells such as> 70%
- the same HE stain can be added Film (or email to inform the number of tumor cells after HE staining of the specimen sent for examination).
- the size of the sample collected is> 0.3 cm 3 , and quickly placed in the EP tube.
- the sample transportation standard can be formulated: the paraffin white tablets can be sent for inspection at room temperature within 2 weeks after cutting, such as using an EP tube, and the mouth of the tube is sealed with a sealing film to prevent leakage during transportation and needs to be sent The pathology number of the test sample is written on the application form.
- criteria for screening tissue samples can be formulated, such as sample rejection criteria: tissues other than 10% neutral formalin fixatives, sample information submitted for inspection does not match the application form, tissue autolysis or degeneration, etc.
- step 520 the DNA of the tissue sample is extracted.
- the method of extracting the DNA of the tissue sample may include cetyltrimethylammonium bromide method (CTAB method), glass bead method, ultrasonic method, grinding method, freeze-thaw method, guanidine isothiocyanate Method, alkaline lysis method, enzymatic method, etc. or any combination of the above.
- CTAB method cetyltrimethylammonium bromide method
- glass bead method ultrasonic method
- grinding method freeze-thaw method
- guanidine isothiocyanate Method alkaline lysis method
- enzymatic method etc. or any combination of the above.
- any known method may also be used to extract the DNA of the tissue sample, which is not limited in the embodiments of the present application.
- Step 530 Prepare the DNA library.
- the library preparation process may include some or all of the steps of DNA disruption, end repair, magnetic bead fragment screening, end tailing, adaptor ligation, PCR enrichment, hybrid sequencing library, and the like.
- any known method can also be used to prepare a DNA library of tissue samples, which is not limited in the embodiments of the present application.
- Step 540 Perform gene sequencing according to the library to obtain sequencing results.
- the prepared library may be subjected to gene sequencing to obtain sequencing data.
- the gene sequencing technology may be a high-throughput sequencing technology.
- High-throughput sequencing technology (“Next-generation" sequencing technology, NGS) may include: single-molecule real-time sequencing ( Pacific Bio), ion semiconductor (Ion Torrent sequencing), pyrophosphate sequencing (454), sequencing by synthesis (Illumina) , Sequencing by connection (SOLiD sequencing), chain termination method (Sanger sequencing) and any combination of one or more.
- any known method can also be used for gene sequencing, which is not limited in the embodiments of the present application.
- Step 550 Analyze the sequencing results to determine the genetic mutation information of the tumor patient.
- the acquired sequencing data can be analyzed to obtain gene mutation information of the tumor patient (including the gene and mutation abundance on the DNA, and / or genes related to tumor prognosis prediction on the DNA, Mutation site mutation abundance, gene mutation abundance, etc.).
- the gene mutation abundance may be the cumulative sum of the mutation abundances at the positions where the non-synonymous single nucleotide variation (Single Nucleotide Variation, SNV) in the statistical sequencing result is greater than a certain set value.
- the set value may be 0.05%, 0.1%, 0.2%, 1%, 2%, 3% or 10%, and so on.
- Mutation site abundance can refer to the proportion of a base mutation.
- mutation abundance at the mutation site number of mutant reads / (number of mutant reads + number of wild-type reads), where reads represents a short sequence of sequencing fragments.
- the mutant gene KMT2C of a patient obtained by sequencing has 5 mutation sites.
- the mutation abundances of the 5 mutation sites are: 1%, 3%, 4%, 6%, 8%, and the threshold is set to 2% .
- the mutation abundance of the mutant gene KMT2C is the cumulative sum of the mutation abundances of 4 mutation sites greater than 2%.
- data analysis may include (1) removing linker sequences in sequencing data; (2) performing quality control and removing low-quality sequencing data (eg, low-quality bases, too short sequencing data, etc.); 3) Compare the processed sequencing data with the reference gene data to identify the mutant gene; (4) remove the normal mutations of the gene (such as polymorphic mutation, synonymous mutation, etc.); (5) obtain the tumor patient ’s Gene mutation information and some or all of the above steps.
- the reference gene data may be normal gene data (for example, gene data in normal cells of a non-tumor site of a tumor patient, gene data of a non-tumor patient, etc.), gene data of a corresponding tumor disease (for example, each tumor Prognosis prediction related genes) and so on.
- the reference gene data may be stored in the database 130, and may be retrieved from the database 130 in use.
- any known method can also be used to determine the mutation abundance of a gene. For example, second-generation sequencing, BEAMING, PARE and other technologies.
- FIG. 7 are heat maps of gene mutations in all osteosarcoma patients according to exemplary embodiments of the application; The heat map of gene mutation in patients with osteosarcoma with good treatment effect according to the exemplary embodiment of the present application;
- FIG. 9 is the heat map of gene mutation in patients with osteosarcoma with poor treatment effect according to the exemplary embodiment of the present application.
- the corresponding tissues and cells can be extracted from the target lesion (osteosarcoma lesion site) of osteosarcoma patients (93 samples of osteosarcoma patients as shown in FIG. 7), and the genes of osteosarcoma patients can be determined therefrom Mutation information.
- the gene mutation information of the osteosarcoma patient can be determined through the foregoing process steps of determining the gene mutation information of the tumor patient.
- the mutation status (eg, gene mutation abundance) of 315 genes in the sample (according to the genes reported in the literature that have a more significant effect on cancer) is mainly detected.
- the number of genes detected may increase or decrease as appropriate.
- Figure 7-9 lists the gene mutation heat maps of the top 29 mutation ratios in all patients with osteosarcoma, patients with good prognosis and patients with poor prognosis. Among them, the left ordinate of Figure 7-9 represents a certain mutation The ratio of mutations in genes in 93 samples. The right ordinate represents the mutant gene and the abscissa represents the sample.
- the mutation gene information with a higher proportion of gene mutations in the sample includes: Lysine N-methyltransferase 2C (KMT2C), SRY- box 9 (SOX9), LDL receptor related protein 1B (LRP1B), Neurofibromatosis type I (NF-1), protein Kinase (PRKDC), FAT typical cadherin 1 (FAT1), slit Guidance ligand 2 (SLIT2), Notch1, EPHreceptor A7 (EPHA7), ATRX, Lysine demethylase 6A (KDM6A), APC, RAN binding protein 2 (RANBP2), ROS proto-oncogene 1 (ROS1), EMSY (C11orf30), AT-rich interactive domain-containing protein 2 (ARID2), RARA antisense RNA 1 (RARA.AS1), TATA-box binding protein protein associated 1 factor (TAF1), mutS homolog 2 (MS
- Table 1 lists the high-abundance mutant gene information corresponding to each patient (only 10 patients with good prognosis and 10 patients with poor prognosis are shown as examples).
- FIG. 6 is an exemplary flowchart of obtaining a tumor prognosis prediction model according to training shown in some embodiments of the present application. Specifically, the process shown in FIG. 6 (such as step 610, step 620, etc.) may be performed by the training module 330. As shown in FIG. 6, an exemplary process of training to obtain a tumor prognosis prediction model may include:
- Step 610 Acquire feature information and prognostic information of multiple tumor patients.
- the characteristic information of multiple tumor patients may include: any combination of one or more of gene mutation information of tumor patients, basic information of tumor patients, and the like.
- the gene mutation information of multiple tumor patients may include genes and mutation abundances of mutations in the DNA of each tumor patient.
- the genetic mutation information of the multiple tumor patients may be the genetic mutation information of the tumor patient at the tumor site (such as a target lesion).
- the basic information of the tumor patient can reflect other information related to the tumor patient except the gene mutation information.
- the basic information of the cancer patient may include the age, gender, smoking history, years of education, working years, treatment plan, sample storage time, medication type, etc. of the cancer patient, or any combination thereof.
- the prognostic information of multiple tumor patients can be divided into disease progression (PD, progressive disease), stable disease (SD, stable disease), partial response (PR, partial response), and complete response according to the changes of target lesions (CR, complete, response) Four categories.
- the prognosis may include two types: good treatment effect and bad treatment effect.
- the prognosis may also be the value of a specific indicator.
- the prognosis may include but is not limited to disease remission rate, disease recurrence rate, disease recurrence within a few years, disease survival rate, survival time, recent mortality, long-term mortality, hospital mortality, out-of-hospital mortality, surgical mortality Wait.
- the prognosis situation described herein may correspond to the prognosis prediction result determined in step 420.
- Step 620 Using the feature information and prognostic information of multiple tumor patients, train an initial model to obtain a tumor prognosis prediction model.
- the tumor prognosis prediction model may be a supervised learning model.
- the supervised learning model may include one or a combination of one of support vector machine model, decision tree model, neural network model, nearest neighbor classifier, and so on.
- the support vector machine model will be used as an example to illustrate the training process of the tumor prognosis prediction model.
- initial model parameters may be set to establish an initial support vector machine model. It can also use the meshing method to search for the optimal model parameters (eg, parameter c (cost), parameter g (gamma), etc.) based on the feature information and prognostic information of multiple tumor patients to update and optimize the model.
- the kernel function of the support vector machine model (such as linear kernel function, polynomial kernel function, Gaussian (RBF) kernel function, sigmoid kernel function) can be selected, and based on the characteristic information of multiple tumor patients and their prognosis Information training obtains the support vector machine model.
- the optimal model parameters can also be found by combining the grid division method and the verification method.
- the model parameters eg, parameter c (cost), parameter g (gamma), etc.
- the optimal model parameter is selected according to the verification result.
- the particle swarm optimization algorithm may be used to optimize the parameters of the support vector machine model. Specifically, you can first initialize the parameters of the particle swarm optimization algorithm, and then use the particle swarm optimization algorithm to find the best parameters to update the model (eg, paired parameters c, g, etc.), and use the best parameters as optimization After the model parameters.
- the particle swarm optimization algorithm may include but is not limited to a basic particle swarm optimization algorithm, an adaptive mutation particle swarm optimization algorithm, and the like.
- the parameters of the particle swarm optimization algorithm can include local search capability parameters, global search capability parameters, elastic coefficients for speed updates, maximum number of evolutions, maximum number of populations, number of cross-validation folds, change range of parameter C, change range of parameter g, etc. , Or any combination thereof.
- the parameters of the particle swarm optimization algorithm can be initialized manually or non-manually.
- grid search and particle swarm optimization algorithms can also be used in combination to optimize the parameters of the support vector machine model. For example, you can first use grid search to optimize the parameters of the support vector machine model, and then use particle swarm optimization to optimize it again.
- the feature information of multiple tumor patients can be further screened, and the filtered feature information can be used for model training.
- the mutation gene information whose mutation abundance is less than a set threshold can be removed from the gene mutation information of the multiple tumor patients.
- the gene mutation abundance can be the cumulative sum of the mutation abundances of multiple different mutation sites in the gene, and the threshold of the gene mutation abundance at the mutation site can be artificially set (such as 0.05%, 0.1%, 0.2%, 1%, 2%, 3%, etc.), remove the mutation gene information whose mutation abundance is less than the set threshold. For example, some mutation sites with abundances of mutations less than a certain value (such as 0.05%, 0.1%, 0.2%, etc.) may not be counted in their gene mutation abundances.
- redundant gene mutation information in the gene mutation information of the multiple tumor patients may be removed.
- the gene mutation information there may be two or more genes, and the correlation between them is relatively high.
- the two genes when the mutations of the two genes are the same or similar, or the expressions of the mutation abundances of the two genes are similar, the two genes are considered to be highly correlated. For such highly correlated genes, one or more of them may be considered redundant genes.
- At least part of the genes may be related genes for tumor prognosis prediction according to the contribution value of the mutation information of each gene in the feature information of multiple tumor patients to the support vector machine model.
- the mutation information of each gene in the characteristic information of multiple tumor patients may be further screened.
- the recursive feature elimination method can be used to screen the mutation information of each gene in the feature information of multiple tumor patients. Taking the prediction accuracy of the model as the evaluation standard, the mutation information of each gene in the characteristic information of multiple tumor patients is selectively eliminated to obtain multiple training sets, and a model is trained on each training set. Based on the prediction accuracy The gene mutation information eliminated during the training of each model is sorted by contribution value. It can be understood that the eliminated gene mutation information corresponding to the model with lower prediction accuracy is greater than the eliminated gene mutation information corresponding to the model with higher prediction accuracy.
- the mutation information of each gene can be screened according to the contribution value to obtain at least part of genes as genes related to tumor prognosis prediction.
- the random forest algorithm can also be used to screen the mutation information of each gene in the characteristic information of multiple tumor patients. Specifically, (1) First build a decision tree: you can define that there are P trees in the forest (such as 20, 40, etc.); you can use the bootstrap sampling method to extract multiple sample sets from 93 samples as each decision tree. Training sample set, repeating P round sampling can get the training sample set of each decision tree.
- Each round of sampling can sample 93 times from 93 samples with replacement sampling to get the training set of a decision tree; At each node of the tree, assuming a total of 315 feature variables, m feature variables are randomly selected from it, and a feature is selected from the m feature variables for branch growth. The pruning operation is not performed during the growth process, and the best is calculated.
- the mutation information of the gene with the most reduced impurity can be used as the feature with the largest contribution value, and so on, to determine the contribution value of different mutant genes to the model (as shown in Table 2), so as to screen out at least some genes for tumor prognosis prediction Related genes.
- n mutant genes with the largest contribution to the tumor prognosis prediction model can be selected from the mutant genes that have a significant impact on tumorigenesis as tumor prognosis prediction correlation gene.
- the tumor prognosis prediction model obtained by training can be verified.
- the cross-validation method may include: Hold-Out Method, K-fold Cross-Validation (K-CV) and Leave-One-Out Cross-Validation, LOO-CV).
- the training sample can be divided into the total number of samples (for example, 93), one of which is used as a verification sample, and the remaining 92 are used as training samples to input the initial support vector machine model for training, and the cross-validation process is repeated 93 times, 93 verification results were obtained, and the 93 verification results were combined to determine the final verification result of the tumor prognosis prediction model obtained by training.
- the receiver operating characteristic curve (ROC curve) can be drawn according to the verification result and has been visually represented (as shown in FIG. 10). As shown in FIG.
- the points on the ROC curve represent the sensitivity and specificity of the osteosarcoma prognosis prediction model under different truncation conditions (such as prognostic effect classification criteria).
- the upper left corner of the ROC curve is close to the upper left corner, which can reflect the higher prediction accuracy of the osteosarcoma prognosis prediction model obtained in this example; the area under the ROC curve is 0.988, very close to 1, which can reflect The osteosarcoma prognosis prediction model obtained in this example has a good classification effect; in addition, the osteosarcoma prognosis prediction model of the present application has higher sensitivity average value (0.95) and specificity average value (0.97) under different truncation conditions.
- 6 additional patients with osteosarcoma were selected (4 of which are known to have a poor prognostic effect and the other 2 have a good prognostic effect).
- Obtain the genetic mutation information of the osteosarcoma lesion site and based on this information, determine the prognostic prediction results of the 6 osteosarcoma patients according to the osteosarcoma prognosis prediction model trained in this embodiment (as shown in Table 3, where the predicted values The threshold is set to 0.5, less than 0.5 is a good prognosis, and greater than 0.5 is a poor prognosis), the obtained prediction results are completely consistent with the known prognostic effect.
- Sample name Predictive value Predicted performance Actual prognosis effect Patient 1 0.335717 Good prognosis Good prognosis Patient 2 0.44896 Good prognosis Good prognosis Patient 3 0.67417 Poor prognosis Poor prognosis Patient 4 0.735268 Poor prognosis Poor prognosis Patient 5 0.756405 Poor prognosis Poor prognosis Patient 6 0.930926 Poor prognosis Poor prognosis
- the possible benefits brought by the embodiments of the present application include, but are not limited to: (1) the prognostic effect of tumor patients based on gene mutation information can be realized; (2) the accuracy of tumor prognosis prediction is improved; (3) the tumor prognosis prediction process is implemented Convenient; (4) Provide reference for the formulation and selection of treatment plan. It should be noted that different embodiments may have different beneficial effects. In different embodiments, the possible beneficial effects may be any one or a combination of the above, or any other beneficial effects that may be obtained.
- the computer storage medium may contain a propagated data signal containing a computer program code, for example, on baseband or as part of a carrier wave.
- the propagated signal may have multiple manifestations, including electromagnetic form, optical form, etc., or a suitable combination form.
- the computer storage medium may be any computer-readable medium except the computer-readable storage medium, and the medium may be connected to an instruction execution system, apparatus, or device to communicate, propagate, or transmit a program for use.
- Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or similar media, or any combination of the foregoing.
- the computer program code required for the operation of each part of this application can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C ++, C #, VB.NET, Python Etc., conventional programming languages such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
- the program code may run entirely on the user's computer, or as an independent software package on the user's computer, or partly on the user's computer, partly on a remote computer, or entirely on the remote computer or server.
- the remote computer can be connected to the user's computer through any network, such as a local area network (LAN) or a wide area network (WAN), or connected to an external computer (eg, via the Internet), or in a cloud computing environment, or as a service Use as software as a service (SaaS).
- LAN local area network
- WAN wide area network
- SaaS software as a service
- Some embodiments use numbers describing the number of components and attributes. It should be understood that such numbers used in embodiment descriptions use the modifiers "about”, “approximately”, or “generally” in some examples. Grooming. Unless otherwise stated, “approximately”, “approximately” or “substantially” indicates that the figures allow a variation of ⁇ 20%.
- the numerical parameters used in the specification and claims are approximate values, and the approximate value may be changed according to the characteristics required by individual embodiments. In some embodiments, the numerical parameters should consider the specified significant digits and adopt the method of general digit retention. Although the numerical fields and parameters used to confirm the breadth of the ranges in some embodiments of the present application are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
本申请涉及医学领域,特别涉及一种肿瘤预后预测的方法和系统。This application relates to the medical field, in particular to a method and system for predicting tumor prognosis.
肿瘤(如,骨肉瘤等)是全球第二大死亡病因,肿瘤死亡率和发病率还在不断增高。尽管肿瘤的诊断和治疗措施不断提高,但患者的死亡率仍然得不到有效控制,复发和转移是导致肿瘤患者死亡的主要原因,例如骨肉瘤可向肺、脊髓等多种组织器官转移,严重威胁患者的生命。Tumors (eg, osteosarcoma, etc.) are the second leading cause of death in the world, and the mortality and morbidity rate of tumors are still increasing. Although the diagnosis and treatment of tumors continue to improve, the mortality of patients is still not effectively controlled. Recurrence and metastasis are the main causes of death of tumor patients. For example, osteosarcoma can metastasize to various tissues and organs such as lung and spinal cord Threatening the life of the patient.
目前,临床主要通过病理学和影像学形态变化对肿瘤进行评估,明确患者年龄、肿瘤病理类型、手术分期以及残存肿瘤等指标。随着分子生物学和分子流行病理学等技术的发展,在分子水平上进行肿瘤相关基因及分子标志物的筛选研究是目前肿瘤研究的热点,该类方法可以从肿瘤细胞的分子水平上为肿瘤患者手术提供参考指征、预测术后复发或转移、根治肿瘤的客观指征以及为抗转移治疗提供靶点等。At present, the clinical evaluation of tumors mainly through pathological and imaging morphological changes, to clarify the patient's age, tumor pathology type, surgical stage and residual tumor and other indicators. With the development of molecular biology and molecular epidemiology and other technologies, the screening of tumor-related genes and molecular markers at the molecular level is currently a hot spot in cancer research. Such methods can be used for tumors at the molecular level of tumor cells. The patient provides reference indications for surgery, predicts postoperative recurrence or metastasis, objective indications for radical cure of tumors, and provides targets for anti-metastatic treatment.
因此,研究基因在肿瘤形成、发展、耐药性等中的表达差异,分析基因在肿瘤中的激活和抑制情况,从而更加全面准确的评估患者病情和预后,以实现对肿瘤患者实施个体化治疗具有重要意义,也是本领域技术人员关注的焦点。Therefore, study the differences in gene expression in tumor formation, development, drug resistance, etc., and analyze the activation and inhibition of genes in tumors, so as to more comprehensively and accurately assess the patient's condition and prognosis, so as to implement individualized treatment for tumor patients It has important significance and is also the focus of those skilled in the art.
发明内容Summary of the invention
本申请实施例之一提供一种肿瘤预后预测方法,包括:获取肿瘤患者的特征信息,所述特征信息至少反映所述肿瘤患者的基因突变信息;基于所述肿瘤患者的特征信息,根据肿瘤预后预测模型,确定所述肿瘤患者的预后预测结果。One of the embodiments of the present application provides a tumor prognosis prediction method, including: obtaining characteristic information of a tumor patient, the characteristic information reflecting at least gene mutation information of the tumor patient; based on the characteristic information of the tumor patient, according to the tumor prognosis The prediction model determines the prognosis prediction result of the tumor patient.
在一些实施例中,所述基因突变信息包括DNA上发生突变的基因及其突变丰度,和/或DNA上的肿瘤预后预测相关基因及其突变丰度。In some embodiments, the gene mutation information includes genes and mutation abundances that have been mutated on DNA, and / or genes related to tumor prognosis prediction on DNA and their mutation abundances.
在一些实施例中,所述获取肿瘤患者的特征信息进一步包括:获取所述肿瘤患者的组织样本;提取所述组织样本的DNA;制备所述DNA的文库;根据所述文库进行基因测序,获得测序结果;分析所述测序结果,确定所述肿瘤患者的基因突变信息。In some embodiments, the obtaining characteristic information of the tumor patient further includes: obtaining a tissue sample of the tumor patient; extracting DNA of the tissue sample; preparing a library of the DNA; performing gene sequencing according to the library to obtain Sequencing results; analyzing the sequencing results to determine gene mutation information of the tumor patient.
在一些实施例中,所述特征信息还包括所述肿瘤患者的以下信息中的至少一条: 年龄、性别、吸烟史、受教育年限、工作年限、治疗方案和样本保存时间。In some embodiments, the characteristic information further includes at least one of the following information of the tumor patient: age, gender, smoking history, years of education, working years, treatment plan, and sample storage time.
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型或神经网络模型。In some embodiments, the tumor prognosis prediction model is a support vector machine model or a neural network model.
在一些实施例中,所述肿瘤预后预测方法还包括:利用多名肿瘤患者的特征信息及其预后信息训练初始模型获得所述肿瘤预后预测模型。In some embodiments, the tumor prognosis prediction method further includes: training the initial model using the feature information of multiple tumor patients and their prognosis information to obtain the tumor prognosis prediction model.
在一些实施例中,所述利用多名肿瘤患者的特征信息及其预后信息训练初始模型获得所述肿瘤预后预测模型包括:去除所述多名肿瘤患者的基因突变信息中突变丰度小于某设定阈值的突变基因信息。In some embodiments, the training of the initial model using the feature information and prognostic information of multiple tumor patients to obtain the tumor prognosis prediction model includes: removing the mutation abundance in the gene mutation information of the multiple tumor patients less than Mutated gene information at a certain threshold.
在一些实施例中,所述利用多名肿瘤患者的特征信息及其预后信息训练初始模型获得所述肿瘤预后预测模型包括:去除所述多名肿瘤患者的基因突变信息中的冗余基因突变信息。In some embodiments, the training the initial model using the feature information of multiple tumor patients and the prognostic information to obtain the tumor prognosis prediction model includes: removing redundant gene mutation information from the gene mutation information of the multiple tumor patients .
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型;所述利用多名肿瘤患者的特征信息及其预后信息训练初始模型获得所述肿瘤预后预测模型包括:根据多名肿瘤患者的特征信息中各基因突变信息对支持向量机模型的贡献值,确定至少部分基因为肿瘤预后预测相关基因;利用多名肿瘤患者的所述肿瘤预后预测相关基因的基因突变信息及其预后信息训练所述初始模型获得所述肿瘤预后预测模型。In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training the initial model using the feature information of multiple tumor patients and its prognosis information to obtain the tumor prognosis prediction model includes: The contribution value of each gene mutation information in the feature information to the support vector machine model to determine at least part of the genes as genes related to tumor prognosis prediction; using the gene mutation information of the tumor prognosis prediction related genes of multiple tumor patients and its prognosis information training institute The initial model obtains the tumor prognosis prediction model.
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型;所述训练初始模型获得所述肿瘤预后预测模型还包括:利用粒子群算法或网格划分法优化所述支持向量机模型的参数。In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training initial model to obtain the tumor prognosis prediction model further includes: optimizing the support vector machine model using particle swarm optimization or meshing parameter.
在一些实施例中,所述预后预测结果包括:疾病进展、疾病稳定、部分缓解和完全缓解;或者,所述预后预测结果包括:治疗效果好和治疗效果不好。In some embodiments, the prognosis prediction results include: disease progression, stable disease, partial remission, and complete remission; or, the prognosis prediction results include: good treatment effect and bad treatment effect.
在一些实施例中,所述肿瘤为骨肉瘤。In some embodiments, the tumor is osteosarcoma.
在一些实施例中,所述特征信息至少反映骨肉瘤患者至少一种以下基因的突变信息:KMT2C、SOX9、LRP1B、NF-1、PRKDC、FAT1、STAG2、SLIT2、NOTCH1、EPHA7、ATRX、KDM6A、APC、RANBP2、RARA.AS1、C11orf30、ROS1、ARID2、TAF1、DICER1、MSH2、MSH6、TP53、KDM5A、JAK2、ALK、RB1、NOTCH2和RICTOR。In some embodiments, the characteristic information at least reflects mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A APC, RANBP2, RARA.AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2 and RICTOR.
在一些实施例中,所述肿瘤患者基因突变信息为骨肉瘤病变部位的基因突变信息。In some embodiments, the gene mutation information of the tumor patient is gene mutation information of the osteosarcoma lesion site.
本申请实施例之一提供一种肿瘤预后预测系统,包括获取模块和预测模块,其中,所述获取模块用于获取肿瘤患者的特征信息,所述特征信息至少反映所述肿瘤患者 的基因突变信息;所述预测模块用于基于所述肿瘤患者的特征信息,根据肿瘤预后预测模型,确定所述肿瘤患者的预后预测结果。One of the embodiments of the present application provides a tumor prognosis prediction system, including an acquisition module and a prediction module, wherein the acquisition module is used to acquire characteristic information of a tumor patient, and the characteristic information reflects at least gene mutation information of the tumor patient The prediction module is used to determine the prognosis prediction result of the tumor patient based on the tumor patient's characteristic information and according to the tumor prognosis prediction model.
在一些实施例中,所述基因突变信息包括DNA上发生突变的基因及其突变丰度,和/或DNA上的肿瘤预后预测相关基因及其突变丰度。In some embodiments, the gene mutation information includes genes and mutation abundances that have been mutated on DNA, and / or genes related to tumor prognosis prediction on DNA and their mutation abundances.
在一些实施例中,所述特征信息还包括所述肿瘤患者的以下信息中的至少一条:年龄、性别、吸烟史、受教育年限、工作年限、治疗方案和样本保存时间。In some embodiments, the characteristic information further includes at least one of the following information of the tumor patient: age, gender, smoking history, years of education, years of work, treatment plan, and sample storage time.
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型或神经网络模型。In some embodiments, the tumor prognosis prediction model is a support vector machine model or a neural network model.
在一些实施例中,所述肿瘤预后预测系统还包括训练模块,所述训练模块用于利用多名肿瘤患者的特征信息及其预后信息训练初始模型获得所述肿瘤预后预测模型。In some embodiments, the tumor prognosis prediction system further includes a training module for training the initial model to obtain the tumor prognosis prediction model by using feature information of multiple tumor patients and their prognosis information.
在一些实施例中,所述训练模块还用于去除所述多名肿瘤患者的基因突变信息中突变丰度小于某设定阈值的突变基因信息。In some embodiments, the training module is further configured to remove the mutation gene information whose mutation abundance is less than a set threshold in the gene mutation information of the multiple tumor patients.
在一些实施例中,所述训练模块还用于去除所述多名肿瘤患者的基因突变信息中的冗余基因突变信息。In some embodiments, the training module is further used to remove redundant gene mutation information from the gene mutation information of the multiple tumor patients.
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型;所述训练模块还用于:根据多名肿瘤患者的特征信息中各基因突变信息对支持向量机模型的贡献值,确定至少部分基因为肿瘤预后预测相关基因;利用多名肿瘤患者的所述肿瘤预后预测相关基因的基因突变信息及其预后信息训练所述初始模型获得所述肿瘤预后预测模型。In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training module is further configured to: according to the contribution value of each gene mutation information in the feature information of multiple tumor patients to the support vector machine model, determine at least Some genes are genes related to tumor prognosis prediction; the gene mutation information of the tumor prognosis prediction related genes of multiple tumor patients and their prognosis information are used to train the initial model to obtain the tumor prognosis prediction model.
在一些实施例中,所述肿瘤预后预测模型为支持向量机模型;所述训练模块还用于利用粒子群算法或网格划分法优化所述支持向量机模型的参数。In some embodiments, the tumor prognosis prediction model is a support vector machine model; the training module is also used to optimize the parameters of the support vector machine model using particle swarm optimization or meshing.
在一些实施例中,所述预后预测结果包括:疾病进展、疾病稳定、部分缓解和完全缓解;或者,所述预后预测结果包括:治疗效果好和治疗效果不好。In some embodiments, the prognosis prediction results include: disease progression, stable disease, partial remission, and complete remission; or, the prognosis prediction results include: good treatment effect and bad treatment effect.
在一些实施例中,所述肿瘤为骨肉瘤。In some embodiments, the tumor is osteosarcoma.
在一些实施例中,所述特征信息至少反映骨肉瘤患者至少一种以下基因的突变信息:KMT2C、SOX9、LRP1B、NF-1、PRKDC、FAT1、STAG2、SLIT2、NOTCH1、EPHA7、ATRX、KDM6A、APC、RANBP2、RARA.AS1、C11orf30、ROS1、ARID2、TAF1、DICER1、MSH2、MSH6、TP53、KDM5A、JAK2、ALK、RB1、NOTCH2和RICTOR。In some embodiments, the characteristic information at least reflects mutation information of at least one of the following genes in osteosarcoma patients: KMT2C, SOX9, LRP1B, NF-1, PRKDC, FAT1, STAG2, SLIT2, NOTCH1, EPHA7, ATRX, KDM6A, APC, RANBP2, RARA.AS1, C11orf30, ROS1, ARID2, TAF1, DICER1, MSH2, MSH6, TP53, KDM5A, JAK2, ALK, RB1, NOTCH2 and RICTOR.
在一些实施例中,所述肿瘤患者基因突变信息为骨肉瘤病变部位的基因突变信息。In some embodiments, the gene mutation information of the tumor patient is gene mutation information of the osteosarcoma lesion site.
本申请实施例之一提供一种肿瘤预后预测装置,所述装置包括至少一个处理器 以及至少一个存储器;所述至少一个存储器用于存储计算机指令;所述至少一个处理器用于执行所述计算机指令中的至少部分指令以实现所述肿瘤预后预测方法。One of the embodiments of the present application provides a tumor prognosis prediction device. The device includes at least one processor and at least one memory; the at least one memory is used to store computer instructions; and the at least one processor is used to execute the computer instructions At least part of the instructions to implement the tumor prognosis prediction method.
本申请实施例之一提供一种计算机可读存储介质,所述存储介质存储计算机指令,当所述计算机指令被处理器执行时,实现所述肿瘤预后预测方法。One embodiment of the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed by a processor, implements the tumor prognosis prediction method.
本申请实施例之一提供一种肿瘤预后预测系统,包括:至少一个计算机可读存储介质,包括用于肿瘤预后预测的一组指令;以及与所述至少一个存储介质通信的至少一个处理器,当执行所述一组指令时,所述至少一个处理器被配置为:获取肿瘤患者的特征信息,所述特征信息至少反映肿瘤患者的基因突变信息;以及基于所述肿瘤患者的特征信息,根据肿瘤预后预测模型,确定所述肿瘤患者的预后预测结果。One embodiment of the present application provides a tumor prognosis prediction system, including: at least one computer-readable storage medium, including a set of instructions for tumor prognosis prediction; and at least one processor in communication with the at least one storage medium, When executing the set of instructions, the at least one processor is configured to: obtain characteristic information of a tumor patient, the characteristic information reflects at least gene mutation information of the tumor patient; and based on the characteristic information of the tumor patient, according to The tumor prognosis prediction model determines the prognosis prediction result of the tumor patient.
本申请将以示例性实施例的方式进一步说明,这些示例性实施例将通过附图进行详细描述。这些实施例并非限制性的,在这些实施例中,相同的编号表示相同的结构,其中:The present application will be further described in terms of exemplary embodiments, which will be described in detail through the drawings. These embodiments are not limiting, and in these embodiments, the same numbers indicate the same structure, where:
图1是根据本申请一些实施例所示的肿瘤预后预测系统的应用场景示意图;1 is a schematic diagram of an application scenario of a tumor prognosis prediction system according to some embodiments of the present application;
图2是根据本申请一些实施例所示的计算设备的架构示意图;2 is a schematic structural diagram of a computing device according to some embodiments of the present application;
图3是根据本申请一些实施例所示的肿瘤预后预测系统的模块图;3 is a block diagram of a tumor prognosis prediction system according to some embodiments of the present application;
图4是根据本申请一些实施例所示的肿瘤预后预测方法的示例性流程图;4 is an exemplary flowchart of a tumor prognosis prediction method according to some embodiments of the present application;
图5是根据本申请一些实施例所示的确定肿瘤患者基因突变信息的示例性流程图;5 is an exemplary flowchart for determining gene mutation information of a tumor patient according to some embodiments of the present application;
图6是根据本申请一些实施例所示的训练获得肿瘤预后预测模型的示例性流程图;6 is an exemplary flowchart of obtaining a tumor prognosis prediction model according to training shown in some embodiments of the present application;
图7是根据本申请示例性实施例所示的骨肉瘤患者的基因突变热图;7 is a heat map of gene mutation in a patient with osteosarcoma according to an exemplary embodiment of the present application;
图8是根据本申请示例性实施例所示的治疗效果好的骨肉瘤患者的基因突变热图;8 is a heat map of gene mutation in a patient with osteosarcoma with good therapeutic effect according to an exemplary embodiment of the present application;
图9是根据本申请示例性实施例所示的治疗效果不好的骨肉瘤患者的基因突变热图;以及9 is a heat map of gene mutation in a patient with osteosarcoma with poor treatment effect according to an exemplary embodiment of the present application; and
图10是根据本申请示例性实施例所示的肿瘤预后预测模型的预测结果验证示意图。10 is a schematic diagram of prediction result verification of a tumor prognosis prediction model according to an exemplary embodiment of the present application.
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本申请的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本申请应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some examples or embodiments of the present application. For those of ordinary skill in the art, the application can be applied to these drawings based on these drawings without creative efforts. Other similar scenarios. Unless obvious from the language environment or otherwise stated, the same reference numerals in the figures represent the same structure or operation.
应当理解,本文使用的“系统”、“装置”、“单元”和/或“模组”是用于区分不同级别的不同组件、元件、部件、部分或装配的一种方法。然而,如果其他词语可实现相同的目的,则可通过其他表达来替换所述词语。It should be understood that "system", "device", "unit" and / or "module" used herein is a method for distinguishing different components, elements, parts, parts or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
如本申请和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其它的步骤或元素。As shown in this application and claims, unless the context clearly indicates an exception, the terms "a", "an", "an", and / or "the" are not specific to the singular but may include the plural. In general, the terms "include" and "include" only suggest that steps and elements that are clearly identified are included, and these steps and elements do not constitute an exclusive list, and the method or device may also contain other steps or elements.
本申请中使用了流程图用来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。This application uses a flowchart to illustrate the operations performed by the system according to the embodiments of the application. It should be understood that the preceding or following operations are not necessarily performed accurately in order. Instead, the steps can be processed in reverse order or simultaneously. At the same time, you can also add other operations to these processes, or remove a certain step or several steps from these processes.
图1所示为根据本申请一些实施例所示的肿瘤预后预测系统100的应用场景示意图。如图1所示,肿瘤预后预测系统100可以包括服务器110、网络120和数据库130。在一些实施例中,数据库130可以存储患者的基础信息、疾病史、治疗方案数据,还可以存储患者的基因信息,例如肿瘤患者140在肿瘤部位的基因突变信息、肿瘤患者正常组织的基因信息及参考基因信息等。患者的生物组织样本或流体样本,例如肿瘤患者140的组织样本145,可以保存在专门的储藏设备中以备进一步处理,例如基因测序处理等。具体的,组织样本145可以包括患者的肿瘤组织样本或患者身体其他部位的组织样本。服务器110可以用于对相关信息进行处理、分析以生成预后预测结果。在一些实施例中,服务器110可以数据库130中获取相关的信息和/或数据(例如,肿瘤患者在肿瘤部位的基因突变信息、肿瘤患者的基础信息、参考基因数据等),也可以直接获取工作人员或者其他设备仪器对肿瘤患者140的组织样本145进行处理得到的相关信息和/或数据。FIG. 1 is a schematic diagram of an application scenario of a tumor
服务器110可以是一个服务器,也可以是一个服务器群组。服务器群组可以是集中式的,例如数据中心。服务器群组也可以是分布式的,例如一个分布式系统。服务器110可以是本地的,也可以是远程的。在一些实施例中,服务器110可以在云平台上实现。仅作为示例,云平台可以包括私有云、公共云、混合云、社区云、分布式云、中间云、多云等或其任意组合。在一些实施例中,服务器110可以在具有图2中所示的至少一个组件的计算设备200上实现。The
在一些实施例中,服务器110可以包括处理引擎112。处理引擎112可以用于执行服务器110的指令(程序代码)。例如,处理引擎112能够执行分析肿瘤患者140特征信息的指令,进而获得肿瘤预后预测结果。分析肿瘤患者140特征信息的指令可以以计算机指令的形式存储在计算机可读存储介质(未示出)中。在一些实施例中,处理引擎112可包含一个或多个子处理设备(如:单芯处理设备或多核多芯处理设备)。仅仅作为范例,处理引擎112可包含中央处理器(CPU)、专用集成电路(ASIC)、专用指令处理器(ASIP)、图形处理器(GPU)、物理处理器(PPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)、可编辑逻辑电路(PLD)、控制器、微控制器单元、精简指令集电脑(RISC)、微处理器等或以上任意组合。In some embodiments, the
网络120可以提供信息交换的渠道。在一些实施例中,服务器110和数据库130之间可以通过网络120交换信息。例如,服务器110可以通过网络120接收数据库130中的参考基因数据。在一些实施例中,肿瘤患者140和/或组织样本145的相关信息可以通过网络120传输给服务器110和/数据库130。例如,肿瘤患者140的特征信息(如基因突变信息、基础信息等)可以通过网络120传输给服务器110。在一些实施例中,网络120可以是任意类型的有线或无线网络。例如,网络120可包括一缆线网络、有线网络、光纤网络、电信网络、内部网络、网际网络、区域网络(LAN)、广域网络(WAN)、无线区域网络(WLAN)、都会区域网络(MAN)、公共电话交换网络(PSTN)、蓝牙网络、ZigBee网络、近场通讯(NFC)网络等或以上任意组合。The
数据库130可以用于存储数据和/或指令集。在一些实施例中,数据库130可以存储从服务器110获得的数据。在一些实施例中,数据库130可存储供服务器110执行或使用的信息和/或指令,以执行本申请中描述的示例性方法。在一些实施例中,数据库130中可以存储参考基因数据。具体地,数据库130可以存储各类基因组数据库中的基因数据和/或现有文献中报道的对肿瘤发生具有影响(或显著影响)的基因数据等。其中,基因组数据库可以包括但不限于COSMIC数据库、ClinVar数据库、HGMD数 据库、OMIM数据库、TCGA数据库、GeneCards数据库等。在一些实施例中,数据库130可以包括大容量存储器、可移动存储器、易失性读写存储器、只读存储器(ROM)等或其任意组合。在一些实施例中,数据库130可以在云平台上实现。仅作为示例,云平台可以包括私有云、公共云、混合云、社区云、分布式云、中间云、多云等或其任意组合。在一些实施例中,数据库130可以是服务器110的一部分。The
在一些实施例中,肿瘤患者140可以是患有一种或多种肿瘤疾病的患者。其中,肿瘤疾病可以包括癌、肉瘤、良性肿瘤等或其任意组合。具体地,癌可以包括鳞状上皮癌、腺癌、未分化癌等。例如,鳞状上皮癌可以包括发生于皮肤、食管、肺、子宫颈、阴道、外阴、阴茎等部位的癌症。腺癌可以包括发生于消化管、肺、子宫体、乳腺、卵巢、前列腺、甲状腺、肝、肾、胰腺、胆囊等部位的癌症。肉瘤可以包括但不限于:软组织肉瘤、骨肉瘤、恶性纤维组织细胞瘤、两边肉瘤、横纹肌肉瘤、淋巴肉瘤、滑膜肉瘤、平滑肌瘤等。良性肿瘤可以包括但不限于错构瘤、胰腺良性肿瘤、甲状腺腺瘤、乳腺纤维瘤、子宫瘤、胃肠道平骨肌瘤、软组织纤维瘤、滑膜瘤、韧带纤维瘤等。在本申请一具体实施例中,肿瘤患者140可以是骨肉瘤患者。在一些实施例中,肿瘤患者140可以是肿瘤在各个阶段(如早期、中期、晚期等)的患者。肿瘤患者140也可以是在治疗各个阶段(如治疗前、治疗中、治疗后等)的患者。In some embodiments, the
在一些实施例中,组织样本145可以用于反映肿瘤患者140肿瘤的相关信息。具体的,组织样本145可以是从肿瘤患者140的肿瘤部位(如靶病灶)和/或非肿瘤部位(如除病灶外的部位)中提取的生物组织或流体样本。例如,组织样本可以包括但不限于:痰液、血液样本、新鲜组织(如手术组织、穿刺组织等)、石蜡包埋组织、尿液、浆膜腔积液(如,腹水、胸腔积液、心包腔积液等)、或从肿瘤部位提取的组织、细胞等,或以上任意组合。在一些实施例中,组织样本145可以包括肿瘤患者140在肿瘤部位以及除肿瘤以外的部位的组织、细胞。在一些实施例中,组织样本145可以仅包括肿瘤患者140在肿瘤部位的组织、细胞。In some embodiments, the
在一些实施例中,肿瘤患者140和/或组织样本145的相关信息可以通过人工(如工作人员)或机器(如仪器设备等)传输给肿瘤预后预测系统100的一个或多个组件(如服务器110、数据库130)。In some embodiments, the relevant information of the
图2是根据本申请一些实施例所示的计算设备200的架构的示意图。如图2所示,计算设备200可以包括处理器210、存储器220、输入/输出接口230和通信端口240。在该计算设备200上可以实现服务器110和/或数据库130。例如,处理引擎112可以在 计算设备200上实现并且被配置为执行本申请中处理引擎112的功能。FIG. 2 is a schematic diagram of the architecture of a
处理器210可以执行计算指令(程序代码)并执行本申请描述的服务器110的功能。计算指令可以包括程序、对象、组件、数据结构、过程、模块和功能(功能指本申请中描述的特定功能)。例如,处理器210可以处理肿瘤预后预测系统100中预测肿瘤预后效果的指令。在一些实施例中,处理器210可以包括微控制器、微处理器、精简指令集计算机(RISC)、专用集成电路(ASIC)、应用特定指令集处理器(ASIP)、中央处理器(CPU)、图形处理单元(GPU)、物理处理单元(PPU)、微控制器单元、数字信号处理器(DSP)、现场可编程门阵列(FPGA)、高级RISC机(ARM)、可编程逻辑器件以及能够执行一个或多个功能的任何电路和处理器等,或其任意组合。仅为了说明,图2中只描述了一个处理器210,但需要注意的是本申请可以包括多个处理器。The
存储器220可以存储从肿瘤预后预测系统100中任何组件获得的数据/信息。在一些实施例中,存储器220可以包括大容量存储器、可移动存储器、易失性读取和写入存储器和只读存储器(ROM)等,或其任意组合。示例性大容量存储器可以包括磁盘、光盘和固态驱动器等。可移动存储器可以包括闪存驱动器、软盘、光盘、存储卡、U盘、压缩盘和移动硬盘等。易失性读取和写入存储器可以包括随机存取存储器(RAM)。RAM可以包括动态RAM(DRAM)、双倍速率同步动态RAM(DDRSDRAM)、静态RAM(SRAM)、晶闸管RAM(T-RAM)和零电容(Z-RAM)等。ROM可以包括掩模ROM(MROM)、可编程ROM(PROM)、可擦除可编程ROM(PEROM)、电可擦除可编程ROM(EEPROM)、光盘ROM(CD-ROM)和数字通用盘ROM等。The
输入/输出接口230可以用于输入或输出信号、数据或信息。在一些实施例中,输入/输出接口230可以用于用户(例如,肿瘤患者140、肿瘤预后预测系统100的使用者等)与服务器110的联系。在一些实施例中,用户可以通过输入/输出接口230输入肿瘤患者的特征信息。在一些实施例中,输入/输出接口230可以包括输入装置和输出装置。示例性输入装置可以包括键盘、鼠标、触摸屏和麦克风等,或其任意组合。示例性输出设备可以包括显示设备、扬声器、打印机、投影仪等,或其任意组合。示例性显示装置可以包括液晶显示器(LCD)、基于发光二极管(LED)的显示器、平板显示器、曲面显示器、电视设备、阴极射线管(CRT)等,或其任意组合。The input /
通信端口240可以连接到网络120以便数据通信。连接可以是有线连接、无线连接或两者的组合。有线连接可以包括电缆、光缆或电话线等,或其任意组合。无线连 接可以包括蓝牙、WiFi、WiMax、WLAN、ZigBee、移动网络(例如,3G、4G或5G等)等,或其任意组合。在一些实施例中,通信端口240可以是标准化端口,如RS232、RS485等。在一些实施例中,通信端口240可以是专门设计的端口。The
图3是根据本申请一些实施例所示的肿瘤预后预测系统的模块图。如图3所示,该肿瘤预后预测系统可以包括获取模块310、预测模块320和训练模块330。FIG. 3 is a block diagram of a tumor prognosis prediction system according to some embodiments of the present application. As shown in FIG. 3, the tumor prognosis prediction system may include an
获取模块310可以用于获取肿瘤患者140的特征信息。在一些实施例中,该特征信息可以至少反映肿瘤患者的基因突变信息。在一些实施例中,肿瘤患者140的特征信息可以包括:肿瘤患者的基因突变信息、肿瘤患者的基础信息等一种或多种的任意组合。The obtaining
预测模块320可以用于预测肿瘤患者的预后预测结果。例如,预测模块320可以基于肿瘤患者的特征信息,根据肿瘤预后预测模型,确定肿瘤患者的预后预测结果。The
训练模块330可以用于训练获得肿瘤预后预测模型。具体的,训练模块330可以获取多名肿瘤患者的特征信息及其预后信息。训练模块330可以利用多名肿瘤患者的特征信息及其预后信息,训练初始模型获得肿瘤预后预测模型。在一些实施例中,训练模块330可以去除基因突变信息中突变丰度小于某设定阈值的突变基因信息。在一些实施例中,训练模块330可以去除基因突变信息中的冗余基因突变信息。在一些实施例中,训练模块330可以根据多名肿瘤患者的特征信息中各基因突变信息对支持向量机模型的贡献值,确定至少部分基因为肿瘤预后预测相关基因。在一些实施例中,训练模块330可以利用多名肿瘤患者的肿瘤预后预测相关基因的基因突变信息及其预后信息训练初始模型获得所述肿瘤预后预测模型。在一些实施例中,训练模块330还可以利用粒子群算法或网格划分法优化支持向量机模型的参数。The
应当理解,图3所示的系统及其模块可以利用各种方式来实现。例如,在一些实施例中,系统及其模块可以通过硬件、软件或者软件和硬件的结合来实现。其中,硬件部分可以利用专用逻辑来实现;软件部分则可以存储在存储器中,由适当的指令执行系统,例如微处理器或者专用设计硬件来执行。本领域技术人员可以理解上述的方法和系统可以使用计算机可执行指令和/或包含在处理器控制代码中来实现,例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本申请的系统及其模块不仅可以有诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现,也可以用例如 由各种类型的处理器所执行的软件实现,还可以由上述硬件电路和软件的结合(例如,固件)来实现。It should be understood that the system and its modules shown in FIG. 3 can be implemented in various ways. For example, in some embodiments, the system and its modules may be implemented by hardware, software, or a combination of software and hardware. Among them, the hardware part can be implemented with dedicated logic; the software part can be stored in the memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. Those skilled in the art will understand that the above methods and systems can be implemented using computer-executable instructions and / or contained in processor control code, for example, on a carrier medium such as a magnetic disk, CD, or DVD-ROM, such as a read-only memory (firmware Such codes are provided on programmable memories or data carriers such as optical or electronic signal carriers. The system and its modules of the present application can be implemented by not only hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. It can also be implemented by, for example, software executed by various types of processors, or by a combination of the above hardware circuits and software (for example, firmware).
需要注意的是,以上对于候选项显示、确定系统及其模块的描述,仅为描述方便,并不能把本申请限制在所举实施例范围之内。可以理解,对于本领域的技术人员来说,在了解该系统的原理后,可能在不背离这一原理的情况下,对各个模块进行任意组合,或者构成子系统与其他模块连接。例如,在一些实施例中,获取模块310、预测模块320和训练模块330可以是一个系统中的不同模块,也可以是一个模块实现上述的两个或两个以上模块的功能。例如,获取模块310和预测模块320也可以是一个模块同时具有获取和预测功能。又例如,各个模块可以共用一个存储模块,各个模块也可以分别具有各自的存储模块。诸如此类的变形,均在本申请的保护范围之内。It should be noted that the above descriptions of the candidate display and determination system and its modules are for convenience of description only, and cannot limit the application to the scope of the illustrated embodiments. It can be understood that, for those skilled in the art, after understanding the principle of the system, it is possible to arbitrarily combine various modules or form a subsystem to connect with other modules without departing from this principle. For example, in some embodiments, the
图4是根据本申请一些实施例所示的肿瘤预后预测方法的示例性流程图。如图4所示,该肿瘤预后预测方法可以包括:FIG. 4 is an exemplary flowchart of a tumor prognosis prediction method according to some embodiments of the present application. As shown in FIG. 4, the tumor prognosis prediction method may include:
步骤410,获取肿瘤患者的特征信息,该特征信息可以至少反映肿瘤患者的基因突变信息。具体的,步骤410可以由获取模块310执行。Step 410: Obtain characteristic information of a tumor patient, and the characteristic information can at least reflect gene mutation information of the tumor patient. Specifically, step 410 may be performed by the obtaining
在一些实施例中,肿瘤患者140的特征信息可以包括:肿瘤患者的基因突变信息、肿瘤患者的基础信息等一种或多种的任意组合。在一些实施例中,肿瘤患者的特征信息可以仅包含肿瘤患者的基因突变信息。具体的,肿瘤患者的基因突变信息可以包括DNA上发生突变的基因及其突变丰度,和/或DNA上的肿瘤预后预测相关基因及其突变丰度。肿瘤患者的基础信息可以反映与肿瘤患者相关的除了基因突变信息以外的其他信息。例如,肿瘤患者的基础信息可以包括肿瘤患者的年龄、性别、吸烟史、受教育年限、工作年限、样本保存时间(如血液保存时间、肿瘤组织保存时间、患者其他正常组织保存时间)、治疗方案等,或其任意组合。在一些具体实施例中,治疗方案可以包括治疗方案的类型(如放射疗法、化疗、免疫疗法等)、治疗持续时间、使用射线的剂量、药物剂量、药物的名称或类型等等。在一些具体实施例中,肿瘤患者的基因突变信息可以为肿瘤患者在肿瘤部位(如靶病灶)的基因突变信息。例如,骨肉瘤患者的基因突变信息可以为骨肉瘤病变部位的基因突变信息。在一些实施例中,肿瘤患者140可以是肿瘤各个阶段(如早期、中期、晚期等)的患者,和/或在治疗各个阶段(如治疗前、治疗中、治疗后等)的患者。例如,可以获取治疗(如化疗)前的骨肉瘤患者的特征信息,以用于预测治疗的预后效果,进而可以为治疗方案的制定、选择等提供参考。In some embodiments, the characteristic information of the
在一些实施例中,获取/确定肿瘤患者140的基因突变信息可以包括:获取肿瘤 患者140的组织样本145、提取组织样本的DNA、制备该DNA的文库、根据该文库进行基因测序以获得测序结果、分析测序结果以确定肿瘤患者的基因突变信息等步骤。关于确定肿瘤患者140基因突变信息的更多细节可以参见图5及其相关描述。In some embodiments, acquiring / determining the gene mutation information of the
步骤420,基于肿瘤患者的特征信息,根据肿瘤预后预测模型,确定所述肿瘤患者的预后预测结果。具体的,该步骤420可以由预测模块320执行。
在一些实施例中,可以将肿瘤患者的特征信息输入到训练好的肿瘤预后预测模型中,以获得肿瘤患者的预后预测结果。在一些实施例中,肿瘤预后预测模型可以是监督学习模型。具体的,监督学习模型可以包括:支持向量机模型、决策树模型、神经网络模型、最近邻分类器等中的一种或几种的组合。关于肿瘤预后预测模型的训练流程可以参见图6及其相关描述。In some embodiments, the characteristic information of the tumor patient may be input into the trained tumor prognosis prediction model to obtain the prognosis prediction result of the tumor patient. In some embodiments, the tumor prognosis prediction model may be a supervised learning model. Specifically, the supervised learning model may include one or a combination of one of support vector machine model, decision tree model, neural network model, nearest neighbor classifier, and so on. For the training process of the tumor prognosis prediction model, please refer to FIG. 6 and related descriptions.
在一些实施例中,预后预测结果可以为治疗后一段时间(如5年)的预后情况。例如,预后预测结果可以根据靶病灶的变化分为疾病进展(PD,progressive disease)、疾病稳定(SD,stable disease)、部分缓解(PR,partial response)和完全缓解(CR,complete response)四类。具体的,PD可以指靶病灶最大径之和增加20%及以上,或出现新病灶(例如由于肿瘤转移出现的新病灶);SD可以指靶病灶最大径之和缩小未达PR,或增大未达PD;PR可以指靶病灶最大径之和减少30%及以上,至少维持4周;CR可以指所有靶病灶消失,无新病灶出现,且肿瘤标志物正常,至少维持4周。在一些实施例中,预后预测结果可以包括:治疗效果好和治疗效果不好两类。具体的,治疗效果好和不好可以根据临床标准确定。例如,若治疗后5年内肿瘤患者病情复发则表示治疗效果不好,若治疗后5年内肿瘤患者病情不复发则表示治疗效果好。又例如,PD和SD可以归为治疗效果不好,PR和CR可以归为治疗效果好。又例如,若首次治疗后,患者的生存时间超过5年则表示治疗效果好;若首次治疗后患者的生存时间小于5年则表示治疗效果不好。In some embodiments, the prognostic prediction result may be the prognosis of a period of time (eg, 5 years) after treatment. For example, the prognostic prediction results can be divided into four categories according to changes in target lesions: PD (progressive disease), stable disease (SD), partial response (PR), partial response (PR), and complete response (CR). . Specifically, PD can mean that the sum of the largest diameters of target lesions increases by 20% or more, or new lesions appear (such as new lesions due to tumor metastasis); SD can mean that the sum of the largest diameters of target lesions does not reach PR, or increases Not reaching PD; PR can refer to the reduction of the sum of the maximum diameter of the target lesions by 30% or more, which should be maintained for at least 4 weeks; CR can refer to the disappearance of all target lesions, no new lesions appear, and the normal tumor markers should be maintained for at least 4 weeks. In some embodiments, the prognostic prediction result may include two types: good treatment effect and bad treatment effect. Specifically, whether the treatment effect is good or not can be determined according to clinical standards. For example, if the tumor patient relapses within 5 years after treatment, it means that the treatment effect is not good, and if the tumor patient does not relapse within 5 years after treatment, it means that the treatment effect is good. For another example, PD and SD can be classified as having a poor therapeutic effect, and PR and CR can be classified as having a good therapeutic effect. For another example, if the survival time of the patient exceeds 5 years after the first treatment, it means that the treatment effect is good; if the survival time of the patient after the first treatment is less than 5 years, it means that the treatment effect is not good.
在一些替代性实施例中,预后预测结果也可以被分为其他类别,本申请实施例对此不做限制。例如,预后预测结果可以分为治疗效果好、治疗效果一般和治疗效果差三类。在一些实施例中,预后预测结果还可以为具体某个指标的预测数值。例如,预后预测结果可以包括但不限于疾病缓解率、疾病复发率、疾病在几年内复发、疾病生存率、生存时间、近期病死率、远期病死率、住院病死率、院外病死率、手术死亡率等。In some alternative embodiments, the prognostic prediction results may also be classified into other categories, which are not limited in the embodiments of the present application. For example, the prognosis prediction results can be divided into three categories: good treatment effect, general treatment effect and poor treatment effect. In some embodiments, the prognostic prediction result may also be the prediction value of a specific indicator. For example, the prognostic prediction results may include, but are not limited to, disease remission rate, disease recurrence rate, disease recurrence within a few years, disease survival rate, survival time, recent mortality, long-term mortality, hospital mortality, out-of-hospital mortality, surgical death Rate etc.
应当注意的是,上述有关流程400的描述仅仅是为了示例和说明,而不限定本申请的适用范围。对于本领域技术人员来说,在本申请的指导下可以对流程400进行各 种修正和改变。然而,这些修正和改变仍在本申请的范围之内。It should be noted that the above description of the
图5是根据本申请一些实施例所示的确定肿瘤患者基因突变信息的示例性流程图。具体的,图5所示各步骤可以由工作人员(如医生、实验人员、操作员等)和/或仪器设备(如检测仪、分析仪等)等执行。如图5所示,确定肿瘤患者基因突变信息的流程可以包括:FIG. 5 is an exemplary flowchart for determining gene mutation information of a tumor patient according to some embodiments of the present application. Specifically, the steps shown in FIG. 5 may be performed by staff (such as doctors, laboratory personnel, operators, etc.) and / or instruments (such as detectors, analyzers, etc.). As shown in FIG. 5, the process of determining gene mutation information of a tumor patient may include:
步骤510,获取肿瘤患者的组织样本。Step 510: Obtain a tissue sample of a tumor patient.
在一些实施例中,组织样本145可以用于反映肿瘤的相关信息。具体的,组织样本145可以是从肿瘤患者140的肿瘤部位(如靶病灶)和/或非肿瘤部位(如除病灶外的部位)中提取的生物组织或流体样本。例如,组织样本可以包括但不限于:痰液、血液样本、新鲜组织(如手术组织、穿刺组织等)、石蜡包埋组织、尿液、浆膜腔积液(如,腹水、胸腔积液、心包腔积液等)、或从肿瘤部位提取的组织、细胞等,或以上任意组合。在一些实施例中,组织样本145可以包括肿瘤患者140在肿瘤部位或除肿瘤以外的部位的组织、细胞。在一些实施例中,组织样本145可以仅包括肿瘤患者140在肿瘤部位的组织、细胞。在一些实施例中,可以对组织样本145制定入选标准。例如,可以制定采集组织样本的要求,要求为手术组织、新鲜组织、穿刺组织、10%中性尔马林定、石蜡包埋的组织等。又例如,石蜡白片可以为10张(5微米)或5张(10微米)的白片,且为保证切片组织中含有足够比例的肿瘤细胞(如>70%),可以加同一张HE染色片(或邮件告知送检标本HE染色后肿瘤细胞量)。又例如,对于手术组织或穿刺组织,可以要求采集的样本大小>0.3cm
3,并快速放到EP管中。又例如,可以制定样本运输标准:石蜡白片在切好后可以在2周内常温送检,如用EP管,且管口用封口膜密封,以防运输过程中渗漏,并需将送检样本的病理号写于申请单上。又例如,可以制定筛选组织样本的标准,如样本拒收标准:非10%中性福尔马林固定液组织、送检样本信息与申请单不符、组织自溶或退变等。
In some embodiments, the
步骤520,提取组织样本的DNA。In
在一些实施例中,提取组织样本的DNA的方法可以包括十六烷基三甲基溴化铵法(CTAB法)、玻璃珠法、超声波法、研磨法、冻融法、异硫氰酸胍法、碱裂解法、酶法等或以上任意组合。在一些实施例中,还可以采用任何已知的方法提取组织样本的DNA,本申请实施例对此不做限制。In some embodiments, the method of extracting the DNA of the tissue sample may include cetyltrimethylammonium bromide method (CTAB method), glass bead method, ultrasonic method, grinding method, freeze-thaw method, guanidine isothiocyanate Method, alkaline lysis method, enzymatic method, etc. or any combination of the above. In some embodiments, any known method may also be used to extract the DNA of the tissue sample, which is not limited in the embodiments of the present application.
步骤530,制备所述DNA的文库。Step 530: Prepare the DNA library.
在一些实施例中,文库制备过程可以包括DNA打断、末端修复、磁珠法片段 筛选、末端加尾、接头连接、PCR富集、杂交测序文库等部分或全部步骤。此外,还可以使用任何已知的方法制备组织样本的DNA的文库,本申请实施例对此不做限制。In some embodiments, the library preparation process may include some or all of the steps of DNA disruption, end repair, magnetic bead fragment screening, end tailing, adaptor ligation, PCR enrichment, hybrid sequencing library, and the like. In addition, any known method can also be used to prepare a DNA library of tissue samples, which is not limited in the embodiments of the present application.
步骤540,根据所述文库进行基因测序,获得测序结果。Step 540: Perform gene sequencing according to the library to obtain sequencing results.
在一些实施例中,可以对所制备的文库进行基因测序,以获得测序数据。其中,基因测序技术可以为高通量测序技术。高通量测序技术("Next-generation"sequencing technology,NGS)可以包括:单分子实时测序(Pacific Bio)、离子半导体(Ion Torrent sequencing)、焦磷酸测序(454)、边合成边测序(Illumina)、边连接边测序(SOLiD sequencing)、链终止法(Sanger sequencing)等一种或多种的任意组合。此外,还可以采用任何已知的方法进行基因测序,本申请实施例对此不做限制。In some embodiments, the prepared library may be subjected to gene sequencing to obtain sequencing data. Among them, the gene sequencing technology may be a high-throughput sequencing technology. High-throughput sequencing technology ("Next-generation" sequencing technology, NGS) may include: single-molecule real-time sequencing (Pacific Bio), ion semiconductor (Ion Torrent sequencing), pyrophosphate sequencing (454), sequencing by synthesis (Illumina) , Sequencing by connection (SOLiD sequencing), chain termination method (Sanger sequencing) and any combination of one or more. In addition, any known method can also be used for gene sequencing, which is not limited in the embodiments of the present application.
步骤550,分析测序结果,确定肿瘤患者的基因突变信息。Step 550: Analyze the sequencing results to determine the genetic mutation information of the tumor patient.
在一些实施例中,可以对获取的测序数据进行数据分析,以获得肿瘤患者的基因突变信息(包括DNA上发生突变的基因及其突变丰度,和/或DNA上的肿瘤预后预测相关基因、突变位点突变丰度、基因突变丰度等)。在一些实施例中,基因突变丰度可以是统计测序结果中非同义单核苷酸变异(Single Nucleotide Variation,SNV)大于某设定值的位点的突变丰度的累加和。所述设定值可以是0.05%、0.1%、0.2%、1%、2%、3%或10%等等。突变位点突变丰度可以指一个碱基突变所占比例。具体的,突变位点突变丰度=突变型reads数量/(突变型reads数量+野生型reads数量),其中reads表示一小段测序片段。例如,通过测序获得某患者的突变基因KMT2C共有5个突变位点,5个突变位点的突变丰度分别为:1%、3%、4%、6%、8%,阈值设为2%。则突变基因KMT2C的突变丰度为大于2%的4个突变位点突变丰度的累加和。在一些实施例中,数据分析可以包括(1)去除测序数据中的接头序列;(2)进行质量控制并去除低质量测序数据(如,低质量碱基、过短的测序数据等);(3)将上述处理后的测序数据与参考基因数据进行比对以识别出突变基因;(4)剔除基因正常变异情况(如多态性变异、同义变异等);(5)获得肿瘤患者的基因突变信息等以上部分或全部步骤。在一些实施例中,参考基因数据可以是正常基因数据(例如,肿瘤患者非肿瘤部位的正常细胞中的基因数据、非肿瘤患者的基因数据等)、相应肿瘤疾病的基因数据(例如,各肿瘤的预后预测相关基因)等。通过本申请的测序方法对93例患者进行了测序,统计得出目标区域覆盖度为98.2%~99.6%,均值为99.41%;目标区域平均测序深度为462.7~1252.89,均值为705.51;目标区域捕获效率为75.6%~84.6%,均值为80.01%。在一些实施例中,参考基因数据可以存储在数据库130中,在使用时可以从该数据库 130中调取。在一些实施例中,还可以采用任何已知的方法测定基因的突变丰度。例如,二代测序、BEAMING、PARE等技术。In some embodiments, the acquired sequencing data can be analyzed to obtain gene mutation information of the tumor patient (including the gene and mutation abundance on the DNA, and / or genes related to tumor prognosis prediction on the DNA, Mutation site mutation abundance, gene mutation abundance, etc.). In some embodiments, the gene mutation abundance may be the cumulative sum of the mutation abundances at the positions where the non-synonymous single nucleotide variation (Single Nucleotide Variation, SNV) in the statistical sequencing result is greater than a certain set value. The set value may be 0.05%, 0.1%, 0.2%, 1%, 2%, 3% or 10%, and so on. Mutation site abundance can refer to the proportion of a base mutation. Specifically, mutation abundance at the mutation site = number of mutant reads / (number of mutant reads + number of wild-type reads), where reads represents a short sequence of sequencing fragments. For example, the mutant gene KMT2C of a patient obtained by sequencing has 5 mutation sites. The mutation abundances of the 5 mutation sites are: 1%, 3%, 4%, 6%, 8%, and the threshold is set to 2% . Then the mutation abundance of the mutant gene KMT2C is the cumulative sum of the mutation abundances of 4 mutation sites greater than 2%. In some embodiments, data analysis may include (1) removing linker sequences in sequencing data; (2) performing quality control and removing low-quality sequencing data (eg, low-quality bases, too short sequencing data, etc.); 3) Compare the processed sequencing data with the reference gene data to identify the mutant gene; (4) remove the normal mutations of the gene (such as polymorphic mutation, synonymous mutation, etc.); (5) obtain the tumor patient ’s Gene mutation information and some or all of the above steps. In some embodiments, the reference gene data may be normal gene data (for example, gene data in normal cells of a non-tumor site of a tumor patient, gene data of a non-tumor patient, etc.), gene data of a corresponding tumor disease (for example, each tumor Prognosis prediction related genes) and so on. 93 patients were sequenced by the sequencing method of the present application, and the statistical coverage of the target area was 98.2% -99.6%, with an average value of 99.41%; the average sequencing depth of the target area was 462.7-1252.89, with an average value of 705.51; the target area was captured The efficiency ranges from 75.6% to 84.6%, with an average value of 80.01%. In some embodiments, the reference gene data may be stored in the
经过测序,发现不同的突变基因在不同患者样本中分布不同。图7-9是根据本申请一些实施例所示的骨肉瘤患者的基因突变热图;其中,图7根据本申请示例性实施例所示的全体骨肉瘤患者的基因突变热图;图8是根据本申请示例性实施例所示的治疗效果好的骨肉瘤患者的基因突变热图;图9是根据本申请示例性实施例所示的治疗效果不好的骨肉瘤患者的基因突变热图。After sequencing, it was found that different mutant genes were distributed differently in different patient samples. 7-9 are heat maps of gene mutations in osteosarcoma patients according to some embodiments of the present application; wherein, FIG. 7 are heat maps of gene mutations in all osteosarcoma patients according to exemplary embodiments of the application; The heat map of gene mutation in patients with osteosarcoma with good treatment effect according to the exemplary embodiment of the present application; FIG. 9 is the heat map of gene mutation in patients with osteosarcoma with poor treatment effect according to the exemplary embodiment of the present application.
在本实施例中,可以从骨肉瘤患者(如图7所示的93例骨肉瘤患者样本)的靶病灶中(骨肉瘤病变部位)提取相应的组织、细胞,并从中确定骨肉瘤患者的基因突变信息。具体的,可以通过前述的确定肿瘤患者基因突变信息的流程步骤确定骨肉瘤患者的基因突变信息。In this embodiment, the corresponding tissues and cells can be extracted from the target lesion (osteosarcoma lesion site) of osteosarcoma patients (93 samples of osteosarcoma patients as shown in FIG. 7), and the genes of osteosarcoma patients can be determined therefrom Mutation information. Specifically, the gene mutation information of the osteosarcoma patient can be determined through the foregoing process steps of determining the gene mutation information of the tumor patient.
在本实施例中,主要检测了样本的315个基因(根据已有文献报道中对癌症具有较显著影响的基因)的突变情况(如,基因突变丰度)。在一些替代性实施例中,所检测的基因数量可以视情况增加或减少。如图7-9分别列出了所有骨肉瘤患者、预后效果好的患者和预后效果差的患者中的突变比率前29的基因突变热图,其中,图7-9左纵坐标代表某个突变基因在93个样本中发生突变所占的比率,右纵坐标代表突变基因,横坐标代表样本。具体的,在本实施例中,样本中基因突变所占比率较高的突变基因信息(如图7-9所示的部分突变基因信息)包括了:Lysine N-methyltransferase 2C(KMT2C)、SRY-box 9(SOX9)、LDL receptor related protein 1B(LRP1B)、Neurofibromatosis type I(NF-1)、protein kinase(PRKDC)、FAT atypical cadherin 1(FAT1)、slit guidance ligand 2(SLIT2)、Notch1、EPH receptor A7(EPHA7)、ATRX、Lysine demethylase6A(KDM6A)、APC、RAN binding protein 2(RANBP2)、ROS proto-oncogene 1(ROS1)、EMSY(C11orf30)、AT-rich interactive domain-containing protein 2(ARID2)、RARA antisense RNA 1(RARA.AS1)、TATA-box binding protein associated factor 1(TAF1)、mutS homolog 2(MSH2)、mutS homolog 6(MSH6)、Tumor protein p53(TP53)、dicer 1(DICER1)、lysine demethylase 5A(KDM5A)、Janus kinase 2(JAK2)、ALK receptor tyrosine kinase(ALK)、RB transcriptional corepressor 1(RB1)、NOTCH2、RPTOR independent companion of MTOR complex 2(RICTOR)、stromal antigen 2(STAG2)、polybromo 1(PBRM1)、melanogenesis associated transcription factor(MITF)、cytochrome P450family 2subfamily C member 8(CYP2C8)、phosphatidylinositol-4, 5-bisphosphate 3-kinase catalytic subunit alpha(PIK3CA)、phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta(PIK3CB)、B-Raf proto-oncogene(BRAF)、MET proto-oncogene,receptor tyrosine kinase(MET)、heat shock protein 90alpha family class A member 1(HSP90AA1)、membrane associated guanylate kinase,WW and PDZ domain containing 2(MAGI2)、mitogen-activated protein kinase kinase kinase 1(MAP3K1)、hepatocyte growth factor(HGF)、E1A binding protein p300(EP300)、AKT serine/threonine kinase 3(AKT3)、ASXL transcriptional regulator 1(ASXL1)、ATM serine/threonine kinase(ATM)、AXIN1、AXL receptor tyrosine kinase(AXL)、BLM RecQ like helicase(BLM)、BRCA2DNA repair associated(BRCA2)、cell division cycle 73(CDC73)、cyclin dependent kinase 12(CDK12)、CREB binding protein(CREBBP)、catenin alpha 1(CTNNA1)、CYLD lysine 63deubiquitinase(CYLD)、EPH receptor A3(EPHA3)、EPH receptor B1(EPHB1)、erb-b2receptor tyrosine kinase 3(ERBB3)、erb-b2receptor tyrosine kinase 4(ERBB4)、ERBB receptor feedback inhibitor 1(ERRFI1)、FA complementation group A(FANCA)、FA complementation group D2(FANCD2)、FAT atypical cadherin 1(FAT1)、far upstream element binding protein 1(FUBP1)、GATA binding protein 1(GATA1)、GATA binding protein 2(GATA2)、interleukin 7receptor(IL7R)、Janus kinase 1(JAK1)、lysine acetyltransferase 6A(KAT6A)、LOC101929829、LOC115110、leucine zipper like transcription regulator 1(LZTR1)、mitogen-activated protein kinase kinase 2(MAP2K2)、MDM4、mediator complex subunit 12(MED12)、mutL homolog 1(MLH1)、MYC proto-oncogene(MYC)、MYCN proto-oncogene(MYCN)、NFKB inhibitor alpha(NFKBIA)、PARK2、phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit gamma(PIK3CG)、phosphoinositide-3-kinase regulatory subunit 2(PIK3R2)、protein kinase C iota(PRKCI)、patched 1(PTCH1)、ret proto-oncogene(RET)、SET domain containing 2(SETD2)、SMAD family member 4(SMAD4)、SMARCA4、spen family transcriptional repressor(SPEN)、spectrin alpha,erythrocytic 1(SPTA1)、signal transducer and activator of transcription 3(STAT3)、transforming growth factor beta receptor 2(TGFBR2)、TSC complex subunit 1(TSC1)等基因。In this embodiment, the mutation status (eg, gene mutation abundance) of 315 genes in the sample (according to the genes reported in the literature that have a more significant effect on cancer) is mainly detected. In some alternative embodiments, the number of genes detected may increase or decrease as appropriate. Figure 7-9 lists the gene mutation heat maps of the top 29 mutation ratios in all patients with osteosarcoma, patients with good prognosis and patients with poor prognosis. Among them, the left ordinate of Figure 7-9 represents a certain mutation The ratio of mutations in genes in 93 samples. The right ordinate represents the mutant gene and the abscissa represents the sample. Specifically, in this embodiment, the mutation gene information with a higher proportion of gene mutations in the sample (part of the mutation gene information shown in Figures 7-9) includes: Lysine N-methyltransferase 2C (KMT2C), SRY- box 9 (SOX9), LDL receptor related protein 1B (LRP1B), Neurofibromatosis type I (NF-1), protein Kinase (PRKDC), FAT typical cadherin 1 (FAT1), slit Guidance ligand 2 (SLIT2), Notch1, EPHreceptor A7 (EPHA7), ATRX, Lysine demethylase 6A (KDM6A), APC, RAN binding protein 2 (RANBP2), ROS proto-oncogene 1 (ROS1), EMSY (C11orf30), AT-rich interactive domain-containing protein 2 (ARID2), RARA antisense RNA 1 (RARA.AS1), TATA-box binding protein protein associated 1 factor (TAF1), mutS homolog 2 (MSH2), mutS homolog 6 (MSH6), Tumor protein p53 (TP53), dicer 1 (DICER1), lysine demethylase 5A (KDM5A), Januskinase 2 (JAK2), ALK receptors tyrosinekinase (ALK), RB transcriptional corepressor 1 (RB1), NOTCH2, RPTOR independent dependent companion of MTOR complex 2 ( RICTOR), stromal antibiotic 2 (STAG2), polybromo 1 (PBRM1), melanogenesis associated with transcription factor (MITF), cytochrome P450 family 2 subfamily C member 8 (CYP2C8), phosphatidylinositol-4, 5-bisphosphate 3-kinase catalytic sub3 CA (alpha) , Phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit beta (PIK3CB), B-Raf proto-oncogene (BRAF), MET proto-oncogene, receptor tyrosine Kinase (MET), heat shock protein 90 alpha family family class A member A1 HSP90AA1), membrane associated guanylate skin, WW and PDZ domain containing 2 (MAGI2), mitogen-activated proteinkinasekinasekinase1 (MAP3K1), hepatocytegrowth factor (HGF), E1A bindingprotein protein300 (EP300), AKTineser 3 (AKT3), ASXL transcriptional regulator 1 (ASXL1), ATM serine / threoninekinase (ATM), AXIN1, AXL receptor, tyrosine skinase (AXL), BLM Recq like Helicase (BLM), BRCA2DNA repair associated (BRCA2), cell 73 (CDC73), cy clin dependentkinase12 (CDK12), CREB binding protein (CREBBP), catenin alpha1 (CTNNA1), CYLDlysine 63deubiquitinase (CYLD), EPHreceptorA3 (EPHA3), EPHreceptorB1 (EPHB1), erb-b2recept ERBB3), erb-b2receptor tyrosinekinase 4 (ERBB4), ERBB receptor feedback inhibitor 1 (ERRFI1), FA complementation group A (FANCA), FA complementation group D2 (FANCD2), FAT typical cadherin 1 (FAT1), far1 protein 1 (FUBP1), GATA binding protein 1 (GATA1), GATA binding protein 2 (GATA2), interleukin 7 receptor (IL7R), Januskinase 1 (JAK1), lysine acetyltransferase 6A (KAT6A), LOC101929829, LOC115110like, leucinez regulator 1 (LZTR1), mitogen-activated protein, kinase 2 (MAP2K2), MDM4, mediator complex subunit 12 (MED12), mutL homolog 1 (MLH1), MYC proto-oncogene (MYC), MYCN proto-oncogene (MYCN), NFKBinhibitoralpha (NFKBIA), PARK2, phosphatidylinositol-4,5- bisphosphate 3-kinase catalytic subunit gamma (PIK3CG), phosphoinositide-3-kinase regulatory subunit 2 (PIK3R2), proteinkinase ciota (PRKCI), patched 1 (PTCH1), ret proto-oncogene (RET), SET domain name containing 2 SETD2), SMAD family member 4 (SMAD4), SMARCA4, spence family transcriptional (repressor) (SPEN), spectrin “alpha, erythrocytic 1 (SPTA1), signal“ transducer ”and“ activator ”of“ transcription ”3 (STAT3), transforming“ growth ”factor 2 (GFT2) , TSC complex subunit 1 (TSC1) and other genes.
另外,对每个患者的315个基因进行测序后,还可以发现患者样本中的不同突变基因的突变丰度不同,如表1所示。In addition, after sequencing 315 genes of each patient, it can also be found that the mutation abundances of different mutant genes in the patient samples are different, as shown in Table 1.
表1各患者对应的高丰度突变基因信息列表(仅展示10名预后效果好的患者 以及10名预后效果差的患者作为示例)。Table 1 lists the high-abundance mutant gene information corresponding to each patient (only 10 patients with good prognosis and 10 patients with poor prognosis are shown as examples).
应当注意的是,上述有关确定肿瘤患者基因突变信息的流程的描述仅仅是为了示例和说明,而不限定本申请的适用范围。对于本领域技术人员来说,在本申请的指导下可以利用任何通过其他技术手段获得的肿瘤患者基因突变信息达到患者预后预测的技术目的。It should be noted that the above description of the process of determining the gene mutation information of a tumor patient is only for illustration and explanation, and does not limit the scope of application of the present application. For those skilled in the art, under the guidance of the present application, any cancer patient gene mutation information obtained by other technical means can be used to achieve the technical purpose of predicting the patient's prognosis.
图6是根据本申请一些实施例所示的训练获得肿瘤预后预测模型的示例性流程图。具体的,图6所示流程(如步骤610、步骤620等)可以由训练模块330执行。如图6所示,训练获得肿瘤预后预测模型的示例性流程可以包括:6 is an exemplary flowchart of obtaining a tumor prognosis prediction model according to training shown in some embodiments of the present application. Specifically, the process shown in FIG. 6 (such as
步骤610,获取多名肿瘤患者的特征信息及其预后信息。Step 610: Acquire feature information and prognostic information of multiple tumor patients.
在一些实施例中,多名肿瘤患者的特征信息可以包括:肿瘤患者的基因突变信息、肿瘤患者的基础信息等一种或多种的任意组合。具体的,多名肿瘤患者的基因突变信息可以包括每名肿瘤患者的DNA上发生突变的基因及其突变丰度。在一些具体实施例中,该多名肿瘤患者的基因突变信息可以为肿瘤患者在肿瘤部位(如靶病灶)的基因突变信息。关于确定该多名肿瘤患者的基因突变信息的具体方法可以参见图5中所描述的确定肿瘤患者基因突变信息的流程。肿瘤患者的基础信息可以反映与肿瘤患者相关的除了基因突变信息以外的其他信息。例如,肿瘤患者的基础信息可以包括肿瘤患者的年龄、性别、吸烟史、受教育年限、工作年限、治疗方案、样本保存时间、用药种类等, 或其任意组合。In some embodiments, the characteristic information of multiple tumor patients may include: any combination of one or more of gene mutation information of tumor patients, basic information of tumor patients, and the like. Specifically, the gene mutation information of multiple tumor patients may include genes and mutation abundances of mutations in the DNA of each tumor patient. In some specific embodiments, the genetic mutation information of the multiple tumor patients may be the genetic mutation information of the tumor patient at the tumor site (such as a target lesion). For a specific method for determining the gene mutation information of the multiple tumor patients, please refer to the flow for determining the gene mutation information of the tumor patient described in FIG. 5. The basic information of the tumor patient can reflect other information related to the tumor patient except the gene mutation information. For example, the basic information of the cancer patient may include the age, gender, smoking history, years of education, working years, treatment plan, sample storage time, medication type, etc. of the cancer patient, or any combination thereof.
在一些实施例中,多名肿瘤患者的预后信息可以根据靶病灶的变化分为疾病进展(PD,progressive disease)、疾病稳定(SD,stable disease)、部分缓解(PR,partial response)和完全缓解(CR,complete response)四类。又例如,预后情况可以包括:治疗效果好和治疗效果不好两类。在一些实施例中,预后情况还可以为具体某指标的数值。例如,预后情况可以包括但不限于疾病缓解率、疾病复发率、疾病在几年内复发、疾病生存率、生存时间、近期病死率、远期病死率、住院病死率、院外病死率、手术死亡率等。在一些实施例中,此处所描述的预后情况可以与步骤420中所确定的预后预测结果相对应。In some embodiments, the prognostic information of multiple tumor patients can be divided into disease progression (PD, progressive disease), stable disease (SD, stable disease), partial response (PR, partial response), and complete response according to the changes of target lesions (CR, complete, response) Four categories. For another example, the prognosis may include two types: good treatment effect and bad treatment effect. In some embodiments, the prognosis may also be the value of a specific indicator. For example, the prognosis may include but is not limited to disease remission rate, disease recurrence rate, disease recurrence within a few years, disease survival rate, survival time, recent mortality, long-term mortality, hospital mortality, out-of-hospital mortality, surgical mortality Wait. In some embodiments, the prognosis situation described herein may correspond to the prognosis prediction result determined in
步骤620,利用多名肿瘤患者的特征信息及其预后信息,训练初始模型获得肿瘤预后预测模型。在一些实施例中,肿瘤预后预测模型可以是监督学习模型。具体的,监督学习模型可以包括:支持向量机模型、决策树模型、神经网络模型、最近邻分类器等中的一种或几种的组合。本实施例中将以支持向量机模型为例,说明肿瘤预后预测模型的训练过程。Step 620: Using the feature information and prognostic information of multiple tumor patients, train an initial model to obtain a tumor prognosis prediction model. In some embodiments, the tumor prognosis prediction model may be a supervised learning model. Specifically, the supervised learning model may include one or a combination of one of support vector machine model, decision tree model, neural network model, nearest neighbor classifier, and so on. In this embodiment, the support vector machine model will be used as an example to illustrate the training process of the tumor prognosis prediction model.
在一些实施例中,可以设定初始模型参数(如,参数c(cost)、参数g(gamma)等)以建立初始支持向量机模型。并可以使用网格划分法,基于多名肿瘤患者的特征信息及其预后信息,搜索最优模型参数(如,参数c(cost)、参数g(gamma)等),以更新、优化模型。在一些实施例中,可以选定支持向量机模型的核函数(如线性核函数、多项式核函数、高斯(RBF)核函数、sigmoid核函数),并基于多名肿瘤患者的特征信息及其预后信息训练获得该支持向量机模型。在一些实施例中,还可以结合网格划分法与验证方法结合寻找最优模型参数。例如,通过网格划分法对模型参数(如,参数c(cost)、参数g(gamma)等)进行调整,对调整参数后的模型进行验证,根据验证结果确定选择最优的模型参数。In some embodiments, initial model parameters (eg, parameter c (cost), parameter g (gamma), etc.) may be set to establish an initial support vector machine model. It can also use the meshing method to search for the optimal model parameters (eg, parameter c (cost), parameter g (gamma), etc.) based on the feature information and prognostic information of multiple tumor patients to update and optimize the model. In some embodiments, the kernel function of the support vector machine model (such as linear kernel function, polynomial kernel function, Gaussian (RBF) kernel function, sigmoid kernel function) can be selected, and based on the characteristic information of multiple tumor patients and their prognosis Information training obtains the support vector machine model. In some embodiments, the optimal model parameters can also be found by combining the grid division method and the verification method. For example, the model parameters (eg, parameter c (cost), parameter g (gamma), etc.) are adjusted by the meshing method, the model after the parameter adjustment is verified, and the optimal model parameter is selected according to the verification result.
在又一些实施例中,可以采用粒子群优化算法对支持向量机模型的参数优化。具体地,可以首先对粒子群优化算法的参数进行初始化,然后使用该粒子群优化算法寻找更新模型的最佳参数(如,成对的参数c、g等),并将该最佳参数作为优化后的模型参数。其中,粒子群优化算法可以包括但不限于基本的粒子群优化算法、自适应变异粒子群优化算法等。粒子群优化算法的参数可以包括局部搜索能力参数、全局搜索能力参数、速度更新的弹性系数、最大进化数量、种群最大数量、交叉验证的折叠次数、参数C的变化范围、参数g的变化范围等,或其任意组合。在一些实施例中,可以人工 或非人工对粒子群优化算法的参数进行初始化设置。In still other embodiments, the particle swarm optimization algorithm may be used to optimize the parameters of the support vector machine model. Specifically, you can first initialize the parameters of the particle swarm optimization algorithm, and then use the particle swarm optimization algorithm to find the best parameters to update the model (eg, paired parameters c, g, etc.), and use the best parameters as optimization After the model parameters. Among them, the particle swarm optimization algorithm may include but is not limited to a basic particle swarm optimization algorithm, an adaptive mutation particle swarm optimization algorithm, and the like. The parameters of the particle swarm optimization algorithm can include local search capability parameters, global search capability parameters, elastic coefficients for speed updates, maximum number of evolutions, maximum number of populations, number of cross-validation folds, change range of parameter C, change range of parameter g, etc. , Or any combination thereof. In some embodiments, the parameters of the particle swarm optimization algorithm can be initialized manually or non-manually.
在其他实施例中,也可以联合采用网格搜索和粒子群优化算法对支持向量机模型的参数进行优化。例如,可以先采用网格搜索对支持向量机模型的参数进行优化,再采用粒子群优化算法对其再次优化。In other embodiments, grid search and particle swarm optimization algorithms can also be used in combination to optimize the parameters of the support vector machine model. For example, you can first use grid search to optimize the parameters of the support vector machine model, and then use particle swarm optimization to optimize it again.
为了提高模型精度或者提高训练效率,还可以对多名肿瘤患者的特征信息进一步筛选,利用筛选后的特征信息进行模型训练。In order to improve the accuracy of the model or improve the training efficiency, the feature information of multiple tumor patients can be further screened, and the filtered feature information can be used for model training.
在一些实施例中,可以去除所述多名肿瘤患者的基因突变信息中突变丰度小于某设定阈值的突变基因信息。基因突变丰度可以为该基因中多个不同突变位点的突变丰度的累加和,可以人为设定突变位点基因突变丰度的阈值(如0.05%、0.1%、0.2%、1%、2%、3%等),将突变丰度小于该设定阈值的突变基因信息予以去除。例如,对于一些突变丰度小于一定值(如0.05%、0.1%、0.2%等)的突变位点可以不计入其基因突变丰度中。In some embodiments, the mutation gene information whose mutation abundance is less than a set threshold can be removed from the gene mutation information of the multiple tumor patients. The gene mutation abundance can be the cumulative sum of the mutation abundances of multiple different mutation sites in the gene, and the threshold of the gene mutation abundance at the mutation site can be artificially set (such as 0.05%, 0.1%, 0.2%, 1%, 2%, 3%, etc.), remove the mutation gene information whose mutation abundance is less than the set threshold. For example, some mutation sites with abundances of mutations less than a certain value (such as 0.05%, 0.1%, 0.2%, etc.) may not be counted in their gene mutation abundances.
在一些实施例中,可以去除所述多名肿瘤患者的基因突变信息中的冗余基因突变信息。具体的,在基因突变信息中,可能存在两个或两个以上基因,其相互之间的相关性比较高。在一些实施例中,当两个基因的突变情况相同或相近,或者两个基因的突变丰度的表达相近时,认为这两个基因相关性较高。对于此类高相关性的基因,可以认为其中一个或多个为冗余基因。通过去除冗余基因突变信息(例如,在高相关性基因中仅保留一个基因),可以在不影响模型训练效果的前提下有效降低基因维数。In some embodiments, redundant gene mutation information in the gene mutation information of the multiple tumor patients may be removed. Specifically, in the gene mutation information, there may be two or more genes, and the correlation between them is relatively high. In some embodiments, when the mutations of the two genes are the same or similar, or the expressions of the mutation abundances of the two genes are similar, the two genes are considered to be highly correlated. For such highly correlated genes, one or more of them may be considered redundant genes. By removing redundant gene mutation information (for example, keeping only one gene in highly correlated genes), the gene dimension can be effectively reduced without affecting the training effect of the model.
在一些实施例中,可以根据多名肿瘤患者的特征信息中各基因突变信息对支持向量机模型的贡献值,确定至少部分基因为肿瘤预后预测相关基因。In some embodiments, at least part of the genes may be related genes for tumor prognosis prediction according to the contribution value of the mutation information of each gene in the feature information of multiple tumor patients to the support vector machine model.
在一些实施例中,可以进一步对多名肿瘤患者的特征信息中各基因突变信息进行筛选。具体地,可以使用递归特征消除方法对多名肿瘤患者的特征信息中各基因突变信息进行筛选。以模型的预测准确率作为评价标准,对多名肿瘤患者的特征信息中各基因突变信息分别进行择一消除获得多个训练集,在各个训练集上分别训练得到一个模型,基于预测准确率对每个模型训练时消除的基因突变信息进行贡献值排序,可以理解为,预测准确率较低的模型对应的消除的基因突变信息大于预测准确率较高的模型对应的消除的基因突变信息。最后可以根据贡献值大小对各基因突变信息进行筛选,获得至少部分基因为肿瘤预后预测相关基因。在一些实施例中,还可以选用随机森林算法对多名肿瘤患者的特征信息中各基因突变信息进行筛选。具体地,(1)首先构建决策树:可以定义森林中有P棵树(如20棵、40棵等);可以利用bootstrap采样方法从93份样 本中抽取多个样本集作为每颗决策树的训练样本集,重复P轮釆样可以得到每颗决策树的训练样本集,每轮采样可以从93份样本中以有放回抽样的方式采样93次得到一棵决策树的训练集;在决策树的每个节点处,假设共有315个特征变量,从中随机抽取m个特征变量,在m个特征变量上选出一个特征进行分支生长,在生长过程中不进行剪枝操作,计算其最佳的分裂方式;(2)将训练好的P棵决策树组合得到随机森林。根据P颗决策树可以对每个多名肿瘤患者进行预测,通过加权或投票的方法,最终的预测结果即为随机森林的输出。在训练各决策树的过程中,可以计算出每个特征减少了多少树的不纯度。对于一个决策树森林来说,可以算出每个特征平均减少了多少不纯度,并把它平均减少的不纯度作为贡献值多少评价标准。例如,可以将减少不纯度最多的基因突变信息作为贡献值最大的特征,以此类推,确定不同突变基因对模型的贡献值(如表2所示),以筛选出至少部分基因为肿瘤预后预测相关基因。例如,可以从对肿瘤发生具有显著影响的突变基因中选择对该肿瘤预后预测模型的贡献值最大的n个(如20个、29个、40个、100个等)突变基因作为肿瘤预后预测相关基因。In some embodiments, the mutation information of each gene in the characteristic information of multiple tumor patients may be further screened. Specifically, the recursive feature elimination method can be used to screen the mutation information of each gene in the feature information of multiple tumor patients. Taking the prediction accuracy of the model as the evaluation standard, the mutation information of each gene in the characteristic information of multiple tumor patients is selectively eliminated to obtain multiple training sets, and a model is trained on each training set. Based on the prediction accuracy The gene mutation information eliminated during the training of each model is sorted by contribution value. It can be understood that the eliminated gene mutation information corresponding to the model with lower prediction accuracy is greater than the eliminated gene mutation information corresponding to the model with higher prediction accuracy. Finally, the mutation information of each gene can be screened according to the contribution value to obtain at least part of genes as genes related to tumor prognosis prediction. In some embodiments, the random forest algorithm can also be used to screen the mutation information of each gene in the characteristic information of multiple tumor patients. Specifically, (1) First build a decision tree: you can define that there are P trees in the forest (such as 20, 40, etc.); you can use the bootstrap sampling method to extract multiple sample sets from 93 samples as each decision tree. Training sample set, repeating P round sampling can get the training sample set of each decision tree. Each round of sampling can sample 93 times from 93 samples with replacement sampling to get the training set of a decision tree; At each node of the tree, assuming a total of 315 feature variables, m feature variables are randomly selected from it, and a feature is selected from the m feature variables for branch growth. The pruning operation is not performed during the growth process, and the best is calculated. (2) Combine the trained P decision trees to get a random forest. According to the P decision trees, multiple tumor patients can be predicted, and the final prediction result is the output of the random forest by weighting or voting. In the process of training each decision tree, it can be calculated how much each tree reduces the impurity of each feature. For a decision tree forest, it is possible to calculate how much impurity each feature has reduced on average, and use the average impurity that it has reduced as the evaluation criterion of how much the contribution value is. For example, the mutation information of the gene with the most reduced impurity can be used as the feature with the largest contribution value, and so on, to determine the contribution value of different mutant genes to the model (as shown in Table 2), so as to screen out at least some genes for tumor prognosis prediction Related genes. For example, n mutant genes with the largest contribution to the tumor prognosis prediction model (such as 20, 29, 40, 100, etc.) can be selected from the mutant genes that have a significant impact on tumorigenesis as tumor prognosis prediction correlation gene.
表2不同突变基因对模型的贡献值列表Table 2 List of contribution values of different mutant genes to the model
在一些实施例中,可以对训练获得的肿瘤预后预测模型进行验证。例如,对于支持向量机模型,可以采用交叉验证法验证模型效果。具体地,交叉验证方法可以包括:留出法(Hold-Out Method)、K折交叉验证法(K-fold Cross Validation,K-CV)和留一交叉验证法(Leave-One-Out Cross Validation,LOO-CV)。以LOO-CV为例,可以将训练样本分为样本总数份(例如,93份),将其中1份作为验证样本,其余92份作为训练样本输入初始支持向量机模型进行训练,重复交叉验证过程93次,获得93个验证结果,联合所述93个验证结果确定训练获得的肿瘤预后预测模型的最终验证结果。进一步,可以根据验证结果绘制出受试者工作特征曲线(ROC曲线)已直观表示(如图10所示)。如图10所示,该ROC曲线上的点代表在不同截断条件(如预后效果分类标准)下骨肉瘤预后预测模型的敏感性及特异性。该ROC曲线的最左上角的点靠近左上角,可以反映出本实施例中获得的骨肉瘤预后预测模型预测准确性较高;该ROC曲线下方的面积AUC为0.988,非常接近1,可以反映出本实施例中获得的骨肉瘤预后预测模型分类效果较好;此外,本申请的骨肉瘤预后预测模型在不同截断条件下的敏感性均值(0.95)和特异性均值(0.97)均较高。In some embodiments, the tumor prognosis prediction model obtained by training can be verified. For example, for the support vector machine model, you can use the cross-validation method to verify the model effect. Specifically, the cross-validation method may include: Hold-Out Method, K-fold Cross-Validation (K-CV) and Leave-One-Out Cross-Validation, LOO-CV). Taking LOO-CV as an example, the training sample can be divided into the total number of samples (for example, 93), one of which is used as a verification sample, and the remaining 92 are used as training samples to input the initial support vector machine model for training, and the cross-validation process is repeated 93 times, 93 verification results were obtained, and the 93 verification results were combined to determine the final verification result of the tumor prognosis prediction model obtained by training. Further, the receiver operating characteristic curve (ROC curve) can be drawn according to the verification result and has been visually represented (as shown in FIG. 10). As shown in FIG. 10, the points on the ROC curve represent the sensitivity and specificity of the osteosarcoma prognosis prediction model under different truncation conditions (such as prognostic effect classification criteria). The upper left corner of the ROC curve is close to the upper left corner, which can reflect the higher prediction accuracy of the osteosarcoma prognosis prediction model obtained in this example; the area under the ROC curve is 0.988, very close to 1, which can reflect The osteosarcoma prognosis prediction model obtained in this example has a good classification effect; in addition, the osteosarcoma prognosis prediction model of the present application has higher sensitivity average value (0.95) and specificity average value (0.97) under different truncation conditions.
在本实施例中,额外选取了6位骨肉瘤患者(已知其中4位的预后效果差,另外2位的预后效果好)。获取其骨肉瘤病变部位的基因突变信息,基于该信息,根据本实施例中训练获得的骨肉瘤预后预测模型确定该6位骨肉瘤患者的预后预测结果(如表3所示,其中预测值的阈值设为0.5,小于0.5为预后好,大于0.5为预后差),所得预 测结果与已知预后效果完全一致。In this example, 6 additional patients with osteosarcoma were selected (4 of which are known to have a poor prognostic effect and the other 2 have a good prognostic effect). Obtain the genetic mutation information of the osteosarcoma lesion site, and based on this information, determine the prognostic prediction results of the 6 osteosarcoma patients according to the osteosarcoma prognosis prediction model trained in this embodiment (as shown in Table 3, where the predicted values The threshold is set to 0.5, less than 0.5 is a good prognosis, and greater than 0.5 is a poor prognosis), the obtained prediction results are completely consistent with the known prognostic effect.
表3骨肉瘤预后预测模型的预测结果与实际预后效果对比Table 3 Comparison of prediction results of osteosarcoma prognosis prediction model and actual prognosis effect
应当注意的是,上述有关流程600的描述仅仅是为了示例和说明,而不限定本申请的适用范围。对于本领域技术人员来说,在本申请的指导下可以对流程600进行各种修正和改变。然而,这些修正和改变仍在本申请的范围之内。It should be noted that the above description of the
本申请实施例可能带来的有益效果包括但不限于:(1)可以实现基于肿瘤患者的基因突变信息预测其预后效果;(2)提高肿瘤预后预测准确率;(3)肿瘤预后预测过程实施方便;(4)为治疗方案的制定、选择提供参考。需要说明的是,不同实施例可能产生的有益效果不同,在不同的实施例里,可能产生的有益效果可以是以上任意一种或几种的组合,也可以是其他任何可能获得的有益效果。The possible benefits brought by the embodiments of the present application include, but are not limited to: (1) the prognostic effect of tumor patients based on gene mutation information can be realized; (2) the accuracy of tumor prognosis prediction is improved; (3) the tumor prognosis prediction process is implemented Convenient; (4) Provide reference for the formulation and selection of treatment plan. It should be noted that different embodiments may have different beneficial effects. In different embodiments, the possible beneficial effects may be any one or a combination of the above, or any other beneficial effects that may be obtained.
上文已对基本概念做了描述,显然,对于本领域技术人员来说,上述详细披露仅仅作为示例,而并不构成对本申请的限定。虽然此处并没有明确说明,本领域技术人员可能会对本申请进行各种修改、改进和修正。该类修改、改进和修正在本申请中被建议,所以该类修改、改进、修正仍属于本申请示范实施例的精神和范围。The basic concept has been described above. Obviously, for those skilled in the art, the above detailed disclosure is only an example, and does not constitute a limitation on the present application. Although it is not explicitly stated here, those skilled in the art may make various modifications, improvements, and amendments to this application. Such modifications, improvements and amendments are suggested in this application, so such modifications, improvements and amendments still belong to the spirit and scope of the exemplary embodiments of this application.
同时,本申请使用了特定词语来描述本申请的实施例。如“一个实施例”、“一实施例”、和/或“一些实施例”意指与本申请至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一个替代性实施例”并不一定是指同一实施例。此外,本申请的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。Meanwhile, this application uses specific words to describe the embodiments of this application. For example, "one embodiment", "one embodiment", and / or "some embodiments" mean a certain feature, structure, or characteristic related to at least one embodiment of the present application. Therefore, it should be emphasized and noted that "one embodiment" or "one embodiment" or "an alternative embodiment" mentioned twice or more at different positions in this specification does not necessarily refer to the same embodiment . In addition, certain features, structures, or characteristics in one or more embodiments of the present application may be combined as appropriate.
此外,本领域技术人员可以理解,本申请的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本申请的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本申请的各方面可能表现为位于一个或多个计算机可读介质中的计算 机产品,该产品包括计算机可读程序编码。In addition, those skilled in the art can understand that various aspects of this application can be illustrated and described through several patentable categories or situations, including any new and useful processes, machines, products, or combinations of materials, or Any new and useful improvements. Correspondingly, various aspects of the present application can be completely executed by hardware, can be completely executed by software (including firmware, resident software, microcode, etc.), or can be executed by a combination of hardware and software. The above hardware or software may be called "data blocks", "modules", "engines", "units", "components" or "systems". In addition, various aspects of this application may appear as a computer product located in one or more computer-readable media, the product including computer-readable program code.
计算机存储介质可能包含一个内含有计算机程序编码的传播数据信号,例如在基带上或作为载波的一部分。该传播信号可能有多种表现形式,包括电磁形式、光形式等,或合适的组合形式。计算机存储介质可以是除计算机可读存储介质之外的任何计算机可读介质,该介质可以通过连接至一个指令执行系统、装置或设备以实现通讯、传播或传输供使用的程序。位于计算机存储介质上的程序编码可以通过任何合适的介质进行传播,包括无线电、电缆、光纤电缆、RF、或类似介质,或任何上述介质的组合。The computer storage medium may contain a propagated data signal containing a computer program code, for example, on baseband or as part of a carrier wave. The propagated signal may have multiple manifestations, including electromagnetic form, optical form, etc., or a suitable combination form. The computer storage medium may be any computer-readable medium except the computer-readable storage medium, and the medium may be connected to an instruction execution system, apparatus, or device to communicate, propagate, or transmit a program for use. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or similar media, or any combination of the foregoing.
本申请各部分操作所需的计算机程序编码可以用任意一种或多种程序语言编写,包括面向对象编程语言如Java、Scala、Smalltalk、Eiffel、JADE、Emerald、C++、C#、VB.NET、Python等,常规程序化编程语言如C语言、Visual Basic、Fortran 2003、Perl、COBOL 2002、PHP、ABAP,动态编程语言如Python、Ruby和Groovy,或其他编程语言等。该程序编码可以完全在用户计算机上运行、或作为独立的软件包在用户计算机上运行、或部分在用户计算机上运行部分在远程计算机运行、或完全在远程计算机或服务器上运行。在后种情况下,远程计算机可以通过任何网络形式与用户计算机连接,比如局域网(LAN)或广域网(WAN),或连接至外部计算机(例如通过因特网),或在云计算环境中,或作为服务使用如软件即服务(SaaS)。The computer program code required for the operation of each part of this application can be written in any one or more programming languages, including object-oriented programming languages such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C ++, C #, VB.NET, Python Etc., conventional programming languages such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may run entirely on the user's computer, or as an independent software package on the user's computer, or partly on the user's computer, partly on a remote computer, or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any network, such as a local area network (LAN) or a wide area network (WAN), or connected to an external computer (eg, via the Internet), or in a cloud computing environment, or as a service Use as software as a service (SaaS).
此外,除非权利要求中明确说明,本申请所述处理元素和序列的顺序、数字字母的使用、或其他名称的使用,并非用于限定本申请流程和方法的顺序。尽管上述披露中通过各种示例讨论了一些目前认为有用的发明实施例,但应当理解的是,该类细节仅起到说明的目的,附加的权利要求并不仅限于披露的实施例,相反,权利要求旨在覆盖所有符合本申请实施例实质和范围的修正和等价组合。例如,虽然以上所描述的系统组件可以通过硬件设备实现,但是也可以只通过软件的解决方案得以实现,如在现有的服务器或移动设备上安装所描述的系统。In addition, unless explicitly stated in the claims, the order of processing elements and sequences, the use of alphanumeric characters, or the use of other names in the present application are not intended to limit the order of the processes and methods of the present application. Although the above disclosure discusses some currently considered useful embodiments of the invention through various examples, it should be understood that such details are for illustrative purposes only, and the appended claims are not limited to the disclosed embodiments. The requirement is to cover all amendments and equivalent combinations that conform to the essence and scope of the embodiments of the present application. For example, although the system components described above can be implemented by hardware devices, they can also be implemented by software solutions only, such as installing the described system on an existing server or mobile device.
同理,应当注意的是,为了简化本申请披露的表述,从而帮助对一个或多个发明实施例的理解,前文对本申请实施例的描述中,有时会将多种特征归并至一个实施例、附图或对其的描述中。但是,这种披露方法并不意味着本申请对象所需要的特征比权利要求中提及的特征多。实际上,实施例的特征要少于上述披露的单个实施例的全部特征。For the same reason, it should be noted that, in order to simplify the expression disclosed in this application and thereby help to understand one or more embodiments of the invention, in the foregoing description of the embodiments of this application, various features are sometimes merged into one embodiment, In the drawings or its description. However, this disclosure method does not mean that the object of this application requires more features than those mentioned in the claims. In fact, the features of the embodiments are less than all the features of the single embodiments disclosed above.
一些实施例中使用了描述成分、属性数量的数字,应当理解的是,此类用于实施例描述的数字,在一些示例中使用了修饰词“大约”、“近似”或“大体上”来修饰。除非另外说明,“大约”、“近似”或“大体上”表明所述数字允许有±20%的变化。 相应地,在一些实施例中,说明书和权利要求中使用的数值参数均为近似值,该近似值根据个别实施例所需特点可以发生改变。在一些实施例中,数值参数应考虑规定的有效数位并采用一般位数保留的方法。尽管本申请一些实施例中用于确认其范围广度的数值域和参数为近似值,在具体实施例中,此类数值的设定在可行范围内尽可能精确。Some embodiments use numbers describing the number of components and attributes. It should be understood that such numbers used in embodiment descriptions use the modifiers "about", "approximately", or "generally" in some examples. Grooming. Unless otherwise stated, "approximately", "approximately" or "substantially" indicates that the figures allow a variation of ± 20%. Correspondingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, and the approximate value may be changed according to the characteristics required by individual embodiments. In some embodiments, the numerical parameters should consider the specified significant digits and adopt the method of general digit retention. Although the numerical fields and parameters used to confirm the breadth of the ranges in some embodiments of the present application are approximate values, in specific embodiments, the setting of such numerical values is as accurate as possible within the feasible range.
针对本申请引用的每个专利、专利申请、专利申请公开物和其他材料,如文章、书籍、说明书、出版物、文档等,特此将其全部内容并入本申请作为参考。与本申请内容不一致或产生冲突的申请历史文件除外,对本申请权利要求最广范围有限制的文件(当前或之后附加于本申请中的)也除外。需要说明的是,如果本申请附属材料中的描述、定义、和/或术语的使用与本申请所述内容有不一致或冲突的地方,以本申请的描述、定义和/或术语的使用为准。For each patent, patent application, patent application publication, and other materials cited in this application, such as articles, books, specifications, publications, documents, etc., the entire contents are hereby incorporated by reference into this application. Except for application history documents that are inconsistent with or conflict with the content of this application, documents that have the widest scope of claims in this application (current or later appended to this application) are also excluded. It should be noted that if there is any inconsistency or conflict between the descriptions, definitions, and / or terms used in the accompanying materials of this application and the content described in this application, the descriptions, definitions, and / or terms used in this application shall prevail .
最后,应当理解的是,本申请中所述实施例仅用以说明本申请实施例的原则。其他的变形也可能属于本申请的范围。因此,作为示例而非限制,本申请实施例的替代配置可视为与本申请的教导一致。相应地,本申请的实施例不仅限于本申请明确介绍和描述的实施例。Finally, it should be understood that the embodiments described in this application are only used to illustrate the principles of the embodiments of this application. Other variations may also fall within the scope of this application. Therefore, as an example and not a limitation, the alternative configuration of the embodiments of the present application can be regarded as consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to the embodiments explicitly introduced and described in the present application.
Claims (30)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201880002164.4A CN109642258B (en) | 2018-10-17 | 2018-10-17 | Method and system for tumor prognosis prediction |
| PCT/CN2018/110565 WO2020077552A1 (en) | 2018-10-17 | 2018-10-17 | Tumor prognostic prediction method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2018/110565 WO2020077552A1 (en) | 2018-10-17 | 2018-10-17 | Tumor prognostic prediction method and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020077552A1 true WO2020077552A1 (en) | 2020-04-23 |
Family
ID=66060220
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/110565 Ceased WO2020077552A1 (en) | 2018-10-17 | 2018-10-17 | Tumor prognostic prediction method and system |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN109642258B (en) |
| WO (1) | WO2020077552A1 (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110675956A (en) * | 2019-08-26 | 2020-01-10 | 南京医渡云医学技术有限公司 | Method and device for determining facial paralysis treatment scheme, readable medium and electronic equipment |
| CN110993106A (en) * | 2019-12-11 | 2020-04-10 | 深圳市华嘉生物智能科技有限公司 | Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information |
| CN111528918B (en) * | 2020-04-30 | 2023-02-21 | 深圳开立生物医疗科技股份有限公司 | Tumor volume change trend graph generation device after ablation, equipment and storage medium |
| CN111784637B (en) * | 2020-06-04 | 2024-06-14 | 复旦大学附属中山医院 | A prognostic feature visualization method, system, device and storage medium |
| CN112397172A (en) * | 2020-12-24 | 2021-02-23 | 上海墩庐生物医学科技有限公司 | Intelligent consultant internet application system for breast cancer survival |
| CN113345564B (en) * | 2021-05-31 | 2022-08-05 | 电子科技大学 | A method and device for early prediction of hospitalization length of patients based on graph neural network |
| CN114627969A (en) * | 2022-03-23 | 2022-06-14 | 中国医学科学院肿瘤医院 | Prognostic prediction model and kit for sarcoma patients based on complement-related genes |
| CN115620854A (en) * | 2022-09-21 | 2023-01-17 | 沈阳金域医学检验所有限公司 | Prognosis model establishment method, device, equipment and storage medium |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106202969A (en) * | 2016-08-01 | 2016-12-07 | 东北大学 | A kind of tumor cells typing prognoses system |
| CN106960122A (en) * | 2017-03-17 | 2017-07-18 | 晶能生物技术(上海)有限公司 | Genetic disease Forecasting Methodology and device caused by gene mutation |
| CN107169264A (en) * | 2017-04-14 | 2017-09-15 | 广东药科大学 | A kind of complex disease diagnostic method and system |
| CN107341366A (en) * | 2017-07-19 | 2017-11-10 | 西安交通大学 | A kind of method that complex disease susceptibility loci is predicted using machine learning |
| CN107833636A (en) * | 2017-12-04 | 2018-03-23 | 浙江鸿赋堂健康管理有限公司 | A kind of tumour Forecasting Methodology |
| CN108416190A (en) * | 2018-02-11 | 2018-08-17 | 广州市碳码科技有限责任公司 | Tumour methods for screening, device, equipment and medium based on deep learning |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9342657B2 (en) * | 2003-03-24 | 2016-05-17 | Nien-Chih Wei | Methods for predicting an individual's clinical treatment outcome from sampling a group of patient's biological profiles |
| CN102713606A (en) * | 2009-11-13 | 2012-10-03 | 无限制药股份有限公司 | Compositions, kits, and methods for identification, assessment, prevention, and therapy of cancer |
| WO2012166700A2 (en) * | 2011-05-29 | 2012-12-06 | Lisanti Michael P | Molecular profiling of a lethal tumor microenvironment |
| AU2013229762A1 (en) * | 2012-03-09 | 2014-09-25 | Caris Life Sciences Switzerland Holdings Gmbh | Biomarker compositions and methods |
| BR112015006273A2 (en) * | 2012-09-21 | 2017-07-04 | Inst Nat Sante Rech Med | '' prognostic method, kit comprising reagents, therapeutic cytotoxic chemotherapeutic agent, use of a therapeutic cytotoxic chemotherapeutic agent, global survival prognostic system and computer readable medium '' |
| WO2016141324A2 (en) * | 2015-03-05 | 2016-09-09 | Trovagene, Inc. | Early assessment of mechanism of action and efficacy of anti-cancer therapies using molecular markers in bodily fluids |
| EP3271848B1 (en) * | 2015-03-16 | 2025-03-12 | Personal Genome Diagnostics, Inc. | Systems and methods for analyzing nucleic acid |
| EP3304085B1 (en) * | 2015-05-27 | 2021-06-23 | Cannabics Pharmaceuticals Inc | System and method for high throughput screening of cancer cells |
| US20180246104A1 (en) * | 2015-08-18 | 2018-08-30 | Agency For Science, Technology And Research | Method for detecting circulating tumor cells and uses thereof |
| CN109563544A (en) * | 2015-10-08 | 2019-04-02 | 会聚基因学有限公司 | Diagnostic assay for urine monitoring of bladder cancer |
| WO2017096248A1 (en) * | 2015-12-02 | 2017-06-08 | Clearlight Diagnostics Llc | Methods for preparing and analyzing tumor tissue samples for detection and monitoring of cancers |
| JP2017216882A (en) * | 2016-06-02 | 2017-12-14 | 国立大学法人金沢大学 | Method for detecting presence of bone disorder, therapeutic agent for bone disorder, and screening method for therapeutic agent |
| CN107545144B (en) * | 2017-09-05 | 2020-12-29 | 上海市内分泌代谢病研究所 | Molecular marker-based prediction system for pheochromocytoma metastasis |
-
2018
- 2018-10-17 WO PCT/CN2018/110565 patent/WO2020077552A1/en not_active Ceased
- 2018-10-17 CN CN201880002164.4A patent/CN109642258B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106202969A (en) * | 2016-08-01 | 2016-12-07 | 东北大学 | A kind of tumor cells typing prognoses system |
| CN106960122A (en) * | 2017-03-17 | 2017-07-18 | 晶能生物技术(上海)有限公司 | Genetic disease Forecasting Methodology and device caused by gene mutation |
| CN107169264A (en) * | 2017-04-14 | 2017-09-15 | 广东药科大学 | A kind of complex disease diagnostic method and system |
| CN107341366A (en) * | 2017-07-19 | 2017-11-10 | 西安交通大学 | A kind of method that complex disease susceptibility loci is predicted using machine learning |
| CN107833636A (en) * | 2017-12-04 | 2018-03-23 | 浙江鸿赋堂健康管理有限公司 | A kind of tumour Forecasting Methodology |
| CN108416190A (en) * | 2018-02-11 | 2018-08-17 | 广州市碳码科技有限责任公司 | Tumour methods for screening, device, equipment and medium based on deep learning |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109642258B (en) | 2020-06-09 |
| CN109642258A (en) | 2019-04-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109642258B (en) | Method and system for tumor prognosis prediction | |
| You et al. | Integrated classification of prostate cancer reveals a novel luminal subtype with poor outcome | |
| CN112805563B (en) | Cell-free DNA for the assessment and/or treatment of cancer | |
| US20200405225A1 (en) | Methods and systems for identifying or monitoring lung disease | |
| CN110689921B (en) | Microsatellite instability detection device, computer equipment and computer storage medium | |
| van Ginkel et al. | Targeted sequencing reveals TP53 as a potential diagnostic biomarker in the post-treatment surveillance of head and neck cancer | |
| Newhook et al. | A thirteen-gene expression signature predicts survival of patients with pancreatic cancer and identifies new genes of interest | |
| CA3214391A1 (en) | Cell-free dna sequence data analysis method to examine nucleosome protection and chromatin accessibility | |
| Kang et al. | Establishment of a platform of non-small-cell lung cancer patient-derived xenografts with clinical and genomic annotation | |
| Bryant et al. | Clinically relevant characterization of lung adenocarcinoma subtypes based on cellular pathways: an international validation study | |
| CN120005996A (en) | A set of gene markers for predicting or evaluating the response to neoadjuvant chemotherapy in breast cancer and their application | |
| Van Swearingen et al. | Genomic and immune profiling of breast cancer brain metastases | |
| TW201926094A (en) | Subtyping of TNBC and methods | |
| EP4010490B1 (en) | Molecular classifiers for prostate cancer | |
| CN120148880A (en) | Construction and application of a prediction model for long-term lung metastasis in stage II-III colorectal cancer after surgery | |
| Belvedere et al. | A computational index derived from whole-genome copy number analysis is a novel tool for prognosis in early stage lung squamous cell carcinoma | |
| Pei et al. | Classification of multiple primary lung cancer in patients with multifocal lung cancer: assessment of a machine learning approach using multidimensional genomic data | |
| WO2023078283A1 (en) | Methylation biomarker for breast cancer diagnosis and use thereof | |
| Huang et al. | Circulating tumor DNA-and cancer tissue-based next-generation sequencing reveals comparable consistency in targeted gene mutations for advanced or metastatic non-small cell lung cancer | |
| CN117413071A (en) | Method for preparing a multi-analysis prediction model for cancer diagnosis | |
| Pareja et al. | Genomic Applications in Breast Carcinoma | |
| Foong et al. | Future Role of Molecular Profiling in Small Breast Samples and Personalised Medicine | |
| CN111919257A (en) | Reduce noise in sequencing data | |
| TW201930884A (en) | Method of making a prognosis for patient suffering from colorectal cancer with liver metastasis | |
| Carvalho | HEAD AND NECK SQUAMOUS CELL CARCINOMA: IN |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18937392 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18937392 Country of ref document: EP Kind code of ref document: A1 |