[go: up one dir, main page]

CN119626571B - Liver cancer analysis method, system, equipment and medium based on multimodal deep learning - Google Patents

Liver cancer analysis method, system, equipment and medium based on multimodal deep learning

Info

Publication number
CN119626571B
CN119626571B CN202411709912.6A CN202411709912A CN119626571B CN 119626571 B CN119626571 B CN 119626571B CN 202411709912 A CN202411709912 A CN 202411709912A CN 119626571 B CN119626571 B CN 119626571B
Authority
CN
China
Prior art keywords
data
target
features
methylation
transcriptome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411709912.6A
Other languages
Chinese (zh)
Other versions
CN119626571A (en
Inventor
孙艳芹
宋淑萍
王心怡
成旭晨
林潍轩
王永炫
周锶琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Medical University
Original Assignee
Guangdong Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Medical University filed Critical Guangdong Medical University
Priority to CN202411709912.6A priority Critical patent/CN119626571B/en
Publication of CN119626571A publication Critical patent/CN119626571A/en
Application granted granted Critical
Publication of CN119626571B publication Critical patent/CN119626571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本申请涉及肝癌分析技术领域,特别地,涉及一种基于多模态深度学习的肝癌分析方法、系统、设备及介质。所述方法包括:获取目标患者的病理数据、转录组数据、甲基化数据和临床数据,并分别进行预处理,得到目标病理数据、目标转录组数据、目标甲基化数据以及目标临床数据;分别进行特征提取,得到病理特征、转录组特征、甲基化特征以及临床特征;对病理特征、转录组特征、甲基化特征以及临床特征进行特征融合,得到肝癌分析特征;提取目标患者的脂质代谢相关标记物特征,并根据脂质代谢相关标记物特征和肝癌分析特征对目标患者进行肝癌综合分析。本申请可以全面地反映目标患者的肿瘤特征,提高辅助诊断的准确性,并为目标患者提供准确的预后预测。

The present application relates to the technical field of liver cancer analysis, and in particular, to a liver cancer analysis method, system, device and medium based on multimodal deep learning. The method comprises: obtaining the pathological data, transcriptome data, methylation data and clinical data of the target patient, and performing preprocessing respectively to obtain target pathological data, target transcriptome data, target methylation data and target clinical data; performing feature extraction respectively to obtain pathological features, transcriptome features, methylation features and clinical features; performing feature fusion on the pathological features, transcriptome features, methylation features and clinical features to obtain liver cancer analysis features; extracting lipid metabolism-related marker features of the target patient, and performing a comprehensive liver cancer analysis on the target patient based on the lipid metabolism-related marker features and liver cancer analysis features. The present application can comprehensively reflect the tumor characteristics of the target patient, improve the accuracy of auxiliary diagnosis, and provide accurate prognosis prediction for the target patient.

Description

Liver cancer analysis method, system, equipment and medium based on multi-mode deep learning
Technical Field
The application relates to the technical field of liver cancer analysis, in particular to a liver cancer analysis method, system, equipment and medium based on multi-mode deep learning.
Background
With advances in medical imaging technology and genomics, diagnosis and prognosis prediction of liver tumors is increasingly dependent on multidimensional data analysis. Although medical images such as CT and MRI provide important information on tumor morphology, it is still challenging to distinguish benign and malignant tumors and predict prognosis only by image data, and liver biopsy is still used clinically as a gold standard for diagnosis of benign and malignant tumors in the liver, but has a limit in clinical application as an invasive examination.
Disclosure of Invention
The application provides a liver cancer analysis method, a liver cancer analysis system, liver cancer analysis equipment and a liver cancer analysis medium based on multi-modal deep learning, which can combine data and characteristics from different modalities into a unified characteristic representation, finally classify or predict prognosis, and realize the synergism of multi-modal characteristics.
Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.
According to an aspect of the embodiment of the present application, there is provided a liver cancer analysis method based on multi-modal deep learning, the method including:
obtaining pathology data, transcriptome data, methylation data and clinical data of a target patient;
preprocessing the pathology data, the transcriptome data, the methylation data and the clinical data respectively to obtain target pathology data corresponding to the pathology data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data and target clinical data corresponding to the clinical data;
Performing feature extraction on the target pathology data, the target transcriptome data, the target methylation data and the target clinical data respectively to obtain pathological features corresponding to the target pathology data, transcriptome features corresponding to the target transcriptome data, methylation features corresponding to the target methylation data and clinical features corresponding to the target clinical data;
Performing feature fusion on the pathological features, the transcriptome features, the methylation features and the clinical features to obtain liver cancer analysis features of the target patient;
Extracting lipid metabolism related marker characteristics of the target patient based on the methylation data, and comprehensively analyzing liver cancer of the target patient according to the lipid metabolism related marker characteristics and the liver cancer analysis characteristics.
In one embodiment of the present application, based on the foregoing aspect, the pathology data is a full slice image, and the target pathology data is obtained by:
Performing full-slice image cutting on the pathological data to obtain a plurality of image blocks;
screening out target image blocks with the cell tissue area occupation ratio higher than a preset threshold value from the image blocks;
adjusting the image parameters of each target image block so as to adjust the image parameters to preset reference values;
Generating the target pathology data according to each target image block after the image parameters of each target image block are adjusted to the reference value;
wherein the image parameters include brightness, contrast, gamma correction, and saturation.
In one embodiment of the application, based on the foregoing, the pathological features are obtained by:
Extracting features of the target pathological data through ResNet model to obtain the pathological features of the target patient;
the model weight of the ResNet model is obtained through the following steps:
Carrying out model prediction on the target pathology data by adopting a cross entropy loss function, and calculating the gradient of the loss of the probability difference distribution relative to the ResNet model parameters through a back propagation algorithm;
and calculating the model weight according to the gradient by adopting an SGD (generalized gradient detector) optimizer.
In one embodiment of the application, based on the foregoing protocol, the transcriptome feature and the methylation feature are obtained by:
Respectively performing feature mapping on the target transcriptome data and the target methylation data in a high-dimensional space by adopting a unified manifold approximation and projection method to obtain transcriptome mapping data and methylation mapping data in a low-dimensional space;
determining the transcriptome signature from the transcriptome mapping data and determining the methylation signature from the methylation mapping data;
the unified manifold approximation and projection method specifically comprises the following steps:
S1, carrying out the adjacent probability calculation of data points on a high-dimensional space;
wherein S1 can perform a proximity probability calculation for any two data points in the target transcriptome data or the target methylation data by:
Where dist (x i,xj) represents the Euclidean distance of data point i and data point j, ρ i is the nearest neighbor distance of data point i, σ i is a preset scale parameter, and p ij is the probability of proximity of any two data points.
S2, carrying out similarity calculation on data points in a low-dimensional space;
wherein S2 may perform similarity calculation of any two data points in the transcriptome map data or the methylation map data by the following formula:
Wherein y i and y j are the positions of data points i and j in a low-dimensional space respectively, a and b are super parameters, and q ij is the similarity of any two data points;
s3, determining a loss function of mapping of the high-dimensional space and the low-dimensional space through the following formula:
where L is the loss function.
In one embodiment of the present application, based on the foregoing, the liver cancer analysis characteristic of the target patient is obtained by:
Inputting the pathological features, the transcriptome features, the methylation features and the clinical features into a preset multi-modal fusion model to obtain the liver cancer analysis features output by the multi-modal fusion model;
The mathematical expression of the multi-modal fusion model is as follows:
in the formula, And W class is a weight matrix corresponding to the target pathology data, the target transcriptome data, the target methylation data and the target clinical data for liver cancer analysis characteristics output by the multimodal fusion model. LayerNorm represents layer normalization, f1, f2, f3 and f4 are the pathological, transcriptome, methylation and clinical features, respectively.
In one embodiment of the present application, based on the foregoing, the performing liver cancer integrated analysis on the target patient according to the lipid metabolism-related marker feature and the liver cancer analysis feature comprises:
And diagnosing and prognosis predicting the liver cancer of the target patient according to the lipid metabolism related marker characteristics and the liver cancer analysis characteristics.
According to an aspect of an embodiment of the present application, there is provided a liver cancer analysis system based on multi-modal deep learning, the system including:
an input unit for acquiring pathology data, transcriptome data, methylation data, and clinical data of a target patient;
A preprocessing unit, configured to respectively preprocess the pathology data, the transcriptome data, the methylation data, and the clinical data, to obtain target pathology data corresponding to the pathology data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data, and target clinical data corresponding to the clinical data;
The hierarchical processing unit is used for respectively extracting features of the target pathological data, the target transcriptome data, the target methylation data and the target clinical data to obtain pathological features corresponding to the target pathological data, transcriptome features corresponding to the target transcriptome data, methylation features corresponding to the target methylation data and clinical features corresponding to the target clinical data;
the fusion processing unit is used for carrying out feature fusion on the pathological features, the transcriptome features, the methylation features and the clinical features to obtain liver cancer analysis features of the target patient
And the analysis unit is used for extracting the lipid metabolism related marker characteristics of the target patient based on the methylation data and carrying out liver cancer comprehensive analysis on the target patient according to the lipid metabolism related marker characteristics and the liver cancer analysis characteristics.
According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored thereon a computer program comprising executable instructions which, when executed by a processor, implement a method as described in the above embodiments.
According to one aspect of an embodiment of the present application, there is provided an electronic device including one or more processors, and a memory for storing executable instructions of the processors, which when executed by the one or more processors, cause the one or more processors to implement the method as described in the above embodiments.
The application has the beneficial effects that: compared with the existing liver cancer diagnosis technology, the traditional method usually only depends on CT or MRI and other image data, is easy to cause missed diagnosis or misdiagnosis, integrates a plurality of information sources such as pathological images, transcriptomes, methylation, clinical data and the like, the accuracy of diagnosis is remarkably improved, and the application realizes more comprehensive understanding of tumors through the feature fusion of multi-mode data.
According to the application, the liver cancer analysis characteristics of the target patient are obtained by carrying out characteristic fusion on the pathological characteristics, the transcriptome characteristics, the methylation characteristics and the clinical characteristics, and the liver cancer comprehensive analysis is carried out on the patient by combining the extraction of the lipid metabolism related marker characteristics of the target patient based on the methylation data, so that the tumor characteristics of the target patient can be more comprehensively reflected, the accuracy of auxiliary diagnosis is improved, more comprehensive and deep pathological information is provided for a clinician, and accurate prognosis prediction is provided for the target patient.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 is a flow chart illustrating a liver cancer analysis method based on multi-modal deep learning according to an embodiment of the present application;
FIG. 2 is a logic diagram of a liver cancer analysis system based on multi-modal deep learning according to an embodiment of the present application;
FIG. 3 is a block diagram of a liver cancer analysis system based on multi-modal deep learning according to an embodiment of the present application;
fig. 4 is a schematic diagram showing a system structure of an electronic device according to an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many different forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or micro-control node means.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be noted that the term "plurality" as used herein means two or more. "and/or" describes the association relationship of the association object, and indicates that there may be three relationships, for example, a and/or B may indicate that there are three cases of a alone, a and B together, and B alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The following details the technical background of the examples of the present application and the directions of the lipid metabolism-related markers studied in the examples of the present application:
Liver cancer is a global health challenge, and its incidence is growing worldwide. It is estimated that by 2025, millions of people will be affected by liver cancer each year, with hepatocellular carcinoma (HCC) being the most common form of liver cancer, accounting for 90% of cases. Liver cancer is one of the high-incidence malignant tumors in China, and more than 50% of new liver cancer cases worldwide exist in China every year, and the morbidity and mortality rate of the new liver cancer cases are high. Hepatitis B Virus (HBV), hepatitis C Virus (HCV), cirrhosis, metabolic syndrome, and the like are widely recognized as causative factors of hepatocellular carcinoma. Therefore, the method is particularly important to accurately diagnose the liver cancer and improve the survival rate of liver cancer patients for 5 years. Unfortunately, liver cancer is not susceptible to chemotherapy, so surgical treatment is the treatment of choice for liver cancer patients. However, most hepatocellular carcinoma patients are advanced in diagnosis and only receive systemic treatment. In recent years, systemic therapy for liver cancer patients has evolved from single drug targeted therapies (sorafenib and lenvatinib) to checkpoint inhibitor plus targeted therapies (PD-L1 mab, atilizumab in combination with the VEGF antagonist bevacizumab). Despite the remarkable progress in clinical treatment of liver cancer, only a small fraction of patients can obtain long-lasting clinical benefit. Aiming at the current medical conditions in China, the development of a novel diagnosis system is of great significance to the increasing liver cancer diagnosis demands.
Liver cancer is a multifactorial and multistage process. Liver is a central organ of lipid metabolism, and the development and progression of tumors may depend on energy metabolic processes including lipogenesis and transformation. Lipid metabolism refers to the process of synthesis, breakdown, transport, and storage of lipids in an organism. Lipid metabolism abnormalities are often manifested as abnormal elevation or decrease in blood lipid levels, including imbalance of total cholesterol, triglycerides, low Density Lipoproteins (LDL), high Density Lipoproteins (HDL), and the like. More and more epidemiological and clinical studies have shown that abnormal lipid metabolism is associated with increased risk of liver cancer. For example, a high cholesterol diet may be a risk factor for promoting liver cancer progression, while a low cholesterol diet may increase the risk of liver cancer occurrence. Lipid metabolism abnormality is not only associated with the occurrence of liver cancer, but may also affect the progress and metastasis of liver cancer.
Studies have demonstrated that tumor cells can adapt to their rapid proliferation requirements by altering lipid metabolism, including processes such as fatty acid synthesis, oxidation, and esterification. In addition, alterations in lipid metabolism in the tumor microenvironment may also promote tumor angiogenesis, immune escape, and metastasis. Although studies have revealed a link between abnormal lipid metabolism and liver cancer, the specific mechanism is still not completely understood. The deep research on the action mechanism of lipid metabolism abnormality in liver cancer has important significance for early diagnosis, risk assessment and treatment strategy formulation. In addition, the biological behavior and clinical treatment manifestation of the liver cancer are affected by the abnormal lipid metabolism, and the relevant pathological characterization and molecular markers of the lipid metabolism in each subtype of the liver cancer are excavated, so that the development of new prevention and clinical treatment means is facilitated.
In recent years, deep learning techniques have made remarkable progress in the field of medical image analysis. However, the field of pathology image analysis remains challenging, mainly due to the complex tissue and texture features of pathology images, as well as the small differences between different pathology types. Although Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have found primary applications in CT image feature extraction and tumor prognosis prediction, these approaches focus mainly on single modality data and do not fully observe and combine analysis of human tumor multi-type data. Therefore, there is a need for further improvement in the scalability, reliability, and accuracy of tumor identification and prognosis. Therefore, the multi-mode deep learning model of different mode information can be effectively integrated, and human perception and cognition processes can be better simulated, so that more comprehensive and deeper understanding is realized. Therefore, the multi-mode deep learning model is a key cross fusion research field in medicine. In addition to medical imaging and genomic data, pathology and transcription data can reveal information about tumor internal microstructure and gene expression to further improve diagnosis and prognosis. However, deep learning models based on clinical pathology images in combination with multimodal deep learning models do not find solutions for practical clinical applications. With the reduction of the cost of various molecular analysis and the progress of deep learning technology, the method may become more cost-effective in the future and is helpful for the clinical application of the related technology.
The application focuses on integrating liver cancer data of different modes, combines the multi-factor influence of lipid metabolism on liver cancer, and finally establishes a liver cancer diagnosis classification, prognosis prediction and lipid metabolism related marker expression prediction system based on pathological images by using a multi-mode deep learning technology, thereby providing a practical diagnosis model for clinic. The system is expected to be integrated into clinical procedures to relieve working pressure of pathologists and clinicians, and promote learning and training of inexperienced pathologists to support diagnosis and treatment of liver cancer.
As deep learning techniques progress in the medical field, artificial Intelligence (AI) techniques, including image histology and deep learning methods, have become unique opportunities to improve the clinical care range of hepatocellular carcinoma by predicting the biological characteristics and prognosis of the medical imaging field.
In the field of image histology, diagnostic techniques for liver cancer have become mature. Currently, diagnostic methods based on medical imaging techniques such as CT and MRI are widely used. Tietz E et al construct a multivariate logistic regression model based on CT images to predict new liver cancer in patients with high risk of liver cirrhosis and exhibit good efficacy, further confirming the important role of image histology in liver cancer diagnosis. However, although image histology can provide objective and quantitative diagnostic information, due to the existence of some benign physical lesions, misdiagnosis or missed diagnosis may be caused by limitations of the imaging examination. To overcome these limitations, researchers have actively explored the incorporation of deep learning extracted features into image histology models. Gan Fuwen et al realized preoperative prediction of liver cancer by combining the T2 weighted imaging high-throughput extracted imaging features with the fine-tuning of parameters of the EFFICIENTNET-B7 model and the depth tag constructed by LightGBM. The research not only enriches the connotation of image histology, but also provides a new idea for accurate diagnosis of liver cancer.
Histopathological image examination was used as the gold standard for oncology diagnosis. The images show that the cells and tissue structures of the tissue sections can be used for determining the cell types, the tissue structures, the number and the morphology of abnormal cells and evaluating the specific conditions of liver cancer. In recent years, with the rapid development of artificial intelligence technology, it has become possible to perform histopathological classification by using digital Whole Slide Images (WSI), and diagnosis efficiency and accuracy of liver cancer are greatly improved. Shao Runhua et al have used convolutional neural networks to conduct intensive studies on pathological images of liver cancer, including diagnostic algorithms in three aspects of liver tumor detection, image segmentation and preoperative prediction. Team Nakatsuka T, 5, 2024, proposed a deep learning model that predicts liver cancer progression based on HE-stained HCC pathology pictures. The model is known to exhibit remarkable performance in predictions, with accuracy rates up to 81.0% and AUC values up to 0.80. However, although this outcome is considerable, there is still some lifting space. Therefore, the fusion of a plurality of data modes is considered to construct a multi-mode deep learning model so as to further improve the accuracy and reliability of liver cancer pathological diagnosis. This is not only the main direction of the study, but also the important development trend in the future liver cancer diagnosis field.
At present, the application of artificial intelligence in tumor pathological tissue marker analysis has achieved remarkable results. Saha et al propose a deep learning based Her2Net model that enables automated assessment of breast cancer Her-2 staining. The model not only realizes high accuracy, but also has false positive rate of only 6.84%. This result fully demonstrates the great potential of AI in tumor pathology diagnosis. The AI-assisted tumor pathological diagnosis has the advantages of automation, high efficiency, repeatability and the like, particularly has great advantages in the aspect of quantitative interpretation of tissue pathological markers, and is beneficial to improving the objectivity and accuracy of tumor concomitant diagnosis. This also provides a new idea for the present study, namely by integrating multiple models and multiple sets of mathematical information, a deeper analysis and diagnosis of tumors.
Serum markers are the most convenient and noninvasive diagnostic methods in clinical applications. Among the numerous tumor markers, alpha Fetoprotein (AFP), alpha fetoprotein heteroplastin (AFP-L3%), and abnormal prothrombin (DCP) are referred to as "liver cancer three". At present, international liver cancer triple detection is a common serum marker combination for early diagnosis of liver cancer, and research shows that DCP, AFP and AFP-L3% combined detection can improve the detection rate of liver cancer to 85.9% -94.57%, and can basically meet the requirement of liver cancer diagnosis. However, AFP-L3% detection is still a difficult problem in three tests of liver cancer due to methodological limitations. The main problems are that the operation process is complicated, the time consumption is long, and the detection result needs to be multiplied by the dilution factor and the like, so that a large amount of labor is consumed, the detection cost is increased, and the wide application of the kit in clinic is limited. Therefore Moldogazieva NT et al propose that a high-throughput proteomic analysis technique can be combined with an artificial intelligence algorithm and a predictive model to further explore new HCC candidate biomarkers, improve the sensitivity and specificity of HCC detection, and provide powerful support for predicting therapeutic response.
With advances in medical imaging technology and genomics, diagnosis and prognosis prediction of liver tumors is increasingly dependent on multidimensional data analysis. Although medical images such as CT and MRI provide important information on tumor morphology, it is still challenging to distinguish benign and malignant tumors and predict prognosis only by image data, and liver biopsy is still used clinically as a gold standard for diagnosis of benign and malignant tumors in the liver, but has a limit in clinical application as an invasive examination. In recent years, deep learning has made a significant breakthrough in the field of medical image analysis, but in the field of pathological images, classification research on benign and malignant tumors of liver is still insufficient, mainly due to complex tissue structures and texture features of pathological images and subtle differences among different pathological types. Therefore, comprehensive analysis combined with data of multiple modes becomes a key research direction. In addition to medical imaging histology, transcriptomics and other histology data can also reveal microstructure and gene expression information inside tumors, so that diagnosis and prognosis prediction effects are improved.
At present, deep learning has been primarily applied in the diagnosis and prognosis prediction of other tumors, such as Convolutional Neural Networks (CNNs) for CT image feature extraction and classification, and cyclic neural networks (RNNs) for modeling of time series medical data to predict tumor prognosis, and we will also use deep learning techniques to study in liver tumors. However, the current tumor deep learning prediction scheme mainly focuses on single-mode data, and cannot fully utilize the correlation and complementarity between multi-mode data, so that the method still needs to be improved in the precise identification and prognosis prediction of benign and malignant tumors.
The application aims to establish a liver cancer diagnosis classification, prognosis prediction and lipid metabolism related marker expression prediction system based on pathological images, which comprises an input unit, a preprocessing unit, a layering processing unit and an analysis unit, wherein the input unit is used for acquiring pathological data, transcriptome data, methylation data and clinical data of a patient, the preprocessing unit is used for respectively preprocessing the pathological data, the transcriptome data, the methylation data and the clinical data, the layering processing unit is used for inputting various preprocessed data into a multi-mode prediction model (multi-mode fusion model) fused with a pathological model, the transcriptome model, the methylation model and the clinical model, predicting to obtain a pathological result (pathological feature), a transcriptome result (transcriptome feature), a methylation result (methylation feature) and a clinical result (clinical feature), and the analysis unit is used for obtaining a final tumor classification, prognosis prediction and lipid metabolism marker expression prediction result through the pathological result (pathological feature), the transcriptome result (transcriptome feature), the methylation result (methylation feature) and the clinical result (clinical feature).
The application effectively fuses various tumor data by utilizing a multi-mode deep learning technology, more comprehensively reflects the characteristics of tumors, more effectively assists clinical diagnosis, provides more comprehensive and deep pathological information for doctors, and can be widely applied to clinical work. In the later research, the method can be applied to different medical tasks and scenes in other medical fields, such as disease diagnosis, drug development, gene editing and the like by properly adjusting the model structure and the data processing flow.
The following is a detailed explanation of terms used in the embodiments of the present application:
1. The multi-mode fusion model is a deep learning model combining different types of data (such as pathological data, transcriptome data, methylation data and clinical data), and can comprehensively simulate the biological characteristics of liver cancer by fusing the data of multiple modes, so as to provide more accurate information for clinical diagnosis and prognosis.
2. Pathology data (whole slice image), high resolution images obtained by pathological sections in medical diagnosis, can reflect microscopic features of tissues and cell structures. The Whole-slice image (white SLIDE IMAGE, WSI) is a standard format for pathological image analysis, containing complete information of tumor tissue.
3. Transcriptome data, gene expression data obtained by RNA sequencing techniques, is used to analyze the transcriptional level of genes in cells or tissues, revealing the biological properties of liver cancer and response to treatment.
4. Methylation data DNA methylation is a gene regulation method that affects gene expression by changing the methylation state on DNA. Methylation data is typically obtained by methylation microarray technology and is used to analyze genomic modification information of tumors, reflecting epigenetic characteristics in the development of liver cancer.
5. Clinical data, patient information recorded in diagnosis and treatment of liver cancer, including age, sex, disease history, liver cirrhosis state, etc. of the patient can provide personalized basis for risk assessment and treatment scheme of tumor.
6. Lipid metabolism, the process of synthesis, decomposition and transport of lipids in living bodies, is closely related to the occurrence of liver cancer.
7. Feature weighting, namely calculating weights of different modes of data through a Softmax algorithm, and applying the weights to a feature technology so as to ensure that importance of each mode is reasonably considered in a multi-mode data fusion process.
8. Layer normalization (LayerNormalization) is a normalization technique for adjusting the distribution of features in a neural network layer to prevent excessive differences in features between different modalities and thereby stabilize the training process of the model.
9. And (3) data cleaning, namely processing noise, missing values and abnormal values in the data set in the data processing process so as to improve the data quality and the training effect of the model.
10. And (3) data amplification, namely generating additional training data by random cutting, overturning, color dithering and other methods, so that the generalization capability of the model is enhanced, and the overfitting is reduced.
11. A convolutional neural network (Convolutional Neural Network, CNN) is a deep learning model for image processing, which extracts spatial features of images through convolutional operations, and is widely used in medical image analysis.
12. Cross entropy loss function (Cross-Entropy Loss Function), a commonly used loss function, is used for measuring the difference between the prediction probability distribution output by the model and the real label, and helps to optimize the classification performance of the model.
13. Pre-training PRETRAINING the process of pre-training the model on a large dataset, allowing the model to learn generic features, thereby more efficiently training specific tasks in small sample data application scenarios.
14. Super-parameter optimization, namely adjusting parameters such as learning rate, regularization coefficient and the like in model training so as to realize the best performance of model performance.
UMAP (unified manifold approximation and projection, uniform Manifold Approximation and Projection) a nonlinear dimension reduction technique, which maintains the topology of high-dimensional data by optimizing the data layout, facilitates the observation of data distribution in low-dimensional space for analysis of high-dimensional molecular data such as transcriptome and methylation data.
16. The fully connected layer (Fully Connected Layer) is a layer in the neural network, which is used for connecting each input node to each output node, and is usually used for classifying the last layer of the task and completing the classified output of the model.
Softmax, a mathematical function for multi-classification tasks, is commonly used at the output layer of neural networks. The Softmax function converts a set of real numbers into a probability distribution, with the probability value for each category between 0 and 1, and the sum of the probabilities for all categories is 1.
The implementation details of the technical scheme of the embodiment of the application are described in detail below:
according to an aspect of the present application, there is provided a liver cancer analysis method based on multi-modal deep learning, and fig. 1 is a flowchart illustrating a liver cancer analysis method based on multi-modal deep learning according to an embodiment of the present application, where the liver cancer analysis method based on multi-modal deep learning at least includes steps 110 to 150, and is described in detail as follows:
In step 110, pathology data, transcriptome data, methylation data, and clinical data of the target patient are acquired.
Specifically, the multi-mode data in the embodiment of the application at least comprises pathology data, transcriptome data, methylation data and clinical data, so that a plurality of information sources such as pathology images, transcriptomes, methylation and clinical data are applied for synthesis, the accuracy of diagnosis is remarkably improved, and the more comprehensive understanding of tumors is realized through the feature fusion of the multi-mode data.
The pathology data, the transcriptome data, the methylation data and the clinical data are preprocessed in step 120, respectively, to obtain target pathology data corresponding to the pathology data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data and target clinical data corresponding to the clinical data.
In one embodiment of the present application, the pathology data is a full slice image, and the target pathology data is obtained by:
Performing full-slice image cutting on the pathological data to obtain a plurality of image blocks;
screening out target image blocks with the cell tissue area occupation ratio higher than a preset threshold value from the image blocks;
adjusting the image parameters of each target image block so as to adjust the image parameters to preset reference values;
Generating the target pathology data according to each target image block after the image parameters of each target image block are adjusted to the reference value;
wherein the image parameters include brightness, contrast, gamma correction, and saturation.
The application provides a pathology model, a transcriptome model, a methylation model and a clinical model, and further provides a multi-mode prediction model obtained by fusing the pathology model, the transcriptome model, the methylation model and the clinical model. The following are detailed explanations of pathology models, transcriptome models, methylation models, and clinical models:
In particular, pathology data, including full slice images (white SLIDE IMAGES, WSI), play a critical role in disease diagnosis, research, and treatment. WSI has great demands on computing resources for deep learning due to its huge data volume and high resolution. To make more efficient use of these images, we will cut the original image, breaking it down into image blocks (patches) of lower pixel values, which can reduce the complexity of our data processing. The patches were filtered after the initial cut and used for data enhancement. By utilizing various types of image processing operations, patches are ultimately randomly cropped into image blocks with lower pixel values. And then further removing the region with low tissue area ratio, wherein the removing step is performed by a skilled pathologist to remove the low-quality region caused by the HE dyeing technology factors. According to the embodiment of the application, the pathological data are input into the pathological model, so that the generalization of the pathological model can be enhanced, the overfitting of the pathological model to training data is reduced, and the adaptability and the robustness of the pathological model to the change of the real image are improved.
For pathology data, a related preprocessing of the image data is required. By personalizing specific parameters of each image block, such as brightness, contrast, gamma correction and saturation, the brightness, contrast, gamma correction and saturation distribution of the image are adjusted to be close to preset reference values, that is, preset average values. The next stage of the process will involve examining the preprocessing method of the input image to match the input requirements of the deep learning model (multimodal fusion model). In particular, normalization is required for each channel of the image.
The preprocessing of transcriptome data, the methylation data and the clinical data is described in detail below:
Comprehensive data cleansing and preprocessing of transcriptome data was performed. At the same time, the methylation data is cleaned, organized and quality controlled on the same platform. In the model training process, we normalize it to eliminate the effects of inter-sample differences and gene length, converting it into a format suitable for model training.
Clinical data are statistical indexes for evaluating possible disease development results, patient treatment effects and survival prospects, which are displayed in the form of text descriptions or charts, generally comprise survival time, disease recurrence conditions, rehabilitation states and the like, can assist doctors in making more personalized treatment schemes, are helpful for deep understanding of natural disease course, influencing factors and the like of diseases, and provide basis for new treatment methods and drug development. In addition, clinical prognostic data also includes patient conditions of age, sex, race, etc., which also affect the patient's tumor prognosis to some extent. In practical application, the clinical prognosis data can be text description of the prognosis condition of the patient by a doctor or related diagnosis and treatment data recorded in a form. With respect to clinical prognosis data preprocessing, we will perform both missing value processing and outlier processing to ensure data integrity and accuracy.
In step 130, feature extraction is performed on the target pathology data, the target transcriptome data, the target methylation data, and the target clinical data, respectively, to obtain a pathology feature corresponding to the target pathology data, a transcriptome feature corresponding to the target transcriptome data, a methylation feature corresponding to the target methylation data, and a clinical feature corresponding to the target clinical data.
In one embodiment of the application, the pathological features are obtained by:
Extracting features of the target pathological data through ResNet model to obtain the pathological features of the target patient;
the model weight of the ResNet model is obtained through the following steps:
Carrying out model prediction on the target pathology data by adopting a cross entropy loss function, and calculating the gradient of the loss of the probability difference distribution relative to the ResNet model parameters through a back propagation algorithm;
and calculating the model weight according to the gradient by adopting an SGD (generalized gradient detector) optimizer.
Specifically, regarding the construction of the pathology model, a deep learning network model such as a deep Convolutional Neural Network (CNN), a cyclic neural network (RNN), or a Graph Neural Network (GNN) may be used as a base image model to adapt to different data structures and features. The application selects ResNet model 152 as the basic model of the pathological model. ResNet152,152, as a model of convolutional neural network, through pre-training on a large-scale image dataset, can effectively extract key features in the image, which have important reference values for tumor diagnosis.
The construction and training of the image model comprises the steps of using a pre-trained ResNet model 152 as a feature extractor, aiming at a specific pathological image classification task, replacing the last full-connection layer of the model with a new full-connection layer with n output nodes based on practical situation consideration so as to ensure that model output is matched with classification requirements, and adapting to the specific pathological image classification task.
In the training process, the parameters of the model are finely adjusted aiming at the tumor diagnosis task so as to improve the adaptability of the model to the task and output the tumor label and the corresponding probability thereof. The difference between the model predicted probability distribution and the true labels is measured by using a cross entropy loss function, and the gradient of the loss relative to the model parameters is calculated by a back propagation algorithm. Model weights are updated according to the calculated gradients using an SGD optimizer, where the learning rate is initially set to 0.001 and decays after every a epochs by multiplying by 0.1. During the training of b epochs, the learning rate is continuously adjusted to optimize model performance. epoch represents the process by which the entire training dataset is model traversed and learned once. One epoch is the process by which the algorithm completes one complete forward (forwardpass) and backward (backwardpass) propagation for all training samples.
In one embodiment of the application, the transcriptome feature and the methylation feature are obtained by:
Respectively performing feature mapping on the target transcriptome data and the target methylation data in a high-dimensional space by adopting a unified manifold approximation and projection method to obtain transcriptome mapping data and methylation mapping data in a low-dimensional space;
determining the transcriptome signature from the transcriptome mapping data and determining the methylation signature from the methylation mapping data;
the unified manifold approximation and projection method specifically comprises the following steps:
S1, carrying out the adjacent probability calculation of data points on a high-dimensional space;
wherein S1 can perform a proximity probability calculation for any two data points in the target transcriptome data or the target methylation data by:
Where dist (x i,xj) represents the Euclidean distance of data point i and data point j, ρ i is the nearest neighbor distance of data point i, σ i is a preset scale parameter, and p ij is the probability of proximity of any two data points.
S2, carrying out similarity calculation on data points in a low-dimensional space;
wherein S2 may perform similarity calculation of any two data points in the transcriptome map data or the methylation map data by the following formula:
Wherein y i and y j are the positions of data points i and j in a low-dimensional space respectively, a and b are super parameters, and q ij is the similarity of any two data points;
s3, determining a loss function of mapping of the high-dimensional space and the low-dimensional space through the following formula:
where L is the loss function.
In the present application, comprehensive methods are employed to analyze transcriptome data and methylation microarray data (methylation data) to monitor tumor progression and evaluate treatment efficacy. Biomarkers of diagnostic value can be selected by comparing RNA transcriptome data from tumor tissue and normal tissue.
To ensure accuracy and consistency of the data, the transcriptome data is subjected to comprehensive data cleansing and preprocessing. At the same time, the same platform was used for washing, organization and quality control of the methylated microarray data. In the model training process, we normalize it to eliminate the effects of inter-sample differences and gene length, converting it into a format suitable for model training.
After data pretreatment, the key genes of liver tumor classification are identified by combining the characteristics of lipid metabolism related genes through a statistical method and bioinformatics analysis. The change in expression levels of these genes may be closely related to specific biological processes or signaling pathways of tumor development, which is critical to improve model prediction accuracy and model interpretability. We extracted these genes and their copy numbers and put them on a new list for subsequent training.
Prior to model construction, the goal was to evaluate the independent impact of a single data pattern on classification performance. For this purpose we choose UMAP (unified manifold approximation and projection), a recently developed nonlinear dimension reduction technique that uses local manifold approximation and fuzzy simplex representation (local fuzzy simplex representation) to construct the topology of the high-dimensional data. The UMAP algorithm aims to preserve the topological properties of the dataset as the data dimensions are reduced, minimizing cross entropy between the high-and low-dimensional topological representations by optimizing the data layout. We used UMAP packages to reduce the dimensions of transcriptome data and methylated microarray data, and then mapped the data to a low-dimensional space to observe their distribution, thereby more fully understanding the molecular characteristics and therapeutic response of tumors. The main steps and formulas are as follows:
① Adjacent probability computation for high dimensional space
For any two data points i and j, UMAP first calculate their proximity probability in the high-dimensional space. The probability p ij is calculated by:
Where dist (x i,xj) represents the Euclidean distance of data point i and data point j; i
Is the nearest neighbor distance of the data point i, sigma i is a preset scale parameter, p ij
Is the probability of proximity of the arbitrary two data points.
② Similarity calculation in low dimensional space
In the low dimensional space UMAP uses a formula like t-SNE to calculate the similarity q ij of two data points i and j:
wherein y i and y j
Respectively, the positions of data points i and j in a low-dimensional space, a and b are super parameters, q ij
Is the similarity of any two data points.
③ Loss function
UMAP optimizes the low-dimensional representation by minimizing the similarity differences in the high-and low-dimensional spaces. The goal is to minimize cross entropy loss:
This loss function ensures that neighboring points in the high-dimensional space remain adjacent in the low-dimensional space as well, preserving the local structure and topology of the data.
Transcriptomics and methylation models will each be composed of multiple fully connected layers for capturing the mathematical relationship (nonlinear relationship) between patterns and variables in gene expression data.
In one embodiment of the application, a model of clinical data is concerned
Clinical data are statistical indexes for evaluating possible disease development results, patient treatment effects and survival prospects, which are displayed in the form of text descriptions or charts, generally comprise survival time, disease recurrence conditions, rehabilitation states and the like, can assist doctors in making more personalized treatment schemes, are helpful for deep understanding of natural disease course, influencing factors and the like of diseases, and provide basis for new treatment methods and drug development. In addition, clinical prognostic data also includes patient conditions of age, sex, race, etc., which also affect the patient's tumor prognosis to some extent. In practical application, the clinical prognosis data can be text description of the prognosis condition of the patient by a doctor or related diagnosis and treatment data recorded in a form.
Regarding the pretreatment of clinical prognosis data, the integrity and accuracy of the data are ensured by performing a missing value process and an outlier process. And extracting key information in clinical prognosis data, wherein the key information has a large influence on tumor prediction, so that the key information needs to be treated independently.
The clinical input data will include numerical and categorical features. In the clinical model, an embedding layer is used to perform class stitching to form a single input vector. Feature extraction and dimension reduction are then performed on the input vector, introducing nonlinearity and reducing overfitting. The clinical model also exists in the form of a deep learning model that takes clinical input data as input, outputs tumor tags and their corresponding probabilities. In the model construction process, the network structure and parameters are also adjusted to improve the performance and robustness of the model.
In step 140, feature fusion is performed on the pathological feature, the transcriptome feature, the methylation feature and the clinical feature, so as to obtain a liver cancer analysis feature of the target patient.
In one embodiment of the present application, the liver cancer analysis characteristic of the target patient is obtained by:
Inputting the pathological features, the transcriptome features, the methylation features and the clinical features into a preset multi-modal fusion model to obtain the liver cancer analysis features output by the multi-modal fusion model;
The mathematical expression of the multi-modal fusion model is as follows:
in the formula, And W class is a weight matrix corresponding to the target pathology data, the target transcriptome data, the target methylation data and the target clinical data for liver cancer analysis characteristics output by the multimodal fusion model. LayerNorm represents layer normalization, f1, f2, f3 and f4 are the pathological, transcriptome, methylation and clinical features, respectively.
Specifically, in order to construct a deep learning model (multi-modal prediction model) capable of accurately diagnosing tumors, the application adopts a multi-information fusion strategy, and solves the heterogeneity of multi-modal data.
The initial step involves creating a feature fusion layer for fusing feature vectors (128 dimensions per source) from four different sources (pathology data, transcriptome data, methylation data, clinical data) into a low dimensional space. To facilitate efficient integration of these features, a feature weighting layer is introduced. The softmax algorithm ensures that the sum of weights is 1. In the forward propagation phase of the model, weights are first assigned to each feature, and then these weights are multiplied by the corresponding features to achieve feature weighting. The weighted features are then combined into a single feature tensor, which provides a comprehensive feature representation for subsequent fusion and classification. In addition, a layer normalization layer is added to normalize the characteristics, so that the training process is further stabilized. Finally, the fusion features are mapped to final classification output through a classifier layer (linear layer), and the number of the classifier output nodes is matched with the number of task categories.
The mathematical expression of the multi-modal prediction model is as follows.
Representing the output of the multimodal predictive model. W class represents the weight matrix of the pathological model, transcriptional model, methylation model and clinical model outputs. LayerNorm denotes layer normalization. f1, f2, f3 and f4 represent pathological, RNA sequencing result (transcriptome), methylation microarray data (methylation) and clinical features, respectively.
Z i represents the original output of the i-th class. Through the Softmax function, the model can derive the probability of each category, thereby making a classification decision.
In the training stage, the fusion model constructed as a whole is trained, and the performance of the fusion model is continuously improved. The deep learning fusion model after full training and optimization has higher diagnosis capability, and the trained model can be used for predicting patient samples in practical application.
In the prediction stage, relevant data corresponding to each patient (sample) is input into a multi-mode tumor prediction model, and the prediction model generates corresponding tumor labels according to learned knowledge and rules, so that a powerful auxiliary tool is provided for doctors.
In step 150, extracting lipid metabolism related marker characteristics of the target patient based on the methylation data, and performing liver cancer comprehensive analysis on the target patient according to the lipid metabolism related marker characteristics and the liver cancer analysis characteristics.
In one embodiment of the present application, the liver cancer integrated analysis of the target patient according to the lipid metabolism-related marker feature and the liver cancer analysis feature comprises:
And diagnosing and prognosis predicting the liver cancer of the target patient according to the lipid metabolism related marker characteristics and the liver cancer analysis characteristics.
In particular, the multimodal prediction model of the application also comprises a lipid metabolism characteristic prediction module,
The application introduces a lipid metabolism characteristic prediction module into a multi-mode prediction model to reveal the relationship between lipid metabolism abnormality and occurrence and development of diseases in liver cancer patients. The module predicts the expression level of the lipid metabolism related marker through deep feature extraction and analysis based on the lipid metabolism related genes in transcriptome data, and provides auxiliary decision support for personalized treatment. The technology is realized as follows:
Identification and feature extraction of lipid metabolism related genes:
in the transcriptome data preprocessing stage, a bioinformatics method is utilized to identify lipid metabolism genes related to liver cancer development. These genes include key genes in metabolic pathways such as lipid synthesis, breakdown, transport, etc., for example cholesterol synthesis genes, fatty acid synthetases, etc. The screened lipid metabolism gene data are further cleaned and standardized, and the consistency and reliability of the input data are ensured. Meanwhile, the most representative lipid metabolism characteristic is extracted by adopting a dimension reduction technology (such as UMAP) so as to reduce noise interference and improve the stability of the model.
Designing a characteristic prediction model:
The lipid metabolism characteristic prediction module adopts a multi-layer neural network architecture, wherein an input layer receives the processed lipid metabolism gene expression data. The hidden layer transforms the features through nonlinear activation functions to extract potential metabolic feature patterns. The expression situation of various lipid metabolism markers is predicted through the Softmax layer, and the model outputs the expression probability of the lipid metabolism related markers. The finally output probability distribution can reflect the abnormal degree of lipid metabolism of patients and provide visual metabolic state reference for clinic.
Abnormal lipid metabolism state prediction and personalized decision support:
The module combines the expression characteristics of lipid metabolism genes, and can accurately identify liver cancer patients with abnormal lipid metabolism. Such patients may have different prognosis and treatment needs, and the predictive outcome of the model may help doctors identify high risk patients and formulate more accurate treatment regimens for them. For example, in patients with abnormal lipid metabolism, specific lipid metabolism inhibitors or nutritional regulation may become potential therapeutic strategies. The predictive outcome of the model provides a basis for selection of such treatment regimens and enables more personalized therapeutic interventions clinically.
1) The application provides a multi-modal deep learning fusion model which is used for realizing accurate diagnosis and prognosis prediction of liver cancer by fusing pathological data, transcriptome data, methylation data and clinical data. The model further optimizes the fusion effect through feature weighting and layer normalization, and provides higher accuracy and reliability for liver cancer diagnosis.
(2) UMAP dimension reduction mechanism the application uses UMAP (unified manifold approximation and projection) algorithm to reduce dimension of high-dimension data, and reserves local topological structure of multi-mode data. Through UMAP dimension reduction, the data of each mode can still keep the tight distribution of the characteristics after dimension reduction, so that the model can efficiently capture the potential characteristic relation of the data in a low-dimension space. The mechanism reduces the data redundancy, effectively reduces the computation complexity, and provides more concise and information-rich feature representation for subsequent feature fusion and classification.
(3) Feature weighting mechanism the feature weighting mechanism is introduced, the weight of each mode feature is calculated by using the Softmax algorithm, the relative importance among the features can be effectively considered when different mode data are fused, and the overall performance of the model is improved.
(4) The application predicts the expression condition of the lipid metabolism related marker by combining the multi-factor influence of lipid metabolism on liver cancer, provides a new research view for the action of lipid metabolism abnormality in liver cancer and is beneficial to clinical accurate treatment.
(5) The model structure flexibility is that the system has a flexible structure, can adjust the modal input of the model according to actual demands, is suitable for various tumor diagnosis and other medical tasks, and improves the universality and expandability of the model.
In summary, the multimodal predictive model can provide diagnosis suggestions and prognosis prediction results to the physician in real time, helping the physician to better make treatment plans and decisions. The doctor can combine the prediction result of the multi-mode prediction model and the clinical information of the patient to formulate a more accurate treatment scheme. Meanwhile, the multi-mode prediction model can also be used for early screening and prevention of tumors, and the life quality and the health level of people are improved. The multi-mode prediction model is combined with the clinical workflow, so that a convenient, quick and accurate diagnosis support system is provided for doctors. The integration mode can improve the working efficiency of doctors, and the multi-mode prediction model can be better adapted to actual clinical demands.
The application establishes a liver cancer diagnosis classification, prognosis prediction and lipid metabolism related marker expression prediction system based on pathological images, and aims at the problem of single data in the traditional diagnosis method, and the deep learning technology is utilized to effectively fuse pathomics, transcriptomics, methylation and clinical prognosis data, so that the characteristics of tumors can be more comprehensively reflected, the accuracy of auxiliary diagnosis is expected to be improved, and more comprehensive and deep pathological information is provided for clinicians. Aiming at heterogeneity among multi-mode data, the method eliminates difference and conflict through technical means, ensures validity and stability of the model, and simultaneously adjusts different mode data in a self-adaptive mode. The model convergence is accelerated by adopting a technical means, a training data set is expanded, the accuracy and generalization capability of the multi-modal prediction model are improved, the multi-modal prediction model can process data of multiple modes at the same time, and the multi-modal prediction model has stronger feature extraction and classification capability. The multi-mode deep learning model is combined with the clinical workflow, so that real-time diagnosis suggestions and prognosis prediction results are provided for doctors, and different clinical scene requirements can be met.
The multi-modal deep learning model (multi-modal prediction model) has important application value in clinical liver cancer management. By combining pathology data, transcriptome data, methylation data and clinical data, the model can provide more accurate and comprehensive liver cancer diagnosis information for doctors. Particularly in the aspect of early diagnosis of liver cancer, the model utilizes the complementarity of multi-mode data, improves the sensitivity and specificity of detection, and reduces the possibility of misdiagnosis and missed diagnosis. For prognosis prediction, the model can provide detailed prognosis information through multimodal data analysis, assisting doctors in formulating more personalized treatment regimens.
In addition, the application also comprises a prediction function of lipid metabolism characteristics, and provides a new tool for researching the relationship between lipid metabolism abnormality and liver cancer development. This function not only helps identify potentially high-risk patients, but also provides a reference for therapeutic regimens targeting lipid metabolism. The multi-mode deep learning model has strong clinical popularization potential, can be embedded into clinical workflow of a hospital, and helps doctors to diagnose and manage liver cancer patients more accurately and efficiently in actual work. By providing high-quality auxiliary diagnosis information, the application not only reduces the workload of doctors, but also supports the fine development of clinical liver cancer management and improves the survival rate and life quality of patients.
Fig. 2 is a logic diagram of a liver cancer analysis system based on multi-modal deep learning according to an embodiment of the present application, and fig. 3 is a block diagram of a liver cancer analysis system 300 based on multi-modal deep learning according to an embodiment of the present application, wherein the system 300 includes an input unit 301, a preprocessing unit 302, a layering processing unit 303, a fusion processing unit 304, and an analysis unit 305.
An input unit 301 for acquiring pathology data, transcriptome data, methylation data, and clinical data of a target patient;
A preprocessing unit 302, configured to perform preprocessing on the pathology data, the transcriptome data, the methylation data, and the clinical data, respectively, to obtain target pathology data corresponding to the pathology data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data, and target clinical data corresponding to the clinical data;
A hierarchical processing unit 303, configured to perform feature extraction on the target pathology data, the target transcriptome data, the target methylation data, and the target clinical data, respectively, to obtain a pathology feature corresponding to the target pathology data, a transcriptome feature corresponding to the target transcriptome data, a methylation feature corresponding to the target methylation data, and a clinical feature corresponding to the target clinical data;
A fusion processing unit 304 for performing feature fusion on the pathological feature, the transcriptome feature, the methylation feature and the clinical feature to obtain liver cancer analysis feature of the target patient
An analysis unit 305 extracts lipid metabolism-related marker characteristics of the target patient based on the methylation data, and performs liver cancer comprehensive analysis on the target patient according to the lipid metabolism-related marker characteristics and the liver cancer analysis characteristics.
As another aspect, the present application also provides a computer readable storage medium having stored thereon a program product capable of implementing the method provided in the present specification. In some possible implementations, the various aspects of the application may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the application as described in the "example methods" section of this specification, when the program product is run on the terminal device.
A program product for implementing the above method according to an embodiment of the present application may employ a portable compact disc read-only memory (CD-ROM) and comprise program code and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
On the other hand, the application also provides electronic equipment capable of realizing the method.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be referred to herein collectively as a "circuit," "module" or "system.
An electronic device 400 according to such an embodiment of the application is described below with reference to fig. 4. The electronic device 400 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 4, the electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to, at least one processing unit 410 described above, at least one memory unit 420 described above, and a bus 430 that connects the various system components, including memory unit 420 and processing unit 410.
Wherein the storage unit stores program code that is executable by the processing unit 410 such that the processing unit 410 performs steps according to various exemplary embodiments of the present application described in the above-described "example methods" section of the present specification.
The storage unit 420 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 421 and/or cache memory 422, and may further include Read Only Memory (ROM) 423.
The storage unit 420 may also include a program/utility 424 having a set (at least one) of program modules 425, such program modules 425 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 430 may represent one or more of several types of bus structures, including a memory unit bus or memory unit control node, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 400 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 400, and/or any device (e.g., router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 450. Also, electronic device 400 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 460. As shown, the network adapter 460 communicates with other modules of the electronic device 400 over the bus 430. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 400, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solutions according to embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including if the instructions are to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to embodiments of the present application.
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present application, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (8)

1.一种基于多模态深度学习的肝癌分析方法,其特征在于,所述方法包括:1. A liver cancer analysis method based on multimodal deep learning, characterized in that the method comprises: 获取目标患者的病理数据、转录组数据、甲基化数据和临床数据;Obtain pathological data, transcriptome data, methylation data, and clinical data of target patients; 对所述病理数据、所述转录组数据、所述甲基化数据和所述临床数据分别进行预处理,得到与所述病理数据对应的目标病理数据、与所述转录组数据对应的目标转录组数据、与所述甲基化数据对应的目标甲基化数据以及与所述临床数据对应的目标临床数据;Preprocessing the pathological data, the transcriptome data, the methylation data, and the clinical data respectively to obtain target pathological data corresponding to the pathological data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data, and target clinical data corresponding to the clinical data; 对所述目标病理数据、所述目标转录组数据、所述目标甲基化数据以及所述目标临床数据分别进行特征提取,得到与所述目标病理数据对应的病理特征、与所述目标转录组数据对应的转录组特征、与所述目标甲基化数据对应的甲基化特征以及与所述目标临床数据对应的临床特征;performing feature extraction on the target pathological data, the target transcriptome data, the target methylation data, and the target clinical data, respectively, to obtain pathological features corresponding to the target pathological data, transcriptome features corresponding to the target transcriptome data, methylation features corresponding to the target methylation data, and clinical features corresponding to the target clinical data; 对所述病理特征、所述转录组特征、所述甲基化特征以及所述临床特征进行特征融合,得到所述目标患者的肝癌分析特征;Performing feature fusion on the pathological features, the transcriptome features, the methylation features, and the clinical features to obtain liver cancer analysis features of the target patient; 基于所述甲基化数据提取所述目标患者的脂质代谢相关标记物特征,并根据所述脂质代谢相关标记物特征和所述肝癌分析特征对所述目标患者进行肝癌综合分析;extracting lipid metabolism-related marker features of the target patient based on the methylation data, and performing a comprehensive liver cancer analysis on the target patient according to the lipid metabolism-related marker features and the liver cancer analysis features; 其中,所述转录组特征和所述甲基化特征通过以下步骤得到:The transcriptome characteristics and the methylation characteristics are obtained by the following steps: 采用统一流形近似和投影方法对处于高维空间中的所述目标转录组数据和所述目标甲基化数据分别进行特征映射,得到处于低维空间下的转录组映射数据和甲基化映射数据;Using a unified manifold approximation and projection method to perform feature mapping on the target transcriptome data and the target methylation data in a high-dimensional space, respectively, to obtain transcriptome mapping data and methylation mapping data in a low-dimensional space; 根据所述转录组映射数据确定所述转录组特征,以及根据所述甲基化映射数据确定所述甲基化特征;determining the transcriptome signature based on the transcriptome mapping data, and determining the methylation signature based on the methylation mapping data; 其中,所述统一流形近似和投影方法具体为以下步骤:The unified manifold approximation and projection method is specifically as follows: S1,对高维空间进行数据点的邻近概率计算;S1, calculates the neighboring probability of data points in high-dimensional space; 其中,S1通过以下公式进行所述目标转录组数据或所述目标甲基化数据中的任意两个数据点的邻近概率计算:Wherein, S1 calculates the proximity probability of any two data points in the target transcriptome data or the target methylation data using the following formula: 式中,表示数据点和数据点的欧氏距离;是数据点的最近邻距离;为预设的尺度参数,为所述任意两个数据点的邻近概率;Where, Represents a data point and data points The Euclidean distance of is a data point The nearest neighbor distance of is the preset scale parameter, is the proximity probability of any two data points; S2,对低维空间进行数据点的相似度计算;S2, calculates the similarity of data points in low-dimensional space; 其中,S2通过以下公式进行所述转录组映射数据或所述甲基化映射数据中任意两个数据点的相似度计算:Wherein, S2 calculates the similarity of any two data points in the transcriptome mapping data or the methylation mapping data using the following formula: 式中,分别为数据点在低维空间中的位置;为超参数,为任意两个数据点的相似度;Where, and The data points and Position in low-dimensional space; and is a hyperparameter, is the similarity between any two data points; S3,通过以下公式确定高维空间与低维空间的映射的损失函数:S3, determines the loss function of the mapping between high-dimensional space and low-dimensional space by the following formula: 式中,为所述损失函数。Where, is the loss function. 2.根据权利要求1所述的基于多模态深度学习的肝癌分析方法,其特征在于,所述病理数据为全切片图像,所述目标病理数据通过以下步骤得到:2. The liver cancer analysis method based on multimodal deep learning according to claim 1, wherein the pathological data is a whole-slice image, and the target pathological data is obtained by the following steps: 对所述病理数据进行全切片图像切割,得到若干个图像块;Performing full-slice image segmentation on the pathological data to obtain a plurality of image blocks; 在各个所述图像块中筛选出细胞组织面积占比高于预设阈值的目标图像块;Screening out target image blocks whose cell tissue area ratio is higher than a preset threshold value from each of the image blocks; 对各个所述目标图像块的图像参数进行调整,以将所述图像参数被调整至预设的基准值;Adjusting the image parameters of each target image block to adjust the image parameters to preset reference values; 在各个所述目标图像块的图像参数被调整至所述基准值之后,根据各个所述目标图像块生成所述目标病理数据;After the image parameters of each target image block are adjusted to the reference value, generating the target pathology data according to each target image block; 其中,所述图像参数包括亮度、对比度、γ校正和饱和度。The image parameters include brightness, contrast, gamma correction and saturation. 3.根据权利要求2所述的基于多模态深度学习的肝癌分析方法,其特征在于,所述病理特征通过以下步骤得到:3. The liver cancer analysis method based on multimodal deep learning according to claim 2, wherein the pathological features are obtained by the following steps: 通过ResNet152模型对所述目标病理数据进行特征提取,得到所述目标患者的所述病理特征;Performing feature extraction on the target pathological data using a ResNet152 model to obtain the pathological features of the target patient; 其中,所述ResNet152模型的模型权重通过以下步骤得到:The model weights of the ResNet152 model are obtained by the following steps: 采用交叉熵损失函数来对所述目标病理数据进行模型预测的概率差异分布,并通过反向传播算法计算所述概率差异分布的损失相对于所述ResNet152模型参数的梯度;A cross entropy loss function is used to perform model prediction of the target pathology data using a probability difference distribution, and a back propagation algorithm is used to calculate the gradient of the loss of the probability difference distribution relative to the ResNet152 model parameters; 采用SGD优化器根据所述梯度计算得到所述模型权重。The model weights are obtained by using an SGD optimizer according to the gradient calculation. 4.根据权利要求3所述的基于多模态深度学习的肝癌分析方法,其特征在于,所述目标患者的肝癌分析特征通过以下步骤得到:4. The liver cancer analysis method based on multimodal deep learning according to claim 3, wherein the liver cancer analysis features of the target patient are obtained by the following steps: 将所述病理特征、所述转录组特征、所述甲基化特征以及所述临床特征输入到预先设定的多模态融合模型中,得到所述多模态融合模型输出的所述肝癌分析特征;Inputting the pathological features, the transcriptome features, the methylation features, and the clinical features into a pre-set multimodal fusion model to obtain the liver cancer analysis features output by the multimodal fusion model; 其中,所述多模态融合模型的数学表达式如下所示:The mathematical expression of the multimodal fusion model is as follows: ; 式中,为所述多模态融合模型输出的肝癌分析特征,为与所述目标病理数据、所述目标转录组数据、所述目标甲基化数据以及所述目标临床数据对应的权重矩阵,表示层归一化、f1、f2、f3和f4分别为所述病理特征、所述转录组特征、所述甲基化特征和所述临床特征。Where, is the liver cancer analysis feature output by the multimodal fusion model, is a weight matrix corresponding to the target pathological data, the target transcriptome data, the target methylation data, and the target clinical data, The representation layer normalization, f1, f2, f3 and f4 are the pathological features, the transcriptome features, the methylation features and the clinical features, respectively. 5.根据权利要求4所述的基于多模态深度学习的肝癌分析方法,其特征在于,所述根据所述脂质代谢相关标记物特征和所述肝癌分析特征对所述目标患者进行肝癌综合分析,包括:5. The liver cancer analysis method based on multimodal deep learning according to claim 4, characterized in that the comprehensive liver cancer analysis of the target patient based on the lipid metabolism-related marker characteristics and the liver cancer analysis characteristics comprises: 根据所述脂质代谢相关标记物特征和所述肝癌分析特征对所述目标患者进行肝癌的诊断以及预后预测。Liver cancer is diagnosed and prognosis is predicted for the target patient based on the lipid metabolism-related marker characteristics and the liver cancer analysis characteristics. 6.一种基于多模态深度学习的肝癌分析系统,其特征在于,所述系统包括:6. A liver cancer analysis system based on multimodal deep learning, characterized in that the system comprises: 输入单元,用于获取目标患者的病理数据、转录组数据、甲基化数据和临床数据;An input unit, used to obtain pathological data, transcriptome data, methylation data, and clinical data of target patients; 预处理单元,用于对所述病理数据、所述转录组数据、所述甲基化数据和所述临床数据分别进行预处理,得到与所述病理数据对应的目标病理数据、与所述转录组数据对应的目标转录组数据、与所述甲基化数据对应的目标甲基化数据以及与所述临床数据对应的目标临床数据;a preprocessing unit, configured to preprocess the pathological data, the transcriptome data, the methylation data, and the clinical data, respectively, to obtain target pathological data corresponding to the pathological data, target transcriptome data corresponding to the transcriptome data, target methylation data corresponding to the methylation data, and target clinical data corresponding to the clinical data; 分层处理单元,用于对所述目标病理数据、所述目标转录组数据、所述目标甲基化数据以及所述目标临床数据分别进行特征提取,得到与所述目标病理数据对应的病理特征、与所述目标转录组数据对应的转录组特征、与所述目标甲基化数据对应的甲基化特征以及与所述目标临床数据对应的临床特征;a hierarchical processing unit, configured to perform feature extraction on the target pathological data, the target transcriptome data, the target methylation data, and the target clinical data, respectively, to obtain pathological features corresponding to the target pathological data, transcriptome features corresponding to the target transcriptome data, methylation features corresponding to the target methylation data, and clinical features corresponding to the target clinical data; 融合处理单元,用于对所述病理特征、所述转录组特征、所述甲基化特征以及所述临床特征进行特征融合,得到所述目标患者的肝癌分析特征A fusion processing unit is used to fuse the pathological features, the transcriptome features, the methylation features and the clinical features to obtain the liver cancer analysis features of the target patient. 分析单元,基于所述甲基化数据提取所述目标患者的脂质代谢相关标记物特征,并根据所述脂质代谢相关标记物特征和所述肝癌分析特征对所述目标患者进行肝癌综合分析;an analysis unit, extracting lipid metabolism-related marker features of the target patient based on the methylation data, and performing a comprehensive liver cancer analysis on the target patient based on the lipid metabolism-related marker features and the liver cancer analysis features; 所述转录组特征和所述甲基化特征通过以下步骤得到:The transcriptome features and the methylation features are obtained by the following steps: 采用统一流形近似和投影方法对处于高维空间中的所述目标转录组数据和所述目标甲基化数据分别进行特征映射,得到处于低维空间下的转录组映射数据和甲基化映射数据;Using a unified manifold approximation and projection method to perform feature mapping on the target transcriptome data and the target methylation data in a high-dimensional space, respectively, to obtain transcriptome mapping data and methylation mapping data in a low-dimensional space; 根据所述转录组映射数据确定所述转录组特征,以及根据所述甲基化映射数据确定所述甲基化特征;determining the transcriptome signature based on the transcriptome mapping data, and determining the methylation signature based on the methylation mapping data; 其中,所述统一流形近似和投影方法具体为以下步骤:The unified manifold approximation and projection method is specifically as follows: S1,对高维空间进行数据点的邻近概率计算;S1, calculates the neighboring probability of data points in high-dimensional space; 其中,S1可通过以下公式进行所述目标转录组数据或所述目标甲基化数据中的任意两个数据点的邻近概率计算:Wherein, S1 can calculate the proximity probability of any two data points in the target transcriptome data or the target methylation data by the following formula: 式中,表示数据点和数据点的欧氏距离;是数据点的最近邻距离;为预设的尺度参数,为所述任意两个数据点的邻近概率;Where, Represents a data point and data points The Euclidean distance of is a data point The nearest neighbor distance of is the preset scale parameter, is the proximity probability of any two data points; S2,对低维空间进行数据点的相似度计算;S2, calculates the similarity of data points in low-dimensional space; 其中,S2可通过以下公式进行所述转录组映射数据或所述甲基化映射数据中任意两个数据点的相似度计算:Wherein, S2 can calculate the similarity of any two data points in the transcriptome mapping data or the methylation mapping data by the following formula: 式中,分别为数据点在低维空间中的位置;为超参数,为任意两个数据点的相似度;Where, and The data points and Position in low-dimensional space; and is a hyperparameter, is the similarity between any two data points; S3,通过以下公式确定高维空间与低维空间的映射的损失函数:S3, determines the loss function of the mapping between high-dimensional space and low-dimensional space by the following formula: 式中,为所述损失函数。Where, is the loss function. 7.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行以实现如权利要求1至5任一项所述的方法所执行的操作。7. A computer-readable storage medium, characterized in that at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded and executed by a processor to implement the operations performed by the method according to any one of claims 1 to 5. 8.一种电子设备,其特征在于,所述电子设备包括一个或多个处理器和一个或多个存储器,所述一个或多个存储器中存储有至少一条程序代码,所述至少一条程序代码由所述一个或多个处理器加载并执行以实现如权利要求1至5任一项所述的方法所执行的操作。8. An electronic device, characterized in that the electronic device comprises one or more processors and one or more memories, at least one program code is stored in the one or more memories, and the at least one program code is loaded and executed by the one or more processors to implement the operations performed by the method according to any one of claims 1 to 5.
CN202411709912.6A 2024-11-27 2024-11-27 Liver cancer analysis method, system, equipment and medium based on multimodal deep learning Active CN119626571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411709912.6A CN119626571B (en) 2024-11-27 2024-11-27 Liver cancer analysis method, system, equipment and medium based on multimodal deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411709912.6A CN119626571B (en) 2024-11-27 2024-11-27 Liver cancer analysis method, system, equipment and medium based on multimodal deep learning

Publications (2)

Publication Number Publication Date
CN119626571A CN119626571A (en) 2025-03-14
CN119626571B true CN119626571B (en) 2025-09-23

Family

ID=94901248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411709912.6A Active CN119626571B (en) 2024-11-27 2024-11-27 Liver cancer analysis method, system, equipment and medium based on multimodal deep learning

Country Status (1)

Country Link
CN (1) CN119626571B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120199472B (en) * 2025-05-26 2025-10-24 榆林市第一医院 Auxiliary diagnosis method and system for primary liver cancer based on imaging data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115622950A (en) * 2022-09-29 2023-01-17 西安热工研究院有限公司 Network traffic identification method, system, device and medium based on improved spectral clustering
CN118553407A (en) * 2024-05-27 2024-08-27 广东医科大学 Lung tumor diagnosis and prediction system based on multi-mode deep learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018068752A (en) * 2016-10-31 2018-05-10 株式会社Preferred Networks Machine learning device, machine learning method and program
WO2019079166A1 (en) * 2017-10-16 2019-04-25 Illumina, Inc. Deep learning-based techniques for training deep convolutional neural networks
WO2021062366A1 (en) * 2019-09-27 2021-04-01 The Brigham And Women's Hospital, Inc. Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction
CN115394445A (en) * 2022-05-25 2022-11-25 郑州金域临床检验中心有限公司 Colon cancer prognosis marker gene and screening, prognosis prediction and model construction method thereof
CN117198536A (en) * 2023-10-18 2023-12-08 宁波市临床病理诊断中心 Multi-mode group low-level glioma auxiliary treatment decision-making system based on machine learning
CN118863302B (en) * 2024-09-25 2025-04-08 浙江大学 Multi-scale shared bicycle carbon emission reduction estimation and key factor identification method, equipment and medium thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115622950A (en) * 2022-09-29 2023-01-17 西安热工研究院有限公司 Network traffic identification method, system, device and medium based on improved spectral clustering
CN118553407A (en) * 2024-05-27 2024-08-27 广东医科大学 Lung tumor diagnosis and prediction system based on multi-mode deep learning

Also Published As

Publication number Publication date
CN119626571A (en) 2025-03-14

Similar Documents

Publication Publication Date Title
JP7542578B2 (en) Methods and systems for utilizing quantitative imaging - Patents.com
Tran et al. Personalized breast cancer treatments using artificial intelligence in radiomics and pathomics
CN112768072A (en) Cancer clinical index evaluation system constructed based on imaging omics qualitative algorithm
CN107492090A (en) Analyzed according to generated data using the tumor phenotypes based on image of machine learning
CN116740435A (en) Breast cancer ultrasound image classification method based on multi-modal deep learning radiomics
CN113421652A (en) Method for analyzing medical data, method for training model and analyzer
Kumar et al. Deep-learning-enabled multimodal data fusion for lung disease classification
Singh et al. Survey of AI-driven techniques for ovarian cancer detection: state-of-the-art methods and open challenges
KR20240068638A (en) discovery platform
CN113239993A (en) Pathological image classification method, pathological image classification system, terminal and computer-readable storage medium
Tian et al. Radiomics and its clinical application: artificial intelligence and medical big data
Koul et al. A study on bladder cancer detection using AI-based learning techniques
CN119626571B (en) Liver cancer analysis method, system, equipment and medium based on multimodal deep learning
Zhang et al. An evolutionary deep learning method based on improved heap-based optimization for medical image classification and diagnosis
Tang et al. Explainable survival analysis with uncertainty using convolution-involved vision transformer
Sui et al. Imaging biomarkers and gene expression data correlation framework for lung cancer radiogenomics analysis based on deep learning
Tenali et al. Oral cancer detection using deep learning techniques
Godbin et al. Leveraging radiomics and genetic algorithms to improve lung infection diagnosis in X-ray images using machine learning
CN120412969A (en) A breast cancer treatment response prediction system based on multi-time series imaging
CN116933135B (en) Cancer staging prediction modeling system and method based on cross-modal fusion cascade
Zhang et al. Prediction of cancer recurrence based on compact graphs of whole slide images
Yan et al. LRCTNet: A lightweight rectal cancer T-staging network based on knowledge distillation via a pretrained swin transformer
Oniga et al. Applications of ai and hpc in the health domain
Owais et al. Volumetric Model Genesis in Medical Domain for the Analysis of Multimodality 2-D/3-D Data Based on the Aggregation of Multilevel Features
Altal et al. Hybrid Attention-Enhanced MobileNetV2 with Particle Swarm Optimization for Endometrial Cancer Classification in CT Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant