Disclosure of Invention
In view of the above, the invention provides a method for constructing a prognosis survival prediction model and a prediction method thereof, so as to provide a reference for clinical prognosis survival of tumor.
In a first aspect, the present invention provides a method for constructing a prognostic survival prediction model, the method comprising:
obtaining fluorescent sample images corresponding to sample tissues one by one before treatment, wherein the fluorescent sample images are obtained after staining by a marker antibody according to myeloid cells;
obtaining prognosis information of a patient corresponding to the sample tissue, wherein the prognosis information comprises survival conditions of the patient after receiving different preset treatment means;
Dividing the myeloid cells in each sample tissue based on the fluorescent sample images to determine the fluorescent expression quantity of each myeloid cell on different markers, wherein the method comprises the steps of dividing each fluorescent sample image into a plurality of area images corresponding to each fluorescent sample image;
Determining a myeloid cell type annotation for each myeloid cell based on the fluorescent expression level of each myeloid cell on the different markers and a cell type annotation algorithm;
Determining a spatial distance between each myeloid cell in each sample tissue and other myeloid cells of different types based on the myeloid cell type annotation, and obtaining a cell spatial distance characteristic;
determining a spatial interaction relationship between each myeloid cell in each sample tissue and surrounding cells based on the myeloid cell type annotations, obtaining a cell interaction characteristic;
Dividing a sample organization into a training set and a testing set according to a preset proportion and prognosis information;
and constructing a prognosis survival prediction model based on the cell space distance characteristics and the cell interaction characteristics in the training set and the test set, wherein the prognosis survival prediction model is used for predicting the survival condition of the pathological sample after intervention.
The invention provides a method for constructing a prognosis prediction model for myeloid cells, which extracts the characteristics of cells in the space field through a plurality of aspects such as cell segmentation, cell annotation, distance calculation, interaction analysis and the like of fluorescent sample images, extracts the characteristics among cells from the cell level, constructs a prognosis survival prediction model based on the spatial position relation of myeloid cells based on the spatial characteristics, and provides a powerful reference basis for clinically judging the prognosis survival condition of tumors.
In an alternative embodiment, the marker antibody is selected from one or more of the following :CD45、CD45RA、CD45RO、CD11b、CD11c、CD14、CD15、CD16、CD33、CD68、CD86、CD163、CD206、SPP1、FOLR2、C1QC、CD1c、HLA-DR、CD141、S100A9、S100A8、CD117、CD123、MPO.
In an alternative embodiment, each region image includes a cancerous region and a paracancerous region, determining the spatial distance between each myeloid cell in each sample tissue to other different types of myeloid cells, obtaining a cell-space distance feature, comprising:
Determining a first Euclidean distance from each myeloid cell type to other different types of myeloid cells in the cancer region comprised by each region image;
Determining a second Euclidean distance between each myeloid cell type in the paracancerous region included in each region image to other different types of myeloid cells;
determining a first spatial distance between each myeloid cell to other different types of myeloid cells in all cancerous regions of the sample tissue based on all first euclidean distances, the first spatial distance comprising a first minimum spatial distance, a first maximum spatial distance, a first average spatial distance;
Determining a second spatial distance between each myeloid cell to other different types of myeloid cells in all paracancerous regions of the sample tissue based on all second Euclidean distances, the second spatial distance comprising a second minimum spatial distance, a second maximum spatial distance, a second average spatial distance;
taking the first space distance and the second space distance as cell space distance characteristics;
after obtaining the cell space distance features, it comprises:
and performing data processing and feature screening processing on the distance features among the cell types of the same marrow line in the cell space distance features to obtain processed cell space distance features, wherein the processed cell space distance features are used for constructing a prognosis survival prediction model.
In this example, by calculating the spatial distance between different types of myeloid cells, the interaction and spatial distribution characteristics of the cells in the tumor microenvironment can be understood in depth. In addition, in the embodiment, the difference of the space distances between the marrow cells in the cancer area and the adjacent cancer area is analyzed, the cell change in the development process of tumors and the like is considered, the comprehensiveness of analysis is improved, and the prediction accuracy of the constructed model is improved.
In the embodiment, the distance characteristics among the cell types of the same marrow line are subjected to data processing and characteristic screening, so that the accuracy and the robustness of the model can be improved, the characteristic relation among the cells can be better captured, and more powerful support is provided for prognosis survival prediction.
In an alternative embodiment, each region image includes a cancerous region and a paracancerous region, determining a spatial interaction relationship between each myeloid cell in each sample tissue and surrounding cells, obtaining a cell interaction signature, comprising:
determining a first interaction characteristic between each myeloid cell in the cancer region comprised by each region image and surrounding interacting cells having an interaction relationship;
Determining a second interaction characteristic between each myeloid cell in the paracancerous region and surrounding interaction cells in which the interaction relationship exists in each region image, wherein the first interaction characteristic and the second interaction characteristic both comprise the ratio of each myeloid cell type in the interaction cells in which the interaction relationship exists;
the first and second interaction characteristics are defined as cellular interaction characteristics.
In the embodiment, the interaction relation of two cells in space is used as an important reference index for disease prognosis, so that the accuracy and the reliability of a prognosis survival prediction model can be improved, and important information support can be provided for clinical decision.
In an alternative embodiment, the features used to construct the prognosis survival prediction model include one or more of the following :T_MonoToM2_max、T_M1ToM2_mean、T_MonoToM2_mean、P_M2ToHLADRhicells_max、P_MonoToHLADRhicells_max、P_HLADRhicellsToMono_min P_MonoToHLADRhicells_min P_M2ToMono_mean、T_HLADRhicells_M2、T_M1_M1、T_M1_M2、T_M2_M1、T_M2_M2、T_Mono_M2、T_Neutro_M2、P_M2_Mono.
In a second aspect, the present invention provides a method for prognosis survival prediction, the method comprising:
Obtaining a fluorescent sample image of a pathological sample to be predicted;
And inputting the fluorescent sample image into a prognosis survival prediction model constructed by the prognosis survival prediction model construction method according to any embodiment, and outputting the survival condition of the pathological sample to be predicted after intervention.
In a third aspect, the present invention provides a prognostic survival prediction model construction apparatus, the apparatus comprising:
The acquisition module is used for acquiring fluorescent sample images corresponding to the sample tissues one by one before treatment, wherein the fluorescent sample images are obtained after the fluorescent sample images are dyed by the marker antibodies according to the myeloid cells;
The system comprises a fluorescence sample image, a segmentation module, a cell nucleus position determination module and a cell nucleus detection module, wherein the fluorescence sample image is used for carrying out segmentation on marrow cells in each sample tissue based on the fluorescence sample image to determine fluorescence expression quantity of each marrow cell on different markers;
an annotation module for determining a myeloid cell type annotation for each myeloid cell based on the fluorescent expression levels of each myeloid cell on different markers and a cell type annotation algorithm;
The distance feature extraction module is used for determining the space distance between each myeloid cell in each sample tissue and other myeloid cells of different types based on the myeloid cell type annotation, and obtaining cell space distance features;
an interaction feature extraction module for determining a spatial interaction relationship between each myeloid cell in each sample tissue and surrounding cells based on the myeloid cell type annotation, obtaining a cell interaction feature;
The dividing module is used for dividing the sample organization into a training set and a testing set according to the preset proportion and the prognosis information;
The construction module is used for constructing a prognosis survival prediction model based on the cell space distance characteristics and the cell interaction characteristics in the training set and the test set, and the prognosis survival prediction model is used for predicting the survival condition of the pathological sample after intervention.
In a fourth aspect, the present invention provides a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions, so as to execute the method for constructing a prognosis survival prediction model according to the first aspect or any embodiment corresponding to the first aspect.
In a fifth aspect, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the method for constructing a prognosis survival prediction model according to the first aspect or any one of its corresponding embodiments.
In a sixth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to execute the method for constructing a prognosis survival prediction model according to the first aspect or any of the embodiments corresponding thereto.
The apparatus, the computer device and the computer-readable storage medium for constructing the prognosis survival prediction model provided by the invention correspond to the method for constructing the prognosis survival prediction model. Therefore, regarding the beneficial effects of the prognosis survival prediction model construction device, the computer device and the computer-readable storage medium, please refer to the description of the corresponding beneficial effects of the prognosis survival prediction model construction method above, and the description thereof will not be repeated here.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Existing effect treatment predictions mainly aim at immune checkpoints of stranguria lines, such as PD-1, PD-L1 and CTLA4, but currently, immune checkpoint drugs targeting myeloid cells, such as CD47, LILRB1, CD171 and the like, exist on the market, and currently, existing PD-1, PD-L1 detection cannot be used for well predicting or evaluating the curative effect of the immune checkpoint drugs of myeloid cells. Therefore, there is a need to propose a model or method based on prognosis prediction of myeloid cells.
According to an embodiment of the present invention, there is provided an embodiment of a method for constructing a prognosis survival prediction model, it should be noted that the steps shown in the flowcharts of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be executed in an order different from that herein.
In this embodiment, a method for constructing a survival prognosis prediction model is provided, which may be executed by a server, a terminal, a mobile terminal, etc., and fig. 1 is a flowchart of a method for constructing a survival prognosis prediction model according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
Step S101, obtaining fluorescent sample images corresponding to the sample tissues one by one before treatment, wherein the fluorescent sample images are obtained after staining by marker antibodies according to myeloid cells, and obtaining prognosis information of patients corresponding to the sample tissues, wherein the prognosis information can comprise survival conditions of the patients after receiving different preset treatment means. Wherein, different treatment means include surgical excision, radiotherapy and chemotherapy, immunotherapy, targeting therapy, etc., the survival condition can include survival time, namely prognosis information can include survival time of patient after receiving a certain treatment means.
In this embodiment, for example, 90 sample tissues are provided, and specific sample information is shown in fig. 2, where T stage represents a tumor stage, "T" is followed by numbers (e.g., T1, T2, T3, T4), the larger the number is, the larger the tumor is or the more severe the expansion is, N stage represents a lymph node stage, "N" is followed by numbers (e.g., N0, N1, N2, N3), and an increase in the number generally represents an increase in the number or degree of metastasis to lymph nodes, "N" is followed by different letters, representing different sizes or metastasis to different types of lymph nodes. The survival time in this example may be the survival time of the myeloid cells after treatment, and may be 1 month, 6 months, 12 months, 24 months, 60 months, 80 months, etc., and is not particularly limited herein.
The process of acquiring a fluorescence sample image for one of the sample tissues will be described in detail below.
In the traditional immunohistochemical staining, only one marker (marker antigen is shown in the embodiment) can be stained on one piece, if 6 markers are required to be stained, 6 contact sections are required, and the final result cannot reach the same cell level. Especially for clinical samples, the tissue is very precious, and if one piece can be dyed with 6 markers, and the tissue is at the same cell level, the tissue is convenient and reliable for subsequent analysis. In the embodiment, multiple immunohistochemical fluorescence (mIHC) staining is adopted, and the multiple immunohistochemical fluorescence (mIHC) staining can be used for staining 6 markers on the same piece, so that the tumor and immune cell states in the tumor microenvironment can be accurately depicted, important reference information is provided for accurate diagnosis of the tumor, and meanwhile, the method has important significance for immunotherapy of the tumor, including the proportion and distribution of immune cell subpopulations, particularly the spatial relationship between the immune cells and the tumor cells, and is an important content of tumor prognosis and prediction research. Can help some clinicians lacking pathological knowledge to conveniently, quickly and effectively predict the prognosis treatment effect of the marrow cell targeting drug.
By way of introduction, myeloid-lineage cells refer to a family of white blood cells produced in bone marrow, including granulocytes (Neutro), monocytes (Mono), and macrophages (Macro). These cells originate from hematopoietic stem cells, undergo a series of differentiation and maturation processes to form mature white blood cells responsible for the body's immune defenses and inflammatory regulation.
In some alternative embodiments, the marker identifying myeloid cells comprises one or more of the following combinations :CD45、CD45RA、CD45RO、CD11b、CD11c、CD14、CD15、CD16、CD33、CD68、CD86、CD163、CD206、SPP1、FOLR2、C1QC、CD1c、HLA-DR、CD141、S100A9、S100A8、CD117、CD123、MPO.
In this example, the following examples of combinations of markers for identifying myeloid cells are provided, CD68, CD11b, CD14, MPO, HLA-DR, CD163, the meaning of the markers being as follows.
CD68, is commonly used to detect macrophages. Macrophages are an important immune cell that plays an important role in inflammation, infection, tumor and tissue repair processes. CD68 is one of the markers of macrophages, and detection of CD68 can help pathologists determine the location, number and active status of macrophages in tissue, and thus provide insight into disease diagnosis and pathophysiological processes.
CD11b, an integrin, also known as Mac-1 or ITGAM, is a marker on the surface of monocytes, macrophages and neutrophils. Detection of CD11b can help identify and quantify the presence and distribution of monocytes, macrophages and neutrophils in tissue. These cells play an important role in inflammatory reactions, infections, immune reactions, and tissue repair. Thus, detection of CD11b can help pathologists understand the pathophysiological processes of the disease, guide diagnosis and treatment protocols.
CD14 is a protein playing a key role in innate immunity and is one of the important surface markers of monocytes. In pathological diagnostics, the detection of CD14 can be used to identify immune cell infiltration patterns of specific diseases, such as autoimmune diseases, infectious diseases, and tumors. By locating and counting CD14 positive cells, the physician can be helped to further understand the pathophysiological mechanisms of the disease, thereby guiding the selection of treatment regimens and assessing disease prognosis.
MPO, a granulocyte-specific enzyme, is commonly used to detect granulocytes, including neutrophils (neutrophils) and eosinophils. MPO is an important enzyme within granulocytes, and its detection can help determine the presence and activity of granulocytes in tissues. In the course of diseases such as inflammation, infection and tumor, granulocytes play an important role, and detection of MPO helps to evaluate the degree of inflammation in tissues, infiltration of granulocytes and pathophysiological mechanisms of the disease. Therefore, by detecting MPO positive cells, the condition of granulocytes in the tissue can be better known, and important information is provided for diagnosis and treatment of diseases.
HLA-DR, an antigen presenting related protein expressed primarily on the surface of immune cells, is commonly used to detect professional antigen presenting cells, primarily cells in the mature and activated state of the monocyte series, with antigen presenting and immunoregulatory functions. In particular, HLA-DR-positive cells play an important role in tissues in inflammatory, infectious, autoimmune and tumor disease states. These cells include macrophages, dendritic cells and B lymphocytes, which play a role in the immune response and antigen presentation process. By detecting HLA-DR positive cells, pathologists can assess the status of the immune cells, the extent of the immune response, and the immunopathogenic characteristics of the disease, to better understand the pathogenesis and progression of the disease.
CD163 is a type I membrane protein, also known as M130 antigen, expressed primarily in the monocyte and macrophage systems. Is a marker of monocytes and macrophages, such as dendritic cells of spleen, alveolar macrophages and the like, and is mainly used for detecting monocytes and tissue cells of tumors and reactive lesions.
Taking intestinal tissue as an example in this example, a fluorescence sample image can be obtained by the following experimental procedure. The method sequentially comprises the steps of baking slice dewaxing, fixing, repairing, blocking, sealing, primary antibody incubation, secondary antibody incubation, fluorescence incubation and repairing, restarting sealing, primary antibody incubation, secondary antibody incubation, fluorescence incubation and repairing, and stopping repeating the steps until 5 targets are marked. Fluorescence incubation was continued to complete Opal polar (a multiplex fluorescent staining technique, commonly used for tissue section analysis in biomedical research) labeling. And finally, nuclear dyeing and sealing, and scanning to obtain a fluorescent sample image. Wherein the scan results are shown with reference to fig. 3.
Step S102, dividing the myeloid cells in each sample tissue based on the fluorescent sample image, and determining the fluorescent expression amount of each myeloid cell on different markers.
Specifically, the scanning may be performed by placing the processed sample tissue in a fluorescence microscope, irradiating the sample with a laser light of a specific wavelength, causing the labeled fluorescent dye to fluoresce, and capturing the emitted fluorescent signal by a camera system, thereby obtaining a fluorescent sample image in the present embodiment. The whole form of the cells can be observed through the fluorescent sample image, the fluorescent expression condition of different markers can be known, and the cell types of each myeloid cell in the sample can be determined.
Further, the cell segmentation can locate the position of each cell nucleus through the fluorescent expression quantity condition of the DAPI (4', 6-diamidino-2-phenylindole) channel, then the cell nucleus is taken as the cell center to search the cell boundary, and finally complete cell information is obtained, and the segmented fluorescent sample image is shown in FIG. 4. The average fluorescence expression level of each marker in each cell can also be used as the fluorescence expression level of the cell on each marker, and the value can be used for subsequent cell clustering and cell type judgment.
Step S103, determining marrow cell type annotation of each marrow cell based on fluorescence expression amount of each marrow cell on different markers and cell type annotation algorithm.
After preliminary determination of cell types based on the fluorescent expression levels of myeloid cells on different markers, further display can be performed on the image. Because the multiple immunohistochemical fluorescence scheme (mIHC) has higher background signal intensity and the background signal intensity has different degrees of difference between different markers and even different samples, it is difficult to determine an overall threshold value to remove the background signal intensity, thereby achieving the purpose of accurate display. In this embodiment, a set of annotation tables suitable for this embodiment is designed, and as shown in table 1, the cell types can be accurately annotated by CELESTA (v0.0.0.9000) algorithm, and the results of the annotation table after classifying the cell types are verified to be consistent with the actual images. In this example, comments on Macro (macrophages), mono (monocytes), neutro (neutrophils), hladr+cells (HLA-DR positive cells), M1 (M1 type macrophages or classical activated macrophages), M2 (M2 type macrophages or substituted activated macrophages), unknow cells (cells of undefined cell type) were made. The annotated fluorescent sample image is shown with reference to fig. 5.
TABLE 1
After the annotation was completed, a heat map of the expression level of each cell type on different markers was also visualized in this example, and normalized, and the heat map is shown with reference to fig. 6.
Step S104, based on the marrow cell type annotation, determining the space distance between each marrow cell in each sample tissue and other different types of marrow cells, and obtaining the cell space distance characteristic.
Specifically, the euclidean distance from each cell to the nearest cell of each cell type can be calculated one by one in this embodiment, including the distance maximum, minimum, average, and the like. Continuing with the above example of sample organization, features of 216 dimensions of the sample organization on the spatial distance level are finally obtained, including T_HLADR+cellsToM1_mean、T_M1ToMacro_mean、T_MonoToNeutro_max、P_MonoToM1_max、T_NeutroToMono_min、P_MacroToM2_min and the like. Wherein the beginning T of the feature name represents cancer, and P represents paracancerous, thereby completing the extraction of the cell space distance feature.
Step S105, based on the marrow cell type annotation, determining the spatial interaction relation between each marrow cell in each sample tissue and surrounding cells, and obtaining the cell interaction characteristics.
In this embodiment, a cell interaction relationship between each cell and the nearest 10 cells around each cell is defined, the ratio of each cell type in all the interaction cells is calculated, the ratio is taken as the characteristic of each cell in space interaction, and the sample tissue is taken as an example, so that 72-dimensional characteristics on the space interaction level, including t_hladr+cells_hladr+cells, t_macro_m1 and the like, are finally obtained, and the extraction of the space interaction characteristics is realized.
And S106, dividing the sample organization into a training set and a testing set according to the preset proportion and the prognosis information.
In this embodiment, the sample organization is taken as an example, that is, 90 samples are taken as an example, the data set can be split according to the ratio of 7:3 by taking 90 samples, wherein 64 samples are taken as training sets, 26 samples are taken as test sets for testing, and long survival and short survival respectively account for half in the training sets and the test sets. In this embodiment, a sample with a survival time of 70 months or longer in the prognostic information may be defined as long survival, with a number of 1, and the other samples may be defined as short survival, with a number of 0, and in the final sample, the long survival and the short survival each account for 50%. The independent test set contained a total of 26 samples, including 13 long survivors and 13 short survivors. The test set is independent of the modeling process, does not participate in any feature selection and model building process, and is only used for final testing to ensure the accuracy of the testing.
In some alternative embodiments, 16 features selected for modeling are as follows :T_MonoToM2_max、T_M1ToM2_mean、T_MonoToM2_mean、P_M2ToHLADRhicells_max、P_MonoToHLADRhicells_max、P_HLADRhicellsToMono_min P_MonoToHLADRhicells_min P_M2ToMono_mean、T_HLADRhicells_M2、T_M1_M1、T_M1_M2、T_M2_M1、T_M2_M2、T_Mono_M2、T_Neutro_M2、P_M2_Mono.
Wherein T_ MonoToM2_max represents the maximum distance from monocytes in the cancerous region to the nearest M2 macrophage. T_M1Tom2_mean represents the average of the distances from M1 macrophages to nearest M2 macrophages in the cancerous region. T_ MonoToM2_mean, the average of the distances from the cancer area monocytes to the nearest M2 macrophages. P_M2ToHLADRhicells _max represents the maximum distance of the paracancerous region M2 macrophages to the nearest HLADR positive cells. P_ MonoToHLADRhicells _max represents the maximum distance from the paracancerous region monocytes to the nearest HLAD positive cells. P_ HLADRhicellsToMono _min represents the minimum distance of paracancerous region HLADR positive cells to nearest monocytes. P_ MonoToHLADRhicells _min represents the minimum distance of paracancerous region monocytes to nearest HLADR positive cells. P_M2ToMono _mean, represents the average of the distance of the paracancerous region M2 macrophages to the nearest monocytes. T_ HLADRHICELLS _M2-the ratio of M2 macrophages indicating that the cancerous region has an interactive relationship with HLADR positive cells. T_M1_M1-the ratio of M1 macrophages indicating the interaction of the cancerous region with the M1 macrophages. T_M1_M2-the ratio of M2 macrophages indicating the interaction of cancer cells with M1 macrophages. T_M2_M1-the ratio of M1 macrophages indicating the interaction of the cancerous region with M2 macrophages. T_M2_M2-the ratio of M2 macrophages indicating the interaction of the cancerous region with the M2 macrophages. T_Mono_M2-the ratio of M2 macrophages representing the interaction of a cancerous region with monocytes. T_ Neutro _M2-the ratio of M2 macrophages representing the interaction of the cancerous region with the central granulocytes. P_M2_Mono-the ratio of monocytes showing the interaction of the paracancerous region with M2 macrophages.
And the prognosis survival prediction model is constructed based on the characteristics, so that the accuracy of model construction can be effectively improved.
And step S107, constructing a prognosis survival prediction model based on the cell space distance features and the cell interaction features in the training set and the test set, wherein the prognosis survival prediction model is used for predicting the survival condition of the pathological sample after intervention.
Specifically, in this embodiment, a random forest model may be used, and information such as sample information, cell space distance features, cell interaction features and the like corresponding to the training set is input into the random forest model to perform training, and an initial survival prediction model is constructed, so that an AUC value of the model on the training set is 0.834, and the model has better prediction performance. Fig. 7 is a schematic diagram of a ROC (Receiver Operating Characteristic, intended as a receiver operation feature) curve of the training set, and in this embodiment, the other evaluation indexes of the model training set are calculated and visualized, and all the evaluation indexes are shown as having good performance with reference to fig. 8. Further, the constructed prediction model is verified on the test set, and the AUC of the model on the test set is 0.787, so that the model is verified to still maintain good prediction performance in the independent test set. Fig. 9 is a schematic view of ROC curve of a test set, and in this embodiment, other evaluation indexes of the model test set are calculated and visualized, referring to fig. 10. In the embodiment, better performance is shown in both the training set and the testing set, and the prediction model has the advantages of convenience in feature extraction, high accuracy and the like.
The current common treatment effect prediction mainly aims at immune checkpoints of the stranguria system, such as PD-1, PD-L1 and CTLA4, however, the traditional immune checkpoints of the stranguria system cannot be used for carrying out good treatment effect prediction or evaluation on the marrow system cell immune checkpoint medicines. In addition, the current analysis methods of multiple immunohistochemical fluorescence mostly stay in visual observation of experimental result images or simply count the cell ratio of positive marker or the double positive cell ratio and the like, and lack more deep analysis.
The invention provides a method for constructing a prognosis prediction model based on myeloid cells, which extracts the characteristics of cells in the space field through a plurality of aspects such as cell segmentation, cell annotation, distance calculation, interaction analysis and the like of fluorescent sample images, extracts the characteristics among cells from the cell level, constructs a prognosis survival prediction model based on the spatial position relation of myeloid cells based on the spatial characteristics, and provides a powerful reference basis for clinically judging the prognosis survival condition of tumors.
In some optional embodiments, the step S102, that is to divide the myeloid cells in each sample tissue based on the fluorescence sample image, determines the fluorescence expression level of each myeloid cell on different markers, includes:
Step S1021, carrying out region division on each fluorescent sample image to obtain a plurality of region images corresponding to each fluorescent sample image.
Step S1022, determining the nucleus position of the marrow cell according to the fluorescence expression quantity corresponding to the area image.
Step S1023, dividing the whole marrow cell corresponding to the cell nucleus based on the cell nucleus position.
Step S1024, determining all fluorescence expression amounts corresponding to each marker in the whole myeloid cell, and calculating the average fluorescence expression amount corresponding to each marker one by one.
Step S1025, the average fluorescence expression level is used as the fluorescence expression level on the corresponding marker.
Region segmentation may use Qupath software to frame a suitable number of suitably positioned region images (hereinafter referred to as ROIs) in each fluorescence sample image. Specifically, qptiff images of each tissue are opened by using Qupath software, and a proper region is selected on the fluorescent sample image to be used as an ROI region for biological analysis according to the staining condition of the sample tissue. In experimental analysis using tissue chips, each chip point can be taken as an ROI. Then, qupath software can be used for cell segmentation of the framed ROI, the fluorescence expression quantity of each cell on different markers after segmentation is extracted, and the result is exported to csv format data. The data may be exported and a SpatialExperiment type data structure object spe (SpatialExperiment v1.10.0, singleCellExperiment v1.22.0, imcRtools v 1.6.5) may be constructed to store cell information, expression level information, prognosis information, etc. for each ROI. For each ROI, the position of each cell nucleus can be positioned according to the fluorescent expression quantity condition of the DAPI channel, then the cell nucleus is taken as the cell center to search the cell boundary, and finally a complete marrow cell is obtained. In this embodiment, the fluorescence sample image is divided into areas, and cells are segmented on each area image, so that myeloid cells can be segmented more accurately, and cell segmentation accuracy is improved.
Further, the average value of all the expression amounts of each marker in each cell is taken as the fluorescence expression amount of the cell on the marker, namely, the average value of the fluorescence expression amount of each marker in the whole myeloid cell is taken as the fluorescence expression amount of the whole myeloid cell on the marker. The average value is used to represent the overall expression level of the cell on the several markers, which can facilitate further cell classification and analysis.
In some optional embodiments, step S104, where each region image includes a cancerous region and a paracancerous region, determines a spatial distance between each myeloid cell in each sample tissue to other myeloid cells of different types, and obtains a cell-space distance feature, including:
in step S1041, a first Euclidean distance from each myeloid cell type to other different types of myeloid cells in the cancerous region comprised by each region image is determined.
In step S1042, a second Euclidean distance is determined from each myeloid cell type to other different types of myeloid cells in the paracancerous region comprised by each region image.
Step S1043, determining a first spatial distance between each myeloid cell to other different types of myeloid cells in all cancerous regions of the sample tissue based on all first Euclidean distances, the first spatial distance comprising a first minimum spatial distance, a first maximum spatial distance, a first average spatial distance.
Step S1044, determining a second spatial distance between each myeloid cell to other different types of myeloid cells in all paracancerous regions of the sample tissue based on all second Euclidean distances, the second spatial distance comprising a second minimum spatial distance, a second maximum spatial distance, a second average spatial distance.
In step S1045, the first spatial distance and the second spatial distance are used as the cell spatial distance characteristics.
Specifically, in this embodiment, the euclidean distance between two different types of cells may be calculated, for example, where one cell has coordinates (x 1, y 1) in the area image and the other cell has coordinates (x 2, y 2) in the area image, and then the euclidean distance may be calculated by using the formula, and the distance ζ -2= (x 1-x 2) ζ -2+ (y 1-y 2) ζ2. Thus, the distances from each cell type to different cell types are obtained in each ROI, and the maximum value, the minimum value and the average value of the distances in each sample tissue are calculated, so that the comprehensive distance characteristic is obtained. In this case, since each sample tissue includes two regions beside the cancer and the cancer, when features are extracted, the spatial distance features of the cells can be calculated and extracted for the cancer and the cancer respectively, and finally, these two features are combined, and still taking the above example as an example, the feature of 216 dimensions on the spatial distance level can be obtained.
In this example, by calculating the spatial distance between different types of myeloid cells, the interaction and spatial distribution characteristics of the cells in the tumor microenvironment can be understood in depth. In addition, in the embodiment, the difference of the space distances between the marrow cells in the cancer area and the adjacent cancer area is analyzed, the cell change in the development process of tumors and the like is considered, the comprehensiveness of analysis is improved, and the prediction accuracy of the constructed model is improved.
In some alternative embodiments, after obtaining the cell space distance feature, comprising:
and performing data processing and feature screening processing on the distance features among the cell types of the same marrow line in the cell space distance features to obtain processed cell space distance features, wherein the processed cell space distance features are used for constructing a prognosis survival prediction model.
In this embodiment, after obtaining the cell space distance feature and the cell interaction feature, the cell space distance feature and the cell interaction feature are fused, and the 288-dimensional feature corresponding to each sample tissue may be obtained by taking the above example as an example. Notably, none of the final features contained Unknow of this cell type in calculating these features. Further, in this embodiment, the distance features between the same cell type in the cell space distance features are also filtered, and the subsequent processing is not included. And (3) carrying out standardization operation on all the features to eliminate the influence of absolute value magnitude difference among different features on model construction. Still taking the above example as an example, 250 features ultimately enter the feature selection process.
Then, feature selection is performed, and the feature selection can be performed on the extracted 216+72-dimensional features in the training set by using a Boruta algorithm, wherein the Boruta algorithm is a feature selection method based on random forests, and the main objective of the feature selection method is to find truly important features from a given feature set and distinguish the truly important features from irrelevant features, and is shown by referring to fig. 11, a schematic diagram of importance of the features in a model, a feature importance score line diagram shown by referring to fig. 12, and green is used for determining important features and yellow is tentative feature. The final result is 16 features of both the important and tentative classes considered by the Boruta algorithm, including 10 features of confirmed importance, 6 tentative features and other unimportant features. The characteristics tentatively assigned to the important characteristics :T_MonoToM2_mean、P_MonoToHLADRhicells_max、P_HLADRhicellsToMono_min、P_MonoToHLADRhicells_min、P_M2ToMono_mean、T_HLADRhicells_M2、T_M2_M1、T_Mono_M2、T_Neutro_M2、P_M2_Mono. are: T_ MonoToM _max, T_M1ToM2_mean P_M25248_max p_m2ToHLADRhicells max _. Reference is specifically made to fig. 13, which is a schematic illustration of the impact of these 16 features on model performance.
In the embodiment, the distance characteristics among the cell types of the same marrow line are subjected to data processing and characteristic screening, so that the accuracy and the robustness of the model can be improved, the characteristic relation among the cells can be better captured, and more powerful support is provided for prognosis survival prediction.
In some optional embodiments, step S105 above, in which each region image includes a cancerous region and a paracancerous region, determining a spatial interaction relationship between each myeloid cell in each sample tissue and surrounding cells, obtaining a cell interaction characteristic includes:
in step S1051, a first interaction characteristic between each myeloid cell in the cancer region included in each region image and the surrounding interacting cells having an interaction relationship is determined. An interactive cell is a cell in which there is an interaction relationship between cells.
And step S1052, determining a second interaction characteristic between each myeloid cell in the paracancerous region and surrounding interaction cells in the paracancerous region, wherein the first interaction characteristic and the second interaction characteristic comprise the ratio of each myeloid cell type in the interaction cells in the interaction relationship.
Step S1053, the first interaction feature and the second interaction feature are used as the cell interaction feature.
In this example, each cell is defined to have a cell interaction relationship with the nearest 10 cells, and the cell type of 10 cells around each cell is obtained by calculating the interaction relationship between each cell and the nearest 10 cells, and the space interaction characteristics from the central cell to different cell types are obtained by calculating the ratio of the cell types of the cells in the interaction cells. And carrying out statistics and averaging on each sample tissue to obtain the spatial interaction characteristics between each cell type and other different cell types in each regional image. The cellular space interaction features are calculated separately for the cancerous and paracancerous regions of the same sample tissue and combined, again taking the above example as an example, to obtain a 72-dimensional feature at the level of cellular interaction.
In the embodiment, the interaction relation of two cells in space is used as an important reference index for disease prognosis, so that the accuracy and the reliability of a prognosis survival prediction model can be improved, and important information support can be provided for clinical decision.
In this embodiment, a prognosis survival prediction method is provided, which may be executed by a server, a terminal, a mobile terminal, and the like, and includes the following steps:
step S201, obtaining a fluorescent sample image of a pathological sample to be predicted;
Step S202, inputting the fluorescent sample image into the prognosis survival prediction model constructed by the prognosis survival prediction model construction method according to any of the embodiments, and outputting the predicted survival time of the pathological sample to be predicted after intervention.
The invention can better predict the prognosis survival by constructing the prognosis survival prediction model based on the marrow cell space characteristics, and can provide an important reference basis for clinically judging the prognosis survival of the tumor.
Specific description of the prognosis survival prediction model is shown with reference to the above embodiments, and is not described herein.
In this embodiment, a device for constructing a prognosis survival prediction model is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides a prognosis survival prediction model construction device, as shown in fig. 14, the device includes:
The acquisition module 301 is used for acquiring fluorescent sample images corresponding to the sample tissues one by one before treatment, wherein the fluorescent sample images are obtained after staining by a marker antibody according to myeloid cells, and the fluorescent sample images are also used for acquiring prognosis information of patients corresponding to the sample tissues, wherein the prognosis information comprises survival conditions of the patients after receiving different preset treatment means;
The segmentation module 302 is used for segmenting the myeloid cells in each sample tissue based on the fluorescent sample image and determining the fluorescent expression quantity of each myeloid cell on different markers, wherein the segmentation module comprises the steps of carrying out regional division on each fluorescent sample image to obtain a plurality of regional images corresponding to each fluorescent sample image;
An annotation module 303 for determining a myeloid cell type annotation for each myeloid cell based on the fluorescent expression levels of each myeloid cell on different markers and a cell type annotation algorithm;
A distance feature extraction module 304 for determining a spatial distance between each myeloid cell in each sample tissue to other myeloid cells of different types based on the myeloid cell type annotation, obtaining a cell spatial distance feature;
an interaction feature extraction module 305 for determining a spatial interaction relationship between each myeloid cell in each sample tissue and surrounding cells based on the myeloid cell type annotation, obtaining a cell interaction feature;
The dividing module 306 is configured to divide the sample organization into a training set and a testing set according to a preset proportion and prognosis information;
The construction module 307 is configured to construct a prognosis survival prediction model based on the cell space distance features and the cell interaction features in the training set and the test set, where the prognosis survival prediction model is used to predict the survival condition of the pathological sample after the intervention.
In an alternative embodiment, the segmentation module 302 includes:
The device comprises a segmentation unit, a cell nucleus position determination unit, a segmentation unit and a calculation unit, wherein the segmentation unit is used for carrying out regional division on each fluorescent sample image to obtain a plurality of regional images corresponding to each fluorescent sample image, determining the cell nucleus position of a marrow cell according to the fluorescent expression quantity corresponding to the regional images, segmenting the complete marrow cell corresponding to the cell nucleus based on the cell nucleus position, determining all the fluorescent expression quantity corresponding to each marker in the complete marrow cell, calculating the average fluorescent expression quantity corresponding to each marker one by one, and taking the average fluorescent expression quantity as the fluorescent expression quantity on the corresponding marker.
In an alternative embodiment, each region image includes a cancerous region and a paracancerous region, and the distance feature extraction module 304 includes:
The distance feature extraction unit is used for determining a first Euclidean distance from each myeloid cell type to other different types of myeloid cells in a cancer area included in each area image, determining a second Euclidean distance from each myeloid cell type to other different types of myeloid cells in a beside-cancer area included in each area image, determining a first spatial distance from each myeloid cell to other different types of myeloid cells in all cancer areas of a sample tissue based on all the first Euclidean distances, wherein the first spatial distance comprises a first minimum spatial distance, a first maximum spatial distance and a first average spatial distance, determining a second spatial distance from each myeloid cell to other different types of myeloid cells in all beside-cancer areas of the sample tissue based on all the second Euclidean distances, wherein the second spatial distance comprises a second minimum spatial distance, a second maximum spatial distance and a second average spatial distance, and taking the first spatial distance and the second spatial distance as cell spatial distance features.
In an alternative embodiment, each region image includes a cancerous region and a paracancerous region, and the interaction feature extraction module 305 includes:
An interaction feature extraction unit for determining a first interaction feature of an interaction relationship between each myeloid cell in the cancer area and surrounding cells in the interaction relationship in each area image, determining a second interaction feature of an interaction relationship between each myeloid cell in the paracancer area and surrounding cells in the interaction relationship in each area image, wherein the first interaction feature and the second interaction feature comprise the ratio of each myeloid cell type in the interaction cells, and taking the first interaction feature and the second interaction feature as the cell interaction features.
In an alternative embodiment, the apparatus further comprises:
The processing module is used for carrying out data processing and feature screening processing on the distance features among the same marrow cell types in the cell space distance features to obtain processed cell space distance features, and the processed cell space distance features are used for constructing a prognosis survival prediction model.
The prognosis survival prediction model constructing means in this embodiment is presented in the form of functional units, where the units refer to ASIC circuits, processors and memories executing one or more software or fixed programs, and/or other devices that can provide the above functions.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The embodiment of the invention also provides a computer device which is provided with the prognosis survival prediction model construction device shown in the figure 14.
Referring to fig. 15, fig. 15 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, and as shown in fig. 15, the computer device includes one or more processors 10, a memory 20, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 15.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The memory 20 may comprise volatile memory, such as random access memory, or nonvolatile memory, such as flash memory, hard disk or solid state disk, or the memory 20 may comprise a combination of the above types of memory.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random-access memory, a flash memory, a hard disk, a solid state disk, or the like, and further, the storage medium may further include a combination of the above types of memories. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the existence of computer program instructions in a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and accordingly, the manner in which computer program instructions are executed by a computer includes, but is not limited to, the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled programs, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed programs. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.