CN119007827B - Method, system, storage medium and electronic equipment for expanding access database - Google Patents
Method, system, storage medium and electronic equipment for expanding access database Download PDFInfo
- Publication number
- CN119007827B CN119007827B CN202411488692.9A CN202411488692A CN119007827B CN 119007827 B CN119007827 B CN 119007827B CN 202411488692 A CN202411488692 A CN 202411488692A CN 119007827 B CN119007827 B CN 119007827B
- Authority
- CN
- China
- Prior art keywords
- lipid
- pathway
- obtaining
- path
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method, a system, a storage medium and electronic equipment for expanding a path database. The pathway database expansion method comprises the steps of obtaining a lipid-pathway annotation relation of a lipid pathway database, obtaining lipid level information according to a lipid classification system, and expanding the lipid-pathway annotation relation according to the lipid level information, wherein for any father-level lipid, if the father-level lipid is annotated to a first pathway, all the son-level lipids of the father-level lipid are annotated to the first pathway. The access database expansion method can improve the accuracy of lipid identification and mapping.
Description
Technical Field
The application belongs to the technical field of lipidomics, relates to a database expansion method, and in particular relates to a path database expansion method, a system, a storage medium and electronic equipment.
Background
The core analysis of lipidomics consists in linking the detected lipids with their biological functions, in order to gain insight into the role and influence of these lipids in biological systems. A common strategy is to map the lipid list of interest onto the metabolic data set by various statistical algorithms using metabolic pathway databases, such as KEGG (Kyoto Encyclopedia of Genes and Genomes) and Reactome, or in a genome-scale metabolic network (GSMN). By this means, researchers can recognize and interpret the synergistic effects of lipids in biological contexts, revealing changes in lipid metabolism under different physiological and pathological conditions. However, there is a lack of methods for expanding the lipid pathway databases in the prior art.
Disclosure of Invention
The application aims to provide a method, a system, a storage medium and electronic equipment for expanding a lipid pathway database.
In a first aspect, the application provides a pathway database expansion method based on lipid level information, which comprises the steps of obtaining lipid-pathway annotation relations of a lipid pathway database, obtaining lipid level information according to a lipid classification system, and expanding the lipid-pathway annotation relations according to the lipid level information, wherein for any father-level lipid, if the father-level lipid is annotated to a first pathway, all the son-level lipids of the father-level lipid are annotated to the first pathway.
In one implementation of the first aspect, for any second pathway in the lipid pathway database, the method further comprises obtaining a node importance factor for the second pathway corresponding to a detected lipid, obtaining a coverage adjustment factor and a topology adjustment factor for the second pathway corresponding to the detected lipid, and obtaining a pathway score for the second pathway based on the node importance factor, the coverage adjustment factor and the topology adjustment factor.
In one implementation of the first aspect, obtaining the node importance factor of the second pathway corresponding to the detected lipids includes obtaining an importance adjustment factor of the nodes in the second pathway, obtaining the detected lipids matched thereto for each node in the second pathway, obtaining an individual influence factor for each matched lipid, and obtaining the node importance factor based on the individual influence factors and the importance adjustment factors.
In one implementation of the first aspect, obtaining the coverage adjustment factor for the second pathway corresponding to the detected lipid includes obtaining a quotient between the number of detected lipids and the number of nodes in the second pathway as an initial coverage, and obtaining the coverage adjustment factor based on the initial coverage.
In one implementation manner of the first aspect, obtaining the topology adjustment factor of the second path corresponding to the detected lipid includes identifying a linear path and a branch node in the second path, and configuring the topology adjustment factor according to a change condition of the node in the linear path and/or a change condition of the branch node.
In one implementation of the first aspect, the method further includes performing a permutation test to obtain a permutation score, calculating an original p-value from the path score and the permutation score, and correcting the original p-value to obtain a corrected p-value.
In one implementation manner of the first aspect, the method further includes performing enrichment analysis based on the extended path database.
In a second aspect, the embodiment of the application provides a pathway database expansion system based on lipid hierarchy information, which comprises an annotation relation acquisition module, a hierarchy information acquisition module and an expansion module, wherein the annotation relation acquisition module is used for acquiring lipid-pathway annotation relations of a lipid pathway database, the hierarchy information acquisition module is used for acquiring lipid hierarchy information according to a lipid classification system, the expansion module is used for expanding the lipid-pathway annotation relations according to the lipid hierarchy information, and for any father-level lipid, if the father-level lipid is annotated to a first pathway, all the son-level lipids of the father-level lipid are annotated to the first pathway.
In a third aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first aspects of the embodiments of the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory storing a computer program, and a processor, communicatively connected to the memory, and executing the method according to any one of the first aspect of the embodiments of the present application when the computer program is invoked.
As described above, the method, the system, the storage medium and the electronic device for expanding the path database provided by the embodiment of the application have the following beneficial effects:
The embodiment of the application can expand the lipid-pathway annotation relationship according to the lipid level information. Wherein for any parent lipid, if the parent lipid is annotated to a pathway, all child lipids of the parent lipid are annotated to the pathway. The expanded lipid pathway database can improve the accuracy of lipid identification and mapping. In addition, the topological structure of the metabolic network is considered in the expansion process, so that the effect of the lipid in the biological system can be estimated more comprehensively.
Drawings
Fig. 1 shows a flowchart of a method for expanding a path database according to an embodiment of the present application.
FIG. 2 is a flow chart of obtaining a path score for a second path in an embodiment of the application.
FIG. 3 is a flow chart of obtaining node importance factors according to an embodiment of the present application.
FIG. 4 is a flowchart of obtaining coverage adjustment factors according to an embodiment of the present application.
Fig. 5 is a flowchart illustrating the acquisition of topology adjustment factors according to an embodiment of the present application.
Fig. 6 is a flowchart for acquiring and correcting p values in an embodiment of the present application.
FIG. 7 shows a flow chart of enrichment analysis in an embodiment of the application.
Fig. 8 is a schematic structural diagram of a path database expansion system according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of element reference numerals
| 800 | Access database expansion system |
| 810 | Annotation relationship acquisition module |
| 820 | Hierarchical information acquisition module |
| 830 | Expansion module |
| 900 | Electronic equipment |
| 801 | Processor and method for controlling the same |
| 902 | Memory device |
| 9021 | Operating system |
| 9022 | Application program |
| 903 | Network interface |
| 904 | System bus |
| 905 | User interface |
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present application, unless explicitly specified and limited otherwise, the terms "connected," "coupled," and the like are to be construed broadly, and may be mechanically coupled or electrically coupled, may be directly coupled or indirectly coupled via an intermediate medium, and may be in communication with each other or in an interaction relationship between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.
One of the key steps in pathway analysis is to match laboratory detected lipid molecules with metabolites recorded in the pathway database. However, the lipid mapping process is much more complex than other types of metabolites. On the one hand, general nodes are often used in databases to represent certain lipid classes, and it is difficult for high-precision experimental data to find exact matches in the database. On the other hand, a common node in the database, such as phosphatidylcholine or triacylglycerol, may correspond to a number of specific lipid molecules detected in the laboratory. For example, in analyzing the PC (diacylphosphorylcholine) family, mass spectra detect PC (34:1), PC (34:2), and if all these molecules correspond to one node PC in the pathway database, the exact match will result in failure of the match, so that part of the pathway cannot be enriched, thereby seriously affecting the downstream interpretation of experimental data.
Furthermore, the related art focuses on simple lipid-pathway mapping when performing lipid analysis, and ignores the overall structure of the metabolic network. This simplified approach fails to capture the true role and interrelationship of lipids in complex biological systems. Furthermore, most enrichment analysis algorithms assume that metabolites are independent of each other and that metabolic pathways are in fact a complete network, which assumption is clearly not consistent with biologically realistic phenomena.
At least in view of the above problems, embodiments of the present application provide a method for expanding a pathway database based on lipid level information. The access database expansion method can expand the lipid-access annotation relationship according to the lipid level information. Wherein for any parent lipid, if the parent lipid is annotated to a pathway, all child lipids of the parent lipid are annotated to the pathway. The expanded lipid pathway database can improve the accuracy of lipid identification and mapping. In addition, the topological structure of the metabolic network is considered in the expansion process, so that the effect of the lipid in the biological system can be estimated more comprehensively.
The following describes the technical solution in the embodiment of the present application in detail with reference to the drawings in the embodiment of the present application.
Fig. 1 shows a flowchart of a method for expanding a path database according to an embodiment of the present application. As shown in fig. 1, the path database expansion method includes the following steps S11 to S13.
S11, obtaining lipid-pathway annotation relations of a lipid pathway database. Among them, lipid pathway databases are information resources for metabolic pathways, functions and interactions, intended to help researchers understand the biological roles and mechanisms of lipids inside and outside cells. The lipid pathway database includes, for example, KEGG, reactome, lipidMap, SABIO-RK, metaCyc, bioCyc, etc.
Metabolic pathways refer to a series of interrelated biochemical reactions and processes involving synthesis, transformation, breakdown, regulation, etc. of lipids. Metabolic pathways play an important role in the normal function of cells and organisms, including the construction of cell membranes, energy storage, signaling, and regulation of cellular functions.
Lipid-pathway annotation relationship refers to the correspondence between lipid molecules and metabolic pathways. The role of a particular lipid in the metabolic pathway and its impact on biological processes can be understood by this annotated relationship.
And S12, acquiring lipid level information according to a lipid classification system. Lipid level information includes parent and subset relationships of lipids.
In some implementations, LIPIDMAP may be downloaded from which information of the main class, subclass, specific molecular class, molecular structure, etc. of each lipid molecule is extracted to construct lipid level information.
And S13, expanding the lipid-pathway annotation relation according to the lipid level information. In an extension, for any parent lipid, if the parent lipid is annotated to a certain pathway (hereinafter referred to as a first pathway), all child lipids of the parent lipid are annotated to the first pathway. Wherein the child-level lipids of the parent-level lipid include direct child-level lipids and indirect child-level lipids.
Illustratively, if lipid a is annotated to pathway b, the next-level sub-level lipids of lipid a are a1 and a2, and the next-level sub-level lipids of lipid a1 are a11 and a12. When a laboratory detected lipid (simply referred to as detected lipid) corresponds to a11, the detected lipid cannot correspond to the original lipid pathway database. After the database expansion method provided by the embodiment of the application is adopted, the lipids a1, a2, a11 and a12 are annotated to the pathway b, and the detected lipid can be corresponding to the expanded lipid pathway database, so that the accuracy of lipid identification and mapping can be improved.
In some implementations, the pathway database expansion method may further include locating its position in the ontology for each detected lipid, expanding each detected lipid to all its parent lipids including direct parent lipids and indirect parent lipids according to lipid level information, and calculating weights of detected lipids and their parent lipids of each level, the weights being used to evaluate the distance of detected lipids and parent lipids.
Illustratively, the weight calculation formula may be w=1-0.2 d, where d is the hierarchical distance of a certain level of parent lipids from the detected lipids or their immediate parent. For example, for a detected lipid and its immediate parent, the weight W is 1, for a detected lipid's upper two-level parent, the weight w=0.8, for a detected lipid's upper three-level parent, the weight w=0.6, and so on.
Referring to fig. 2, in some implementations, for any pathway in the lipid pathway database (hereinafter referred to as a second pathway), the pathway database expansion method further includes the following steps S21 to S23.
S21, acquiring node importance factors of the second path corresponding to the detected lipid.
S22, obtaining coverage rate adjustment factors and topology adjustment factors of the second path corresponding to the detected lipid.
S23, obtaining the path score of the second path according to the node importance factor, the coverage rate adjustment factor and the topology adjustment factor.
Referring to fig. 3, in some implementations, acquiring the node importance factor for the second pathway corresponding to detecting lipids includes the following steps S31 to S34.
S31, obtaining importance adjustment factors of nodes in the second path.
S32, for each node in the second path, acquiring the detection lipid matched with the node. Wherein, for a node, the detected lipids that match the node include detected lipids that directly match the node and detected lipids that match the parent class of the node (including direct parent class and indirect parent class).
S33, obtaining individual influence factors of each matched lipid. The matching lipid is the detected lipid that matches the node in the second path obtained in step S32.
Illustratively, for a matched lipid i, its individual influencing factor if_i can be derived by:
IF_i =log2FC_i*(-log10(p_value_i))*W_i;
The log2FC_i is used for measuring the variation degree of the expression level of the matched lipid i under different experimental conditions, and the p_value_i is the p value of the matched lipid i and is used for evaluating the significance level between observed experimental data and statistical hypothesis, and the p_value_i can be obtained by performing differential expression analysis on the matched lipid i. W_i is the matching weight of matching lipid i, if matching lipid i is directly matched with a certain node, w_i=1, otherwise w_i=0.8.
S34, acquiring node importance factors according to the individual influence factors and the importance adjustment factors. Specifically, for any node c in the second path, the node importance factor if_node_c of that node c is if_node_c= (Σif_i)/N, where N is the number of detected lipids that match node c. The node importance factor for the second pathway corresponding to the detected lipid is the sum of the node importance factors for all nodes in the second pathway.
In some implementations, obtaining the importance adjustment factor for the nodes in the second path includes S311 through S312.
S311, obtaining the normalization degree centrality of the nodes in the second path. The centrality of a node refers to the number of edges directly connected to the node, including the in-degree centrality and the out-degree centrality. Wherein the ingress center line is the number of edges pointing to the node, and the egress center line is the number of edges starting from the node, and total centrality = ingress centrality + egress centrality. For comparison between metabolic networks of different sizes, the ensemble center line may be normalized to obtain normalized ensemble centrality. Illustratively, normalized centrality = total centrality/(N-1), where N is the total number of nodes in the metabolic network.
And S312, mapping the normalization degree centrality to a target interval to obtain an importance adjustment factor.
Illustratively, the target interval is, for example, [0.8,1.2], which in embodiments of the present application can be mapped using the following equation:
importance adjustment factor = 0.8+0.4 (S-s_min)/(s_max-s_min);
Where S is the normalisation centrality, s_min and s_max are the minimum and maximum values of S in all nodes, respectively.
The importance adjustment factor can be obtained through the above steps S311 and S312. Wherein the least important node has an importance adjustment factor of 0.8, the impact of which is slightly reduced. The importance adjustment factor of the most important node is 1.2, the impact of which is slightly increased. The importance adjustment factor of most nodes is about 1, and the original influence is kept unchanged.
Referring to fig. 4, in some implementations, obtaining the coverage adjustment factor for the second pathway corresponding to detecting lipids includes S41 and S42.
S41, obtaining the quotient between the quantity of the detected lipid and the quantity of the nodes in the second path as the initial coverage rate.
S42, acquiring a coverage rate adjustment factor according to the initial coverage rate. Illustratively, the coverage adjustment factor may be the square root of the initial coverage, but the application is not limited thereto.
Referring to fig. 5, in some implementations, acquiring the topological adjustment factor for the second pathway corresponding to detecting the lipid includes S51 and S52.
S51, identifying a linear path and a branch node in the second path.
S52, the topology adjustment factors are configured according to the change condition of the nodes in the linear path and/or the change condition of the branch nodes.
Illustratively, an initial value, for example, 1, may be configured for the topology adjustment factor, and the initial value is adjusted according to the change condition of the node in the linear path and/or the change condition of the branch node to obtain a final topology adjustment factor.
Illustratively, since experimental data is unknown, the lipid of the node may rise or fall compared to the reference group. For a linear path, if the variation of three or more nodes in succession is consistent, e.g., up-or co-down, then a bonus is awarded, e.g., 1.5 times. For a branch node, if the branch node is consistent with the changes of its upstream and downstream nodes, then a bonus is awarded, for example 1.2 times.
In some implementations, obtaining the pathway score for the second pathway from the node importance factor, the coverage adjustment factor, and the topology adjustment factor includes obtaining a product of the node importance factor, the coverage adjustment factor, and the topology adjustment factor for the second pathway corresponding to the detected lipid as the pathway score for the second pathway. The pathway score may be used to determine whether the second pathway is an important pathway under the corresponding experimental conditions (e.g., under a certain disease state). Based on the pathway scores, researchers can identify which biological pathways are of significant importance under specific experimental conditions, providing directions for subsequent studies. Furthermore, in disease studies, pathway scores may help reveal the underlying pathological mechanisms of disease, and by comparing pathway scores in disease groups and control groups, researchers may discover which pathways play an important role in disease occurrence and progression. In drug development, potential drug targets can be discovered by identifying second pathways active under specific conditions, which helps design small molecule drugs or biologics that can interfere with these pathways.
Referring to fig. 6, in some implementations, the path database expansion method may further include the following steps S61 to S63.
And S61, performing a replacement test to obtain a replacement score. For example, 1000 substitutions may be made, each randomly disrupting the expression value signature of the original lipid. For each permutation, the path scores are recalculated, and these path scores constitute the permutation scores.
S62, calculating an original p value according to the path score and the replacement score of the second path. For example, a ratio in which the path score of the second path is larger than the replacement score may be calculated as the p value.
S63, correcting the original p value to obtain a corrected p value. For example, FDR correction can be performed by using the Benjamini-Hochberg method to obtain corrected p-value, but the application is not limited thereto.
In some implementations, the pathway database expansion method may further include generating a table containing names of the second pathway, pathway scores, original p-values, corrected p-values.
The process of obtaining the passage score will be described below by way of one specific example. In this example, the experimentally detected data includes :PC(16:0/18:1),log2FC=1.5,p-value=0.001;PC(18:0/20:4),log2FC= - 0.8,p-value=0.01;LPC(16:0),log2FC=0.5,p-value=0.05;PA(16:0/18:1),log2FC=1.2,p-value=0.008.
Taking the second path as a glycerophospholipid metabolic path as an example, the result of lipid level expansion of the detected lipid is as follows, PC (16:0/18:1) -Phosphatidylcholine (PC) -glycerophospholipid;
PC (18:0/20:4) -Phosphatidylcholine (PC) -glycerophospholipids;
LPC (16:0) -Lysophosphatidylcholine (LPC) - > Phosphatidylcholine (PC);
PA (16:0/18:1) -Phosphatidic Acid (PA) > glycerophospholipids.
Based on the extended pathway database, PC (16:0/18:1), PC (18:0/20:4) and LPC (16:0) are detection lipids matched with the PC nodes, and PA (16:0/18:1) is detection lipids matched with the PA nodes.
For the original detected lipid and its immediate parent, its weight is configured to be 1.0, and for higher-level parents, such as PC, its weight is configured to be 0.8.
Assuming that there are 100 nodes in the glycerophospholipid metabolism pathway, the nodes matched with the detected lipid are PC nodes and PA nodes, and acquiring the importance factor of the nodes of the glycerophospholipid metabolism pathway corresponding to the detected lipid comprises:
for the PC node, the individual influencing factors of the detected lipids matched to this are:
PC(16:0/18:1):IF_1=1.5*(-log10(0.001))*1=4.5;
PC(18:0/20:4):IF_2= - 0.8*(-log10(0.01))*1= - 1.6;
LPC(16:0):IF_3=0.5*(-log10(0.05))*0.8=0.5*1.3*0.8=0.52。
IF the importance adjustment factor of the PC node is 1.1, the node importance factor of the PC node is if_pc= (4.5+ (-1.6+0.52)/3×1.1=1.254.
For the PA node, the individual impact factor of the detected lipid matched to it is PA (16:0/18:1) if_4=1.2 (-log 10 (0.008)). Times.1= 2.5164.
IF the importance adjustment factor of the PA node is 1.05, the node importance factor of the PA node is if_pa=2.5164×1.05= 2.64222.
The node importance factor (or initial score) of the glycerophospholipid metabolic pathway relative to the detected lipid is if_pc+if_pa=1.254+2.64222= 3.89622.
Initial coverage=4/100=0.04, and coverage adjustment factor is 0.2.PC to PA forms a path that varies consistently, giving a 1.2 fold prize, topology adjustment factor = 1.2. The channel score of the glycerophospholipid metabolic channel was 3.89622 x 0.2 x 1.2= 0.9350928. The p-value obtained by substitution test was 0.003 and the p-value after FDR correction was 0.01.
In some implementations, the pathway database expansion method further includes performing enrichment analysis based on the expanded pathway database.
For example, referring to fig. 7, the enrichment analysis may include the following steps S71 to S73.
S71, obtaining a group of lipid lists of interest. Illustratively, the process may include:
i. Lipidomic data was collected using liquid chromatography-mass spectrometry (LC-MS/MS).
And ii, carrying out data normalization processing and/or missing value processing on the acquired data. The method of data normalization is, for example, total ion current intensity normalization or internal standard normalization. The method of processing the missing values is, for example, a minimum value substitution method or a multiple interpolation method.
Matching is performed using a database, e.g., LIPIDMAPS, for lipid identification.
Identification of significantly altered lipids using statistical methods, such as t-test or ANOVA. Screening criteria are set, for example p-value <0.05 and fold change >1.5 or 2 fold.
V. the lipids satisfying statistical and biological significance are summarized into a lipid list.
S72, utilizing the interesting lipid list to intersect with the compound of a certain channel in the expanded channel database, finding out the common compound and counting. Illustratively, the process may include:
i. And obtaining metabolic pathway information according to the extended pathway database. And analyzing the path file, and extracting a compound list in each path. And establishing a mapping relation between the compound ID and the path.
The lipid ID detected experimentally is converted to a standard ID used in the pathway database.
For each pathway, calculate the intersection of its compound set with the list of lipids of interest. The intersection size (i.e., the number of common compounds) of each pathway is recorded.
A background set is defined, which can be all lipids detected in the experiment or all lipids in a database. The intersection size of the background set with each path is calculated.
S73, whether the observed count value is higher than random is evaluated by using a statistical test to judge whether the channel is significantly enriched. The process includes, for example:
i. the statistical model is selected, for example, using a super-geometric distribution model, fisher' sexacttest, or chi-square test.
For each channel, the P-value is calculated based on four numbers of lipid numbers (N) in the lipid list of interest associated with the channel, the total size (N) of the lipid list of interest, the lipid numbers (M) of the channel in the background set, the total size (T) of the background set P (x+.n) = Σ (x=n to min (N, M)) C (M, X) ×c (T-M, N-X)/C (T, N).
A significance threshold is set, for example, a p-value <0.05 after correction can be selected as the significance threshold.
Because of the extended pathway database, the child-level lipids of the parent-level lipids are annotated to the pathway corresponding to the parent-level lipids. Therefore, even if an accurate matching method is adopted, accurate matching can be realized in the embodiment of the application. In addition, the metabolic network topology structure is fully considered in the embodiment of the application, so that the effect of the lipid in the biological system can be more comprehensively evaluated.
The protection scope of the method for expanding the path database provided by the embodiment of the application is not limited to the execution sequence of the steps listed in the embodiment, and all the schemes of step increase and decrease and step replacement in the prior art according to the principles of the application are included in the protection scope of the application.
The embodiment of the application also provides a path database expansion system which can realize the path database expansion method provided by the embodiment of the application, but the implementation device of the path database expansion method provided by the embodiment of the application comprises but is not limited to the structure of the path database expansion system listed by the embodiment of the application, and all structural variations and substitutions of the prior art according to the principles of the embodiment of the application are included in the protection scope of the application.
Fig. 8 is a schematic structural diagram of a path database expansion system according to an embodiment of the present application. As shown in fig. 8, the pathway database expansion system 800 includes an annotation relationship acquisition module 810, a hierarchical information acquisition module 82, and an expansion module 830. The annotation relationship acquisition module 810 is configured to acquire lipid-pathway annotation relationships of a lipid pathway database. The hierarchy information acquisition module 820 is configured to acquire lipid hierarchy information according to a lipid classification hierarchy. The expansion module 830 is configured to expand the lipid-pathway annotation relationship according to the lipid hierarchy information. Wherein for any parent lipid, if a parent lipid is annotated to the first pathway, all child lipids of the parent lipid are annotated to the first pathway.
It should be noted that, the annotation relationship acquiring module 810, the hierarchical information acquiring module 82 and the expanding module 830 in the path database expanding system 800 are in one-to-one correspondence with steps S11 to S13 in the path database expanding method shown in fig. 1, and are not described in detail herein.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for expanding the path database provided by the embodiment of the application. Those of ordinary skill in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct a processor, where the program may be stored in a computer readable storage medium, where the storage medium is a non-transitory (non-transitory) medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (MAGNETIC TAPE), a floppy disk (floppy disk), a compact disk (optical disk), and any combination thereof. The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Drive (SSD)), or the like.
The embodiment of the application also provides electronic equipment. Fig. 9 is a schematic block diagram of an electronic device provided by an embodiment of the present application. As shown in fig. 9, the electronic device 900 includes at least one processor 901, memory 902, at least one network interface 903, and a user interface 905. The various components in the electronic device 900 are coupled together by a bus system 904. It is to be appreciated that the bus system 904 is employed to facilitate connected communications between these components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus systems in fig. 9.
The user interface 905 may include, among other things, a display, keyboard, mouse, trackball, click gun, keys, buttons, touch pad, or touch screen, etc.
It is to be appreciated that the memory 902 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), which serves as an external cache, among others. By way of example, and not limitation, many forms of RAM are available, such as static random Access Memory (SRAM, staticRandom Access Memory), synchronous static random Access Memory (SSRAM, synchronous Static RandomAccess Memory). The memory described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 902 in embodiments of the application is used to store various categories of data to support the operation of the electronic device 900. Examples of such data include any executable programs for operation on the electronic device 900, such as the operating system 9021 and application programs 9022, the operating system 9021 containing various system programs, such as framework layers, core library layers, driver layers, etc., for implementing various underlying services and processing hardware-based tasks. The application 9022 may include various applications such as a media player (MEDIA PLAYER), browser (Browser), etc. for implementing various application services. The method for expanding the path database provided by the embodiment of the application can be contained in the application program 9022.
The method disclosed in the above embodiment of the present application may be applied to the processor 901 or implemented by the processor 901. Processor 901 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 901 or instructions in the form of software. The Processor 901 may be a general purpose Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 901 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor 901 may be a microprocessor or any conventional processor or the like. The steps of the accessory optimization method provided by the embodiment of the application can be directly embodied as the execution completion of the hardware decoding processor or the execution completion of the hardware and software module combination execution in the decoding processor. The software modules may be located in a storage medium having memory and a processor reading information from the memory and performing the steps of the method in combination with hardware.
In an exemplary embodiment, the electronic device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device) for performing the aforementioned methods.
Embodiments of the present application may also provide a computer program product comprising one or more computer instructions. When the computer instructions are loaded and executed on a computing device, the processes or functions in accordance with embodiments of the present application are fully or partially developed. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, or data center to another website, computer, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
The computer program product is executed by a computer, which performs the method according to the preceding method embodiment. The computer program product may be a software installation package, which may be downloaded and executed on a computer in case the aforementioned method is required.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block) and steps (steps) described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the above-described embodiments, the functions of the respective functional units may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
In summary, the embodiment of the application provides a method, a system, a storage medium and electronic equipment for expanding a path database. The embodiment of the application can expand the lipid-pathway annotation relationship according to the lipid level information. Wherein for any parent lipid, if the parent lipid is annotated to a pathway, all child lipids of the parent lipid are annotated to the pathway. The expanded lipid pathway database can improve the accuracy of lipid identification and mapping. In addition, the topological structure of the metabolic network is considered in the expansion process, so that the effect of the lipid in the biological system can be estimated more comprehensively. Therefore, the application effectively overcomes various defects in the prior art and has higher industrial value.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.
Claims (6)
1. A method for expanding a pathway database based on lipid level information, comprising:
obtaining a lipid-pathway annotation relationship of a lipid pathway database;
Obtaining lipid level information according to a lipid classification system;
Expanding the lipid-pathway annotation relationship according to the lipid hierarchy information, wherein for any parent lipid, if the parent lipid is annotated to a first pathway, all child lipids of the parent lipid are annotated to the first pathway;
for any second pathway in the lipid pathway database, acquiring a node importance factor of the second pathway corresponding to the detected lipid;
Obtaining a coverage adjustment factor and a topology adjustment factor of the second pathway corresponding to the detected lipid;
obtaining a path score of the second path according to the node importance factor, the coverage rate adjustment factor and the topology adjustment factor;
The method comprises the steps of obtaining importance adjustment factors of nodes in a second path, obtaining the detected lipids matched with each node in the second path, obtaining individual influence factors of each matched lipid, obtaining the node importance factors according to the individual influence factors and the importance adjustment factors, and obtaining the node importance factors according to the individual influence factors and the importance adjustment factors;
Obtaining a coverage adjustment factor of the second pathway corresponding to the detected lipid includes obtaining a quotient between the number of detected lipids and the number of nodes in the second pathway as an initial coverage;
The method comprises the steps of obtaining a topology adjustment factor of a second path corresponding to the detected lipid, wherein the topology adjustment factor comprises the steps of identifying a linear path and a branch node in the second path, and configuring the topology adjustment factor according to the change condition of the node in the linear path and/or the change condition of the branch node;
The method comprises the steps of obtaining the normalization degree centrality of the nodes in the second path, mapping the normalization degree centrality to a target interval to obtain the importance adjustment factor;
For matching lipid i, its individual influencing factor if_i can be obtained by:
IF_i=log2FC_i*(-log10(p_value_i))*W_i;
Wherein log2fc_i is used to measure the degree of change in the expression level of the matched lipid i under different experimental conditions, p_value_i is the p value of the matched lipid i, for evaluating the significance level between observed experimental data and statistical assumptions, w_i is the matching weight of the matched lipid i, if the matched lipid i is directly matched with a certain node, w_i=1, otherwise w_i=0.8.
2. The pathway database expansion method of claim 1, further comprising:
Performing a permutation test to obtain a permutation score;
calculating an original p-value from the pathway score and the permutation score;
And correcting the original p value to obtain a corrected p value.
3. The method of claim 1, further comprising performing enrichment analysis based on the extended pathway database.
4. A pathway database expansion system based on lipid hierarchy information for implementing the pathway database expansion method of any one of claims 1 to 3, the system comprising:
The annotation relation acquisition module is used for acquiring the lipid-pathway annotation relation of the lipid pathway database;
The level information acquisition module is used for acquiring lipid level information according to the lipid classification system;
and the expansion module is used for expanding the lipid-pathway annotation relation according to the lipid level information, wherein for any father-level lipid, if the father-level lipid is annotated to a first pathway, all the son-level lipids of the father-level lipid are annotated to the first pathway.
5. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any one of claims 1 to 3.
6. An electronic device, the electronic device comprising:
a memory storing a computer program;
a processor in communication with the memory, which when invoked performs the method of any one of claims 1 to 3.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411488692.9A CN119007827B (en) | 2024-10-24 | 2024-10-24 | Method, system, storage medium and electronic equipment for expanding access database |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411488692.9A CN119007827B (en) | 2024-10-24 | 2024-10-24 | Method, system, storage medium and electronic equipment for expanding access database |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119007827A CN119007827A (en) | 2024-11-22 |
| CN119007827B true CN119007827B (en) | 2025-05-23 |
Family
ID=93474895
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411488692.9A Active CN119007827B (en) | 2024-10-24 | 2024-10-24 | Method, system, storage medium and electronic equipment for expanding access database |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119007827B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109712669A (en) * | 2018-12-05 | 2019-05-03 | 上海美吉生物医药科技有限公司 | A kind of protein function annotation method and system |
| CN116313155A (en) * | 2023-03-27 | 2023-06-23 | 北京中寰宸政科技有限公司 | A system and method for disease association evolution based on lipidomics method |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005085795A2 (en) * | 2004-03-02 | 2005-09-15 | Vanderbilt University | Lipids analysis |
| CA2977020A1 (en) * | 2015-02-06 | 2016-08-11 | Dh Technologies Development Pte. Ltd. | Lipid screening platform allowing a complete solution for lipidomics research |
| WO2017027559A1 (en) * | 2015-08-10 | 2017-02-16 | Massachusetts Institute Of Technology | Systems, apparatus, and methods for analyzing and predicting cellular pathways |
| CN117637032A (en) * | 2023-12-01 | 2024-03-01 | 国家卫生健康委科学技术研究所 | Disease pathogenic gene extraction annotation method, device, storage medium and terminal |
| CN118588169B (en) * | 2024-05-22 | 2025-07-25 | 青岛可立生物医药有限公司 | Method and system for constructing database of special disease group |
-
2024
- 2024-10-24 CN CN202411488692.9A patent/CN119007827B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109712669A (en) * | 2018-12-05 | 2019-05-03 | 上海美吉生物医药科技有限公司 | A kind of protein function annotation method and system |
| CN116313155A (en) * | 2023-03-27 | 2023-06-23 | 北京中寰宸政科技有限公司 | A system and method for disease association evolution based on lipidomics method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119007827A (en) | 2024-11-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Rosato et al. | From correlation to causation: analysis of metabolomics data using systems biology approaches | |
| Eden et al. | GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists | |
| Brohée et al. | Network Analysis Tools: from biological networks to clusters and pathways | |
| Paolini et al. | Global mapping of pharmacological space | |
| de Haan et al. | Interpretation of ANOVA models for microarray data using PCA | |
| Chang et al. | Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of ‘date’and ‘party’hubs | |
| Beckett et al. | FALCON: a software package for analysis of nestedness in bipartite networks | |
| Dong et al. | LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights | |
| TWI844091B (en) | Feature matching rule construction, feature matching method, device, equipment and medium | |
| Treen et al. | SIMILE enables alignment of tandem mass spectra with statistical significance | |
| Sloutsky et al. | Accounting for noise when clustering biological data | |
| CN118212994B (en) | Metabonomics data processing method, device, equipment and readable storage medium | |
| Wandy et al. | GraphOmics: an interactive platform to explore and integrate multi-omics data | |
| Liang | False discovery rate estimation for large‐scale homogeneous discrete p‐values | |
| Städler et al. | Molecular heterogeneity at the network level: high-dimensional testing, clustering and a TCGA case study | |
| Rosenthal et al. | Mapping the common gene networks that underlie related diseases | |
| Sun et al. | Longitudinal network analysis reveals interactive change of schizophrenia symptoms during acute antipsychotic treatment | |
| Curk et al. | SNPsyn: detection and exploration of SNP–SNP interactions | |
| CN119007827B (en) | Method, system, storage medium and electronic equipment for expanding access database | |
| CN112270574B (en) | A method, device, equipment and medium for analyzing abnormal changes during activity execution | |
| CN112802546B (en) | A biological state characterization method, device, equipment and storage medium | |
| Wang et al. | PM-CNN: microbiome status recognition and disease detection model based on phylogeny and multi-path neural network | |
| Gopalacharyulu et al. | An integrative approach for biological data mining and visualisation | |
| Yu et al. | Investigating causal networks of dementia using causal discovery and natural language processing models | |
| Kakati et al. | X-Module: A novel fusion measure to associate co-expressed gene modules from condition-specific expression profiles: Tulika Kakati et al. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |