CN114927168A - Method for constructing biomechanically regulated bone reconstruction text mining interactive website - Google Patents
Method for constructing biomechanically regulated bone reconstruction text mining interactive website Download PDFInfo
- Publication number
- CN114927168A CN114927168A CN202210606098.XA CN202210606098A CN114927168A CN 114927168 A CN114927168 A CN 114927168A CN 202210606098 A CN202210606098 A CN 202210606098A CN 114927168 A CN114927168 A CN 114927168A
- Authority
- CN
- China
- Prior art keywords
- text
- gene
- database
- biomechanically
- classical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Biotechnology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及生物力学网站构建技术领域,尤其涉及一种生物力学调控骨改建文本挖掘交互网站构建方法。The invention relates to the technical field of biomechanical website construction, in particular to a method for constructing a text mining interactive website for biomechanical regulation and bone remodeling.
背景技术Background technique
骨组织先天性发育不足、发育异常、骨组织缺损或缺失是较为常见的临床问题,对患者颜面外观、心理健康、生活质量影响极大。对此,机械刺激、应力牵张等基于生物力学原理的治疗手段是目前较为安全可靠、高效经济的应对措施。因此,明确力学刺激下骨改建的生物分子机制,是进一步发展精准治疗、高效治疗的首要前提。目前,生物力学调控骨改建研究领域已具有海量研究数据,但信息分散,难以整合,因此构建高效获取重要信息的知识网络技术平台,将为快速推动该领域研究发展提供重要手段。Congenital hypoplasia, dysplasia, and defect or absence of bone tissue are relatively common clinical problems, which have a great impact on patients' facial appearance, mental health, and quality of life. In this regard, mechanical stimulation, stress-stretching and other treatment methods based on biomechanical principles are currently relatively safe, reliable, efficient and economical countermeasures. Therefore, clarifying the biomolecular mechanism of bone remodeling under mechanical stimulation is the primary prerequisite for the further development of precise and efficient treatment. At present, there is a large amount of research data in the field of biomechanical regulation of bone remodeling, but the information is scattered and difficult to integrate. Therefore, building a knowledge network technology platform to efficiently obtain important information will provide an important means to rapidly promote the research and development of this field.
阐明骨相关细胞对生物力学刺激的响应过程是骨生理、病理研究的基本前提。开放共享的知识平台极大的促进了近代科学的发展,但不断增长的出版物数量和海量信息使得研究者通过手动整理文献进行文献梳理及挖掘愈发困难。在大数据时代,采用机器语言处理模式,调用自然语言处理工具(NLP)来对生物医药相关文献进行整合梳理,是一种高效、可靠、具有极大潜力的应用模式。Elucidating the response process of bone-related cells to biomechanical stimuli is the basic premise of bone physiology and pathology research. The open and shared knowledge platform has greatly promoted the development of modern science, but the ever-increasing number of publications and massive amounts of information make it more and more difficult for researchers to sort out and mine documents manually. In the era of big data, using machine language processing mode and calling natural language processing tools (NLP) to integrate and sort out biomedical related literature is an efficient, reliable and potential application mode.
目前,Tagger、iTextMine、Geneshot等计算机语言工具可被用于区分生物医学文本中的专业术语及特定表达方式,为针对生物医学文本的计算机语言处理策略提供了可能。近年来,LION LBD、GLAD4U等,都利用NLP工具,进行生物文本挖掘,对数据进行整合梳理,提供研究相关信息。At present, computer language tools such as Tagger, iTextMine, and Geneshot can be used to distinguish professional terms and specific expressions in biomedical texts, which provides the possibility for computer language processing strategies for biomedical texts. In recent years, LION LBD, GLAD4U, etc. have used NLP tools to conduct biological text mining, integrate and sort out data, and provide research-related information.
然而,在骨相关生物力学研究领域,上述的文本研究工具却难以发挥有效作用,主要体现在以下几方面:However, in the field of bone-related biomechanics research, the above text research tools are difficult to play an effective role, mainly reflected in the following aspects:
1、编程能力限制:现存多数文本处理工具面向为具有一定编程能力的用户,如Tagger、iTextMine、Geneshot等,需要用户掌握一定的自然语言处理知识,而对于多数生物医学科研工作者而言,操作难以实现。1. Programming ability limitation: Most of the existing text processing tools are aimed at users with certain programming ability, such as Tagger, iTextMine, Geneshot, etc., which require users to master a certain knowledge of natural language processing. For most biomedical researchers, operation hard to accomplish.
2、背景数据库冗余:生物过程是是精确且有条件限制的,虽然现有的NLP工具能够提取并结构化存储的大量数据信息,但大多使用未经过滤的背景数据库,会造成不相关信息的纳入,造成结果的假阳性。对于特定的生物学领域,特别是生物力学这类相对小众的研究领域,难以在泛医学研究背景库内得到较好的搜索结果。因此,研究者需要一种有针对性的、更适合骨相关生物力学研究的NLP工具。2. Background database redundancy: Biological processes are precise and conditional. Although existing NLP tools can extract and store a large amount of data information in a structured way, most of them use unfiltered background databases, which will cause irrelevant information. included, resulting in false positive results. For a specific biological field, especially a relatively small research field such as biomechanics, it is difficult to obtain good search results in the pan-medical research background library. Therefore, researchers need a targeted NLP tool that is more suitable for bone-related biomechanical research.
3、缺乏可视化展示:对于复杂交互的网络结构而言,纯文本信息相较于图形化的展示方式,难以提供清晰、有逻辑性的框架结构,因此,本实施例需要一种可视化模式,对分子间的连接和交互关系进行梳理,以便于研究者能够快速了解通路信息并定位所需的目标。3. Lack of visual display: For the network structure of complex interaction, it is difficult to provide a clear and logical frame structure for plain text information compared with the graphical display method. Intermolecular connections and interactions are sorted out so that researchers can quickly understand pathway information and locate desired targets.
发明内容SUMMARY OF THE INVENTION
本申请为了解决上述技术问题提供一种生物力学调控骨改建文本挖掘交互网站构建方法。In order to solve the above technical problems, the present application provides a method for constructing a text mining interactive website for biomechanical regulation and bone reconstruction.
本申请通过下述技术方案实现:This application is achieved through the following technical solutions:
一种生物力学调控骨改建文本挖掘交互网站构建方法,所述方法包括:A method for constructing a text mining interactive website for biomechanical regulation of bone remodeling, the method comprising:
S1,根据相关词条筛选文献中基因信息文本词,获取基因分子互作关系对,构建文献数据库;S1, screen gene information text words in the literature according to related entries, obtain gene-molecule interaction relationship pairs, and construct a literature database;
S2,基于文献数据库中的基因分子互作关系对,采用权重算法计算目标检索因子与经典力学敏感通路的相关性;S2, based on the gene-molecular interaction pairs in the literature database, the weighting algorithm is used to calculate the correlation between the target retrieval factor and the classical mechanics-sensitive pathway;
S3,将目标检索因子与经典力学敏感通路之间的相关性进行可视化展示,并将经典力学敏感通路中的基因分子显示为互相连接的节点,通过单击节点之间的连线可以链接到文献数据库中相应的文献。S3. Visually display the correlation between the target retrieval factor and the classical mechanosensitive pathway, and display the gene molecules in the classical mechanosensitive pathway as interconnected nodes. By clicking the connection between the nodes, you can link to the literature corresponding literature in the database.
进一步的,所述步骤S1与步骤S2之间,还包括对PMC数据库进行深度神经网络训练,筛选带生物信息的文本关键词,构建语料库。Further, between the step S1 and the step S2, it also includes performing deep neural network training on the PMC database, screening text keywords with biological information, and constructing a corpus.
进一步的,所述生物信息包括力学类型、研究物种、细胞类型。Further, the biological information includes mechanical type, research species, and cell type.
优选地,所述步骤S1中相关词条包括生物力学、骨相关词条。Preferably, the related entries in the step S1 include biomechanics and bone related entries.
进一步的,所述步骤S1包括对基因信息文本词进行计算机语言归一化和预处理。Further, the step S1 includes computer language normalization and preprocessing on the gene information text words.
进一步的,所述步骤S1还包括采用PubTator识别基因信息文本词,并通过调用NCBI基因数据库的API将基因信息文本词转换为正式名称。Further, the step S1 also includes using PubTator to identify the gene information text words, and converting the gene information text words into official names by calling the API of the NCBI gene database.
进一步的,所述步骤S2中权重算法的公式为:Further, the formula of the weight algorithm in the step S2 is:
式中,r(g,p)为基因g与经典力学敏感通路p的相关系数,Ni表示经典力学敏感通路p中第i个基因在文献数据库中相关实体总数,Np为经典力学敏感通路p所有基因在文献数据库中相关实体的总数,Ωg、Ωp分别表示基因g和经典力学敏感通路p的集合。In the formula, r(g,p) is the correlation coefficient between gene g and the classical mechanosensitive pathway p, Ni represents the total number of related entities in the literature database of the ith gene in the classical mechanosensitive pathway p , and Np is the classical mechanosensitive pathway The total number of related entities of all genes of p in the literature database, Ω g and Ω p represent the set of gene g and classical mechanosensitive pathway p, respectively.
优选地,所述经典力学敏感通路包括Hippo、BMP、TGFβ、Wnt、Notch、PI3K/Akt、MAPK、Ras中的至少一种。Preferably, the classical mechanosensitive pathway includes at least one of Hippo, BMP, TGFβ, Wnt, Notch, PI3K/Akt, MAPK, and Ras.
进一步的,所述步骤S3中,还包括可视化展示目标检索因子在KEGG数据库中的通路信息。Further, in the step S3, it also includes visually displaying the path information of the target retrieval factor in the KEGG database.
进一步的,所述步骤S3中,还包括可视化展示目标检索因子在String数据中的基因分子互作关系对。Further, in the step S3, it also includes visually displaying the gene-molecule interaction relationship pairs of the target retrieval factor in the String data.
与现有技术相比,本申请具有以下有益效果:Compared with the prior art, the present application has the following beneficial effects:
1、使用网页工具提供开放搜索端口,便于用户自定义搜索范围,无需用户掌握复杂的计算机编程能力。1. Use web tools to provide an open search port, which is convenient for users to customize the search range, and does not require users to master complex computer programming skills.
2、通过设定严格的文献数据库纳入标准,明确了骨相关生物力学信息。对于复杂的骨相关生物力学调控网络而言,可在很大程度上过滤假阳性信息,使结果更为可信、有效。2. Bone-related biomechanical information was clarified by setting strict inclusion criteria for literature databases. For complex bone-related biomechanical regulatory networks, false positive information can be filtered to a large extent, making the results more credible and effective.
3、采用可视化网络图的模式,保证用户的交互操作,使计算机文献挖掘为研究者所用,以一种更用户友好模式,促进信息传播及理解。3. Adopt the mode of visual network diagram to ensure the user's interactive operation, make computer literature mining available to researchers, and promote information dissemination and understanding in a more user-friendly mode.
4、将经典力学敏感通路与文本挖掘结果相结合,使用户能够通过生物学通路来定位目标基因或基因集。基于文献数据库和权重算法,计算得出目标检索因子与各个经典力学敏感通路之间关联度,同时提供该通路及基因交互搜索,使得基因导航更具说服力和意义。4. Combining classical mechanosensitive pathways with text mining results enables users to locate target genes or gene sets through biological pathways. Based on the literature database and weighting algorithm, the correlation between the target retrieval factor and each classical mechanosensitive pathway is calculated, and the pathway and gene interactive search are provided at the same time, which makes gene navigation more convincing and meaningful.
附图说明Description of drawings
此处所说明的附图用来提供对本申请实施方式的进一步理解,构成本申请的一部分,并不构成对本发明实施方式的限定。The accompanying drawings described herein are used to provide a further understanding of the embodiments of the present application, and constitute a part of the present application, and do not constitute a limitation on the embodiments of the present invention.
图1是本发明的流程框图;Fig. 1 is the flow chart of the present invention;
图2是语料库的深度神经网络训练示意图;Fig. 2 is the deep neural network training schematic diagram of corpus;
图3是本发明的检索窗口界面图;Fig. 3 is the retrieval window interface diagram of the present invention;
图4是本发明的检索结果界面图;Fig. 4 is the retrieval result interface diagram of the present invention;
图5是图4中板块1-3的示意图;Fig. 5 is the schematic diagram of plate 1-3 in Fig. 4;
图6是图4中板块4-6的示意图;Fig. 6 is the schematic diagram of plate 4-6 in Fig. 4;
图7是图4中板块7的示意图。FIG. 7 is a schematic diagram of the
具体实施方式Detailed ways
为使本申请的目的、技术方案和有益效果更加清楚,下面将结合实施方式中的附图,对本发明实施方式中的技术方案进行清楚、完整地描述。显然,所描述的实施方式是本发明一部分实施方式,而不是全部的实施方式。通常在此处附图中描述和示出的本发明实施方式的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and beneficial effects of the present application clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments. Obviously, the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.
1.数据库构建和文本标记1. Database Construction and Text Markup
本发明网站根据“生物力学”、“骨”相关研究为中心,以关键词集作为文献资源的收录标准。筛选纳入了从2010年1月1日至2020年12月31日之间发表的34937篇文章,在对文本词进行计算机语言归一化和预处理后,每篇文章中的基因信息首先经PubTator识别,然后通过调用NCBI基因数据库的API转换为正式名称。The website of the present invention is centered on the related research of "biomechanics" and "bone", and takes the keyword set as the collection standard of literature resources. The screening included 34,937 articles published between January 1, 2010 and December 31, 2020. After computer language normalization and preprocessing of text words, the genetic information in each article was first processed by PubTator The identification is then converted to the official name by calling the API of the NCBI gene database.
Ncbi为ncbi entrez系统提供e-utilities api,并允许访问所有entrez数据库,包括pubmed、pmc、gene和protein,这有利于批处理和大量文本字检索(https://www.ncbi.nlm.nih.gov/home/develop/api/)。文本数据由每篇文章的标题和摘要组成,首先通过文本处理库自然语言工具包(nltk,http://www.nltk.org/)进行标记、解析和规范化,从而避免模糊的描述,并确保后续处理的可识别性。然后执行名称实体识别(ner)来提取所需的每篇论文的详细信息。一方面,pubtator(https://www.ncbi.nlm.nih.gov/research/pubtator/)作为一种成熟的生物医学术语识别工具,在识别模糊和复杂的生物医学术语名称方面取得了很好的效果,被用于对文本数据库中出现的基因和蛋白质进行标注。基因id随后通过biopython(https://biopython.org/)转化为基于访问ncbi基因数据库的标准名称。Ncbi provides the e-utilities api for the ncbi entrez system and allows access to all entrez databases, including pubmed, pmc, gene and protein, which facilitates batch processing and bulk text word retrieval (https://www.ncbi.nlm.nih. gov/home/develop/api/). Text data, consisting of the title and abstract of each article, are first tagged, parsed, and normalized by the text processing library Natural Language Toolkit (nltk, http://www.nltk.org/) to avoid ambiguous descriptions and ensure Identifiability for subsequent processing. Name entity recognition (ner) is then performed to extract the required details of each paper. On the one hand, pubtator (https://www.ncbi.nlm.nih.gov/research/pubtator/), as a mature biomedical term recognition tool, has achieved good results in identifying vague and complex biomedical term names The effect of , was used to annotate the genes and proteins present in the text database. Gene IDs were then converted to standard names based on access to the ncbi gene database by biopython (https://biopython.org/).
另一方面,为其他特殊术语,如力的类型、细胞类型和种类,建立了一个自编语料库,对有关力学类型、研究物种和细胞类型等信息进行提取,然后通过将规范化的文本内容与语料库进行比较来识别名称实体。通过自建语料库,基于自建库的分类和数据检索,用户可在网页选项内更改搜索范围以指定给定的力条件或设置的细胞系,有助于获得更具体化的结果。On the other hand, for other special terms, such as type of force, cell type, and species, a self-compiled corpus was built to extract information about mechanics type, study species and cell type. A comparison is made to identify the name entity. Through self-built corpus, self-built library-based classification and data retrieval, users can change the search range within the web page options to specify a given force condition or set cell line, which helps to obtain more specific results.
自建语料库通过下列方式实现:如图2所示,设计了一个基于预训练的语言模型BERT的深度神经网络,并对网络参数进行了优化改进,主要参数如下:batch size:32;epochs:4;learning rate:5e-5;hidden_size:128。对全英语语料库PMC中13.5百万个词、生物文献语料库PubMed中4.5百万个词进行训练,得到了带生物信息的文本关键词提取模型。The self-built corpus is realized in the following ways: As shown in Figure 2, a deep neural network based on the pre-trained language model BERT is designed, and the network parameters are optimized and improved. The main parameters are as follows: batch size: 32; epochs: 4 ; learning rate: 5e-5; hidden_size: 128. After training 13.5 million words in the full English corpus PMC and 4.5 million words in the biological literature corpus PubMed, a text keyword extraction model with biological information was obtained.
2.力学生物学通路间交互作用2. Interactions between mechanobiological pathways
细胞和组织感知、传递机械信息的方式取决于基因之间的相互作用,而交互的级联网络,就组成了生物信号通路。如图2所示,在路径导航部分,本实施例首先展示了调节这一过程的典型通路以及它们之间的相互作用。The way cells and tissues perceive and transmit mechanical information depends on the interaction between genes, and the cascade network of interactions constitutes biological signaling pathways. As shown in Fig. 2, in the route navigation section, this embodiment first shows the typical pathways regulating this process and their interactions.
如图3所示,在通路导航部分,本实施例中的网站展示了经典力学敏感通路,如Hippo、BMP、TGFβ、Wnt、Notch、PI3K/Akt、MAPK、Ras信号通路等,并探究了其在机械转导中的交互关系,为用户提供力学生物学领域的背景信息。As shown in Figure 3, in the pathway navigation section, the website in this example displays classical mechanosensitive pathways, such as Hippo, BMP, TGFβ, Wnt, Notch, PI3K/Akt, MAPK, Ras signaling pathway, etc. Interactions in mechanotransduction, providing users with background information in the field of mechanobiology.
在这种模式下,本实施例梳理出可信路径及其相互作用,为用户提供一般的背景信息。通过结合hippo、bmp、wnt、gpcr、tgf-beta、igf、整合素和细胞连接相关的可信通路,使基因在机械感觉和机械转导中的导航更有说服力和意义。In this mode, this embodiment sorts out trusted paths and their interactions, and provides general background information for users. Makes the navigation of genes in mechanosensory and mechanotransduction more convincing and meaningful by combining credible pathways associated with hippo, bmp, wnt, gPCR, tgf-beta, igf, integrins and cell junctions.
其次,对单一分子的理解通常较为片面和局限,相比之下,将分子和通路联系起来,更有利于研究人员对其作用机制的理解和进一步探索。因而在一种可能的设计中,将常态路径与文本挖掘结果结合起来,使用户能够通过生物学过程定位其目标基因或基因集。通过将提交的基因与每条通路的注释基因集进行匹配,对基因和机械相关途径之间的相关性进行评分,并基于文本挖掘技术提供可能的连接。Second, the understanding of a single molecule is usually one-sided and limited. In contrast, linking molecules and pathways is more beneficial for researchers to understand and further explore their mechanisms of action. Thus, in one possible design, combining normal pathways with text mining results enables users to locate their target genes or gene sets through biological processes. Correlations between genes and mechanistically related pathways were scored by matching submitted genes to annotated gene sets for each pathway, and possible connections were provided based on text mining techniques.
为了得到一个合理的评分系统,本申请基于文献数据库内分子互作关系对,计算得出与目标检索因子和各个经典力学敏感通路的相关性,可帮助研究者快速定位相关生物学信号传导模式。评分计算方式如下:In order to obtain a reasonable scoring system, this application calculates the correlation with the target retrieval factor and each classical mechanosensitive pathway based on the molecular interaction relationship pairs in the literature database, which can help researchers to quickly locate relevant biological signal transduction patterns. The score is calculated as follows:
上式中,r(g,p)为基因g与经典力学敏感通路p的相关系数,Ni表示经典力学敏感通路p中第i个基因在文献数据库中相关实体总数,Np为经典力学敏感通路p所有基因在文献数据库中相关实体的总数,Ωg、Ωp分别表示基因g和经典力学敏感通路p的集合。In the above formula, r(g,p) is the correlation coefficient between the gene g and the classical mechanosensitive pathway p, Ni represents the total number of related entities in the literature database of the ith gene in the classical mechanosensitive pathway p , and Np is the classical mechanosensitive pathway The total number of related entities of all genes of pathway p in the literature database, Ω g and Ω p represent the set of gene g and classical mechanosensitive pathway p, respectively.
使用权重算法,可凸显通路明星分子重要性,符合文本数据挖掘逻辑,当目标检索分子与通路明星分子共现时,可认为目标检索分子与该通路关联可能性更大。Using the weight algorithm can highlight the importance of pathway star molecules, which conforms to the logic of text data mining. When the target search molecule and pathway star molecules co-occur, it can be considered that the target search molecule is more likely to be associated with the pathway.
3.可视化网站构架3. Visual website architecture
为了支持跨平台的可视化,该网站的Web架构基于Django框架,后端数据库使用MySQL实现,语义UI用于前端架构。To support cross-platform visualization, the web architecture of the website is based on the Django framework, the back-end database is implemented using MySQL, and the semantic UI is used for the front-end architecture.
作为NLP Web工具,本发明网站结合了演示和预测策略,提出了一种有效且可信的方法来梳理在骨骼中进行机械感觉和机械传导的分子之间的连接和串扰。As an NLP web tool, the present invention website combines demonstration and prediction strategies to propose an efficient and plausible method to tease out connections and crosstalk between molecules that perform mechanosensory and mechanotransduction in bone.
本发明网站使用图形网络将所有力学通路中的分子显示为互相连接的节点,通过单击节点之间的连线可以链接到相应的原始文献,此功能通过网页前端和服务器数据库交互技术实现,为现有技术,此处不再赘述。The website of the present invention uses a graph network to display the molecules in all mechanical pathways as interconnected nodes, and the corresponding original documents can be linked to by clicking the connection between the nodes. The prior art is not repeated here.
通过上述自建语料库可以对从文献数据库中检索到的实体进行子分类,使得用户可以选择关注特定力的类型或特定的细胞系,从而有助于更精确和有针对性的基于文献的发现。同时,本发明网站创造性地采用了通路拟合方法,基于权重算法,系统可以根据NLP结果显示目标检索分子与经典力学路径的相关性得分,将用户的靶向分子与经典途径的组成部分联系起来,使之更适合生物医学研究。Entities retrieved from literature databases can be sub-classified through the self-built corpus described above, allowing users to choose to focus on specific force types or specific cell lines, thereby facilitating more precise and targeted literature-based discovery. At the same time, the website of the present invention creatively adopts the pathway fitting method. Based on the weighting algorithm, the system can display the correlation score between the target search molecule and the classical mechanical pathway according to the NLP result, and link the user's target molecule with the components of the classical pathway. , making it more suitable for biomedical research.
4.相关性识别和可视化4. Correlation identification and visualization
根据用户定义的范围,本发明网站可以自动检索与目标检索分子相关的实体以及与通路之间的相关性,并将其可视化。交互式操作适用于图形插图,可以实现用户自定义的可取布局以及每个实体的详细信息。点击实体之间的边缘后,弹出窗口可以显示确认信息以及相应句子以红色高亮显示的资源文章。原始文本的收集使用户能够决定人工智能发现的连接的重要性和可靠性,这可能是有效的和准确的。分层搜索使第二层和第三层关系提取能够放大网络,有利于新分子的开发。According to the user-defined scope, the website of the present invention can automatically retrieve and visualize the entities related to the target searched molecule and the correlations with the pathways. Interactive operations are suitable for graphic illustrations, enabling user-defined desirable layouts and details for each entity. After clicking on the edge between entities, a pop-up window can display confirmation and the resource article with the corresponding sentence highlighted in red. The collection of raw text enables users to decide the significance and reliability of the connections discovered by the AI, which may be valid and accurate. Hierarchical search enables second- and third-layer relation extraction to amplify the network, favoring the development of new molecules.
管理通路图和骨定位机制生物学在很大程度上取决于连续的反应和相互作用的几个途径,如上所述。考虑到这一点,本实施例确定了涉及机械敏感性和机械转导的经典途径与可信的证据。概述路径及其与站立证明的交互可视化通过图表和svg.js,一个用于操作和动画svg文件的轻量级库。每个路径的元素都在KEGG(Kyoto Encyclopedia of Genesand Genomes)上搜索,然后与本实施例的数据集进行比较,这些数据集为每个路径包含的项目列表做出了贡献,通过相关系数可对目标基因和途径之间的相关性进行排序,并进行可视化显示。Governing pathway maps and mechanism biology of bone localization depend heavily on successive responses and interactions of several pathways, as described above. With this in mind, this example identifies a classical pathway involving mechanosensitivity and mechanotransduction with plausible evidence. Overview paths and their interaction with standing proofs are visualized via diagrams and svg.js, a lightweight library for manipulating and animating svg files. The elements of each path are searched on KEGG (Kyoto Encyclopedia of Genes and Genomes) and then compared with the datasets of this example, which contribute to the list of items contained in each path, which can be determined by the correlation coefficient. Correlations between target genes and pathways are ranked and visualized.
除了评分外,生物力学调控骨改建文本挖掘交互网站还提供了一个交互选择,将目标通路的所有/选择性成分加入到nlp网络中,形成一个分子到通路网络,从而发现更多的间接连接。In addition to scoring, the Biomechanical Regulation of Bone Remodeling Text Mining Interactive website provides an interactive option to add all/selective components of the target pathway to the NLP network to form a molecule-to-pathway network to uncover more indirect connections.
下面将详细阐述结果界面内容:The content of the result interface will be described in detail below:
结合图4-图7,结果界面左侧集中展示了力学通路关联搜索结果,具体如下:如图5所示,板块1展示了目标检索分子与各经典力学通路的关联度;板块2处,用户可选择感兴趣通路,在网络中加入通路分子合并搜索;板块3处,用户可以快速了解目标检索分子在KEGG数据库中的通路信息,以便更全面地了解该分子的作用途径提。如图6所示,结果页中部主要将分子互作信息可视化,同时提供多种“String”按钮选项。String是一个包含基于研究证据和算法预测的蛋白相互作用信息的数据库,将String的结果与原始NLP网络集成,可为新兴分子的研究提供思路。通过单击本发明网站提供的分层搜索功能,用户可以放大关系网络至第2、3层,扩大网络搜索范围,有利于通路中新分子的发现。网站中相应的图标提供了更改展示模式、图片下载,以及图片重置的功能。可更改分子关联图的展示模式,下载保存当前关联图及恢复上一版关联图展示。对于相应来源文献的检索,用户可通过鼠标点击节点之间的连接,如图7所示,弹窗将显示其相关性以及相对应的文献,相应语句也以红色突出显示。Combined with Figures 4 to 7, the left side of the result interface displays the search results of mechanical pathway associations, as follows: As shown in Figure 5,
本实施例选择用共现来定义相关性,而不是采用机器学习的方法来识别语法数据进行关系抽取为了保证预测结果的可信度,本实施例选择让用户告诉嵌入在语料库中的关系,而不是机器。基于每个规范化句子的关系检索和可视化,实体通过共现关联,然后标记出相应的句子。同现得分记录相应具有同现标签的物品的数量。这些句子和相应的实体被存储在一个关系数据库中,并由sqlite(https://www.sqlite.org/index.html)实现。字符串(https://string-db.org/)用于对目标的全面搜索,本实施例提供了二级和/或字符串搜索选项,可以有助于更多的结果。This embodiment chooses to use co-occurrence to define correlation, instead of using machine learning method to identify grammatical data for relation extraction. Not a machine. Based on relation retrieval and visualization of each normalized sentence, entities are associated by co-occurrence, and then corresponding sentences are labeled. The co-occurrence score records the number of items corresponding to the co-occurrence tag. These sentences and corresponding entities are stored in a relational database and implemented by sqlite (https://www.sqlite.org/index.html). String (https://string-db.org/) is used for a comprehensive search of the target, and this embodiment provides secondary and/or string search options that can contribute to more results.
本发明网站的使用方法:用户在本发明网站的检索窗口界面中输入目标检索因子,在检索窗口界面展示了部分经典力学敏感通路,为用户提供力学生物学领域的背景信息,可帮助用户快速确定经典力学敏感通路。当输入完成后,界面会转换至检索结果界面,检索结果界面分为7个板块进行展示。界面左边为板块1-3,界面中部为板块4-6,界面右边有1个板块7。The method of using the website of the present invention: the user enters the target retrieval factor in the retrieval window interface of the website of the present invention, and some classical mechanics-sensitive pathways are displayed on the retrieval window interface, providing the user with background information in the field of mechanobiology, which can help the user to quickly determine Classical mechanosensitive pathways. When the input is completed, the interface will switch to the search results interface, which is divided into 7 sections for display. The left side of the interface is plate 1-3, the middle of the interface is plate 4-6, and there is a
板块1采用柱状图展示了目标检索分子与各经典力学敏感通路的相关性;板块2展示了用户可选择的感兴趣通路,可在网络中加入通路分子合并检索;在板块3,用户可以快速了解目标检索分子在KEGG数据库中的通路信息,以便更全面地了解目标检索分子的作用途径。
通过单击板块5提供的分层搜索功能,用户可以放大关系网络至第2、3层,扩大网络搜索范围。板块6提供了更改展示模式、图片下载、图片重置的功能图标,点击图标可更改分子关联图的展示模式、下载保存当前关联图、恢复上一版关联图展示。By clicking the layered search function provided by
对于相应来源文献的检索,用户可通过鼠标点击板块4中节点之间的连接,板块7内的弹窗将显示其相关性以及相对应的文献,相应语句也以红色突出显示。For the retrieval of the corresponding source documents, the user can click the connection between the nodes in the
综上,本发明网站基于开放文献资源,利用自然语言处理(NLP)策略,挖掘文本数据库,构建一个网页交互工具。创建了首个骨相关生物力学文本数据库,创新性地引入可视化模式,将复杂晦涩地文本信息图像化;采用自创权重算法,计算目标检索因子与经典力学敏感通路之间的相关性,建立一种以生物学通路为基础的全新分析策略;同时,引入网页交互工具,将生物学文献探索过程可视化、简易化,可极大促进骨相关力学生物学的分子机制研究,推动更有效的数据处理和知识共享方式。To sum up, the website of the present invention is based on the open document resources, uses the natural language processing (NLP) strategy, mines the text database, and constructs a web page interaction tool. Created the first bone-related biomechanical text database, innovatively introduced visualization mode to visualize complex and obscure text information; used self-created weighting algorithm to calculate the correlation between target retrieval factors and classical mechanics-sensitive pathways, and established a A new analysis strategy based on biological pathways; at the same time, the introduction of web interactive tools to visualize and simplify the process of biological literature exploration can greatly promote the research on the molecular mechanism of bone-related mechanobiology and promote more effective data processing. and knowledge sharing.
本申请不再依赖于未经过滤的资源,而是在骨骼机械生物学过程中指定目标,并根据自组织文库检索分类信息。通过这种方式,用户可以选择在所有与机械相关的文章中进行探索,甚至可以指定他们的目标来强制类型、细胞系或物种,有利于知识共享的讨论式生物医学平台为研究人员提供了前所未有的范围和棘手的大量信息。同时,本申请采用以路径为中心的策略,运用加权评分和组合算法,使机械生物学过程中单个基因或集合的导航和探索成为可能。The present application no longer relies on unfiltered sources, but instead specifies targets in bone mechanobiology processes and retrieves taxonomic information from self-organizing libraries. In this way, users can choose to explore across all mechanistically related articles, and can even specify their target to enforce type, cell line or species, a discussion-based biomedical platform that facilitates knowledge sharing and provides researchers with unprecedented access to The range and tricky mass of information. At the same time, the present application adopts a path-centric strategy, using weighted scoring and combinatorial algorithms, to enable the navigation and exploration of individual genes or collections in mechanobiological processes.
以上的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present application in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Within the spirit and principle of the present invention, any modifications, equivalent replacements, improvements, etc. made should be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210606098.XA CN114927168B (en) | 2022-05-31 | 2022-05-31 | Construction method of biomechanical regulation and control bone reconstruction text mining interaction website |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210606098.XA CN114927168B (en) | 2022-05-31 | 2022-05-31 | Construction method of biomechanical regulation and control bone reconstruction text mining interaction website |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114927168A true CN114927168A (en) | 2022-08-19 |
| CN114927168B CN114927168B (en) | 2023-08-29 |
Family
ID=82813152
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210606098.XA Active CN114927168B (en) | 2022-05-31 | 2022-05-31 | Construction method of biomechanical regulation and control bone reconstruction text mining interaction website |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114927168B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116130010A (en) * | 2023-02-23 | 2023-05-16 | 武汉大学人民医院(湖北省人民医院) | Gene interaction network construction method, device and equipment based on natural language processing |
| TWI897104B (en) * | 2022-11-22 | 2025-09-11 | 大陸商中國銀聯股份有限公司 | Sensitive data identification method, device, equipment and computer storage medium |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7428554B1 (en) * | 2000-05-23 | 2008-09-23 | Ocimum Biosolutions, Inc. | System and method for determining matching patterns within gene expression data |
| CN104978347A (en) * | 2014-04-11 | 2015-10-14 | 中国中医科学院中医临床基础医学研究所 | Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database |
| CN107346372A (en) * | 2017-06-19 | 2017-11-14 | 苏州班凯基因科技有限公司 | A kind of database and its construction method understood applied to gene mutation |
| US20180239863A1 (en) * | 2017-02-17 | 2018-08-23 | The Regents Of The University Of California | Metabolite, annotation, and gene integration system and method |
| CN109545284A (en) * | 2018-10-16 | 2019-03-29 | 中国人民解放军军事科学院军事医学研究院 | Drug integrated information database building method and system based on drug and target information |
| CN110286233A (en) * | 2019-06-27 | 2019-09-27 | 山西大学 | A biomarker metabolic pathway and its analysis method and application |
| CN111797296A (en) * | 2020-07-08 | 2020-10-20 | 中国人民解放军军事科学院军事医学研究院 | Method and system for knowledge mining of poison-target literature based on web crawling |
| CN112029710A (en) * | 2020-08-31 | 2020-12-04 | 上海交通大学医学院附属第九人民医院 | Screening method of direct mechanical response cell subset and application thereof |
| CN112289372A (en) * | 2020-12-15 | 2021-01-29 | 武汉华美生物工程有限公司 | Protein structure design method and device based on deep learning |
| CN114168708A (en) * | 2021-11-15 | 2022-03-11 | 哈尔滨工业大学 | Personalized biological channel retrieval method based on multi-domain characteristics |
-
2022
- 2022-05-31 CN CN202210606098.XA patent/CN114927168B/en active Active
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7428554B1 (en) * | 2000-05-23 | 2008-09-23 | Ocimum Biosolutions, Inc. | System and method for determining matching patterns within gene expression data |
| CN104978347A (en) * | 2014-04-11 | 2015-10-14 | 中国中医科学院中医临床基础医学研究所 | Data mining method and data mining system for sensitive keywords in Chinese biomedical literature database |
| US20180239863A1 (en) * | 2017-02-17 | 2018-08-23 | The Regents Of The University Of California | Metabolite, annotation, and gene integration system and method |
| CN107346372A (en) * | 2017-06-19 | 2017-11-14 | 苏州班凯基因科技有限公司 | A kind of database and its construction method understood applied to gene mutation |
| CN109545284A (en) * | 2018-10-16 | 2019-03-29 | 中国人民解放军军事科学院军事医学研究院 | Drug integrated information database building method and system based on drug and target information |
| CN110286233A (en) * | 2019-06-27 | 2019-09-27 | 山西大学 | A biomarker metabolic pathway and its analysis method and application |
| CN111797296A (en) * | 2020-07-08 | 2020-10-20 | 中国人民解放军军事科学院军事医学研究院 | Method and system for knowledge mining of poison-target literature based on web crawling |
| CN112029710A (en) * | 2020-08-31 | 2020-12-04 | 上海交通大学医学院附属第九人民医院 | Screening method of direct mechanical response cell subset and application thereof |
| CN112289372A (en) * | 2020-12-15 | 2021-01-29 | 武汉华美生物工程有限公司 | Protein structure design method and device based on deep learning |
| CN114168708A (en) * | 2021-11-15 | 2022-03-11 | 哈尔滨工业大学 | Personalized biological channel retrieval method based on multi-domain characteristics |
Non-Patent Citations (4)
| Title |
|---|
| LEIHONG WU等: "a knowledgebase providing network-based research platform on coronary heart disease", pages 1 - 7 * |
| WILCO W.M. FLEUREN等: "Application of text mining in the biomedical domain", pages 97 - 106 * |
| 詹心可: "基于深度神经网络及集成学习的蛋白质相互作用预测研究", pages 006 - 76 * |
| 鲍振申: "疾病相关信号通路富集分析方法研究及其应用", pages 006 - 136 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI897104B (en) * | 2022-11-22 | 2025-09-11 | 大陸商中國銀聯股份有限公司 | Sensitive data identification method, device, equipment and computer storage medium |
| CN116130010A (en) * | 2023-02-23 | 2023-05-16 | 武汉大学人民医院(湖北省人民医院) | Gene interaction network construction method, device and equipment based on natural language processing |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114927168B (en) | 2023-08-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Fleuren et al. | Application of text mining in the biomedical domain | |
| Rebholz-Schuhmann et al. | Text-mining solutions for biomedical research: enabling integrative biology | |
| US8494987B2 (en) | Semantic relationship extraction, text categorization and hypothesis generation | |
| Ramasamy et al. | Disease prediction in data mining using association rule mining and keyword based clustering algorithms | |
| Kim et al. | PubChem: a large‐scale public chemical database for drug discovery | |
| Felizardo et al. | A systematic mapping on the use of visual data mining to support the conduct of systematic literature reviews | |
| US10198478B2 (en) | Methods and systems for technology analysis and mapping | |
| CN115114445B (en) | Cell knowledge graph construction method, device, computing device and storage medium | |
| CN114927168B (en) | Construction method of biomechanical regulation and control bone reconstruction text mining interaction website | |
| Gürcan | Major research topics in big data: A literature analysis from 2013 to 2017 using probabilistic topic models | |
| WO2025237327A1 (en) | Gut microbe knowledge graph system | |
| CN116775897A (en) | Knowledge graph construction and query method and device, electronic equipment and storage medium | |
| JP2005122231A (en) | Screen display system and screen display method | |
| Roche et al. | A holistic AI-based approach for pharmacovigilance optimization from patients behavior on social media | |
| Marchesin et al. | Building a large gene expression-cancer knowledge base with limited human annotations | |
| Asaad et al. | AsthmaKGxE: An asthma–environment interaction knowledge graph leveraging public databases and scientific literature | |
| CN113946647A (en) | DDIs (distributed denial of service) search engine based on medical entity vector and construction method thereof | |
| Gendrin-Brokmann et al. | Investigating deep-learning NLP for automating the extraction of oncology efficacy endpoints from scientific literature | |
| Park et al. | GPDminer: a tool for extracting named entities and analyzing relations in biological literature | |
| JP2008515029A (en) | Display method of molecular function network | |
| Morine et al. | A comprehensive and holistic health database | |
| Atkinson et al. | Discovering novel causal patterns from biomedical natural-language texts using bayesian nets | |
| Archana et al. | Information Extraction and Knowledge Discovery in Biomedical Engineering and Health Informatics | |
| CN117150029B (en) | Knowledge graph construction, retrieval method and device | |
| Nazaruddin et al. | Identifying Key Mental Health Topic on Youtube Comments using Non-negative Matrix Factorization |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |