CN114927168A

CN114927168A - Method for constructing biomechanically regulated bone reconstruction text mining interactive website

Info

Publication number: CN114927168A
Application number: CN202210606098.XA
Authority: CN
Inventors: 经典; 蔡靖仪; 赵志河
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-08-19
Anticipated expiration: 2042-05-31
Also published as: CN114927168B

Abstract

The invention relates to a construction method of a biomechanically regulated bone reconstruction text mining interactive website, which comprises the following steps: s1, screening gene information text words in the literature according to the related entries, obtaining gene molecule interaction relation pairs, and constructing a literature database; s2, calculating the correlation between the target retrieval factor and the classical mechanics sensitive path by adopting a weight algorithm based on the gene molecule interaction relation pair in the literature database; and S3, performing visual display on the correlation between the target retrieval factor and the classical mechanical sensitive path, displaying gene molecules in the classical mechanical sensitive path as mutually connected nodes, and linking to corresponding documents in a document database by clicking a connecting line between the nodes. The website provided by the invention is based on open literature resources, utilizes a natural language processing strategy to mine a text database, creates a first bone-related biomechanics text database, innovatively introduces a visualization mode, and establishes a brand-new analysis strategy based on a biological pathway.

Description

A method of constructing a text mining interactive website for biomechanical regulation of bone remodeling

技术领域technical field

本发明涉及生物力学网站构建技术领域，尤其涉及一种生物力学调控骨改建文本挖掘交互网站构建方法。The invention relates to the technical field of biomechanical website construction, in particular to a method for constructing a text mining interactive website for biomechanical regulation and bone remodeling.

背景技术Background technique

骨组织先天性发育不足、发育异常、骨组织缺损或缺失是较为常见的临床问题，对患者颜面外观、心理健康、生活质量影响极大。对此，机械刺激、应力牵张等基于生物力学原理的治疗手段是目前较为安全可靠、高效经济的应对措施。因此，明确力学刺激下骨改建的生物分子机制，是进一步发展精准治疗、高效治疗的首要前提。目前，生物力学调控骨改建研究领域已具有海量研究数据，但信息分散，难以整合，因此构建高效获取重要信息的知识网络技术平台，将为快速推动该领域研究发展提供重要手段。Congenital hypoplasia, dysplasia, and defect or absence of bone tissue are relatively common clinical problems, which have a great impact on patients' facial appearance, mental health, and quality of life. In this regard, mechanical stimulation, stress-stretching and other treatment methods based on biomechanical principles are currently relatively safe, reliable, efficient and economical countermeasures. Therefore, clarifying the biomolecular mechanism of bone remodeling under mechanical stimulation is the primary prerequisite for the further development of precise and efficient treatment. At present, there is a large amount of research data in the field of biomechanical regulation of bone remodeling, but the information is scattered and difficult to integrate. Therefore, building a knowledge network technology platform to efficiently obtain important information will provide an important means to rapidly promote the research and development of this field.

阐明骨相关细胞对生物力学刺激的响应过程是骨生理、病理研究的基本前提。开放共享的知识平台极大的促进了近代科学的发展，但不断增长的出版物数量和海量信息使得研究者通过手动整理文献进行文献梳理及挖掘愈发困难。在大数据时代，采用机器语言处理模式，调用自然语言处理工具(NLP)来对生物医药相关文献进行整合梳理，是一种高效、可靠、具有极大潜力的应用模式。Elucidating the response process of bone-related cells to biomechanical stimuli is the basic premise of bone physiology and pathology research. The open and shared knowledge platform has greatly promoted the development of modern science, but the ever-increasing number of publications and massive amounts of information make it more and more difficult for researchers to sort out and mine documents manually. In the era of big data, using machine language processing mode and calling natural language processing tools (NLP) to integrate and sort out biomedical related literature is an efficient, reliable and potential application mode.

目前，Tagger、iTextMine、Geneshot等计算机语言工具可被用于区分生物医学文本中的专业术语及特定表达方式，为针对生物医学文本的计算机语言处理策略提供了可能。近年来，LION LBD、GLAD4U等，都利用NLP工具，进行生物文本挖掘，对数据进行整合梳理，提供研究相关信息。At present, computer language tools such as Tagger, iTextMine, and Geneshot can be used to distinguish professional terms and specific expressions in biomedical texts, which provides the possibility for computer language processing strategies for biomedical texts. In recent years, LION LBD, GLAD4U, etc. have used NLP tools to conduct biological text mining, integrate and sort out data, and provide research-related information.

然而，在骨相关生物力学研究领域，上述的文本研究工具却难以发挥有效作用，主要体现在以下几方面：However, in the field of bone-related biomechanics research, the above text research tools are difficult to play an effective role, mainly reflected in the following aspects:

1、编程能力限制：现存多数文本处理工具面向为具有一定编程能力的用户，如Tagger、iTextMine、Geneshot等，需要用户掌握一定的自然语言处理知识，而对于多数生物医学科研工作者而言，操作难以实现。1. Programming ability limitation: Most of the existing text processing tools are aimed at users with certain programming ability, such as Tagger, iTextMine, Geneshot, etc., which require users to master a certain knowledge of natural language processing. For most biomedical researchers, operation hard to accomplish.

2、背景数据库冗余：生物过程是是精确且有条件限制的，虽然现有的NLP工具能够提取并结构化存储的大量数据信息，但大多使用未经过滤的背景数据库，会造成不相关信息的纳入，造成结果的假阳性。对于特定的生物学领域，特别是生物力学这类相对小众的研究领域，难以在泛医学研究背景库内得到较好的搜索结果。因此，研究者需要一种有针对性的、更适合骨相关生物力学研究的NLP工具。2. Background database redundancy: Biological processes are precise and conditional. Although existing NLP tools can extract and store a large amount of data information in a structured way, most of them use unfiltered background databases, which will cause irrelevant information. included, resulting in false positive results. For a specific biological field, especially a relatively small research field such as biomechanics, it is difficult to obtain good search results in the pan-medical research background library. Therefore, researchers need a targeted NLP tool that is more suitable for bone-related biomechanical research.

3、缺乏可视化展示：对于复杂交互的网络结构而言，纯文本信息相较于图形化的展示方式，难以提供清晰、有逻辑性的框架结构，因此，本实施例需要一种可视化模式，对分子间的连接和交互关系进行梳理，以便于研究者能够快速了解通路信息并定位所需的目标。3. Lack of visual display: For the network structure of complex interaction, it is difficult to provide a clear and logical frame structure for plain text information compared with the graphical display method. Intermolecular connections and interactions are sorted out so that researchers can quickly understand pathway information and locate desired targets.

发明内容SUMMARY OF THE INVENTION

本申请为了解决上述技术问题提供一种生物力学调控骨改建文本挖掘交互网站构建方法。In order to solve the above technical problems, the present application provides a method for constructing a text mining interactive website for biomechanical regulation and bone reconstruction.

本申请通过下述技术方案实现：This application is achieved through the following technical solutions:

一种生物力学调控骨改建文本挖掘交互网站构建方法，所述方法包括：A method for constructing a text mining interactive website for biomechanical regulation of bone remodeling, the method comprising:

S1，根据相关词条筛选文献中基因信息文本词，获取基因分子互作关系对，构建文献数据库；S1, screen gene information text words in the literature according to related entries, obtain gene-molecule interaction relationship pairs, and construct a literature database;

S2，基于文献数据库中的基因分子互作关系对，采用权重算法计算目标检索因子与经典力学敏感通路的相关性；S2, based on the gene-molecular interaction pairs in the literature database, the weighting algorithm is used to calculate the correlation between the target retrieval factor and the classical mechanics-sensitive pathway;

S3，将目标检索因子与经典力学敏感通路之间的相关性进行可视化展示，并将经典力学敏感通路中的基因分子显示为互相连接的节点，通过单击节点之间的连线可以链接到文献数据库中相应的文献。S3. Visually display the correlation between the target retrieval factor and the classical mechanosensitive pathway, and display the gene molecules in the classical mechanosensitive pathway as interconnected nodes. By clicking the connection between the nodes, you can link to the literature corresponding literature in the database.

进一步的，所述步骤S1与步骤S2之间，还包括对PMC数据库进行深度神经网络训练，筛选带生物信息的文本关键词，构建语料库。Further, between the step S1 and the step S2, it also includes performing deep neural network training on the PMC database, screening text keywords with biological information, and constructing a corpus.

进一步的，所述生物信息包括力学类型、研究物种、细胞类型。Further, the biological information includes mechanical type, research species, and cell type.

优选地，所述步骤S1中相关词条包括生物力学、骨相关词条。Preferably, the related entries in the step S1 include biomechanics and bone related entries.

进一步的，所述步骤S1包括对基因信息文本词进行计算机语言归一化和预处理。Further, the step S1 includes computer language normalization and preprocessing on the gene information text words.

进一步的，所述步骤S1还包括采用PubTator识别基因信息文本词，并通过调用NCBI基因数据库的API将基因信息文本词转换为正式名称。Further, the step S1 also includes using PubTator to identify the gene information text words, and converting the gene information text words into official names by calling the API of the NCBI gene database.

进一步的，所述步骤S2中权重算法的公式为：Further, the formula of the weight algorithm in the step S2 is:

式中，r(g,p)为基因g与经典力学敏感通路p的相关系数，N_i表示经典力学敏感通路p中第i个基因在文献数据库中相关实体总数，N_p为经典力学敏感通路p所有基因在文献数据库中相关实体的总数，Ω_g、Ω_p分别表示基因g和经典力学敏感通路p的集合。In the formula, r(g,p) is the correlation coefficient between gene g and the classical mechanosensitive pathway p, Ni represents the total number of related entities in the literature database of the _ith gene in the classical mechanosensitive pathway _p , and Np is the classical mechanosensitive pathway The total number of related entities of all genes of p in the literature database, Ω _g and Ω _p represent the set of gene g and classical mechanosensitive pathway p, respectively.

优选地，所述经典力学敏感通路包括Hippo、BMP、TGFβ、Wnt、Notch、PI3K/Akt、MAPK、Ras中的至少一种。Preferably, the classical mechanosensitive pathway includes at least one of Hippo, BMP, TGFβ, Wnt, Notch, PI3K/Akt, MAPK, and Ras.

进一步的，所述步骤S3中，还包括可视化展示目标检索因子在KEGG数据库中的通路信息。Further, in the step S3, it also includes visually displaying the path information of the target retrieval factor in the KEGG database.

进一步的，所述步骤S3中，还包括可视化展示目标检索因子在String数据中的基因分子互作关系对。Further, in the step S3, it also includes visually displaying the gene-molecule interaction relationship pairs of the target retrieval factor in the String data.

与现有技术相比，本申请具有以下有益效果：Compared with the prior art, the present application has the following beneficial effects:

1、使用网页工具提供开放搜索端口，便于用户自定义搜索范围，无需用户掌握复杂的计算机编程能力。1. Use web tools to provide an open search port, which is convenient for users to customize the search range, and does not require users to master complex computer programming skills.

2、通过设定严格的文献数据库纳入标准，明确了骨相关生物力学信息。对于复杂的骨相关生物力学调控网络而言，可在很大程度上过滤假阳性信息，使结果更为可信、有效。2. Bone-related biomechanical information was clarified by setting strict inclusion criteria for literature databases. For complex bone-related biomechanical regulatory networks, false positive information can be filtered to a large extent, making the results more credible and effective.

3、采用可视化网络图的模式，保证用户的交互操作，使计算机文献挖掘为研究者所用，以一种更用户友好模式，促进信息传播及理解。3. Adopt the mode of visual network diagram to ensure the user's interactive operation, make computer literature mining available to researchers, and promote information dissemination and understanding in a more user-friendly mode.

4、将经典力学敏感通路与文本挖掘结果相结合，使用户能够通过生物学通路来定位目标基因或基因集。基于文献数据库和权重算法，计算得出目标检索因子与各个经典力学敏感通路之间关联度，同时提供该通路及基因交互搜索，使得基因导航更具说服力和意义。4. Combining classical mechanosensitive pathways with text mining results enables users to locate target genes or gene sets through biological pathways. Based on the literature database and weighting algorithm, the correlation between the target retrieval factor and each classical mechanosensitive pathway is calculated, and the pathway and gene interactive search are provided at the same time, which makes gene navigation more convincing and meaningful.

附图说明Description of drawings

此处所说明的附图用来提供对本申请实施方式的进一步理解，构成本申请的一部分，并不构成对本发明实施方式的限定。The accompanying drawings described herein are used to provide a further understanding of the embodiments of the present application, and constitute a part of the present application, and do not constitute a limitation on the embodiments of the present invention.

图1是本发明的流程框图；Fig. 1 is the flow chart of the present invention;

图2是语料库的深度神经网络训练示意图；Fig. 2 is the deep neural network training schematic diagram of corpus;

图3是本发明的检索窗口界面图；Fig. 3 is the retrieval window interface diagram of the present invention;

图4是本发明的检索结果界面图；Fig. 4 is the retrieval result interface diagram of the present invention;

图5是图4中板块1-3的示意图；Fig. 5 is the schematic diagram of plate 1-3 in Fig. 4;

图6是图4中板块4-6的示意图；Fig. 6 is the schematic diagram of plate 4-6 in Fig. 4;

图7是图4中板块7的示意图。FIG. 7 is a schematic diagram of the plate 7 in FIG. 4 .

具体实施方式Detailed ways

为使本申请的目的、技术方案和有益效果更加清楚，下面将结合实施方式中的附图，对本发明实施方式中的技术方案进行清楚、完整地描述。显然，所描述的实施方式是本发明一部分实施方式，而不是全部的实施方式。通常在此处附图中描述和示出的本发明实施方式的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and beneficial effects of the present application clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments. Obviously, the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.

1.数据库构建和文本标记1. Database Construction and Text Markup

本发明网站根据“生物力学”、“骨”相关研究为中心，以关键词集作为文献资源的收录标准。筛选纳入了从2010年1月1日至2020年12月31日之间发表的34937篇文章，在对文本词进行计算机语言归一化和预处理后，每篇文章中的基因信息首先经PubTator识别，然后通过调用NCBI基因数据库的API转换为正式名称。The website of the present invention is centered on the related research of "biomechanics" and "bone", and takes the keyword set as the collection standard of literature resources. The screening included 34,937 articles published between January 1, 2010 and December 31, 2020. After computer language normalization and preprocessing of text words, the genetic information in each article was first processed by PubTator The identification is then converted to the official name by calling the API of the NCBI gene database.

Ncbi为ncbi entrez系统提供e-utilities api，并允许访问所有entrez数据库，包括pubmed、pmc、gene和protein，这有利于批处理和大量文本字检索(https://www.ncbi.nlm.nih.gov/home/develop/api/)。文本数据由每篇文章的标题和摘要组成，首先通过文本处理库自然语言工具包(nltk，http://www.nltk.org/)进行标记、解析和规范化，从而避免模糊的描述，并确保后续处理的可识别性。然后执行名称实体识别(ner)来提取所需的每篇论文的详细信息。一方面，pubtator(https://www.ncbi.nlm.nih.gov/research/pubtator/)作为一种成熟的生物医学术语识别工具，在识别模糊和复杂的生物医学术语名称方面取得了很好的效果，被用于对文本数据库中出现的基因和蛋白质进行标注。基因id随后通过biopython(https://biopython.org/)转化为基于访问ncbi基因数据库的标准名称。Ncbi provides the e-utilities api for the ncbi entrez system and allows access to all entrez databases, including pubmed, pmc, gene and protein, which facilitates batch processing and bulk text word retrieval (https://www.ncbi.nlm.nih. gov/home/develop/api/). Text data, consisting of the title and abstract of each article, are first tagged, parsed, and normalized by the text processing library Natural Language Toolkit (nltk, http://www.nltk.org/) to avoid ambiguous descriptions and ensure Identifiability for subsequent processing. Name entity recognition (ner) is then performed to extract the required details of each paper. On the one hand, pubtator (https://www.ncbi.nlm.nih.gov/research/pubtator/), as a mature biomedical term recognition tool, has achieved good results in identifying vague and complex biomedical term names The effect of , was used to annotate the genes and proteins present in the text database. Gene IDs were then converted to standard names based on access to the ncbi gene database by biopython (https://biopython.org/).

另一方面，为其他特殊术语，如力的类型、细胞类型和种类，建立了一个自编语料库，对有关力学类型、研究物种和细胞类型等信息进行提取，然后通过将规范化的文本内容与语料库进行比较来识别名称实体。通过自建语料库，基于自建库的分类和数据检索，用户可在网页选项内更改搜索范围以指定给定的力条件或设置的细胞系，有助于获得更具体化的结果。On the other hand, for other special terms, such as type of force, cell type, and species, a self-compiled corpus was built to extract information about mechanics type, study species and cell type. A comparison is made to identify the name entity. Through self-built corpus, self-built library-based classification and data retrieval, users can change the search range within the web page options to specify a given force condition or set cell line, which helps to obtain more specific results.

自建语料库通过下列方式实现：如图2所示，设计了一个基于预训练的语言模型BERT的深度神经网络，并对网络参数进行了优化改进，主要参数如下：batch size:32；epochs:4；learning rate:5e-5；hidden_size:128。对全英语语料库PMC中13.5百万个词、生物文献语料库PubMed中4.5百万个词进行训练，得到了带生物信息的文本关键词提取模型。The self-built corpus is realized in the following ways: As shown in Figure 2, a deep neural network based on the pre-trained language model BERT is designed, and the network parameters are optimized and improved. The main parameters are as follows: batch size: 32; epochs: 4 ; learning rate: 5e-5; hidden_size: 128. After training 13.5 million words in the full English corpus PMC and 4.5 million words in the biological literature corpus PubMed, a text keyword extraction model with biological information was obtained.

2.力学生物学通路间交互作用2. Interactions between mechanobiological pathways

细胞和组织感知、传递机械信息的方式取决于基因之间的相互作用，而交互的级联网络，就组成了生物信号通路。如图2所示，在路径导航部分，本实施例首先展示了调节这一过程的典型通路以及它们之间的相互作用。The way cells and tissues perceive and transmit mechanical information depends on the interaction between genes, and the cascade network of interactions constitutes biological signaling pathways. As shown in Fig. 2, in the route navigation section, this embodiment first shows the typical pathways regulating this process and their interactions.

如图3所示，在通路导航部分，本实施例中的网站展示了经典力学敏感通路，如Hippo、BMP、TGFβ、Wnt、Notch、PI3K/Akt、MAPK、Ras信号通路等，并探究了其在机械转导中的交互关系，为用户提供力学生物学领域的背景信息。As shown in Figure 3, in the pathway navigation section, the website in this example displays classical mechanosensitive pathways, such as Hippo, BMP, TGFβ, Wnt, Notch, PI3K/Akt, MAPK, Ras signaling pathway, etc. Interactions in mechanotransduction, providing users with background information in the field of mechanobiology.

在这种模式下，本实施例梳理出可信路径及其相互作用，为用户提供一般的背景信息。通过结合hippo、bmp、wnt、gpcr、tgf-beta、igf、整合素和细胞连接相关的可信通路，使基因在机械感觉和机械转导中的导航更有说服力和意义。In this mode, this embodiment sorts out trusted paths and their interactions, and provides general background information for users. Makes the navigation of genes in mechanosensory and mechanotransduction more convincing and meaningful by combining credible pathways associated with hippo, bmp, wnt, gPCR, tgf-beta, igf, integrins and cell junctions.

其次，对单一分子的理解通常较为片面和局限，相比之下，将分子和通路联系起来，更有利于研究人员对其作用机制的理解和进一步探索。因而在一种可能的设计中，将常态路径与文本挖掘结果结合起来，使用户能够通过生物学过程定位其目标基因或基因集。通过将提交的基因与每条通路的注释基因集进行匹配，对基因和机械相关途径之间的相关性进行评分，并基于文本挖掘技术提供可能的连接。Second, the understanding of a single molecule is usually one-sided and limited. In contrast, linking molecules and pathways is more beneficial for researchers to understand and further explore their mechanisms of action. Thus, in one possible design, combining normal pathways with text mining results enables users to locate their target genes or gene sets through biological processes. Correlations between genes and mechanistically related pathways were scored by matching submitted genes to annotated gene sets for each pathway, and possible connections were provided based on text mining techniques.

为了得到一个合理的评分系统，本申请基于文献数据库内分子互作关系对，计算得出与目标检索因子和各个经典力学敏感通路的相关性，可帮助研究者快速定位相关生物学信号传导模式。评分计算方式如下：In order to obtain a reasonable scoring system, this application calculates the correlation with the target retrieval factor and each classical mechanosensitive pathway based on the molecular interaction relationship pairs in the literature database, which can help researchers to quickly locate relevant biological signal transduction patterns. The score is calculated as follows:

上式中，r(g,p)为基因g与经典力学敏感通路p的相关系数，N_i表示经典力学敏感通路p中第i个基因在文献数据库中相关实体总数，N_p为经典力学敏感通路p所有基因在文献数据库中相关实体的总数，Ω_g、Ω_p分别表示基因g和经典力学敏感通路p的集合。In the above formula, r(g,p) is the correlation coefficient between the gene g and the classical mechanosensitive pathway p, Ni represents the total number of related entities in the literature database of the _ith gene in the classical mechanosensitive pathway _p , and Np is the classical mechanosensitive pathway The total number of related entities of all genes of pathway p in the literature database, Ω _g and Ω _p represent the set of gene g and classical mechanosensitive pathway p, respectively.

使用权重算法，可凸显通路明星分子重要性，符合文本数据挖掘逻辑，当目标检索分子与通路明星分子共现时，可认为目标检索分子与该通路关联可能性更大。Using the weight algorithm can highlight the importance of pathway star molecules, which conforms to the logic of text data mining. When the target search molecule and pathway star molecules co-occur, it can be considered that the target search molecule is more likely to be associated with the pathway.

3.可视化网站构架3. Visual website architecture

为了支持跨平台的可视化，该网站的Web架构基于Django框架，后端数据库使用MySQL实现，语义UI用于前端架构。To support cross-platform visualization, the web architecture of the website is based on the Django framework, the back-end database is implemented using MySQL, and the semantic UI is used for the front-end architecture.

作为NLP Web工具，本发明网站结合了演示和预测策略，提出了一种有效且可信的方法来梳理在骨骼中进行机械感觉和机械传导的分子之间的连接和串扰。As an NLP web tool, the present invention website combines demonstration and prediction strategies to propose an efficient and plausible method to tease out connections and crosstalk between molecules that perform mechanosensory and mechanotransduction in bone.

本发明网站使用图形网络将所有力学通路中的分子显示为互相连接的节点，通过单击节点之间的连线可以链接到相应的原始文献，此功能通过网页前端和服务器数据库交互技术实现，为现有技术，此处不再赘述。The website of the present invention uses a graph network to display the molecules in all mechanical pathways as interconnected nodes, and the corresponding original documents can be linked to by clicking the connection between the nodes. The prior art is not repeated here.

通过上述自建语料库可以对从文献数据库中检索到的实体进行子分类，使得用户可以选择关注特定力的类型或特定的细胞系，从而有助于更精确和有针对性的基于文献的发现。同时，本发明网站创造性地采用了通路拟合方法，基于权重算法，系统可以根据NLP结果显示目标检索分子与经典力学路径的相关性得分，将用户的靶向分子与经典途径的组成部分联系起来，使之更适合生物医学研究。Entities retrieved from literature databases can be sub-classified through the self-built corpus described above, allowing users to choose to focus on specific force types or specific cell lines, thereby facilitating more precise and targeted literature-based discovery. At the same time, the website of the present invention creatively adopts the pathway fitting method. Based on the weighting algorithm, the system can display the correlation score between the target search molecule and the classical mechanical pathway according to the NLP result, and link the user's target molecule with the components of the classical pathway. , making it more suitable for biomedical research.

4.相关性识别和可视化4. Correlation identification and visualization

根据用户定义的范围，本发明网站可以自动检索与目标检索分子相关的实体以及与通路之间的相关性，并将其可视化。交互式操作适用于图形插图，可以实现用户自定义的可取布局以及每个实体的详细信息。点击实体之间的边缘后，弹出窗口可以显示确认信息以及相应句子以红色高亮显示的资源文章。原始文本的收集使用户能够决定人工智能发现的连接的重要性和可靠性，这可能是有效的和准确的。分层搜索使第二层和第三层关系提取能够放大网络，有利于新分子的开发。According to the user-defined scope, the website of the present invention can automatically retrieve and visualize the entities related to the target searched molecule and the correlations with the pathways. Interactive operations are suitable for graphic illustrations, enabling user-defined desirable layouts and details for each entity. After clicking on the edge between entities, a pop-up window can display confirmation and the resource article with the corresponding sentence highlighted in red. The collection of raw text enables users to decide the significance and reliability of the connections discovered by the AI, which may be valid and accurate. Hierarchical search enables second- and third-layer relation extraction to amplify the network, favoring the development of new molecules.

管理通路图和骨定位机制生物学在很大程度上取决于连续的反应和相互作用的几个途径，如上所述。考虑到这一点，本实施例确定了涉及机械敏感性和机械转导的经典途径与可信的证据。概述路径及其与站立证明的交互可视化通过图表和svg.js，一个用于操作和动画svg文件的轻量级库。每个路径的元素都在KEGG(Kyoto Encyclopedia of Genesand Genomes)上搜索，然后与本实施例的数据集进行比较，这些数据集为每个路径包含的项目列表做出了贡献，通过相关系数可对目标基因和途径之间的相关性进行排序，并进行可视化显示。Governing pathway maps and mechanism biology of bone localization depend heavily on successive responses and interactions of several pathways, as described above. With this in mind, this example identifies a classical pathway involving mechanosensitivity and mechanotransduction with plausible evidence. Overview paths and their interaction with standing proofs are visualized via diagrams and svg.js, a lightweight library for manipulating and animating svg files. The elements of each path are searched on KEGG (Kyoto Encyclopedia of Genes and Genomes) and then compared with the datasets of this example, which contribute to the list of items contained in each path, which can be determined by the correlation coefficient. Correlations between target genes and pathways are ranked and visualized.

除了评分外，生物力学调控骨改建文本挖掘交互网站还提供了一个交互选择，将目标通路的所有/选择性成分加入到nlp网络中，形成一个分子到通路网络，从而发现更多的间接连接。In addition to scoring, the Biomechanical Regulation of Bone Remodeling Text Mining Interactive website provides an interactive option to add all/selective components of the target pathway to the NLP network to form a molecule-to-pathway network to uncover more indirect connections.

下面将详细阐述结果界面内容：The content of the result interface will be described in detail below:

结合图4-图7，结果界面左侧集中展示了力学通路关联搜索结果，具体如下：如图5所示，板块1展示了目标检索分子与各经典力学通路的关联度；板块2处，用户可选择感兴趣通路，在网络中加入通路分子合并搜索；板块3处，用户可以快速了解目标检索分子在KEGG数据库中的通路信息，以便更全面地了解该分子的作用途径提。如图6所示，结果页中部主要将分子互作信息可视化，同时提供多种“String”按钮选项。String是一个包含基于研究证据和算法预测的蛋白相互作用信息的数据库，将String的结果与原始NLP网络集成，可为新兴分子的研究提供思路。通过单击本发明网站提供的分层搜索功能，用户可以放大关系网络至第2、3层，扩大网络搜索范围，有利于通路中新分子的发现。网站中相应的图标提供了更改展示模式、图片下载，以及图片重置的功能。可更改分子关联图的展示模式，下载保存当前关联图及恢复上一版关联图展示。对于相应来源文献的检索，用户可通过鼠标点击节点之间的连接，如图7所示，弹窗将显示其相关性以及相对应的文献，相应语句也以红色突出显示。Combined with Figures 4 to 7, the left side of the result interface displays the search results of mechanical pathway associations, as follows: As shown in Figure 5, panel 1 shows the correlation between the target search molecule and each classical mechanical pathway; panel 2, the user The pathway of interest can be selected, and the pathway molecules can be added to the network for combined search; in section 3, the user can quickly understand the pathway information of the target search molecule in the KEGG database, so as to have a more comprehensive understanding of the action pathway of the molecule. As shown in Figure 6, the middle part of the result page mainly visualizes the molecular interaction information, and provides a variety of "String" button options. String is a database containing protein interaction information based on research evidence and algorithmic predictions. Integrating String's results with the original NLP network can provide ideas for the study of emerging molecules. By clicking the hierarchical search function provided by the website of the present invention, the user can enlarge the relational network to the second and third layers, thereby expanding the network search range, which is beneficial to the discovery of new molecules in the pathway. The corresponding icons in the website provide functions to change the display mode, download pictures, and reset pictures. You can change the display mode of the molecular correlation diagram, download and save the current correlation diagram and restore the display of the previous version of the correlation diagram. For the retrieval of the corresponding source documents, the user can click the connection between the nodes with the mouse, as shown in Figure 7, the pop-up window will display its relevance and the corresponding literature, and the corresponding sentence is also highlighted in red.

本实施例选择用共现来定义相关性，而不是采用机器学习的方法来识别语法数据进行关系抽取为了保证预测结果的可信度，本实施例选择让用户告诉嵌入在语料库中的关系，而不是机器。基于每个规范化句子的关系检索和可视化，实体通过共现关联，然后标记出相应的句子。同现得分记录相应具有同现标签的物品的数量。这些句子和相应的实体被存储在一个关系数据库中，并由sqlite(https://www.sqlite.org/index.html)实现。字符串(https://string-db.org/)用于对目标的全面搜索，本实施例提供了二级和/或字符串搜索选项，可以有助于更多的结果。This embodiment chooses to use co-occurrence to define correlation, instead of using machine learning method to identify grammatical data for relation extraction. Not a machine. Based on relation retrieval and visualization of each normalized sentence, entities are associated by co-occurrence, and then corresponding sentences are labeled. The co-occurrence score records the number of items corresponding to the co-occurrence tag. These sentences and corresponding entities are stored in a relational database and implemented by sqlite (https://www.sqlite.org/index.html). String (https://string-db.org/) is used for a comprehensive search of the target, and this embodiment provides secondary and/or string search options that can contribute to more results.

本发明网站的使用方法：用户在本发明网站的检索窗口界面中输入目标检索因子，在检索窗口界面展示了部分经典力学敏感通路，为用户提供力学生物学领域的背景信息，可帮助用户快速确定经典力学敏感通路。当输入完成后，界面会转换至检索结果界面，检索结果界面分为7个板块进行展示。界面左边为板块1-3，界面中部为板块4-6，界面右边有1个板块7。The method of using the website of the present invention: the user enters the target retrieval factor in the retrieval window interface of the website of the present invention, and some classical mechanics-sensitive pathways are displayed on the retrieval window interface, providing the user with background information in the field of mechanobiology, which can help the user to quickly determine Classical mechanosensitive pathways. When the input is completed, the interface will switch to the search results interface, which is divided into 7 sections for display. The left side of the interface is plate 1-3, the middle of the interface is plate 4-6, and there is a plate 7 on the right side of the interface.

板块1采用柱状图展示了目标检索分子与各经典力学敏感通路的相关性；板块2展示了用户可选择的感兴趣通路，可在网络中加入通路分子合并检索；在板块3，用户可以快速了解目标检索分子在KEGG数据库中的通路信息，以便更全面地了解目标检索分子的作用途径。Panel 1 uses a histogram to show the correlation between target search molecules and various classical mechanosensitive pathways; Panel 2 displays user-selectable pathways of interest, which can be added to the network for combined search; in Panel 3, users can quickly understand Pathway information of target search molecules in the KEGG database, in order to more comprehensively understand the action pathways of target search molecules.

通过单击板块5提供的分层搜索功能，用户可以放大关系网络至第2、3层，扩大网络搜索范围。板块6提供了更改展示模式、图片下载、图片重置的功能图标，点击图标可更改分子关联图的展示模式、下载保存当前关联图、恢复上一版关联图展示。By clicking the layered search function provided by panel 5, the user can enlarge the relationship network to the second and third layers to expand the network search scope. Section 6 provides functional icons for changing the display mode, downloading pictures, and resetting pictures. Clicking the icons can change the display mode of the molecular correlation diagram, download and save the current correlation diagram, and restore the display of the previous version of the correlation diagram.

对于相应来源文献的检索，用户可通过鼠标点击板块4中节点之间的连接，板块7内的弹窗将显示其相关性以及相对应的文献，相应语句也以红色突出显示。For the retrieval of the corresponding source documents, the user can click the connection between the nodes in the panel 4 with the mouse, and the pop-up window in the panel 7 will display its relevance and the corresponding literature, and the corresponding sentence is also highlighted in red.

综上，本发明网站基于开放文献资源，利用自然语言处理(NLP)策略，挖掘文本数据库，构建一个网页交互工具。创建了首个骨相关生物力学文本数据库，创新性地引入可视化模式，将复杂晦涩地文本信息图像化；采用自创权重算法，计算目标检索因子与经典力学敏感通路之间的相关性，建立一种以生物学通路为基础的全新分析策略；同时，引入网页交互工具，将生物学文献探索过程可视化、简易化，可极大促进骨相关力学生物学的分子机制研究，推动更有效的数据处理和知识共享方式。To sum up, the website of the present invention is based on the open document resources, uses the natural language processing (NLP) strategy, mines the text database, and constructs a web page interaction tool. Created the first bone-related biomechanical text database, innovatively introduced visualization mode to visualize complex and obscure text information; used self-created weighting algorithm to calculate the correlation between target retrieval factors and classical mechanics-sensitive pathways, and established a A new analysis strategy based on biological pathways; at the same time, the introduction of web interactive tools to visualize and simplify the process of biological literature exploration can greatly promote the research on the molecular mechanism of bone-related mechanobiology and promote more effective data processing. and knowledge sharing.

本申请不再依赖于未经过滤的资源，而是在骨骼机械生物学过程中指定目标，并根据自组织文库检索分类信息。通过这种方式，用户可以选择在所有与机械相关的文章中进行探索，甚至可以指定他们的目标来强制类型、细胞系或物种，有利于知识共享的讨论式生物医学平台为研究人员提供了前所未有的范围和棘手的大量信息。同时，本申请采用以路径为中心的策略，运用加权评分和组合算法，使机械生物学过程中单个基因或集合的导航和探索成为可能。The present application no longer relies on unfiltered sources, but instead specifies targets in bone mechanobiology processes and retrieves taxonomic information from self-organizing libraries. In this way, users can choose to explore across all mechanistically related articles, and can even specify their target to enforce type, cell line or species, a discussion-based biomedical platform that facilitates knowledge sharing and provides researchers with unprecedented access to The range and tricky mass of information. At the same time, the present application adopts a path-centric strategy, using weighted scoring and combinatorial algorithms, to enable the navigation and exploration of individual genes or collections in mechanobiological processes.

以上的具体实施方式，对本申请的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present application in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Within the spirit and principle of the present invention, any modifications, equivalent replacements, improvements, etc. made should be included within the protection scope of the present invention.

Claims

1. A biomechanical regulation bone reconstruction text mining interactive website construction method is characterized by comprising the following steps: the method comprises the following steps:

s1, screening gene information text words in the literature according to the related entries, obtaining gene molecule interaction relation pairs, and constructing a literature database;

s2, based on the gene-molecule interaction relation pair in the literature database, calculating the correlation between the target retrieval factor and the classical mechanics sensitive path by adopting a weight algorithm;

and S3, performing visual display on the correlation between the target retrieval factor and the classical mechanical sensitive path, displaying gene molecules in the classical mechanical sensitive path as mutually connected nodes, and linking to corresponding documents in a document database by clicking a connecting line between the nodes.

2. The biomechanically regulated bone remodeling text mining interaction website building method of claim 1, wherein: and between the step S1 and the step S2, deep neural network training is carried out on the PMC database, text keywords with biological information are screened, and a corpus is constructed.

3. The method for constructing a biomechanically controlled bone remodeling text-mining interaction website of claim 2, wherein: the biological information includes mechanical type, research species, cell type.

4. The method for constructing a biomechanically controlled bone remodeling text-mining interaction website according to claim 1 or 2, wherein: the related entries in the step S1 include biomechanical and bone related entries.

5. The biomechanically regulated bone remodeling text mining interaction website building method of claim 4, wherein: said step S1 includes computer language normalization and preprocessing of the gene information text words.

6. The biomechanically regulated bone remodeling text mining interaction website building method of claim 5, wherein: the step S1 further includes recognizing the gene information text word using pubtat, and converting the gene information text word into a formal name by calling an API of the NCBI gene database.

7. The method for constructing a biomechanically controlled bone remodeling text-mining interaction website according to claim 1 or 2, wherein: the formula of the weighting algorithm in step S2 is:

wherein r (g, p) is the correlation coefficient of gene g and classical mechanics sensitive path p, N _i Represents the total number of related entities of the ith gene in the classical mechanical sensitive pathway p in the literature database, N _p The total number of related entities in the literature database, omega, for all genes of the classical mechanosensitive pathway p _g 、Ω _p Representing gene g and the set of classical mechanosensitive pathways p, respectively.

8. The biomechanically regulated bone remodeling text mining interaction website building method of claim 7, wherein: the classical mechanosensitive pathway includes at least one of Hippo, BMP, TGF β, Wnt, Notch, PI3K/Akt, MAPK, Ras.

9. The biomechanically regulated bone remodeling text mining interaction website building method of claim 1, wherein: in step S3, the method further includes visually displaying the path information of the target search factor in the KEGG database.

10. The biomechanically regulated bone remodeling text-mining interaction website building method of claim 1 or 9, wherein: in step S3, the method further includes visually displaying the gene-molecule interaction relationship pairs of the target search factors in the String data.