WO2021179688A1 - Medical literature retrieval method and apparatus, electronic device, and storage medium - Google Patents
Medical literature retrieval method and apparatus, electronic device, and storage medium Download PDFInfo
- Publication number
- WO2021179688A1 WO2021179688A1 PCT/CN2020/131810 CN2020131810W WO2021179688A1 WO 2021179688 A1 WO2021179688 A1 WO 2021179688A1 CN 2020131810 W CN2020131810 W CN 2020131810W WO 2021179688 A1 WO2021179688 A1 WO 2021179688A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query sentence
- similarity
- medical
- entity
- medical document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to the technical field of information recommendation, in particular to a medical document retrieval method, device, electronic equipment and storage medium.
- the public medicine (PUBMED) database contains a large amount of medical literature, and the mass medical literature often contains the development trend of a certain medical field.
- PUBMED public medicine
- the inventor realizes that at present, in order to improve the retrieval efficiency from the PUBMED medical database, it is generally necessary for experienced medical workers to mark each medical document in the PUBMED medical database with a related topic, and obtain the medical subject headings (Medical Subject Headings, MeSH), so that in the subsequent retrieval process, the query sentence can be matched with MeSH keywords to retrieve the relevant medical literature.
- MeSH Medical Subject Headings
- the embodiments of the present application provide a medical document retrieval method, device, electronic equipment, and storage medium, so as to improve the retrieval efficiency and retrieval accuracy of medical documents.
- an embodiment of the present application provides a medical document retrieval method, including:
- the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined.
- the language type of a medical document is the same as the first query sentence;
- the second target similarity corresponding to the second query sentence and each second medical document is determined.
- the language type of the medical document is the same as the second query sentence;
- a target medical document corresponding to the first query sentence is determined.
- an embodiment of the present application provides a medical document retrieval device, including:
- the transceiver unit is used to obtain the first query sentence
- a processing unit configured to translate the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
- the processing unit is further configured to determine the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document , Wherein the language type of each first medical document is the same as the first query sentence;
- the processing unit is further configured to determine the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document , Wherein the language type of each second medical document is the same as the second query sentence;
- the processing unit is further configured to determine the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity.
- an embodiment of the present application provides an electronic device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the electronic device executes the following method:
- the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined.
- the language type of a medical document is the same as the first query sentence;
- the second target similarity corresponding to the second query sentence and each second medical document is determined.
- the language type of the medical document is the same as the second query sentence;
- a target medical document corresponding to the first query sentence is determined.
- an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:
- the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined.
- the language type of a medical document is the same as the first query sentence;
- the second target similarity corresponding to the second query sentence and each second medical document is determined.
- the language type of the medical document is the same as the second query sentence;
- a target medical document corresponding to the first query sentence is determined.
- embodiments of the present application provide a computer program product
- the computer program product includes a non-transitory computer-readable storage medium storing a computer program
- the computer is operable to cause the computer to execute the computer program as described in the first aspect Methods.
- the implementation of the embodiments of this application eliminates the need to mark the medical documents in the medical database in advance in the process of retrieving medical documents, which reduces labor costs and improves retrieval efficiency; in addition, in the process of retrieving medical documents, it can be retrieved at the same time Medical literature in different languages is published, and the accuracy of medical literature retrieval is improved.
- FIG. 1 is a schematic flowchart of a medical document retrieval method provided by an embodiment of the application
- FIG. 2 is a schematic structural diagram of a network model provided by an embodiment of this application.
- FIG. 3 is a block diagram of functional units of a medical document retrieval device provided by an embodiment of the application.
- FIG. 4 is a schematic structural diagram of a medical document retrieval device provided by an embodiment of the application.
- the technical solution of this application can be applied to the fields of artificial intelligence, smart city, digital healthcare, blockchain and/or big data technology to realize document retrieval.
- the data involved in this application such as medical documents and/or vectors, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
- FIG. 1 is a schematic flowchart of a medical literature retrieval method provided by this application.
- the medical document sorting method is applied to a medical document retrieval device, and the method includes the following steps:
- the medical document retrieval device acquires the first query sentence.
- the first query sentence may be input by the user in the information input box of the medical document retrieval device, or may be obtained by performing voice recognition on the user's voice, for example, the user inputs the user's voice through a voice assistant.
- This application does not limit the method of obtaining the first query sentence.
- the medical document retrieval device translates the first query sentence to obtain a second query sentence.
- the language types of the first query statement and the second query statement are different.
- the first query sentence and the second language type are Chinese or English. Therefore, when the first query sentence is in Chinese, the first query sentence can be translated into English to obtain the second query sentence; when the first query sentence is in English, the first query sentence can be translated into English. The query sentence is translated into Chinese, and the second query sentence is obtained.
- the first query sentence can also be translated into other types of languages, such as Korean, Japanese, and so on.
- the first query sentence may be translated in combination with the medical knowledge graph.
- the first medical knowledge graph corresponding to the first query sentence can be obtained from the medical knowledge graph database.
- the first medical knowledge graph can be obtained by keyword matching, and then combined with the first medical knowledge graph.
- the medical knowledge graph translates the first query sentence to obtain a second query sentence corresponding to the first query sentence.
- the first medical knowledge graph can be vectorized to obtain the first feature vector corresponding to the first medical knowledge graph; each word in the first query sentence is processed by word embedding to obtain each word The corresponding word vector; then, semantic feature extraction is performed on the word vector corresponding to each word to obtain the second feature vector corresponding to the first query sentence; finally, the second feature vector corresponding to the first query sentence and the first A first feature vector corresponding to a medical knowledge graph is spliced to obtain a first target feature vector; translation is performed according to the first target feature vector to obtain a second query sentence.
- the translation is performed based on the first target feature vector, and the existing coding network can be used to translate the second query sentence in an iterative manner, which will not be described in detail.
- the medical knowledge graph can be combined, that is, the prior knowledge is combined, thereby improving the accuracy of the translation of the first query sentence.
- the medical document retrieval device determines the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document.
- each first medical document is the same as the first query sentence.
- the first medical document may be a medical document with the same language type as the first query sentence in a medical database. For example, if the first query sentence is in English, then the English medical document in the medical database is regarded as the first medical document.
- Literature where the medical database may be a PUBMED database.
- the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document can be determined respectively.
- the BM25 algorithm can be used to determine the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document.
- To determine the first similarity for example, determine the first correlation between each word in the first query sentence and the title of each first medical document, and the second correlation between each word and the first query sentence Correlation, and determine the weight of each word; finally, according to the first correlation between each word and each first medical document, the second correlation between each word and the first query sentence, and each word The weight of, determines the first degree of similarity.
- the entity recognition of the first query sentence is performed to obtain the entity in the first query sentence; and the title and abstract of each first medical document are respectively recognized, and the title of each first medical document is obtained.
- Entity and the entity in the abstract then, determine the third similarity between the entity in the first query sentence and the entity in the title of each first medical document, and determine the entity in the first query sentence and The fourth degree of similarity between entities in the abstract of each first medical document.
- the third degree of similarity and the fourth degree of similarity may be characterized by a Jaccard coefficient. Therefore, the third degree of similarity can be expressed by formula (1):
- S3 is the third degree of similarity
- A is the set of entities in the first query sentence
- B is the set of entities in the title of each first medical document
- ⁇ A ⁇ B ⁇ can be set A and set
- ⁇ A ⁇ B ⁇ can be the number of elements in the union of set A and set B
- ⁇ A ⁇ , ⁇ B ⁇ can be the number of elements in set A and set B .
- determining the entity in the first query sentence, the title of each first medical document, and the entity in the abstract can be achieved by completing the trained bert model.
- the entity recognition process is described below by taking the entity in the first query sentence as an example.
- the subsequent entity recognition is similar to this and will not be described again.
- the word vector performs semantic feature extraction to obtain the target word vector of each word; according to the target word vector of each word, the type of each word is determined, and the entity in the first query sentence is obtained.
- the entity in the first query sentence can be determined in combination with the first medical knowledge graph, and it can be excluded that the first query sentence is not corresponding to the first medical knowledge graph.
- the entities in the medical field the entities that are more conducive to medical literature retrieval. Specifically, firstly, each entity in the first medical knowledge graph is vectorized to obtain the feature vector corresponding to each entity; then, based on the attention mechanism, the weight coefficient between each word and each entity is determined, that is, each entity is determined separately.
- the similarity between the word vector corresponding to each word and the feature vector of each entity, and then the similarity corresponding to each entity is normalized to obtain the weight coefficient between each word and each entity; according to the corresponding entity
- the weight coefficient weights the feature vector of each entity to obtain the target word vector corresponding to each word; finally, according to the target word vector corresponding to each word, the type of each word is determined, and the entity in the first query sentence is obtained .
- the bert model can be used to identify the subject and abstract of each first medical document in the medical database to obtain each first medical document.
- the subject and the entities in the abstract can be identified in advance, so that in the process of medical literature retrieval, there is no need to identify the title and entity in the abstract of each first medical literature, which improves The retrieval efficiency of medical literature.
- the entities involved in the embodiments of the present application may include diseases, drugs, surgery, genes, laboratory tests, etc. Wait.
- the first target similarity can be expressed by formula (2):
- S m1 is the first target similarity
- S 1 , S 2 , S 3 and S 4 respectively correspond to the first similarity, the second similarity, the third similarity, and the first similarity corresponding to the first query sentence.
- the medical document retrieval device determines the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document.
- each second medical document is the same as the second query sentence.
- a medical document with the same language type as the second query sentence in a public medical database can be used as the second medical document.
- the first similarity and the second similarity between the second query sentence and the title and abstract of the second medical document are respectively determined;
- the third similarity and the fourth similarity between the entities in the second query sentence and the entities in the title and abstract of the second medical document are respectively determined;
- the second query sentence corresponds to the third similarity and the fourth similarity.
- the first similarity, the second similarity, the third similarity, and the fourth similarity are weighted to obtain the second target similarity corresponding to the second query vector.
- the medical document retrieval device according to the first target similarity between the first query sentence and each of the first medical documents and the second target corresponding to the second query sentence and each of the second medical documents The similarity determines the target medical document corresponding to the first query sentence.
- the first medical document corresponding to the largest first target similarity is taken as a target medical document; according to the first query
- the sentence corresponds to the second target similarity of each second medical document, and the second medical document corresponding to the largest second target similarity is regarded as the other target medical document; then, the two target medical documents are regarded as the second target medical document.
- the medical document with the greatest similarity among the two target medical documents may also be used as the target medical document corresponding to the first query sentence.
- the medical literature retrieval method of this application can also be applied to smart medical scenarios. For example, a doctor can quickly retrieve the medical literature that the doctor wants to obtain through the medical literature retrieval method of this application, and then Quickly find historical cases or documents, so as to improve the data reference for the doctor's diagnosis, improve the doctor's diagnosis efficiency and accuracy, and push the development of medical technology.
- the foregoing determination of the first target similarity and the second target similarity may be achieved by completing the training of the network model.
- the following describes the implementation process of the medical document retrieval method of the present application by combining the structure of the network model and determining the similarity of the first target as an example.
- the method of determining the similarity of the second target is similar to the method of determining the similarity of the first target, and will not be described again.
- the network model includes an embedding layer 1, an embedding layer 2, a feature extraction network 1, a feature extraction network 2, a feature extraction network 3, a feature extraction network 4, a decoding network, and an atlas conversion network.
- the embedding layer 1, the embedding layer 2, the embedding layer 3, and the graph conversion network can be a network based on the bert model for word embedding
- the feature extraction network 1, the feature extraction network 2, the feature extraction network 3, and the feature extraction network 4 can be It is a general feature extraction network used for semantic feature extraction, for example, it can be a long short-term memory network LSTM, a recurrent neural network RNN, etc.
- a decoder network (Decoder) can include multiple stack layers.
- the embedding layer 1 is used to embed each word in the first query sentence to obtain the word vector corresponding to each word, and the feature extraction network 1 is used to perform semantic feature extraction on the word vector of each word.
- the second feature vector corresponding to the first query sentence is obtained; then, the first medical knowledge graph matching the first query sentence is obtained from the medical knowledge graph database; the embedding layer 2 is used to separately store the first medical knowledge graph Word embedding of medical knowledge (such as disease names, precautions, entities, etc.) to obtain the feature vector corresponding to the medical knowledge;
- feature extraction network 2 is used to perform feature extraction on the feature vector corresponding to the medical knowledge to obtain the first The first feature vector corresponding to the medical knowledge graph;
- the first feature vector and the second feature vector are spliced to obtain the first target feature vector corresponding to the first query sentence;
- the decoding network is used for translating according to the first target feature vector to obtain the The second query statement corresponding to the query statement;
- the self-attention mechanism determine the similarity between the word vector of each word and the feature vector of each entity in the first medical knowledge graph, and obtain the weight coefficient between each word and each entity; use each word The weight coefficient between each entity and the feature vector corresponding to each entity are weighted to obtain the target word vector corresponding to each word;
- the feature extraction network 3 is used to perform feature extraction on the target word vector of each word to obtain the target feature vector corresponding to the first query sentence, and to obtain the target feature vector corresponding to the first query sentence according to the target feature vector corresponding to the first query sentence.
- the words in the query sentence are classified, and the entity 3 in the first query sentence is obtained;
- the first similarity and the second similarity between the first query sentence and the title and abstract of the first medical document are respectively calculated;
- the embedding layer 2 is used to embed the title and abstract of the first medical document respectively to obtain the word vector corresponding to the title and the abstract; then, the feature extraction network 4 is used to feature the word vector corresponding to the title and the abstract Extract and classify to obtain entity 1 in the title and entity 2 in the abstract; then, determine the third similarity between entity 1 and entity 3, and the fourth similarity between entity 2 and entity 3;
- first similarity, the second similarity, the third similarity, and the fourth similarity are weighted to obtain the first target similarity corresponding to each first medical document.
- FIG. 3 is a block diagram of the functional unit composition of a medical document retrieval device provided by an embodiment of the application.
- the medical document retrieval device 300 includes: a transceiver unit 301 and a processing unit 302, wherein:
- the transceiver unit 301 is configured to obtain the first query sentence
- the processing unit 302 is configured to translate the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
- the processing unit 302 is further configured to determine the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document, Wherein, the language type of each first medical document is the same as the first query sentence;
- the processing unit 302 is further configured to determine the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document, Wherein, the language type of each second medical document is the same as the second query sentence;
- the processing unit 302 is further configured to determine the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity.
- the processing unit 302 is specifically configured to:
- the first query sentence is translated to obtain the second query sentence.
- the processing unit 302 is specifically configured to:
- the processing unit 302 is specifically used for:
- the second degree of similarity determines the first target corresponding to the first query sentence and each of the first medical documents Similarity.
- the processing unit 302 is specifically configured to:
- the processing unit 302 is specifically configured to:
- the processing unit 302 is specifically configured to:
- Two target medical documents are taken as target medical documents corresponding to the first query sentence.
- FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the application.
- the electronic device includes a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device executes the above method.
- the electronic device 400 includes a transceiver 401, a processor 402, and a memory 403. They are connected by a bus 404 between them.
- the memory 403 is used to store computer programs and data, and can transmit the data stored in the memory 403 to the processor 402.
- the processor 402 is configured to read the computer program in the memory 403 and perform the following operations:
- the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined.
- the language type of a medical document is the same as the first query sentence;
- the second target similarity corresponding to the second query sentence and each second medical document is determined.
- the language type of the medical document is the same as the second query sentence;
- a target medical document corresponding to the first query sentence is determined.
- the processor 402 in terms of translating the first query sentence to obtain the second query sentence, is specifically configured to perform the following operations:
- the first query sentence is translated to obtain the second query sentence.
- the processor 402 in terms of translating the first query sentence according to the first medical knowledge graph to obtain the second query sentence, is specifically configured to perform the following operations:
- the processor 402 is specifically configured to perform the following operations:
- the second degree of similarity determines the first target corresponding to the first query sentence and each of the first medical documents Similarity.
- the processor 402 is specifically configured to perform the following operations:
- the processor 402 is specifically configured to perform the following operations:
- the processor 402 is specifically configured to perform the following operations :
- Two target medical documents are taken as target medical documents corresponding to the first query sentence.
- the transceiver 401 may be the transceiver unit 301 of the medical document retrieval device 300 in the embodiment shown in FIG. 3, and the processor 402 may be the processing unit 302 of the medical document retrieval device 300 in the embodiment shown in FIG. .
- the medical document retrieval device in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, notebook computers, and mobile Internet Devices (MID). ) Or wearable devices, etc.
- smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
- tablet computers such as Samsung phones, iOS phones, Windows Phone phones, etc.
- MID mobile Internet Devices
- wearable devices etc.
- the aforementioned medical document retrieval device is only an example, not an exhaustive list, and includes but not limited to the aforementioned medical document retrieval device.
- the aforementioned medical document retrieval device may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.
- the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to realize any medical document retrieval as recorded in the above method embodiment Part or all of the steps of the method.
- the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
- the embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute the method described in the above method embodiment Part or all of the steps of any kind of medical literature retrieval method.
- the disclosed device may be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
- the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
- the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
- a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
- the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
本申请要求于2020年10月23日提交中国专利局、申请号为202011152153.X,发明名称为“医学文献检索方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 23, 2020, the application number is 202011152153.X, and the invention title is "Medical Document Retrieval Methods, Apparatus, Electronic Equipment, and Storage Media", and its entire contents Incorporated in this application by reference.
本申请涉及信息推荐技术领域,具体涉及一种医学文献检索方法、装置、电子设备及存储介质。This application relates to the technical field of information recommendation, in particular to a medical document retrieval method, device, electronic equipment and storage medium.
公共医学(public medicine,PUBMED)数据库包含了大量的医学文献,海量医学文献中往往包含着某一医学领域的研究方向的发展趋势,通过对医学领域的医学文献进行阅读,可提高相关领域研究者们和相关公共卫生政策制定者们制定决策的效率和精度。然而,随着医学领域的文献发表数量飞速增长,PUBMED医学数据库也搜集了越来越多的医学文献。The public medicine (PUBMED) database contains a large amount of medical literature, and the mass medical literature often contains the development trend of a certain medical field. By reading the medical literature in the medical field, researchers in related fields can be improved The efficiency and accuracy of decision-making by our and relevant public health policy makers. However, with the rapid increase in the number of publications in the medical field, the PUBMED medical database has also collected more and more medical literature.
发明人意识到,目前,为了提高从PUBMED医学数据库中检索效率,一般需要经验丰富的医学工作者为PUBMED医学数据库中的每篇医学文献标注一个相关主题,得到医学主题词表(Medical Subject Headings,MeSH),这样后续检索的过程中可以将查询语句与MeSH进行关键词匹配,检索出相关的医学文献。The inventor realizes that at present, in order to improve the retrieval efficiency from the PUBMED medical database, it is generally necessary for experienced medical workers to mark each medical document in the PUBMED medical database with a related topic, and obtain the medical subject headings (Medical Subject Headings, MeSH), so that in the subsequent retrieval process, the query sentence can be matched with MeSH keywords to retrieve the relevant medical literature.
然而,由于PUBMED数据库中的医学文献数量较多,通过人工标注需要投入大量的人工成本,且检索结果依赖人工标注结果,导致检索效率和检索精度较低。However, due to the large number of medical documents in the PUBMED database, manual annotation requires a lot of labor costs, and the retrieval results rely on manual annotation results, resulting in low retrieval efficiency and retrieval accuracy.
发明内容Summary of the invention
本申请实施例提供了一种医学文献检索方法、装置、电子设备及存储介质,提高医学文献的检索效率和检索精度。The embodiments of the present application provide a medical document retrieval method, device, electronic equipment, and storage medium, so as to improve the retrieval efficiency and retrieval accuracy of medical documents.
第一方面,本申请实施例提供一种医学文献检索方法,包括:In the first aspect, an embodiment of the present application provides a medical document retrieval method, including:
获取第一查询语句;Get the first query statement;
对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;Translating the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;According to the first query sentence and the title and abstract of each first medical document, the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined. The language type of a medical document is the same as the first query sentence;
根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;According to the second query sentence and the title and abstract of each second medical document, the second target similarity corresponding to the second query sentence and each second medical document is determined. 2. The language type of the medical document is the same as the second query sentence;
根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献。According to the first target similarity and the second target similarity, a target medical document corresponding to the first query sentence is determined.
第二方面,本申请实施例提供一种医学文献检索装置,包括:In the second aspect, an embodiment of the present application provides a medical document retrieval device, including:
收发单元,用于获取第一查询语句;The transceiver unit is used to obtain the first query sentence;
处理单元,用于对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;A processing unit, configured to translate the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
所述处理单元,还用于根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;The processing unit is further configured to determine the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document , Wherein the language type of each first medical document is the same as the first query sentence;
所述处理单元,还用于根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;The processing unit is further configured to determine the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document , Wherein the language type of each second medical document is the same as the second query sentence;
所述处理单元,还用于根据所述第一目标相似度以及所述第二目标相似度,确定与所 述第一查询语句对应的目标医学文献。The processing unit is further configured to determine the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity.
第三方面,本申请实施例提供一种电子设备,包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行以下方法:In a third aspect, an embodiment of the present application provides an electronic device, including a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , So that the electronic device executes the following method:
获取第一查询语句;Get the first query statement;
对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;Translating the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;According to the first query sentence and the title and abstract of each first medical document, the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined. The language type of a medical document is the same as the first query sentence;
根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;According to the second query sentence and the title and abstract of each second medical document, the second target similarity corresponding to the second query sentence and each second medical document is determined. 2. The language type of the medical document is the same as the second query sentence;
根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献。According to the first target similarity and the second target similarity, a target medical document corresponding to the first query sentence is determined.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序使得计算机执行以下方法:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program causes a computer to execute the following method:
获取第一查询语句;Get the first query statement;
对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;Translating the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;According to the first query sentence and the title and abstract of each first medical document, the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined. The language type of a medical document is the same as the first query sentence;
根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;According to the second query sentence and the title and abstract of each second medical document, the second target similarity corresponding to the second query sentence and each second medical document is determined. 2. The language type of the medical document is the same as the second query sentence;
根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献。According to the first target similarity and the second target similarity, a target medical document corresponding to the first query sentence is determined.
第五方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机可操作来使计算机执行如第一方面所述的方法。In a fifth aspect, embodiments of the present application provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to execute the computer program as described in the first aspect Methods.
实施本申请实施例,在检索医学文献的过程中无需提前对医学数据库中的医学文献进行标注,减少了人力成本的投入,提高了检索效率;此外,在检索医学文献的过程中,可同时检索出不同语言类型的医学文献,而且,提高了对医学文献检索的精度。The implementation of the embodiments of this application eliminates the need to mark the medical documents in the medical database in advance in the process of retrieving medical documents, which reduces labor costs and improves retrieval efficiency; in addition, in the process of retrieving medical documents, it can be retrieved at the same time Medical literature in different languages is published, and the accuracy of medical literature retrieval is improved.
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本申请实施例提供的一种医学文献检索方法的流程示意图;FIG. 1 is a schematic flowchart of a medical document retrieval method provided by an embodiment of the application;
图2为本申请实施例提供的一种网络模型的结构示意图;FIG. 2 is a schematic structural diagram of a network model provided by an embodiment of this application;
图3为本申请实施例提供的一种医学文献检索装置的功能单元组成框图;3 is a block diagram of functional units of a medical document retrieval device provided by an embodiment of the application;
图4为本申请实施例提供的一种医学文献检索装置的结构示意图。FIG. 4 is a schematic structural diagram of a medical document retrieval device provided by an embodiment of the application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地 描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the specification and claims of this application and the drawings are used to distinguish different objects, not to describe a specific order . In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结果或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that specific features, results or characteristics described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请的技术方案可应用于人工智能、智慧城市、数字医疗、区块链和/或大数据技术领域,以实现文献检索。可选的,本申请涉及的数据如医学文献和/或向量等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。The technical solution of this application can be applied to the fields of artificial intelligence, smart city, digital healthcare, blockchain and/or big data technology to realize document retrieval. Optionally, the data involved in this application, such as medical documents and/or vectors, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.
参阅图1,图1为本申请提供的一种医学文献检索方法的流程示意图。该医学文献排序方法应用于医学文献检索装置,该方法包括以下步骤:Refer to FIG. 1, which is a schematic flowchart of a medical literature retrieval method provided by this application. The medical document sorting method is applied to a medical document retrieval device, and the method includes the following steps:
101:医学文献检索装置获取第一查询语句。101: The medical document retrieval device acquires the first query sentence.
其中,该第一查询语句可以是用户在该医学文献检索装置的信息输入框输入的,也可以是对用户语音进行语音识别得到的,比如,用户通过语音助手输入用户语音。本申请对获取第一查询语句的获取方式不做限定。The first query sentence may be input by the user in the information input box of the medical document retrieval device, or may be obtained by performing voice recognition on the user's voice, for example, the user inputs the user's voice through a voice assistant. This application does not limit the method of obtaining the first query sentence.
102:医学文献检索装置对所述第一查询语句进行翻译,得到第二查询语句。102: The medical document retrieval device translates the first query sentence to obtain a second query sentence.
其中,该第一查询语句和该第二查询语句的语言类型不同。Wherein, the language types of the first query statement and the second query statement are different.
示例性的,该第一查询语句和所述第二的语言类型为中文或者英文。因此,在该第一查询语句的为中文的情况下,则可将该第一查询语句翻译为英文,得到第二查询语句;在该第一查询语句为英文的情况下,可将该第一查询语句翻译为中文,得到该第二查询语句。Exemplarily, the first query sentence and the second language type are Chinese or English. Therefore, when the first query sentence is in Chinese, the first query sentence can be translated into English to obtain the second query sentence; when the first query sentence is in English, the first query sentence can be translated into English. The query sentence is translated into Chinese, and the second query sentence is obtained.
应理解,本申请中主要以中文和英文两种语言类型为例进行说明,在实际应用中,还可以将第一查询语句翻译为其他类型的语言,比如,韩文、日文,等等。It should be understood that in this application, two language types, Chinese and English, are mainly used as examples. In practical applications, the first query sentence can also be translated into other types of languages, such as Korean, Japanese, and so on.
示例性的,在本申请中对第一查询语句进行翻译的过程中,可以结合医学知识图谱对该第一查询语句进行翻译。具体的,可先从医学知识图谱库中获取与该第一查询语句对应的第一医学知识图谱,比如,可以通过关键词匹配的方式,得到该第一医学知识图谱,然后,结合该第一医学知识图谱,对该第一查询语句进行翻译,得到与该第一查询语句对应的第二查询语句。示例性的,可对该第一医学知识图谱进行向量化,得到该第一医学知识图谱对应的第一特征向量;对该第一查询语句中的每个单词进行词嵌入处理,得到每个单词对应的词向量;然后,对每个单词对应的词向量进行语义特征提取,得到该第一查询语句对应的第二特征向量;最后,将该第一查询语句对应的第二特征向量与该第一医学知识图谱对应的第一特征向量进行拼接,得到第一目标特征向量;根据该第一目标特征向量进行翻译,得到第二查询语句。Exemplarily, in the process of translating the first query sentence in this application, the first query sentence may be translated in combination with the medical knowledge graph. Specifically, the first medical knowledge graph corresponding to the first query sentence can be obtained from the medical knowledge graph database. For example, the first medical knowledge graph can be obtained by keyword matching, and then combined with the first medical knowledge graph. The medical knowledge graph translates the first query sentence to obtain a second query sentence corresponding to the first query sentence. Exemplarily, the first medical knowledge graph can be vectorized to obtain the first feature vector corresponding to the first medical knowledge graph; each word in the first query sentence is processed by word embedding to obtain each word The corresponding word vector; then, semantic feature extraction is performed on the word vector corresponding to each word to obtain the second feature vector corresponding to the first query sentence; finally, the second feature vector corresponding to the first query sentence and the first A first feature vector corresponding to a medical knowledge graph is spliced to obtain a first target feature vector; translation is performed according to the first target feature vector to obtain a second query sentence.
其中,基于第一目标特征向量进行翻译,可以使用现有的编码网络,通过依次迭代的方式翻译出该第二查询语句,不再详细描述。Wherein, the translation is performed based on the first target feature vector, and the existing coding network can be used to translate the second query sentence in an iterative manner, which will not be described in detail.
可以看出,在对第一查询语句进行翻译的过程中,可以结合医学知识图谱,即结合了先验知识,进而提高了对第一查询语句翻译的精度。It can be seen that in the process of translating the first query sentence, the medical knowledge graph can be combined, that is, the prior knowledge is combined, thereby improving the accuracy of the translation of the first query sentence.
103:医学文献检索装置根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度。103: The medical document retrieval device determines the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document.
其中,每篇第一医学文献的语言类型与该第一查询语句相同。Among them, the language type of each first medical document is the same as the first query sentence.
其中,该第一文医学文献可以是医学数据库中语言类型与该第一查询语句相同的医学文献,比如,第一查询语句为英文,则将该医学数据库中的英文医学文献作为该第一医学文献,其中,该医学数据库可以为PUBMED数据库。The first medical document may be a medical document with the same language type as the first query sentence in a medical database. For example, if the first query sentence is in English, then the English medical document in the medical database is regarded as the first medical document. Literature, where the medical database may be a PUBMED database.
示例性的,可分别确定该第一查询语句与每篇第一医学文献的标题和摘要之间的第一相似度和第二相似度。比如,可通过BM25算法,确定该第一查询语句与每篇第一医学文献的标题和摘要之间的第一相似度和第二相似度。Exemplarily, the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document can be determined respectively. For example, the BM25 algorithm can be used to determine the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document.
以确定第一相似度举例来说,确定该第一查询语句中每个单词与每篇第一医学文献的标题之间的第一相关性、以及每个单词与该第一查询语句的第二相关性,并确定每个单词的权重;最后,根据每个单词与每篇第一医学文献之间的第一相关性、每个单词与该第一查询语句的第二相关性以及每个单词的权重,确定该第一相似度。To determine the first similarity, for example, determine the first correlation between each word in the first query sentence and the title of each first medical document, and the second correlation between each word and the first query sentence Correlation, and determine the weight of each word; finally, according to the first correlation between each word and each first medical document, the second correlation between each word and the first query sentence, and each word The weight of, determines the first degree of similarity.
进一步地,对该第一查询语句进行实体识别,得到该第一查询语句中的实体;以及分别对每个第一医学文献的标题和摘要进行识别,得到每篇第一医学文献的标题中的实体以及摘要中的实体;然后,确定该第一查询语句中的实体与每篇第一医学文献中的标题中的实体之间的第三相似度,以及确定该第一查询语句中的实体与每篇第一医学文献的摘要中的实体之间的第四相似度。Further, the entity recognition of the first query sentence is performed to obtain the entity in the first query sentence; and the title and abstract of each first medical document are respectively recognized, and the title of each first medical document is obtained. Entity and the entity in the abstract; then, determine the third similarity between the entity in the first query sentence and the entity in the title of each first medical document, and determine the entity in the first query sentence and The fourth degree of similarity between entities in the abstract of each first medical document.
示例性的,该第三相似度和该第四相似度可以通过杰卡德系数表征。因此,该第三相似度可以通过公式(1)表示:Exemplarily, the third degree of similarity and the fourth degree of similarity may be characterized by a Jaccard coefficient. Therefore, the third degree of similarity can be expressed by formula (1):
应理解的是,确定第四相似度与确定第三相似度的方式类似,不再叙述。It should be understood that the method of determining the fourth degree of similarity is similar to that of determining the third degree of similarity, and will not be described again.
其中,S3为第三相似度,A为第一查询语句中的实体组成的集合,B为每篇第一医学文献中的标题中的实体组成集合,丨A∩B丨可以为集合A和集合B的交集中的元素的个数,丨A∪B丨可以为集合A和集合B的并集中的元素的个数,丨A丨、丨B丨可以为集合A和集合B中元素的个数。Among them, S3 is the third degree of similarity, A is the set of entities in the first query sentence, B is the set of entities in the title of each first medical document, 丨A∩B丨 can be set A and set The number of elements in the intersection of B, 丨A∪B丨 can be the number of elements in the union of set A and set B, 丨A丨, 丨B丨 can be the number of elements in set A and set B .
示例性的,确定第一查询语句中的实体、以及每篇第一医学文献的标题以及摘要中的实体均可以通过完成训练的bert模型实现。Exemplarily, determining the entity in the first query sentence, the title of each first medical document, and the entity in the abstract can be achieved by completing the trained bert model.
下面以识别第一查询语句中的实体为例说明实体识别过程,后续涉及的实体识别均与此类似,不再叙述。The entity recognition process is described below by taking the entity in the first query sentence as an example. The subsequent entity recognition is similar to this and will not be described again.
通过该完成训练的bert模型对该第一查询语句进行分词;通过bert模型对该第一查询语句中的每个单词进行词嵌入处理,得到每个单词对应的词向量;对每个单词对应的词向量进行语义特征提取,得到每个单词的目标词向量;根据每个单词的目标词向量,确定每个单词的类型,得到第一查询语句中的实体。Perform word segmentation on the first query sentence through the trained bert model; perform word embedding processing on each word in the first query sentence through the bert model to obtain the word vector corresponding to each word; The word vector performs semantic feature extraction to obtain the target word vector of each word; according to the target word vector of each word, the type of each word is determined, and the entity in the first query sentence is obtained.
此外,在对第一查询语句中的进行实体识别的过程中,可以结合第一医学知识图谱确定第一查询语句中的实体,可以排除该第一查询语句中不是该第一医学知识图谱所对应的医学领域中的实体,得到更有利于医学文献检索的实体。具体的,首先对第一医学知识图谱中的各个实体进行向量化,得到各个实体对应的特征向量;然后,基于注意力机制,确定每个单词与各个实体之间的权重系数,即分别确定每个单词对应的词向量与各个实体的特征向量之间的相似度,然后,将各个实体对应的相似度进行归一化,得到每个单词与各个实体之间的权重系数;根据各个实体对应的权重系数对各个实体的特征向量进行加权,得到与每个单词对应的目标词向量;最后,根据每个单词对应的目标词向量,确定每个单词的类型,得到该第一查询语句中的实体。In addition, in the process of identifying entities in the first query sentence, the entity in the first query sentence can be determined in combination with the first medical knowledge graph, and it can be excluded that the first query sentence is not corresponding to the first medical knowledge graph. For the entities in the medical field, the entities that are more conducive to medical literature retrieval. Specifically, firstly, each entity in the first medical knowledge graph is vectorized to obtain the feature vector corresponding to each entity; then, based on the attention mechanism, the weight coefficient between each word and each entity is determined, that is, each entity is determined separately. The similarity between the word vector corresponding to each word and the feature vector of each entity, and then the similarity corresponding to each entity is normalized to obtain the weight coefficient between each word and each entity; according to the corresponding entity The weight coefficient weights the feature vector of each entity to obtain the target word vector corresponding to each word; finally, according to the target word vector corresponding to each word, the type of each word is determined, and the entity in the first query sentence is obtained .
可以看出,结合医学知识图谱,可以排除一些该第一查询语句中不是该第一医学知识 图谱所对应的医学领域的实体,比如,一些、人名实体、物名实体,等等。由于排除了这些对于医学文献检索帮助不大的实体,可提高第三相似度和第四相似度的准确度,进而可以提高医学文献的检索效率和精度。It can be seen that in combination with the medical knowledge graph, some entities in the first query sentence that are not in the medical field corresponding to the first medical knowledge graph can be excluded, such as some, name entities, name entities, and so on. Since these entities that are of little help to medical literature retrieval are excluded, the accuracy of the third similarity degree and the fourth similarity degree can be improved, and the retrieval efficiency and accuracy of the medical literature can be improved.
在一些可能的实施方式中,在确定第一查询语句中的实体之前,可使用bert模型分别对该医学数据库中的每篇第一医学文献的主题和摘要进行识别,得到每篇第一医学文献的主题和摘要中的实体。也就是说,可以提前识别出医学文献的主题以及摘要中的实体,这样就进行医学文献检索的过程中,就无需再对每篇医学第一医学文献的标题和摘要中的实体进行识别,提高医学文献的检索效率。In some possible implementations, before the entity in the first query sentence is determined, the bert model can be used to identify the subject and abstract of each first medical document in the medical database to obtain each first medical document. The subject and the entities in the abstract. In other words, the subject of medical literature and the entity in the abstract can be identified in advance, so that in the process of medical literature retrieval, there is no need to identify the title and entity in the abstract of each first medical literature, which improves The retrieval efficiency of medical literature.
应理解的是,本申请实施例中涉及的实体(比如,医学文献的摘要和标题中的实体,或者,第一查询语句中的实体)可以包括疾病、药品、手术、基因、检验检查,等等。It should be understood that the entities involved in the embodiments of the present application (for example, the entities in the abstracts and titles of medical documents, or the entities in the first query sentence) may include diseases, drugs, surgery, genes, laboratory tests, etc. Wait.
最后,对该第一相似度、第二相似度、第三相似度以及第四相似度进行加权处理,得到该第一目标相似度。示例性的,该第一目标相似度可以通过公式(2)表示:Finally, weighting is performed on the first similarity, the second similarity, the third similarity, and the fourth similarity to obtain the first target similarity. Exemplarily, the first target similarity can be expressed by formula (2):
S m1=α 1*S 1+α 2*S 2+α 3*S 3+α 4*S 4 (公式2); S m1 =α 1 *S 1 +α 2 *S 2 +α 3 *S 3 +α 4 *S 4 (Equation 2);
其中,S m1为第一目标相似度,S 1、S 2、S 3以及S 4分别与第一查询语句对应的第一相似度第一相似度、第二相似度、第三相似度以及第四相似度,α 1、α 2、α 3和α 4为预设的权重系数,且α 1+α 2+α 3+α 4=1。 Among them, S m1 is the first target similarity, S 1 , S 2 , S 3 and S 4 respectively correspond to the first similarity, the second similarity, the third similarity, and the first similarity corresponding to the first query sentence. Four similarities, α 1 , α 2 , α 3 and α 4 are preset weight coefficients, and α 1 +α 2 +α 3 +α 4 =1.
104:医学文献检索装置根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度。104: The medical document retrieval device determines the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document.
其中,每篇第二医学文献的语言类型与所述第二查询语句相同。Wherein, the language type of each second medical document is the same as the second query sentence.
同样,可将公共医学数据库中语言类型与该第二查询语句相同的医学文献作为该第二医学文献。Similarly, a medical document with the same language type as the second query sentence in a public medical database can be used as the second medical document.
示例性的,与确定第一查询语句对应的第一目标相似度的方法类似,分别确定出第二查询语句与第二医学文献的标题和摘要之间的第一相似度、第二相似度;然后,再分别确定该第二查询语句中的实体与该第二医学文献的标题和摘要中的实体之间的第三相似度、第四相似度;最后,将与该第二查询语句对应的第一相似度、第二相似度、第三相似度以及第四相似度进行加权,得到与该第二查询向量对应的第二目标相似度。Exemplarily, similar to the method for determining the first target similarity corresponding to the first query sentence, the first similarity and the second similarity between the second query sentence and the title and abstract of the second medical document are respectively determined; Then, the third similarity and the fourth similarity between the entities in the second query sentence and the entities in the title and abstract of the second medical document are respectively determined; finally, the second query sentence corresponds to the third similarity and the fourth similarity. The first similarity, the second similarity, the third similarity, and the fourth similarity are weighted to obtain the second target similarity corresponding to the second query vector.
105:医学文献检索装置根据所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度以及所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,确定与所述第一查询语句对应的目标医学文献。105: The medical document retrieval device according to the first target similarity between the first query sentence and each of the first medical documents and the second target corresponding to the second query sentence and each of the second medical documents The similarity determines the target medical document corresponding to the first query sentence.
示例性的,根据该第一查询语句与每篇第一医学文献对应的第一目标相似度,将最大第一目标相似度对应的第一医学文献作为一篇目标医学文献;根据该第一查询语句与每篇第二医学文献对应的第二目标相似度,将最大第二目标相似度对应的第二医学文献作为另外一篇目标医学文献;然后,将这两篇目标医学文献作为与该第一查询语句对应的目标医学文献。Exemplarily, according to the first target similarity corresponding to the first query sentence and each first medical document, the first medical document corresponding to the largest first target similarity is taken as a target medical document; according to the first query The sentence corresponds to the second target similarity of each second medical document, and the second medical document corresponding to the largest second target similarity is regarded as the other target medical document; then, the two target medical documents are regarded as the second target medical document. A target medical literature corresponding to the query sentence.
示例性,还可以这两篇目标医学文献中相似度最大的医学文献作为与该第一查询语句对应的目标医学文献。Exemplarily, the medical document with the greatest similarity among the two target medical documents may also be used as the target medical document corresponding to the first query sentence.
可以看出,在本申请实施方案中,可同时检索出不同语言类型的医学文献,而且,在检索过程中无需提前对医学数据库中的医学文献进行标注,减少了人力成本的投入;而且,通过实体匹配的方式,确定查询语句中的实体与每篇医学文献的摘要以及标题中的实体之间的相似度,进一步提高对医学文献检索的精度。It can be seen that in the implementation of this application, medical documents of different language types can be retrieved at the same time, and there is no need to mark the medical documents in the medical database in advance during the retrieval process, which reduces the labor cost; moreover, through The entity matching method determines the similarity between the entity in the query sentence and the abstract of each medical document and the entity in the title, and further improves the accuracy of medical document retrieval.
在本申请的一个实施方式中,本申请的医学文献检索方法还可应用于智慧医疗场景,比如,医生可以通过本申请的医学文献检索方法快速的检索出医生想要获取的医学文献,进而可以快速找到历史病例或者文献,从而为医生的诊断提高数据参考,提高医生的诊断 效率和精度,推送医疗科技的发展。In one embodiment of this application, the medical literature retrieval method of this application can also be applied to smart medical scenarios. For example, a doctor can quickly retrieve the medical literature that the doctor wants to obtain through the medical literature retrieval method of this application, and then Quickly find historical cases or documents, so as to improve the data reference for the doctor's diagnosis, improve the doctor's diagnosis efficiency and accuracy, and push the development of medical technology.
在一些可能的实施方式中,上述确定第一目标相似度以及第二目标相似度可以通过完成训练的网络模型实现。下面结合网络模型的结构,并以确定第一目标相似度为例,说明本申请的医学文献检索方法的实现过程。其中,确定第二目标相似度的方式与确定第一目标相似度的方式类似,不再叙述。In some possible implementation manners, the foregoing determination of the first target similarity and the second target similarity may be achieved by completing the training of the network model. The following describes the implementation process of the medical document retrieval method of the present application by combining the structure of the network model and determining the similarity of the first target as an example. Wherein, the method of determining the similarity of the second target is similar to the method of determining the similarity of the first target, and will not be described again.
参阅图2,该网络模型包括嵌入层1、嵌入层2、特征提取网络1、特征提取网络2、特征提取网络3、特征提取网络4、解码网络以及图谱转换网络。其中,嵌入层1、嵌入层2、嵌入层3以及图谱转换网络可以为基于bert模型的网络,用于词嵌入;特征提取网络1、特征提取网络2、特征提取网络3以及特征提取网络4可以为通用的特征提取网络,用于语义特征提取,比如,可以为长短期记忆网络LSTM、循环神经网络RNN,等等;解码网络(Decoder),可以包括多个堆栈层。Referring to Fig. 2, the network model includes an embedding layer 1, an embedding layer 2, a feature extraction network 1, a feature extraction network 2, a feature extraction network 3, a feature extraction network 4, a decoding network, and an atlas conversion network. Among them, the embedding layer 1, the embedding layer 2, the embedding layer 3, and the graph conversion network can be a network based on the bert model for word embedding; the feature extraction network 1, the feature extraction network 2, the feature extraction network 3, and the feature extraction network 4 can be It is a general feature extraction network used for semantic feature extraction, for example, it can be a long short-term memory network LSTM, a recurrent neural network RNN, etc.; a decoder network (Decoder) can include multiple stack layers.
示例性的,嵌入层1用于对第一查询语句中的每个单词进行词嵌入,得到每个单词对应的词向量,特征提取网络1用于对每个单词的词向量进行语义特征提取,得到该第一查询语句对应的第二特征向量;然后,从医学知识图谱库中获取与该第一查询语句匹配的第一医学知识图谱;嵌入层2用于分别对该第一医学知识图谱中的医学知识(比如,疾病名称、注意事项、实体,等等)进行词嵌入,得到以医学知识对应的特征向量;特征提取网络2用于对医学知识对应的特征向量进行特征提取,得到第一医学知识图谱对应的第一特征向量;Exemplarily, the embedding layer 1 is used to embed each word in the first query sentence to obtain the word vector corresponding to each word, and the feature extraction network 1 is used to perform semantic feature extraction on the word vector of each word. The second feature vector corresponding to the first query sentence is obtained; then, the first medical knowledge graph matching the first query sentence is obtained from the medical knowledge graph database; the embedding layer 2 is used to separately store the first medical knowledge graph Word embedding of medical knowledge (such as disease names, precautions, entities, etc.) to obtain the feature vector corresponding to the medical knowledge; feature extraction network 2 is used to perform feature extraction on the feature vector corresponding to the medical knowledge to obtain the first The first feature vector corresponding to the medical knowledge graph;
然后,对该第一特征向量和第二特征向量进行拼接,得到与该第一查询语句对应的第一目标特征向量;解码网络用于根据该第一目标特征向量进行翻译,得到与该第一查询语句对应的第二查询语句;Then, the first feature vector and the second feature vector are spliced to obtain the first target feature vector corresponding to the first query sentence; the decoding network is used for translating according to the first target feature vector to obtain the The second query statement corresponding to the query statement;
然后,根据自注意力机制,确定每个单词的词向量与第一医学知识图谱中各个实体的特征向量之间的相似度,得到每个单词与各个实体之间的权重系数;使用每个单词与各个实体之间的权重系数,对各个实体对应的特征向量进行加权,得到每个单词对应的目标词向量;Then, according to the self-attention mechanism, determine the similarity between the word vector of each word and the feature vector of each entity in the first medical knowledge graph, and obtain the weight coefficient between each word and each entity; use each word The weight coefficient between each entity and the feature vector corresponding to each entity are weighted to obtain the target word vector corresponding to each word;
进一步的,特征提取网络3用于对每个单词的目标词向量进行特征提取,得到与该第一查询语句对应的目标特征向量,并根据该第一查询语句对应的目标特征向量对该第一查询语句中的单词进行分类,得到该第一查询语句中的实体3;Further, the feature extraction network 3 is used to perform feature extraction on the target word vector of each word to obtain the target feature vector corresponding to the first query sentence, and to obtain the target feature vector corresponding to the first query sentence according to the target feature vector corresponding to the first query sentence. The words in the query sentence are classified, and the entity 3 in the first query sentence is obtained;
示例性的,基于BM25算法,分别计算出第一查询语句与该第一医学文献的标题和摘要之间的第一相似度以及第二相似度;Exemplarily, based on the BM25 algorithm, the first similarity and the second similarity between the first query sentence and the title and abstract of the first medical document are respectively calculated;
示例性的,嵌入层2用于分别对第一医学文献的标题和摘要进行词嵌入,得到标题以及摘要对应的词向量;然后,特征提取网络4用于对标题以及摘要对应的词向量进行特征提取,并进行分类,分别得到标题中的实体1以及摘要中的实体2;然后,确定实体1与实体3之间的第三相似度,以及实体2与实体3之间的第四相似度;Exemplarily, the embedding layer 2 is used to embed the title and abstract of the first medical document respectively to obtain the word vector corresponding to the title and the abstract; then, the feature extraction network 4 is used to feature the word vector corresponding to the title and the abstract Extract and classify to obtain entity 1 in the title and entity 2 in the abstract; then, determine the third similarity between entity 1 and entity 3, and the fourth similarity between entity 2 and entity 3;
最后,对该第一相似度、第二相似度、第三相似度以及第四相似度进行加权,得到与每篇第一医学文献对应的第一目标相似度。Finally, the first similarity, the second similarity, the third similarity, and the fourth similarity are weighted to obtain the first target similarity corresponding to each first medical document.
参阅图3,图3为本申请实施例提供的一种医学文献检索装置的功能单元组成框图。医学文献检索装置300包括:收发单元301和处理单元302,其中:Referring to FIG. 3, FIG. 3 is a block diagram of the functional unit composition of a medical document retrieval device provided by an embodiment of the application. The medical document retrieval device 300 includes: a transceiver unit 301 and a processing unit 302, wherein:
收发单元301,用于获取第一查询语句;The transceiver unit 301 is configured to obtain the first query sentence;
处理单元302,用于对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;The processing unit 302 is configured to translate the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
处理单元302,还用于根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;The processing unit 302 is further configured to determine the first target similarity corresponding to the first query sentence and each first medical document according to the first query sentence and the title and abstract of each first medical document, Wherein, the language type of each first medical document is the same as the first query sentence;
处理单元302,还用于根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;The processing unit 302 is further configured to determine the second target similarity corresponding to the second query sentence and each second medical document according to the second query sentence and the title and abstract of each second medical document, Wherein, the language type of each second medical document is the same as the second query sentence;
处理单元302,还用于根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献。The processing unit 302 is further configured to determine the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity.
在一些可能的实施方式中,在对所述第一查询语句进行翻译,得到第二查询语句方面,处理单元302,具体用于:In some possible implementation manners, in terms of translating the first query sentence to obtain the second query sentence, the processing unit 302 is specifically configured to:
获取医学知识图谱库中与所述第一查询语句对应的第一医学知识图谱;Acquiring a first medical knowledge graph corresponding to the first query sentence in the medical knowledge graph database;
根据所述第一医学知识图谱,对所述第一查询语句进行翻译,得到所述第二查询语句。According to the first medical knowledge graph, the first query sentence is translated to obtain the second query sentence.
在一些可能的实施方式中,在根据所述第一医学知识图谱,对所述第一查询语句进行翻译,得到所述第二查询语句方面,处理单元302,具体用于:In some possible implementation manners, in terms of translating the first query sentence according to the first medical knowledge graph to obtain the second query sentence, the processing unit 302 is specifically configured to:
对所述第一医学知识图谱进行向量化,得到所述第一医学知识图谱对应的第一特征向量;Vectorizing the first medical knowledge graph to obtain a first feature vector corresponding to the first medical knowledge graph;
对所述第一查询语句中的每个单词进行词嵌入处理,得到所述每个单词对应的词向量;Performing word embedding processing on each word in the first query sentence to obtain a word vector corresponding to each word;
对所述每个单词对应的词向量进行语义特征提取,得到所述第一查询语句对应的第二特征向量;Performing semantic feature extraction on the word vector corresponding to each word to obtain a second feature vector corresponding to the first query sentence;
将所述第一查询语句对应的第二特征向量与所述第一特征向量进行拼接,得到第一目标特征向量;Splicing a second feature vector corresponding to the first query sentence with the first feature vector to obtain a first target feature vector;
根据所述第一目标特征向量进行翻译,得到所述第二查询语句。Translate according to the first target feature vector to obtain the second query sentence.
在一些可能的实施方式中,在根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度方面,处理单元302,具体用于:In some possible implementation manners, according to the first query sentence and the title and abstract of each first medical document, it is determined that the first query sentence is similar to the first target corresponding to each first medical document. In terms of degree, the processing unit 302 is specifically used for:
分别确定所述第一查询语句与每篇第一医学文献的标题和摘要之间的第一相似度和第二相似度Determine the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document respectively
对所述第一查询语句进行实体识别,得到所述第一查询语句中的实体;Performing entity recognition on the first query sentence to obtain the entity in the first query sentence;
分别对所述每篇第一医学文献的标题和摘要进行识别,得到所述每篇第一医学文献的标题中的实体以及摘要中的实体;Recognizing the title and abstract of each first medical document, respectively, to obtain the entity in the title of each first medical document and the entity in the abstract;
确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第三相似度,以及与所述每篇第一医学文献的摘要中的实体之间的第四相似度;Determine the third degree of similarity between the entity in the first query sentence and the entity in the title of each first medical document, and between the entity in the abstract of each first medical document The fourth degree of similarity;
根据所述第一相似度、所述第二相似度、所述第三相似度以及所述第四相似度,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度。According to the first degree of similarity, the second degree of similarity, the third degree of similarity, and the fourth degree of similarity, determine the first target corresponding to the first query sentence and each of the first medical documents Similarity.
在一些可能的实施方式中,在确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第三相似度,以及与所述每篇第一医学文献的摘要中的实体之间的第四相似度方面,处理单元302,具体用于:In some possible implementations, the third degree of similarity between the entity in the first query sentence and the entity in the title of each first medical document is determined, and the degree of similarity between the entity in the first query sentence and the entity in the title of each first medical document is determined. Regarding the fourth degree of similarity between entities in the abstract of medical documents, the processing unit 302 is specifically configured to:
确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第一杰卡德系数,并将所述第一杰卡德系数作为所述第三相似度;Determine the first Jaccard coefficient between the entity in the first query sentence and the entity in the title of each first medical document, and use the first Jaccard coefficient as the third Similarity
确定所述第一查询语句中的实体与所述每篇第一医学文献的摘要中的实体之间的第二杰卡德系数,并将所述第二杰卡德系数作为所述第四相似度。Determine the second Jaccard coefficient between the entity in the first query sentence and the entity in the abstract of each first medical document, and use the second Jaccard coefficient as the fourth similarity Spend.
在一些可能的实施方式中,在对所述第一查询语句进行实体识别,得到所述第一查询语句中的实体方面,处理单元302,具体用于:In some possible implementation manners, in terms of performing entity recognition on the first query sentence to obtain the entity in the first query sentence, the processing unit 302 is specifically configured to:
对所述第一查询语句中的每个单词进行词嵌入,得到与所述每个单词对应的词向量;Performing word embedding on each word in the first query sentence to obtain a word vector corresponding to each word;
对所述第一医学知识图谱中的各个实体进行向量化,得到所述第一医学知识图谱中的各个实体的特征向量;Vectorize each entity in the first medical knowledge graph to obtain a feature vector of each entity in the first medical knowledge graph;
确定所述每个单词对应的词向量与所述第一医学知识图谱中的各个实体的特征向量之 间的权重系数;Determining the weight coefficient between the word vector corresponding to each word and the feature vector of each entity in the first medical knowledge graph;
根据所述权重系数对所述第一医学知识图谱中的各个实体的特征向量进行加权,得到所述每个单词对应的目标词向量;Weighting the feature vector of each entity in the first medical knowledge graph according to the weight coefficient to obtain the target word vector corresponding to each word;
根据所述每个单词对应的目标词向量对所述每个单词进行分类,得到所述第一查询语句中的实体。Classify each word according to the target word vector corresponding to each word to obtain the entity in the first query sentence.
在一些可能的实施方式中,在根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献方面,处理单元302,具体用于:In some possible implementation manners, in terms of determining the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity, the processing unit 302 is specifically configured to:
将最大第一目标相似度对应的第一医学文献作为一篇目标医学文献;Take the first medical document corresponding to the largest first target similarity as a target medical document;
将最大第二目标相似度对应的第二医学文献作为另外一篇目标医学文献;Take the second medical document corresponding to the largest second target similarity as another target medical document;
将两篇目标医学文献作为与所述第一查询语句对应的目标医学文献。Two target medical documents are taken as target medical documents corresponding to the first query sentence.
参阅图4,图4为本申请实施例提供的一种电子设备的结构示意图。该电子设备包括:处理器,所述处理器与存储器相连,所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以使得所述电子设备执行上述方法。例如,如图4所示,电子设备400包括收发器401、处理器402和存储器403。它们之间通过总线404连接。存储器403用于存储计算机程序和数据,并可以将存储403存储的数据传输给处理器402。Refer to FIG. 4, which is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device includes a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device executes the above method. For example, as shown in FIG. 4, the
处理器402用于读取存储器403中的计算机程序执行以下操作:The
获取第一查询语句;Get the first query statement;
对所述第一查询语句进行翻译,得到第二查询语句,其中,所述第一查询语句和所述第二查询语句的语言类型不同;Translating the first query sentence to obtain a second query sentence, wherein the language types of the first query sentence and the second query sentence are different;
根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度,其中,所述每篇第一医学文献的语言类型与所述第一查询语句相同;According to the first query sentence and the title and abstract of each first medical document, the first target similarity corresponding to the first query sentence and each first medical document is determined, wherein the first target similarity of each first medical document is determined. The language type of a medical document is the same as the first query sentence;
根据所述第二查询语句以及每篇第二医学文献的标题和摘要,确定所述第二查询语句与所述每篇第二医学文献对应的第二目标相似度,其中,所述每篇第二医学文献的语言类型与所述第二查询语句相同;According to the second query sentence and the title and abstract of each second medical document, the second target similarity corresponding to the second query sentence and each second medical document is determined. 2. The language type of the medical document is the same as the second query sentence;
根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献。According to the first target similarity and the second target similarity, a target medical document corresponding to the first query sentence is determined.
在一些可能的实施方式中,在对所述第一查询语句进行翻译,得到第二查询语句方面,处理器402具体用于执行以下操作:In some possible implementation manners, in terms of translating the first query sentence to obtain the second query sentence, the
获取医学知识图谱库中与所述第一查询语句对应的第一医学知识图谱;Acquiring a first medical knowledge graph corresponding to the first query sentence in the medical knowledge graph database;
根据所述第一医学知识图谱,对所述第一查询语句进行翻译,得到所述第二查询语句。According to the first medical knowledge graph, the first query sentence is translated to obtain the second query sentence.
在一些可能的实施方式中,在根据所述第一医学知识图谱,对所述第一查询语句进行翻译,得到所述第二查询语句方面,处理器402具体用于执行以下操作:In some possible implementation manners, in terms of translating the first query sentence according to the first medical knowledge graph to obtain the second query sentence, the
对所述第一医学知识图谱进行向量化,得到所述第一医学知识图谱对应的第一特征向量;Vectorizing the first medical knowledge graph to obtain a first feature vector corresponding to the first medical knowledge graph;
对所述第一查询语句中的每个单词进行词嵌入处理,得到所述每个单词对应的词向量;Performing word embedding processing on each word in the first query sentence to obtain a word vector corresponding to each word;
对所述每个单词对应的词向量进行语义特征提取,得到所述第一查询语句对应的第二特征向量;Performing semantic feature extraction on the word vector corresponding to each word to obtain a second feature vector corresponding to the first query sentence;
将所述第一查询语句对应的第二特征向量与所述第一特征向量进行拼接,得到第一目标特征向量;Splicing a second feature vector corresponding to the first query sentence with the first feature vector to obtain a first target feature vector;
根据所述第一目标特征向量进行翻译,得到所述第二查询语句。Translate according to the first target feature vector to obtain the second query sentence.
在一些可能的实施方式中,在根据所述第一查询语句以及每篇第一医学文献的标题和摘要,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度方面,处理器402具体用于执行以下操作:In some possible implementation manners, according to the first query sentence and the title and abstract of each first medical document, it is determined that the first query sentence is similar to the first target corresponding to each first medical document. In terms of degrees, the
分别确定所述第一查询语句与每篇第一医学文献的标题和摘要之间的第一相似度和第二相似度Determine the first similarity and the second similarity between the first query sentence and the title and abstract of each first medical document respectively
对所述第一查询语句进行实体识别,得到所述第一查询语句中的实体;Performing entity recognition on the first query sentence to obtain the entity in the first query sentence;
分别对所述每篇第一医学文献的标题和摘要进行识别,得到所述每篇第一医学文献的标题中的实体以及摘要中的实体;Recognizing the title and abstract of each first medical document, respectively, to obtain the entity in the title of each first medical document and the entity in the abstract;
确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第三相似度,以及与所述每篇第一医学文献的摘要中的实体之间的第四相似度;Determine the third degree of similarity between the entity in the first query sentence and the entity in the title of each first medical document, and between the entity in the abstract of each first medical document The fourth degree of similarity;
根据所述第一相似度、所述第二相似度、所述第三相似度以及所述第四相似度,确定所述第一查询语句与所述每篇第一医学文献对应的第一目标相似度。According to the first degree of similarity, the second degree of similarity, the third degree of similarity, and the fourth degree of similarity, determine the first target corresponding to the first query sentence and each of the first medical documents Similarity.
在一些可能的实施方式中,在确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第三相似度,以及与所述每篇第一医学文献的摘要中的实体之间的第四相似度方面,处理器402具体用于执行以下操作:In some possible implementation manners, the third degree of similarity between the entity in the first query sentence and the entity in the title of each first medical document is determined, and the third similarity between the entity in the first query sentence and the entity in the title of each first medical document is Regarding the fourth degree of similarity between entities in the abstract of the medical document, the
确定所述第一查询语句中的实体与所述每篇第一医学文献中的标题中的实体之间的第一杰卡德系数,并将所述第一杰卡德系数作为所述第三相似度;Determine the first Jaccard coefficient between the entity in the first query sentence and the entity in the title of each first medical document, and use the first Jaccard coefficient as the third Similarity
确定所述第一查询语句中的实体与所述每篇第一医学文献的摘要中的实体之间的第二杰卡德系数,并将所述第二杰卡德系数作为所述第四相似度。Determine the second Jaccard coefficient between the entity in the first query sentence and the entity in the abstract of each first medical document, and use the second Jaccard coefficient as the fourth similarity Spend.
在一些可能的实施方式中,在对所述第一查询语句进行实体识别,得到所述第一查询语句中的实体方面,处理器402具体用于执行以下操作:In some possible implementation manners, in terms of performing entity recognition on the first query sentence to obtain the entity in the first query sentence, the
对所述第一查询语句中的每个单词进行词嵌入,得到与所述每个单词对应的词向量;Performing word embedding on each word in the first query sentence to obtain a word vector corresponding to each word;
对所述第一医学知识图谱中的各个实体进行向量化,得到所述第一医学知识图谱中的各个实体的特征向量;Vectorize each entity in the first medical knowledge graph to obtain a feature vector of each entity in the first medical knowledge graph;
确定所述每个单词对应的词向量与所述第一医学知识图谱中的各个实体的特征向量之间的权重系数;Determining a weight coefficient between the word vector corresponding to each word and the feature vector of each entity in the first medical knowledge graph;
根据所述权重系数对所述第一医学知识图谱中的各个实体的特征向量进行加权,得到所述每个单词对应的目标词向量;Weighting the feature vector of each entity in the first medical knowledge graph according to the weight coefficient to obtain the target word vector corresponding to each word;
根据所述每个单词对应的目标词向量对所述每个单词进行分类,得到所述第一查询语句中的实体。Classify each word according to the target word vector corresponding to each word to obtain the entity in the first query sentence.
在一些可能的实施方式中,在根据所述第一目标相似度以及所述第二目标相似度,确定与所述第一查询语句对应的目标医学文献方面,处理器402具体用于执行以下操作:In some possible implementation manners, in terms of determining the target medical document corresponding to the first query sentence according to the first target similarity and the second target similarity, the
将最大第一目标相似度对应的第一医学文献作为一篇目标医学文献;Take the first medical document corresponding to the largest first target similarity as a target medical document;
将最大第二目标相似度对应的第二医学文献作为另外一篇目标医学文献;Take the second medical document corresponding to the largest second target similarity as another target medical document;
将两篇目标医学文献作为与所述第一查询语句对应的目标医学文献。Two target medical documents are taken as target medical documents corresponding to the first query sentence.
具体地,上述收发器401可为图3所述的实施例的医学文献检索装置300的收发单元301,上述处理器402可以为图3所述的实施例的医学文献检索装置300的处理单元302。Specifically, the
应理解,本申请中的医学文献检索装置可以包括智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)或穿戴式设备等。上述医学文献检索装置仅是举例,而非穷举,包含但不限于上述医学文献检索装置。在实际应用中,上述医学文献检索装置还可以包括:智能车载终端、计算机设备,等等。It should be understood that the medical document retrieval device in this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, notebook computers, and mobile Internet Devices (MID). ) Or wearable devices, etc. The aforementioned medical document retrieval device is only an example, not an exhaustive list, and includes but not limited to the aforementioned medical document retrieval device. In practical applications, the aforementioned medical document retrieval device may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施例中记载的任何一种医学文献检索方法的部分或全部步骤。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to realize any medical document retrieval as recorded in the above method embodiment Part or all of the steps of the method.
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种医学文献检索方法的部分或全部步骤。The embodiments of the present application also provide a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute the method described in the above method embodiment Part or all of the steps of any kind of medical literature retrieval method.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by this application.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011152153.XA CN112287217B (en) | 2020-10-23 | 2020-10-23 | Medical document retrieval method, medical document retrieval device, electronic equipment and storage medium |
| CN202011152153.X | 2020-10-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021179688A1 true WO2021179688A1 (en) | 2021-09-16 |
Family
ID=74424965
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/131810 Ceased WO2021179688A1 (en) | 2020-10-23 | 2020-11-26 | Medical literature retrieval method and apparatus, electronic device, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN112287217B (en) |
| WO (1) | WO2021179688A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114446431A (en) * | 2022-01-30 | 2022-05-06 | 中国医学科学院医学信息研究所 | Method and device for selecting annotating personnel of professional data and electronic equipment |
| CN114637855A (en) * | 2022-03-09 | 2022-06-17 | 腾讯科技(深圳)有限公司 | Knowledge graph-based searching method and device, computer equipment and storage medium |
| CN115659047A (en) * | 2022-11-11 | 2023-01-31 | 南京汇宁桀信息科技有限公司 | Medical literature retrieval method based on hybrid algorithm |
| CN116775897A (en) * | 2023-05-19 | 2023-09-19 | 魔方医药科技(苏州)有限公司 | Knowledge graph construction and query method and device, electronic equipment and storage medium |
| CN116881436A (en) * | 2023-08-09 | 2023-10-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Knowledge graph-based document retrieval method, system, terminal and storage medium |
| CN117271799A (en) * | 2023-09-25 | 2023-12-22 | 中国电子科技集团公司第十研究所 | Knowledge graph-based multi-round question answering method and system |
| CN119577113A (en) * | 2024-11-19 | 2025-03-07 | 哈尔滨工业大学 | An explicit attribution system of medical knowledge based on medical literature information |
| CN120873154A (en) * | 2025-07-25 | 2025-10-31 | 神州医疗科技股份有限公司 | Auxiliary method and auxiliary device for scientific research in medical science special field |
| CN121053665A (en) * | 2025-10-29 | 2025-12-02 | 同方赛威讯信息技术有限公司 | A feature extraction method, apparatus, device, and medium based on a recognition model |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113326706B (en) * | 2021-06-29 | 2025-02-11 | 北京搜狗科技发展有限公司 | Cross-language search method, device and electronic device |
| CN120030241A (en) * | 2025-04-21 | 2025-05-23 | 上海临床创新转化研究院有限公司 | Literature recommendation method, system and computer program product in the field of clinical research |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040059730A1 (en) * | 2002-09-19 | 2004-03-25 | Ming Zhou | Method and system for detecting user intentions in retrieval of hint sentences |
| CN104850610A (en) * | 2015-05-11 | 2015-08-19 | 均康(上海)信息科技有限公司 | Network search engine system |
| CN106294639A (en) * | 2016-08-01 | 2017-01-04 | 金陵科技学院 | Method is analyzed across the newly property the created anticipation of language patent based on semantic |
| CN108345694A (en) * | 2018-03-19 | 2018-07-31 | 华北电力大学(保定) | A kind of document retrieval method and system based on subject data base |
| CN109255121A (en) * | 2018-07-27 | 2019-01-22 | 中山大学 | A kind of across language biomedicine class academic paper information recommendation method based on theme class |
| CN109740168A (en) * | 2019-01-09 | 2019-05-10 | 北京邮电大学 | A translation method of traditional Chinese medicine classics based on TCM knowledge graph and attention mechanism |
| CN110795541A (en) * | 2019-08-23 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Text query method and device, electronic equipment and computer readable storage medium |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2691326A1 (en) * | 2010-01-28 | 2011-07-28 | Ibm Canada Limited - Ibm Canada Limitee | Integrated automatic user support and assistance |
| US8914395B2 (en) * | 2013-01-03 | 2014-12-16 | Uptodate, Inc. | Database query translation system |
| CN108304412B (en) * | 2017-01-13 | 2022-09-30 | 北京搜狗科技发展有限公司 | Cross-language search method and device for cross-language search |
| CN107992630A (en) * | 2017-12-26 | 2018-05-04 | 医渡云(北京)技术有限公司 | Medical data retrieval method and device, storage medium, electronic equipment |
| CN110489751B (en) * | 2019-08-13 | 2024-12-31 | 腾讯科技(深圳)有限公司 | Text similarity calculation method and device, storage medium, and electronic device |
-
2020
- 2020-10-23 CN CN202011152153.XA patent/CN112287217B/en active Active
- 2020-11-26 WO PCT/CN2020/131810 patent/WO2021179688A1/en not_active Ceased
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040059730A1 (en) * | 2002-09-19 | 2004-03-25 | Ming Zhou | Method and system for detecting user intentions in retrieval of hint sentences |
| CN104850610A (en) * | 2015-05-11 | 2015-08-19 | 均康(上海)信息科技有限公司 | Network search engine system |
| CN106294639A (en) * | 2016-08-01 | 2017-01-04 | 金陵科技学院 | Method is analyzed across the newly property the created anticipation of language patent based on semantic |
| CN108345694A (en) * | 2018-03-19 | 2018-07-31 | 华北电力大学(保定) | A kind of document retrieval method and system based on subject data base |
| CN109255121A (en) * | 2018-07-27 | 2019-01-22 | 中山大学 | A kind of across language biomedicine class academic paper information recommendation method based on theme class |
| CN109740168A (en) * | 2019-01-09 | 2019-05-10 | 北京邮电大学 | A translation method of traditional Chinese medicine classics based on TCM knowledge graph and attention mechanism |
| CN110795541A (en) * | 2019-08-23 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Text query method and device, electronic equipment and computer readable storage medium |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114446431A (en) * | 2022-01-30 | 2022-05-06 | 中国医学科学院医学信息研究所 | Method and device for selecting annotating personnel of professional data and electronic equipment |
| CN114637855A (en) * | 2022-03-09 | 2022-06-17 | 腾讯科技(深圳)有限公司 | Knowledge graph-based searching method and device, computer equipment and storage medium |
| CN114637855B (en) * | 2022-03-09 | 2025-11-25 | 腾讯科技(深圳)有限公司 | Knowledge graph-based search methods, devices, computer equipment, and storage media |
| CN115659047A (en) * | 2022-11-11 | 2023-01-31 | 南京汇宁桀信息科技有限公司 | Medical literature retrieval method based on hybrid algorithm |
| CN115659047B (en) * | 2022-11-11 | 2023-07-28 | 南京汇宁桀信息科技有限公司 | Medical document retrieval method based on hybrid algorithm |
| CN116775897A (en) * | 2023-05-19 | 2023-09-19 | 魔方医药科技(苏州)有限公司 | Knowledge graph construction and query method and device, electronic equipment and storage medium |
| CN116881436A (en) * | 2023-08-09 | 2023-10-13 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Knowledge graph-based document retrieval method, system, terminal and storage medium |
| CN117271799A (en) * | 2023-09-25 | 2023-12-22 | 中国电子科技集团公司第十研究所 | Knowledge graph-based multi-round question answering method and system |
| CN119577113A (en) * | 2024-11-19 | 2025-03-07 | 哈尔滨工业大学 | An explicit attribution system of medical knowledge based on medical literature information |
| CN120873154A (en) * | 2025-07-25 | 2025-10-31 | 神州医疗科技股份有限公司 | Auxiliary method and auxiliary device for scientific research in medical science special field |
| CN121053665A (en) * | 2025-10-29 | 2025-12-02 | 同方赛威讯信息技术有限公司 | A feature extraction method, apparatus, device, and medium based on a recognition model |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112287217A (en) | 2021-01-29 |
| CN112287217B (en) | 2023-08-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021179688A1 (en) | Medical literature retrieval method and apparatus, electronic device, and storage medium | |
| CN111090987B (en) | Method and apparatus for outputting information | |
| US20220121824A1 (en) | Method for determining text similarity, method for obtaining semantic answer text, and question answering method | |
| WO2020207431A1 (en) | Document classification method, apparatus and device, and storage medium | |
| CN112749547A (en) | Generation of text classifier training data | |
| CN110516260A (en) | Entity recommended method, device, storage medium and equipment | |
| CN103026356A (en) | Semantic content searching | |
| WO2021179693A1 (en) | Medical text translation method and device, and storage medium | |
| CN109710952B (en) | Translation history retrieval method, device, equipment and medium based on artificial intelligence | |
| CN107992477A (en) | Text subject determines method, apparatus and electronic equipment | |
| WO2021189920A1 (en) | Medical text cluster subject matter determination method and apparatus, electronic device, and storage medium | |
| WO2021190662A1 (en) | Medical text sorting method and apparatus, electronic device, and storage medium | |
| CN112347758A (en) | Text abstract generation method and device, terminal equipment and storage medium | |
| WO2021159812A1 (en) | Cancer staging information processing method and apparatus, and storage medium | |
| CN119621954B (en) | Semantic search method and storage medium for identifying user intention based on AI | |
| CN112487827A (en) | Question answering method, electronic equipment and storage device | |
| CN111126084B (en) | Data processing method, device, electronic equipment and storage medium | |
| CN113297852A (en) | Medical entity word recognition method and device | |
| CN116402045A (en) | Entity recognition model training method, entity recognition method and related equipment | |
| CN114912452B (en) | Entity identification and information extraction method and device | |
| CN114707497A (en) | Cross Transformer Chinese medical named entity recognition method based on multi-source dictionary | |
| CN119938946A (en) | A document knowledge element extraction method, device and medium based on large model | |
| CN112597299A (en) | Text entity classification method and device, terminal equipment and storage medium | |
| CN112287134A (en) | Retrieval model training and recognition method, electronic device and storage medium | |
| CN114239578B (en) | Named entity recognition method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20924048 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20924048 Country of ref document: EP Kind code of ref document: A1 |