CN117609635A

CN117609635A - Collaborative filtering-based data pushing method and device

Info

Publication number: CN117609635A
Application number: CN202311668201.4A
Authority: CN
Inventors: 杜梦豪; 胡耀国
Original assignee: Heyu Health Technology Co ltd
Current assignee: Heyu Health Technology Co ltd
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-02-27

Abstract

The invention discloses a collaborative filtering-based data pushing method and a collaborative filtering-based data pushing device, which are characterized in that data preprocessing is carried out on medical detection data input by a user through a text analysis model to obtain first medical data, text classification processing and clinical diagnosis relation determination are sequentially carried out on the first medical data based on a text classification algorithm and a relation extraction algorithm, similarity coefficients between the clinical data and the diagnosis data are output, the first medical data are screened according to the similarity coefficients to output a diagnosis candidate set, recommended images are carried out on the first medical data based on a medical knowledge base, screening and sorting are carried out on the candidate diagnosis set through the recommended images according to the collaborative filtering algorithm, and a recommendation result is output. According to the invention, the input data is subjected to one-round screening through data preprocessing, so that the error data in the input data is eliminated, the similarity coefficient between the diagnosis data and the clinical data is determined through text classification and relation extraction, and the individuation degree and the coincidence degree of the recommended result are improved.

Description

Collaborative filtering-based data pushing method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a collaborative filtering-based data pushing method and device.

Background

With the development of artificial intelligence technology, new algorithm models are continuously proposed and developed in the fields of deep learning, knowledge graph, representation learning, graph calculation and the like of enterprises and higher institutions in the field of artificial intelligence at home and abroad, and technical support is provided for intelligent application of medical scenes. In the medical field, hospitals have also begun to develop information at a high rate. The advanced technologies such as medical big data, artificial Intelligence (AI) and the like are applied to the clinical and scientific fields by hospitals, a new technology and a new method of artificial intelligence treatment are promoted, a rapid and accurate intelligent medical system is established, and intelligent hospital construction is explored.

At present, aiming at a clinical decision-making auxiliary system, an existing knowledge base and a rule base are mostly used, and a rule inference machine is used for realizing disease decision. This decision is largely dependent on the establishment of an initial expert database and requires normalization of the data, otherwise problems with no output or deviation of the results are faced. However, the above system data needs to be imported manually, depending on the data in the HIS system, the medical records information in the real world cannot be shared, and meanwhile, the above system cannot consider the experience of the doctor, and the push content personalization cannot be realized.

Disclosure of Invention

The invention provides a collaborative filtering-based data pushing method and a collaborative filtering-based data pushing device, which are used for realizing the technical effect of personalized pushing by a system pushing scheme.

In order to solve the technical problems, the invention provides a collaborative filtering-based data pushing method, which comprises the following steps:

acquiring medical detection data input by a user, and calling a preset text analysis model to perform data preprocessing on the medical detection data to acquire first medical data;

sequentially carrying out text classification and clinical diagnosis relation determination on the first medical data based on a preset text classification algorithm and a relation extraction algorithm to obtain a similarity coefficient between clinical data and diagnosis data in the first medical data, and screening the first medical data according to the similarity coefficient to obtain a diagnosis candidate set;

recommending the first medical data based on a preset medical knowledge base to obtain corresponding recommended images, screening and sorting the diagnosis candidate sets based on a collaborative filtering algorithm according to the recommended images to obtain and push corresponding recommended results.

After medical monitoring data input by a user, namely doctor complaints, past history, current medical history, examination monitoring and other data of a patient are obtained, the data are preprocessed through a text analysis model preset in the system, so that the text classification and relation extraction of the first medical data obtained after processing are facilitated for a subsequent system. Meanwhile, when the data is subjected to data preprocessing firstly before the data is subjected to formal processing, the system also performs one-round screening on the data when the data preprocessing step is performed, so that whether error data exist in the data is determined, the accuracy of the output push data is improved, and meanwhile, the probability of faults when the follow-up system processes the data due to excessive error data is also reduced.

After the first medical data is obtained, the system carries out text classification and clinical diagnosis relation determination on the first medical data based on a text classification algorithm and a relation extraction algorithm to obtain a similarity coefficient between the corresponding clinical data and the diagnosis data, and then screens the first medical data according to the similarity coefficient to obtain a diagnosis candidate set. The system divides various texts in the first medical data into various data sets including clinical data sets and diagnostic data sets according to a certain classification mode through a text classification algorithm in the two methods, then further determines the association relation between various data in the two data sets through a relation extraction algorithm, screens the first medical data according to the determined similarity coefficient, and further improves the accuracy and individuation degree of the output recommended result, so that the output recommended result accords with the case situation input by a user.

Meanwhile, after the diagnosis candidate set is obtained through screening, the system also carries out recommendation portraits on the contents of each case in the first medical data based on each item of historical medical data in the medical knowledge base, so as to obtain various different recommendation portraits, and then carries out further screening and sorting on various corresponding diagnosis candidate sets through various recommendation portraits, so that the authority and practicality of the output recommendation result are improved, the recommendation result is more in accordance with the actual condition of the case input by the user, and the individuation degree of the recommendation result is also improved, so that the recommendation result is more in accordance with the actual requirement of the case.

As a preferred example, the retrieving a preset text analysis model performs data preprocessing on the medical detection data to obtain first medical data, which specifically includes:

based on the medical knowledge base, sequentially performing word segmentation processing and keyword recognition processing on the medical detection data through a word segmentation model and a named entity recognition model to obtain corresponding medical detection knowledge information;

and carrying out relation recognition and determination on the medical detection knowledge information based on a medical rule base to obtain the association relation between each entity in the medical detection knowledge information, and integrating the medical detection knowledge information according to the determined association relation to obtain the first medical data.

In order to further improve the accuracy of the recommended result of the system pushing, the data pushing method provided by the invention also carries out data preprocessing on the medical data before carrying out formal processing on the medical data, and the preprocessing on the data comprises word segmentation processing and keyword recognition processing on the data, so that the contents of a large amount of complicated text documents in the medical data are simplified, and important information including medical knowledge information such as symptoms, signs and the like, namely medical detection knowledge, is extracted and obtained.

And meanwhile, the system also identifies and determines the relationship of the extracted medical detection knowledge through the medical rule base, namely, identifies and determines the relationship among various entities extracted from the medical detection knowledge, identifies and determines the relationship of the entities based on various medical diagnosis rules preset in the medical database, and integrates the identified relationship and the entities to generate corresponding first medical data to be provided for subsequent data processing of the system.

As a preferred example, the text classification and clinical diagnosis relation determination are performed on the first medical data based on a preset text classification algorithm and a relation extraction algorithm, so as to obtain a similarity coefficient between clinical data and diagnosis data in the first medical data, which specifically includes:

invoking the text classification algorithm to perform knowledge feature normalization and multi-label type recognition on the first medical data according to the medical knowledge base, and determining the disease category of each case in the first medical data;

and meanwhile, identifying and determining the relationship between the diagnosis data and the clinical data of the case in the first medical data through the relationship extraction algorithm, and carrying out frequency statistics on the identified and determined relationship to obtain relationship frequency data, and further obtaining the similarity coefficient according to the preset diagnosis recommendation correlation matrix.

After the first medical data is output, the system performs formal data processing on the first medical data, and firstly performs knowledge feature normalization and multi-label type identification on the data through a text classification algorithm on the first medical data, namely, determines the disease types of all cases in the medical data, including the disease types of internal medicine, surgery or pediatrics and the like.

Then, the system also carries out relationship identification determination and frequency statistics determination on the diagnosis data and the clinical data in the medical data through a relationship extraction algorithm, and determines the relationship between the diagnosis data and the clinical manifestation through the frequency statistics data, so that the corresponding relationship between the clinical manifestation and the diagnosis result is determined, and the accuracy of the follow-up similarity coefficient is improved.

As a preferable example, the recommending image is performed on the first medical data based on a preset medical knowledge base, and a corresponding recommending image is obtained, which specifically includes:

the historical case data is called from the medical knowledge base, and case portrayal is carried out on the user case data in the first medical data according to the historical case data, so that case recommendation portrayal is obtained;

retrieving historical test data from the medical knowledge base, and performing test portrait on the test data in the first medical data according to the historical test data to obtain a test recommended portrait;

And retrieving historical prescription data from the medical knowledge base, and recommending and portraying prescription drug data and auxiliary treatment data in the first medical data according to the historical prescription data to obtain prescription recommendation data and auxiliary recommendation data.

In order to further improve the consistency and accuracy of a recommendation method pushed by a system, the data pushing method provided by the invention further carries out recommendation portraits on various data in the first medical data based on a medical knowledge base preset in the system after calculating and determining the similarity coefficient, and provides reference data for screening and outputting recommendation results.

The system divides various data in the first medical data into user case data, detection and inspection data, prescription medication data and auxiliary treatment data, and calls corresponding historical medical data from a medical knowledge base according to the classification mode to conduct recommended portraits, the authority and expert diagnosis experience of the recommended portraits are improved through expert diagnosis cases in the medical knowledge base, meanwhile, the medical knowledge base also comprises probability statistical results, seasonal features and current social morbidity features of current season epidemic diseases, the various data in the first medical data are recommended portraits through the feature content, the consistency of the output recommended portraits and the actual conditions of the input cases of the user is improved, and the output recommended results are more consistent with the actual conditions of the cases.

As a preferable example, before the obtaining of the medical detection data input by the user, the method further includes:

acquiring and acquiring medical knowledge to be processed, performing text preprocessing on the medical knowledge to be processed through a word segmentation algorithm, a part-of-speech labeling algorithm and a dependency syntax analysis algorithm to obtain a first keyword, and simultaneously calling a preset deep learning model to perform named entity recognition on the medical knowledge to be processed to obtain a first knowledge point;

the word vector similarity algorithm is called to perform normalization and standardization processing on the first keywords and the first knowledge points, and corresponding standard knowledge entities are output and used as characteristic entities to perform subject word extraction in the medical knowledge to be processed to obtain corresponding first subject words;

and carrying out relationship identification labeling on the entities in the first knowledge points to obtain a first entity relationship, and further constructing the medical knowledge base based on the pre-training model according to the first subject term, the first entity relationship and the standard knowledge entity.

In order to further improve accuracy and practicability of the recommended result output by the system, the data pushing method provided by the invention also provides a construction method of the medical knowledge base. The medical knowledge to be processed is acquired, text preprocessing is carried out on the medical knowledge to be processed through a word segmentation algorithm, a part-of-speech tagging algorithm and a dependency syntax analysis algorithm to obtain a first keyword, and clinical manifestations such as various disease names, examination names, treatment schemes, symptoms, signs and the like are included, so that the accuracy and the specificity of the first keyword are improved. And meanwhile, the named entity recognition is carried out on the medical knowledge to be processed through the deep learning model, a first knowledge point is output, the recognition capability of the medical knowledge is improved through the strong learning capability of the deep learning model, and the matching degree between the content of the recognized knowledge point and medical knowledge data is improved.

And then, the system invokes a word vector similarity algorithm to normalize and standardize the first keyword and the first knowledge point, so that the probability of different expressions of similar words in the output standard knowledge entity is reduced, and the subsequent recognition processing and relationship labeling of the system are facilitated. Meanwhile, the system also extracts subject terms from medical knowledge to be processed according to the standard knowledge entity, screens and identifies key terms in the medical knowledge to be processed, avoids the situation that the constructed medical knowledge base has wrong knowledge due to the fact that the key terms are lost, and further reduces the possibility that the system pushes recommended results based on the medical knowledge base to have overlarge variability.

After the first subject word is extracted and obtained, the system invokes the pre-training model to train the first entity relation, the first subject word and the standard knowledge entity obtained by relation labeling, and the scattered knowledge is integrated, extracted and parameter learned through the language characterization learning capability and the characteristic extraction capability with strong multi-layer attention mechanism in the training model, so that the knowledge is territorized. Meanwhile, the medical knowledge base constructed by the pre-training model also realizes the migration and automatic updating of knowledge.

acquiring medical rules to be processed, carrying out structural analysis on the medical rules to be processed through regular expressions, and constructing and outputting a corresponding medical rule base;

meanwhile, acquiring a to-be-processed diagnosis case and a to-be-processed inspection report, performing matrix correlation analysis according to the to-be-processed diagnosis case and the to-be-processed inspection report, and outputting a corresponding diagnosis recommendation correlation matrix.

In order to further improve the practicability and accuracy of the system output recommendation result, the invention also provides a medical rule base construction method and a correlation matrix construction method, and the accuracy and efficiency of the system for data processing on medical data are improved through the constructed medical rule base diagnosis recommendation correlation matrix, so that the efficiency of the system output recommendation result is improved.

Correspondingly, the invention also provides a data pushing device based on collaborative filtering, which comprises a data preprocessing module, a data screening module and a collaborative filtering module;

the data preprocessing module is used for obtaining medical detection data input by a user, and calling a preset text analysis model to perform data preprocessing on the medical detection data to obtain first medical data;

The data screening module is used for sequentially carrying out text classification and clinical diagnosis relation determination on the first medical data based on a preset text classification algorithm and a relation extraction algorithm to obtain a similarity coefficient between clinical data and diagnosis data in the first medical data, and screening the first medical data according to the similarity coefficient to obtain a diagnosis candidate set;

the collaborative filtering module is used for recommending the first medical data based on a preset medical knowledge base to obtain corresponding recommended images, and screening and sorting the diagnosis candidate sets based on a collaborative filtering algorithm according to the recommended images to obtain and push corresponding recommended results.

As a preferred example, the data pushing device further includes a knowledge base construction module;

the knowledge base construction module is used for acquiring and acquiring medical knowledge to be processed, performing text preprocessing on the medical knowledge to be processed through a word segmentation algorithm, a part-of-speech tagging algorithm and a dependency syntactic analysis algorithm to acquire a first keyword, and simultaneously calling a preset deep learning model to perform named entity recognition on the medical knowledge to be processed to acquire a first knowledge point;

As a preferred example, the data pushing device further includes a rule base construction module and a matrix construction module;

the rule base construction module is used for acquiring medical rules to be processed, carrying out structural analysis on the medical rules to be processed through regular expressions, and constructing and outputting a corresponding medical rule base;

the matrix construction module is used for acquiring a diagnosis case to be processed and a test report to be processed, performing matrix correlation analysis according to the diagnosis case to be processed and the test report to be processed, and outputting a corresponding diagnosis recommendation correlation matrix.

As a preferred example, the data preprocessing module invokes a preset text analysis model to perform data preprocessing on the medical detection data, so as to obtain first medical data, which specifically is:

Drawings

Fig. 1: the invention provides a flow diagram of one embodiment of a collaborative filtering-based data pushing method;

fig. 2: the invention provides a structural schematic diagram of an embodiment of a collaborative filtering-based data pushing device;

fig. 3: schematic diagram of one embodiment of a method for inputting expressions for the BERT model provided by the invention;

fig. 4: training flow diagrams for one embodiment of the knowledge-experience model provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, a flow chart of an embodiment of a collaborative filtering-based data pushing method provided by the present invention includes steps 101 to 103, where the steps are specifically as follows:

step 101: medical detection data input by a user are obtained, a preset text analysis model is called, and data preprocessing is carried out on the medical detection data, so that first medical data are obtained.

After obtaining medical monitoring data input by a user, namely doctor complaints, past history of patients, current medical history, examination monitoring and other data, the data pushing method provided by the embodiment of the invention firstly carries out data preprocessing on the data through a text analysis model preset in the system, and is convenient for a subsequent system to carry out text classification and relation extraction on the first medical data obtained after processing. Meanwhile, when the data is subjected to data preprocessing firstly before the data is subjected to formal processing, the system also performs one-round screening on the data when the data preprocessing step is performed, so that whether error data exist in the data is determined, the accuracy of the output push data is improved, and meanwhile, the probability of faults when the follow-up system processes the data due to excessive error data is also reduced.

In this embodiment, the system performs data processing on the medical monitoring data, specifically including two parts, the first part is based on deep learning and suspected diagnosis of the medical knowledge base, and the second part is based on collaborative filtering recommendation.

The first part is based on deep learning and suspected diagnosis of a medical knowledge base, and the system firstly carries out text analysis, namely data preprocessing, on input medical detection data comprising doctor main complaints, past history of patients, current history, examination monitoring and other data to obtain corresponding first medical data.

Specifically, in this embodiment, the retrieving a preset text analysis model performs data preprocessing on the medical detection data to obtain first medical data, which specifically includes:

In order to further improve accuracy of a recommendation result of system pushing, the data pushing method provided by the embodiment of the invention also carries out data preprocessing on medical data before carrying out formal processing on the medical data, wherein the data preprocessing comprises word segmentation processing and keyword recognition processing on the data, so that contents of a large amount of complicated text documents in the medical data are simplified, and important information in the medical data, including medical knowledge information such as symptoms, signs and the like, namely medical detection knowledge is extracted and obtained.

In this embodiment, the system obtains knowledge information such as symptoms and signs of cases through a word segmentation model and a named entity recognition model based on various medical knowledge in a medical knowledge base, and recognizes and confirms the relationship between the examination result and the entity and the relationship between the entity and the entity in the medical monitoring data based on a medical rule base. After the processing, the system integrates the processed results to generate corresponding first medical data for the subsequent system to generate recommended results.

Step 102: and sequentially carrying out text classification and clinical diagnosis relation determination on the first medical data based on a preset text classification algorithm and a relation extraction algorithm to obtain a similarity coefficient between clinical data and diagnosis data in the first medical data, and screening the first medical data according to the similarity coefficient to obtain a diagnosis candidate set.

Exemplary, according to this embodiment, the text classification and clinical diagnosis relation determination are performed on the first medical data based on a preset text classification algorithm and a relation extraction algorithm, so as to obtain a similarity coefficient between clinical data and diagnostic data in the first medical data, which specifically includes:

In this embodiment, after the system obtains the first medical data output in step 101, the system performs knowledge feature normalization on the first medical data based on the medical knowledge base, and performs text classification on the first medical data through a text classification algorithm, so as to implement multi-label type recognition, and determine the disease category attribution of each case in the first medical data, for example: belongs to the medical department, the surgical department or the pediatric department.

After determining the attribution of the disease category of the case, the system firstly learns the relationship category between the diagnosis data and the clinical manifestation in the existing case by a relationship extraction method based on deep learning, and counts the frequency F of the occurrence of the relationship _zl Further, the similarity coefficient S between diagnosis and clinical manifestation is determined based on the diagnosis-recommendation correlation matrix preset in the system _zl 。

Determining the diagnostic clinical frequency F _zl And similarity coefficient S _zl After the disease category of each case is attributed, the system performs primary funnel screening on the first medical data based on the three to obtain a corresponding diagnosis candidate set S ₁ 。

Step 103: recommending the first medical data based on a preset medical knowledge base to obtain corresponding recommended images, screening and sorting the diagnosis candidate sets based on a collaborative filtering algorithm according to the recommended images to obtain and push corresponding recommended results.

Exemplary, in this embodiment, the recommending image is performed on the first medical data based on a preset medical knowledge base, so as to obtain a corresponding recommending image, which specifically includes:

In the present embodiment, the system, when a diagnosis candidate set S is obtained ₁ Then, the user can be subjected to case user portrayal UP based on the cases through various medical knowledge in the medical knowledge base _disease At the same time, the recommended image UP for inspection is also based on the inspection data _inspect Based on prescription data, medication recommendation image UP is performed _drug And an auxiliary treatment portrait UP _therapy . After the portraits are obtained, filtering and screening can be performed according to various data corresponding to the portraits.

First, the system will pass the case user portrayal UP based on collaborative filtering algorithm _disease For suspected diagnosis result candidate set S ₁ Reorder and filter. Because of case user portrayal UP _disease The portrayal data in (1) is expert diagnosis case, so user portrayal UP through the case _disease For suspected diagnosis result candidate set S ₁ The authority of the screened results is improved by screening, and the method has expert experience. In addition, the system will call the probability statistical result, seasonal characteristic and current social disease characteristic of the current season epidemic disease from the medical knowledge base, and the call content will be used as the parameter and case user portrait UP _disease Together with the suspected diagnosis result candidate set S ₁ Sequencing screening is performed, so that the practicability of screening results, namely diagnosis results, is improved, and the results are more close to the current real situation of the case.

Then, the system also extracts and obtains an inspection candidate set S based on the medical knowledge base and the medical rule base according to the screened result, namely the diagnosis result _inspect Drug candidate set S _drug Adjuvant therapyCandidate set S _therapy . After the three portrait results and the corresponding candidate sets are determined, the system can output the last filtering and sorting before the corresponding candidate sets according to the portrait results. The method specifically comprises the following steps: recommended portrayal result UP by inspection _inspect Checking and checking candidate set S based on collaborative filtering algorithm _inspect Screening and sorting are carried out to obtain a final recommended result R _inspect The method comprises the steps of carrying out a first treatment on the surface of the Using a medication recommendation image result UP _drug Drug candidate set R based on collaborative filtering algorithm _drug Screening and sorting are carried out to obtain a final recommended result R _drug The method comprises the steps of carrying out a first treatment on the surface of the Image result UP using adjuvant therapy _therapy Auxiliary treatment candidate set S based on collaborative filtering algorithm _therapy Screening and sorting are carried out to obtain a final recommended result R _therapy . Through the collaborative filtering method, personalized recommendation for different crowds can be realized, including multiple aspects such as drug personalized recommendation, inspection and inspection personalized recommendation, operation treatment personalized recommendation and the like.

In addition, based on different collaborative filtering algorithms, different types of personalized recommendation can be realized. If the diagnosis and treatment scheme is based on the user collaborative filtering algorithm, personalized recommendation of the diagnosis and treatment scheme can be achieved, and special auxiliary recommendation schemes are made for doctor accounts of different hospitals. If the diagnosis is based on the user collaborative filtering algorithm and the project collaborative filtering algorithm, the suspected diagnosis can be subjected to collaborative filtering and recommended, and the transfer of the diagnosis experience of different experts is realized, so that each doctor can acquire the diagnosis experience of the expert. If the collaborative filtering algorithm is based on the user, collaborative filtering recommendation can be carried out on the inspection, the correlation between different inspection is mined, joint recommendation is realized, and omission of the correlation inspection is prevented.

The collaborative filtering algorithm is mainly implemented based on a user similarity matrix and an article similarity matrix, and the embodiment also provides a diagnosis filtering scheme based on user collaborative filtering and project collaborative filtering and an auxiliary treatment recommending scheme based on user collaborative filtering.

The specific implementation method of the diagnosis filtering scheme based on the user collaborative filtering and the project collaborative filtering is as follows:

firstly, acquiring diagnosis cases of different departments and different doctors in a hospital, analyzing the acquired diagnosis cases, extracting symptoms, signs, department categories and relationships between entities in the cases through text analysis, named entity recognition and relationship extraction methods, realizing multiple classification based on the symptoms, signs and entity relationships, and deciding the general disease types and departments of the cases.

And secondly, constructing a feature similarity matrix (wherein features are diagnosis features) based on the extracted symptoms, signs and entity relations, and constructing a department-diagnosis relation matrix based on doctors' departments and diagnosis results.

And thirdly, performing first filtering and recommending on the diagnosis cases according to the feature similarity matrix constructed in the second step, and then inputting the first recommending result into a department-diagnosis relation matrix to perform second filtering and recommending, and further outputting a second recommending result.

The specific implementation steps of the auxiliary treatment recommendation scheme based on the collaborative filtering of the user are as follows:

first, the diagnosis result information of all doctors including medicine recommendation, examination recommendation, etc. and doctor information such as the department, the field of art, the hospital, etc. are obtained.

And secondly, constructing a doctor similarity matrix based on the doctor information.

Thirdly, recommending medicines according to authority data aiming at disease types, checking, assisting in treatment and the like, and outputting corresponding initial schemes.

And fourthly, collaborative filtering is carried out on the initial scheme output in the third step based on the doctor similarity matrix, and new doctor treatment schemes are continuously collected, so that experience sharing is realized.

Fifthly, digging relations among different examination and inspection, relations among diseases and examination and inspection, relations among medicines and relations among diseases, realizing expansion of a treatment scheme and compensating for missing problems possibly occurring in the prescription making process.

In addition, the embodiment also provides a method for constructing a medical knowledge base, which comprises the following specific implementation steps:

In order to further improve accuracy and practicality of the recommended result output by the system, the data pushing method provided by the embodiment of the invention further provides a method for constructing a medical knowledge base. The medical knowledge to be processed is acquired, text preprocessing is carried out on the medical knowledge to be processed through a word segmentation algorithm, a part-of-speech tagging algorithm and a dependency syntax analysis algorithm to obtain a first keyword, and clinical manifestations such as various disease names, examination names, treatment schemes, symptoms, signs and the like are included, so that the accuracy and the specificity of the first keyword are improved. And meanwhile, the named entity recognition is carried out on the medical knowledge to be processed through the deep learning model, a first knowledge point is output, the recognition capability of the medical knowledge is improved through the strong learning capability of the deep learning model, and the matching degree between the content of the recognized knowledge point and medical knowledge data is improved.

In this embodiment, before the system formally builds the medical knowledge base, a great amount of medical knowledge needs to be acquired first, including acquiring general medical knowledge content on the network by adopting a crawler technology, performing text recognition on medical documents and books based on an OCR technology, acquiring professional basic knowledge, and acquiring various medical knowledge such as case diagnosis, inspection report and the like of a doctor by the medical system through the HIS system. After the various medical knowledge is obtained, the system performs word segmentation, part of speech tagging, dependency syntactic analysis and other modes on the obtained knowledge through a jieba word segmentation tool and an Ltp word segmentation tool to obtain corresponding first keywords, and simultaneously performs named entity recognition on the text through a Bilstm-CRF deep learning model to recognize that specific knowledge point content comprises medical knowledge such as clinical manifestations of disease names, examination names, treatment schemes, symptoms, physical signs and the like, so as to obtain first knowledge points.

After the medical knowledge is obtained, the system adopts a fastatex word vector similarity method to normalize and normalize the words and knowledge points to obtain standard knowledge entities, such as normalization of 'chronic obstructive pulmonary disease' to 'chronic obstructive pulmonary disease', thereby realizing standardization of entity naming and improving the processing efficiency of the system when the subsequent system further processes data. After standardization, the system takes the output standard knowledge entity as a characteristic entity, extracts a text, namely a keyword and a subject word in medical knowledge to be processed, as main description of the text paragraph, namely the output first subject word based on statistical learning methods such as lda, tf-idf and texttrank.

Further, the system marks the relationship between the entities in the cases based on the first knowledge point, namely the entity identified from the medical knowledge cases, acquires relationship mark data, learns the relationship between the entities through a text classification model, and stores the learned relationship into a pre-training model of the system for later use.

In order to integrate the knowledge entities, the system combines knowledge, disambiguates knowledge and network knowledge acquired by crawler technology, and stores the disambiguated knowledge in a knowledge map in the form of images, and stores the subject words, the keywords and the paragraphs in an elastic search database in the form of documents for later use.

In addition, in order to enrich training data of the pre-training model, the embodiment also provides a method for constructing a medical rule base and a diagnosis recommendation correlation matrix.

The construction method of the medical rule base specifically comprises the following steps: acquiring medical rules to be processed, carrying out structural analysis on the medical rules to be processed through regular expressions, and constructing and outputting a corresponding medical rule base.

The method for constructing the diagnosis recommendation correlation matrix specifically comprises the following steps: acquiring a to-be-processed diagnosis case and a to-be-processed inspection report, performing matrix correlation analysis according to the to-be-processed diagnosis case and the to-be-processed inspection report, and outputting a corresponding diagnosis recommendation correlation matrix.

In order to further improve the practicability and accuracy of the system output recommendation result, the embodiment of the invention also provides a medical rule base construction method and a correlation matrix construction method, and the accuracy and efficiency of data processing of the system on medical data are improved through the constructed medical rule base diagnosis recommendation correlation matrix, so that the efficiency of the system output recommendation result is improved.

In this embodiment, the system is required to take medical diagnosis rules and medical recommendation rules written by medical researchers to construct a medical rule base, and structural analysis is performed on the rules through regular expressions, so that the medical rule base in different fields is constructed for suspected diagnosis, inspection and inspection recommendation of diseases, and medication is reasonable. And constructing a diagnosis recommendation correlation matrix requires a system to acquire data such as diagnosis cases and examination reports acquired from the HIS medical system, performs matrix correlation analysis on correlation rules such as user-diagnosis and diagnosis-recommendation, performs type labeling on the cases and examination reports, realizes case classification and examination classification by using a logistic regression method, and further outputs a corresponding diagnosis-recommendation correlation matrix. The output correlation matrix and case category and examination category are used for funnel screening and candidate set sorting in the recommendation process.

After obtaining the knowledge contents, the system learns knowledge in a basic knowledge base, namely an elastic search database and a knowledge map, based on the BERT pre-training model, establishes a basic knowledge model, then carries out fine adjustment on the basic knowledge model through data and rules in different fields, adds transfer learning, further realizes automatic updating and transfer of model knowledge, and outputs a corresponding medical knowledge base.

In the training process of the BERT pre-training model provided by the embodiment, the long text information such as medical books and documents can be well learned through the language representation learning ability and the feature extraction ability with strong multi-layer attention mechanisms, meanwhile, the MSAK masking mechanism in the BERT can randomly mask one part to conduct context prediction and improve the information acquisition amount of the visual field range, and the masking mechanism can mask the field features in medicine to realize the learning of knowledge in a specific field. The large model training mechanism can learn rich characteristics and semantic information, the characteristics can be applied to downstream through transfer learning, and an adapter can be introduced into a regulator mode in the transfer learning process. In the training fine-tuning process by adding the regulator mode, the system can learn the small sample by adding a layer-by-layer thawing mechanism, and can better learn parameters. The method for constructing the medical knowledge base realizes the learning of various basic information, realizes the sharing of most knowledge, and realizes knowledge territory by applying the knowledge to a specific field through a model transfer learning mode.

Specifically, the specific process of basic data training based on the bert model in this embodiment is as follows:

in a first step, medical data, a literature data set, an expert diagnosis case data set, and an expert examination test data set are acquired.

The second step, preprocessing the data, specifically, the following steps: for medical data, literature datasets, processed into topic-paragraph forms, such as disease-examination, disease-diagnosis, disease-clinical manifestations, disease-treatment regimens, etc.; for the case data set, adding types-clinical manifestations, types-cases and the like while the three forms exist; for data sets of cases, diagnoses and the like, medical labels are added according to medical expertise.

And thirdly, performing numerical mapping, phase filling and other operations on the data set subjected to preliminary processing.

Fourth, as shown in fig. 3, fig. 3 is a schematic diagram of an embodiment of the BERT model input expression method provided by the present invention, where the whole text uses words as input units, and cough is a paragraph and symptoms are topics; the position vector of the whole text is calculated as follows:

the method comprises the steps of initializing BERT Model parameters in a uniformly distributed mode, namely initializing parameters in each transducer, carrying out batched data through an iterator, saving memory and loading the data into a Model, optimizing a cross entropy loss function of BERT to obtain average loss and gradient of each batch, updating all layer parameters of the Model in a back propagation algorithm by using the gradient, and then carrying out iteration on all data according to the process to obtain Model parameters, so as to obtain two basic data sets (the two data sets correspond to medical data documents and medical cases respectively) through training, and particularly obtaining a basic knowledge Model (Base Knowledge Model, BKM) and an Empirical Model (EM).

Accordingly, referring to FIG. 4, FIG. 4 is a training flow diagram of one embodiment of the knowledge-experience model provided by the present invention. As shown in FIG. 4, the embodiment of the invention realizes the migration learning of knowledge through an adapter and a fine-tuning in a knowledge-experience model. The knowledge transfer learning is to realize text classification, named entity recognition and relation extraction in different fields by adding different adapter layers and fine-tuning layers.

The specific implementation process of the multi-domain text classification task based on the domain phrase and the liner layer is as follows:

in a first step, new data sets of different fields are acquired, the data sets are in the form of texts, and one of a basic knowledge model BMK and an experience model EM is selected as a preloading model aiming at different data types.

And secondly, counting and analyzing a data set of each field, automatically constructing phrases in different fields, and labeling the phrases, wherein the labels are category labels, namely whether the phrases appear in a respiratory system or a pediatric system.

And thirdly, inputting different field phrases and labels as an adapter layer, enhancing learning of field knowledge, and taking a liner layer and a softmax layer as a fine-tuning substitution layer for learning category information of texts so as to control a model to train.

Fourth, if the training data amount is small in the model training process, directly fixing parameters of a transducer layer, and training parameters of a liner layer and a softmax layer by adopting a gradient descent method; if the amount of training data is sufficient, the fine-tuning layer parameters are learned first, and then the transducer layer parameters are modified.

Fifth step, training to generate new model BMK _class And EM _class The generated model is applied to the field corresponding to the training data, and accurate category classification of the field data can be achieved.

In addition, the specific implementation process of the multi-domain named entity recognition task based on the domain phrase and the crf is as follows:

firstly, acquiring new data sets in different fields, wherein the data set structure is used for identifying data for named entities marked in a BEMS mode, and selecting one of a basic knowledge model BMK and an experience model EM as a preloading model aiming at different data types.

And secondly, counting and analyzing the data set of each field, automatically constructing phrases in different fields, and labeling the phrases, wherein the labels are phrase entity labels, namely entity types such as symptoms, signs and the like.

And thirdly, inputting different field phrases and labels as an adapter layer, enhancing learning of field knowledge, and simultaneously taking a crf layer as a fine-tuning substitution layer for learning category information of the text.

Fourthly, in the model training process, if the training data amount is small, directly fixing parameters of a transducer layer, and training parameters of a crf layer by adopting a gradient descent method; if the amount of training data is sufficient, the fine-tuning layer parameters are learned first, and then the transducer layer parameters are modified.

Fifth step, training to generate new model BMK _ner And EM _ner The generated model is applied to the field corresponding to the training data, and accurate entity identification of the field data can be achieved.

Further, the specific implementation process of the extracting task based on the multi-domain relation of the domain phrase pairs and the liner is as follows:

firstly, acquiring new data sets in different fields, wherein the data set labeling structure is the relation between entities, and one of a basic knowledge model BMK and an experience model EM is selected as a preloading model aiming at different data types.

Secondly, counting and analyzing the data set of each field, automatically constructing phrase pairs in different fields, and labeling the phrases, wherein the labels are relationship type labels, such as: entity-to-entity is a disease-symptom relationship.

And thirdly, inputting different field phrases and labels as an adapter layer, enhancing learning of field knowledge, and simultaneously taking a liner layer and a softmax layer as substitution layers of fine-tuning for learning category information of texts.

Fifth step, training to generate new model BMK _relation And EM _relation The generated model is applied to the field corresponding to the training data, and accurate relation extraction of the field data can be achieved.

In order to better illustrate the working principle and the step flow of the collaborative filtering-based data pushing method and device of the present invention, reference may be made to the above related description, but not limited thereto.

Accordingly, referring to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of a collaborative filtering-based data pushing device provided by the present invention. As shown in fig. 2, the data pushing device in this embodiment includes a data preprocessing module 201, a data screening module 202, a collaborative filtering module 203, a knowledge base construction module 204, a rule base construction module 205, and a matrix construction module 206.

The data preprocessing module 201 is configured to obtain medical detection data input by a user, and call a preset text analysis model to perform data preprocessing on the medical detection data, so as to obtain first medical data.

Further, the data preprocessing module 201 invokes a preset text analysis model to perform data preprocessing on the medical detection data, so as to obtain first medical data, which specifically is:

based on the medical knowledge base, sequentially performing word segmentation processing and keyword recognition processing on the medical detection data through a word segmentation model and a named entity recognition model to obtain corresponding medical detection knowledge information; and carrying out relation recognition and determination on the medical detection knowledge information based on a medical rule base to obtain the association relation between each entity in the medical detection knowledge information, and integrating the medical detection knowledge information according to the determined association relation to obtain the first medical data.

The data screening module 202 is configured to sequentially perform text classification and clinical diagnosis relation determination on the first medical data based on a preset text classification algorithm and a relation extraction algorithm, obtain a similarity coefficient between clinical data and diagnostic data in the first medical data, and screen the first medical data according to the similarity coefficient to obtain a diagnosis candidate set.

Further, the data filtering module 202 performs text classification and clinical diagnosis relation determination on the first medical data based on a preset text classification algorithm and a relation extraction algorithm, so as to obtain a similarity coefficient between clinical data and diagnostic data in the first medical data, which specifically includes:

Invoking the text classification algorithm to perform knowledge feature normalization and multi-label type recognition on the first medical data according to the medical knowledge base, and determining the disease category of each case in the first medical data; and meanwhile, identifying and determining the relationship between the diagnosis data and the clinical data of the case in the first medical data through the relationship extraction algorithm, and carrying out frequency statistics on the identified and determined relationship to obtain relationship frequency data, and further obtaining the similarity coefficient according to the preset diagnosis recommendation correlation matrix.

The collaborative filtering module 203 is configured to perform recommendation on the first medical data based on a preset medical knowledge base, obtain a corresponding recommendation, and perform screening and sorting on a diagnosis candidate set according to the recommendation image based on a collaborative filtering algorithm, so as to obtain and push a corresponding recommendation result.

Further, the collaborative filtering module 203 performs recommendation on the first medical data based on a preset medical knowledge base to obtain a corresponding recommendation, which specifically includes:

the historical case data is called from the medical knowledge base, and case portrayal is carried out on the user case data in the first medical data according to the historical case data, so that case recommendation portrayal is obtained; retrieving historical test data from the medical knowledge base, and performing test portrait on the test data in the first medical data according to the historical test data to obtain a test recommended portrait; and retrieving historical prescription data from the medical knowledge base, and recommending and portraying prescription drug data and auxiliary treatment data in the first medical data according to the historical prescription data to obtain prescription recommendation data and auxiliary recommendation data.

The knowledge base construction module 204 is configured to collect and acquire medical knowledge to be processed, perform text preprocessing on the medical knowledge to be processed through a word segmentation algorithm, a part-of-speech labeling algorithm and a dependency syntax analysis algorithm to obtain a first keyword, and simultaneously call a preset deep learning model to perform named entity recognition on the medical knowledge to be processed to obtain a first knowledge point;

the word vector similarity algorithm is called to perform normalization and standardization processing on the first keywords and the first knowledge points, and corresponding standard knowledge entities are output and used as characteristic entities to perform subject word extraction in the medical knowledge to be processed to obtain corresponding first subject words; and carrying out relationship identification labeling on the entities in the first knowledge points to obtain a first entity relationship, and further constructing the medical knowledge base based on the pre-training model according to the first subject term, the first entity relationship and the standard knowledge entity.

The rule base construction module 205 is configured to acquire medical rules to be processed, perform structural analysis on the medical rules to be processed through regular expressions, and construct and output a corresponding medical rule base.

The matrix construction module 206 is configured to acquire a diagnosis case to be processed and a test report to be processed, perform matrix correlation analysis according to the diagnosis case to be processed and the test report to be processed, and output a corresponding diagnosis recommendation correlation matrix.

In summary, the embodiment of the invention provides a collaborative filtering-based data pushing method and device, which are used for preprocessing data of medical detection data input by a user through a text analysis model to obtain first medical data, sequentially carrying out text classification processing and clinical diagnosis relation determination on the first medical data based on a text classification algorithm and a relation extraction algorithm, outputting similarity coefficients between the clinical data and the diagnosis data, screening the first medical data according to the similarity coefficients to output a diagnosis candidate set, carrying out recommended portrait on the first medical data based on a medical knowledge base, screening and sorting the candidate diagnosis set according to a collaborative filtering algorithm through the recommended portrait, and outputting and pushing a recommended result. According to the invention, the input data is subjected to one-round screening through data preprocessing, so that the error data in the input data is eliminated, the similarity coefficient between the diagnosis data and the clinical data is determined through text classification and relation extraction, and the individuation degree and the coincidence degree of the recommended result are improved.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. The collaborative filtering-based data pushing method is characterized by comprising the following steps of:

2. The collaborative filtering-based data pushing method according to claim 1, wherein the retrieving a preset text analysis model performs data preprocessing on the medical detection data to obtain first medical data, specifically:

3. The collaborative filtering-based data pushing method according to claim 1, wherein the text classification and clinical diagnosis relation determination is performed on the first medical data based on a preset text classification algorithm and a relation extraction algorithm, so as to obtain a similarity coefficient between clinical data and diagnostic data in the first medical data, and the method specifically comprises:

4. The collaborative filtering-based data pushing method according to claim 1, wherein the recommending image of the first medical data based on the preset medical knowledge base is obtained, and the method specifically comprises:

5. The collaborative filtering-based data pushing method of claim 1, further comprising, prior to said obtaining the user-entered medical test data:

6. The collaborative filtering-based data pushing method of claim 1, further comprising, prior to said obtaining the user-entered medical test data:

7. The data pushing device based on collaborative filtering is characterized by comprising a data preprocessing module, a data screening module and a collaborative filtering module;

8. The collaborative filtering-based data pushing device of claim 7, wherein the data pushing device further comprises a knowledge base construction module;

9. The collaborative filtering-based data pushing device of claim 7, further comprising a rule base construction module and a matrix construction module;

10. The collaborative filtering-based data pushing device according to claim 7, wherein the data preprocessing module invokes a preset text analysis model to perform data preprocessing on the medical detection data to obtain first medical data, and the data preprocessing module specifically includes: