CN117831790B

CN117831790B - Auxiliary coding method, system, terminal and medium for medical diagnosis

Info

Publication number: CN117831790B
Application number: CN202410251993.3A
Authority: CN
Inventors: 李卓群; 蒋江涛; 马杰; 金剑; 邓小宁
Original assignee: North Health Medical Big Data Technology Co ltd
Current assignee: North Health Medical Big Data Technology Co ltd
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-07-05
Anticipated expiration: 2044-03-06
Also published as: CN117831790A

Abstract

The invention relates to the field of medical diagnosis coding, and particularly discloses an auxiliary coding method, a system, a terminal and a medium for medical diagnosis, which are used for receiving unstructured patient treatment data input by a user; analyzing unstructured visit data based on a large language model to analyze a coding target; extracting necessary diagnosis description text for performing diagnosis coding from unstructured visit data according to a coding target; converting the necessary diagnosis description text into a mathematical model, and recording the mathematical model as a target mathematical model; comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; and judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient. The invention uses a large language model to understand the doctor's disease description, abstracts the necessary conditions for diagnosis coding, and greatly improves the accuracy and efficiency of diagnosis coding.

Description

Auxiliary coding method, system, terminal and medium for medical diagnosis

Technical Field

The invention relates to the field of medical diagnosis coding, in particular to an auxiliary coding method, an auxiliary coding system, an auxiliary coding terminal and an auxiliary coding medium for medical diagnosis.

Background

With the development of medical science and technology, the accumulation of medical data has been increasing at an unprecedented rate. These data include patient history, physical examination reports, laboratory test results, imaging data, and the like. Such information is an important basis for doctors to diagnose and treat, and thus how to accurately and efficiently manage and use such data becomes an important problem facing the medical industry.

Medical diagnostic coding is an important means to solve this problem. By converting patient descriptions and signs into standardized diagnostic codes, data management and statistical analysis can be readily performed. Currently, the most commonly used diagnostic coding system is the tenth edition of the International Classification of diseases (ICD-10). However, this coding method is mainly dependent on experience and expertise of doctors, and requires a lot of time and effort for manual selection and input, and is inefficient and prone to error.

To solve this problem, some automatic coding systems such as CACS (Computer Assisted Coding System) have started to be applied in recent years. These systems are based primarily on rules or traditional machine learning methods, which automatically generate corresponding diagnostic codes by analyzing patient descriptions and signs. However, these system performances are limited by the technology they use. Rule-based systems require manual definition of a large number of rules and cannot handle complex and ambiguous medical data, e.g., patient descriptions may contain large amounts of unstructured information, such as natural language descriptions, medical images, etc., which are difficult to handle efficiently by rules or traditional machine learning methods. Whereas conventional machine learning methods require a large amount of labeling data, they have limited coding capacity for rare diseases.

Disclosure of Invention

In order to solve the problems, the invention provides an auxiliary coding method, an auxiliary coding system, an auxiliary coding terminal and an auxiliary coding medium for medical diagnosis, which are used for improving the accuracy and the high efficiency of medical diagnosis coding.

In a first aspect, the present invention provides an auxiliary encoding method for medical diagnosis, including the following steps:

Receiving unstructured patient care data of a patient input by a user;

analyzing unstructured visit data based on a large language model to analyze a coding target;

Extracting necessary diagnosis description text for performing diagnosis coding from unstructured visit data according to a coding target;

Converting the necessary diagnosis description text into a mathematical model, and recording the mathematical model as a target mathematical model;

comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; and judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient.

In an alternative embodiment, after extracting the necessary diagnosis description text for diagnosis encoding from unstructured visit data according to the encoding target, the method further comprises the following steps:

detecting whether other necessary diagnosis description text for performing diagnosis coding is absent;

If not, converting all the extracted necessary diagnosis description texts into a mathematical model, and recording the mathematical model as a target mathematical model;

If yes, feeding back necessary diagnosis data missing information to the front end of the user or predicting missing data according to the BERT language model, and supplementing the missing necessary diagnosis description text;

And converting the supplemented description text to be diagnosed into a mathematical model, and recording the mathematical model as a target mathematical model.

In an alternative embodiment, the missing data prediction is performed according to the BERT language model, specifically including:

extracting all diagnosis description texts from unstructured visit data;

Searching at least one diagnosis description text related to the missing necessary diagnosis description text from all diagnosis description texts;

And inputting the characteristics of the related diagnosis description text into the BERT language model for prediction to obtain the necessary diagnosis description text to be supplemented.

In an alternative embodiment, the necessary diagnosis description text is converted into a mathematical model and recorded as a target mathematical model, which specifically includes:

converting each necessary diagnostic description text into a text vector based on the large language model;

The text vectors of all necessary diagnostic description texts are linearly combined to generate a diagnostic data combination vector, i.e. a target mathematical model.

In an alternative embodiment, the target mathematical model is compared with the pre-stored mathematical models of the diagnostic codes, and the pre-stored mathematical model closest to the target mathematical model is found, which specifically includes:

calculating the distance between the diagnosis data combination vector and each pre-stored diagnosis coded vector;

screening out the diagnosis coding vector closest to the diagnosis data combination vector; the diagnostic code corresponding to the nearest diagnostic code vector is the diagnostic code of the current patient.

In an alternative embodiment, after determining the diagnostic code corresponding to the closest pre-stored mathematical model, i.e. the diagnostic code of the current patient, the method further comprises the steps of:

the diagnostic code of the current patient is fed back to the user front end.

In a second aspect, the present invention provides an auxiliary coding system for medical diagnosis, comprising,

The diagnosis data receiving module: receiving unstructured patient care data of a patient input by a user;

And a coding target analysis module: analyzing unstructured visit data based on a large language model to analyze a coding target;

the necessary text extraction module: extracting necessary diagnosis description text for performing diagnosis coding from unstructured visit data according to a coding target;

Mathematical model conversion module: converting the necessary diagnosis description text into a mathematical model, and recording the mathematical model as a target mathematical model;

Diagnostic code determination module: comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; and judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient.

In an alternative embodiment, the necessary text extraction module is further configured to:

Detecting whether other necessary diagnosis description text for performing diagnosis coding is absent; if not, triggering the execution mathematical model conversion module to convert all the extracted necessary diagnosis description texts into a mathematical model, and recording the mathematical model as a target mathematical model; if yes, feeding back necessary diagnosis data missing information to the front end of the user or predicting missing data according to the BERT language model, and supplementing the missing necessary diagnosis description text; and triggering a mathematical model conversion module to execute the text conversion of the description to be diagnosed after the filling into a mathematical model, and recording the mathematical model as a target mathematical model.

In a third aspect, a technical solution of the present invention provides a terminal, including:

a memory for storing an auxiliary encoding program for medical diagnosis;

A processor for implementing the medical diagnosis-oriented auxiliary encoding method according to any one of the above steps when executing the medical diagnosis-oriented auxiliary encoding program.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a medical diagnosis-oriented auxiliary encoding program is stored, which when executed by a processor, implements the steps of the medical diagnosis-oriented auxiliary encoding method according to any one of the above.

Compared with the prior art, the auxiliary coding method, the auxiliary coding system, the terminal and the storage medium for medical diagnosis have the following beneficial effects: the doctor's illness state description can be understood by using the large language model, the necessary condition for diagnosis coding is abstracted, the accuracy and efficiency of diagnosis coding are greatly improved, and the large language model is based on deep learning, so that effective knowledge can be learned from a large amount of medical data, and further, rare diseases without enough labeling data can be effectively coded. And meanwhile, the vector-based mathematical model is combined for searching, so that the effectiveness of diagnosis code matching is improved. The invention can lighten the workload of doctors and improve the efficiency and quality of medical service.

Drawings

For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a principle and architecture of an auxiliary coding method for medical diagnosis according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of an auxiliary coding method for medical diagnosis according to an embodiment of the present invention.

Fig. 3 is a schematic block diagram of an auxiliary coding system for medical diagnosis according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

For problems with current automated coding systems that have limited processing power for complex and ambiguous medical data, for example, patient descriptions may contain large amounts of unstructured information, such as natural language descriptions, medical images, etc., which are difficult to process efficiently by rules or traditional machine learning methods. Second, these systems have inadequate coding capacity for new or rare diseases. Because the performance of these systems depends in large part on the existing labeling data, the encoding capacity will be greatly reduced for diseases for which there is insufficient labeling data. The invention provides an auxiliary coding method for medical diagnosis, and FIG. 1 is a schematic diagram of the principle architecture of the method, which is realized based on an intelligent body, wherein the intelligent body is packaged based on a large language model, and can understand and process complex medical data, including medical history, physical examination report, laboratory test result, imaging data and the like of a patient. The intelligent agent abstracts out the necessary conditions for diagnosis coding by analyzing the data, and then searches the coding library according to the conditions to find the best matching diagnosis coding. The retrieval process adopts a vector-based mode, each diagnostic code corresponds to a vector, and the best matching diagnostic code can be found by calculating the distance between the vectors.

Fig. 2 is a schematic flow chart of an auxiliary coding method for medical diagnosis according to an embodiment of the present invention. Wherein, the execution subject of fig. 2 may be an auxiliary encoding system oriented to medical diagnosis. The auxiliary coding method for medical diagnosis provided by the embodiment of the invention is executed by the computer equipment, and correspondingly, the auxiliary coding system for medical diagnosis is operated in the computer equipment. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.

As shown in fig. 2, the method includes the following steps.

S1, unstructured patient treatment data input by a user are received.

First, a doctor inputs patient visit data to an agent. These data are unstructured, containing a large number of natural language descriptions and medical images. For example, a patient's medical history may include a patient's description of the condition, the progress of the condition, past treatment conditions, etc.; the physical examination report may include physiological indicators such as the body temperature, blood pressure, heart rate, etc. of the patient; laboratory test results may include blood tests, urine tests, biochemical tests, and the like; the imaging data may include X-ray films, CT films, MRI films, etc. These data are important basis for diagnosis by doctors and are also important inputs for diagnostic codes by agents.

S2, analyzing unstructured treatment data based on a large language model, and analyzing an encoding target.

The intelligent body analyzes the doctor input treatment data, the step is completed through a large language model, and the large language model is a language model based on deep learning and can understand and generate natural language. Therefore, the physician can understand whether the disease description is in text form or voice form or even in image form. For example, if the doctor's condition is described as "the patient has a persistent cough, a body temperature of 38.5 degrees, and X-ray film shows a right lung shadow", the agent can understand that this is a possible case of pneumonia.

The pneumonia understood by the large language model is the resolved coding target, and the diagnosis code of the current patient should be classified as the range of the pneumonia.

In one particular embodiment, unstructured visit data is input into a pre-trained GPT model, which may generate a speculative target based on the input. And encoding the unstructured visit data by each of a plurality of encoders in the pre-training GPT model to obtain a plurality of encoding feature vectors. Combining the plurality of coding feature vectors to obtain a combined feature vector; decoding the combined feature vector by a decoder in the pre-training GPT model to obtain a decoding result; and determining the diagnosis result of the unstructured visit data according to the decoding result. The encoding target is determined based on the diagnosis result.

S3, extracting necessary diagnosis description text for performing diagnosis coding from unstructured visit data according to the coding target.

The purpose of this step is to abstract the necessary conditions for performing diagnostic encoding based on the analysis result. These conditions may be patient history, signs, laboratory test results, etc. These conditions are key to diagnostic coding by the agent and are the bridge linking the doctor's description of the condition to the diagnostic code. For example, for the case of pneumonia described above, the agent might abstract the following conditions: "persistent cough", "body temperature 38.5 degrees", "right lung shaded".

However, in some cases, unstructured patient care data entered by the user may be incomplete, affecting the accuracy of the diagnostic code, at which point missing information may be fed back to the user or self-aligned according to the language model.

Specifically, after the necessary diagnosis description text for performing diagnosis coding is extracted from unstructured diagnosis data, detecting whether other necessary diagnosis description text for performing diagnosis coding is absent, if not, directly executing the next step, if not, feeding back necessary diagnosis data missing information to the front end of a user or predicting missing data according to a BERT language model, supplementing the missing necessary diagnosis description text, converting the supplemented necessary diagnosis description text into a mathematical model, and recording the mathematical model as a target mathematical model.

After receiving the missing information of the necessary diagnostic data, the user re-inputs the missing diagnosis data, and the agent further extracts the diagnosis description text.

In an alternative embodiment, the missing data prediction is performed according to the BERT language model, specifically including: extracting all diagnosis description texts from unstructured visit data; searching at least one diagnosis description text related to the missing necessary diagnosis description text from all diagnosis description texts; and inputting the characteristics of the related diagnosis description text into the BERT language model for prediction to obtain the necessary diagnosis description text to be supplemented.

It should be noted that, the BERT language model is pre-trained, and a plurality of diagnosis description texts related to the diagnosis description texts are pre-stored, if a certain diagnosis description text is missing, a certain diagnosis description text is presumed based on the BERT language model according to the plurality of diagnosis description texts related to the certain diagnosis description text, and the presumed diagnosis description text is padded to the necessary diagnosis description text.

S4, converting the necessary diagnosis description text into a mathematical model, and recording the mathematical model as a target mathematical model.

The purpose of this step is to text model the necessary diagnostic description in order to match the diagnostic code. In an alternative embodiment, the steps specifically include: and converting each necessary diagnosis description text into a text vector based on the large language model, and linearly combining the text vectors of all necessary diagnosis description texts to generate a diagnosis data combination vector, namely the target mathematical model.

S5, comparing the target mathematical model with prestored mathematical models of all diagnostic codes, and finding out a prestored mathematical model closest to the target mathematical model; and judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient.

It should be noted that, in the coding library, each diagnostic code corresponds to one mathematical model, and the embodiment matches the corresponding diagnostic code through matching of the mathematical models.

In a specific embodiment, the matching of the diagnostic codes is realized in a vectorization mode, each diagnostic code corresponds to a vector, and the best matching diagnostic code is found through the distance between the vectors. Specifically, calculating the distance between the diagnostic data combination vector and the pre-stored vectors of the respective diagnostic codes; screening out the diagnosis coding vector closest to the diagnosis data combination vector; the diagnostic code corresponding to the closest diagnostic code vector is the diagnostic code of the current patient. For example, for the cases of pneumonia described above, the diagnostic code that the agent may find is "J18.9", a code for pneumonia that is not specified in ICD-10.

After the pre-stored mathematical model closest to the target mathematical model is found, the diagnosis code of the current patient is determined, and the diagnosis code of the current patient is fed back to the front end of the user. Based on this code, the physician can perform subsequent diagnosis and treatment. For example, for the case of pneumonia described above, the physician can determine that the patient's condition is pneumonia based on the diagnostic code "J18.9" and then perform the corresponding treatment.

The embodiment of the auxiliary coding method for medical diagnosis is described in detail above, and the auxiliary coding system for medical diagnosis corresponding to the method is also provided in the embodiment of the invention based on the auxiliary coding method for medical diagnosis described in the above embodiment.

Fig. 3 is a schematic block diagram of a medical diagnosis-oriented auxiliary encoding system according to an embodiment of the present invention, and in this embodiment, the medical diagnosis-oriented auxiliary encoding system 300 may be divided into a plurality of functional modules according to functions performed by the same, as shown in fig. 3. The functional module may include: the system comprises a diagnosis data receiving module 310, a coding target analyzing module 320, a necessary text extracting module 330, a mathematical model converting module 340 and a diagnosis coding judging module 350. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory.

The visit data receiving module 310: unstructured patient care data entered by a user is received.

Encoding target parsing module 320: and analyzing unstructured visit data based on the large language model to analyze the coding target.

The necessary text extraction module 330: and extracting necessary diagnosis description text for performing diagnosis coding from unstructured visit data according to the coding target.

Mathematical model transformation module 340: the necessary diagnosis description text is converted into a mathematical model and recorded as a target mathematical model.

Diagnostic code determination module 350: comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; and judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient.

In an alternative embodiment, the necessary text extraction module 330 is further configured to: detecting whether other necessary diagnosis description text for performing diagnosis coding is absent; if not, triggering the execution mathematical model conversion module to convert all the extracted necessary diagnosis description texts into a mathematical model, and recording the mathematical model as a target mathematical model; if yes, feeding back necessary diagnosis data missing information to the front end of the user or predicting missing data according to the BERT language model, and supplementing the missing necessary diagnosis description text; and triggering a mathematical model conversion module to execute the text conversion of the description to be diagnosed after the filling into a mathematical model, and recording the mathematical model as a target mathematical model.

In an alternative embodiment, the necessary diagnosis description text is converted into a mathematical model and recorded as a target mathematical model, which specifically includes: converting each necessary diagnostic description text into a text vector based on the large language model; the text vectors of all necessary diagnostic description texts are linearly combined to generate a diagnostic data combination vector, i.e. a target mathematical model.

In an alternative embodiment, the target mathematical model is compared with the pre-stored mathematical models of the diagnostic codes, and the pre-stored mathematical model closest to the target mathematical model is found, which specifically includes: calculating the distance between the diagnosis data combination vector and each pre-stored diagnosis coded vector; screening out the diagnosis coding vector closest to the diagnosis data combination vector; the diagnostic code corresponding to the nearest diagnostic code vector is the diagnostic code of the current patient.

In an alternative embodiment, the diagnostic code determination module 350 is further configured to feedback the diagnostic code of the current patient to the user front end after determining the diagnostic code corresponding to the closest pre-stored mathematical model, i.e., the diagnostic code of the current patient.

The auxiliary encoding system for medical diagnosis of the present embodiment is used to implement the foregoing auxiliary encoding method for medical diagnosis, so that the specific implementation of the apparatus may be found in the foregoing example portion of the auxiliary encoding method for medical diagnosis, and therefore, the specific implementation thereof may refer to the description of the examples of the respective portions and will not be further described herein.

In addition, since the auxiliary encoding system for medical diagnosis of the present embodiment is used to implement the foregoing auxiliary encoding method for medical diagnosis, the functions thereof correspond to those of the foregoing method, and will not be described herein.

Fig. 4 is a schematic structural diagram of a terminal 400 according to an embodiment of the present invention, including: processor 410, memory 420, and communication unit 430. The processor 410 is configured to implement the following steps when implementing the auxiliary encoding program for medical diagnosis stored in the memory 420:

Receiving unstructured patient care data of a patient input by a user;

The terminal 400 includes a processor 410, a memory 420, and a communication unit 430. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.

The memory 420 may be used to store instructions for execution by the processor 410, and the memory 420 may be implemented by any type of volatile or nonvolatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 420, when executed by processor 410, enables terminal 400 to perform some or all of the steps in the method embodiments described below.

The processor 410 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 420, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (INTEGRATED CIRCUIT, simply referred to as an IC), for example, a single packaged IC, or may be comprised of multiple packaged ICs connected to one another for the same function or for different functions. For example, the processor 410 may include only a central processing unit (Central Processing Unit, CPU for short). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.

And a communication unit 430 for establishing a communication channel so that the storage terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.

The invention also provides a computer storage medium, which can be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (random access memory, RAM) and the like.

The computer storage medium stores a medical diagnosis-oriented auxiliary encoding program which when executed by the processor performs the steps of:

Receiving unstructured patient care data of a patient input by a user;

It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing disclosure is merely illustrative of the preferred embodiments of the invention and the invention is not limited thereto, since modifications and variations may be made by those skilled in the art without departing from the principles of the invention.

Claims

1. An auxiliary coding method for medical diagnosis is characterized by comprising the following steps:

Receiving unstructured patient care data of a patient input by a user;

Comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient;

Wherein, after extracting the necessary diagnosis description text for diagnosis coding from unstructured visit data according to the coding target, the method further comprises the following steps:

converting the supplemented description text to be diagnosed into a mathematical model, and marking the mathematical model as a target mathematical model;

The missing data prediction method specifically comprises the following steps of:

extracting all diagnosis description texts from unstructured visit data;

inputting the characteristics of the related diagnosis description text into a BERT language model for prediction to obtain necessary diagnosis description text to be supplemented;

the method comprises the steps of comparing a target mathematical model with prestored mathematical models of various diagnostic codes, and finding out the prestored mathematical model closest to the target mathematical model, wherein the method specifically comprises the following steps of:

2. The medical diagnosis-oriented auxiliary encoding method according to claim 1, wherein the necessary diagnosis description text is converted into a mathematical model and recorded as a target mathematical model, and specifically comprises:

3. The medical diagnosis-oriented auxiliary encoding method according to claim 2, further comprising the steps of, after determining the diagnosis code corresponding to the closest pre-stored mathematical model, that is, the diagnosis code of the current patient:

the diagnostic code of the current patient is fed back to the user front end.

4. An auxiliary coding system for medical diagnosis, which is characterized by comprising,

Diagnostic code determination module: comparing the target mathematical model with the mathematical models of all pre-stored diagnostic codes, and finding out the pre-stored mathematical model closest to the target mathematical model; judging the diagnosis code corresponding to the nearest pre-stored mathematical model, namely the diagnosis code of the current patient;

Wherein the necessary text extraction module is further configured to:

Detecting whether other necessary diagnosis description text for performing diagnosis coding is absent; if not, triggering the execution mathematical model conversion module to convert all the extracted necessary diagnosis description texts into a mathematical model, and recording the mathematical model as a target mathematical model; if yes, feeding back necessary diagnosis data missing information to the front end of the user or predicting missing data according to the BERT language model, and supplementing the missing necessary diagnosis description text; triggering a mathematical model conversion module to execute the conversion of the supplemented description text to be diagnosed into a mathematical model, and recording the mathematical model as a target mathematical model;

extracting all diagnosis description texts from unstructured visit data;

5. A terminal, comprising:

a memory for storing an auxiliary encoding program for medical diagnosis;

A processor for implementing the steps of the medical diagnosis oriented auxiliary encoding method according to any one of claims 1-3 when executing the medical diagnosis oriented auxiliary encoding program.

6. A computer-readable storage medium, wherein a medical diagnosis-oriented auxiliary encoding program is stored on the readable storage medium, which when executed by a processor, implements the steps of the medical diagnosis-oriented auxiliary encoding method according to any one of claims 1 to 3.