Disclosure of Invention
The embodiment of the invention aims to provide an imaging structured report label extraction method, an imaging structured report label extraction system, a terminal and a storage medium, and aims to solve the problem that the existing imaging structured report label extraction accuracy is low.
The embodiment of the invention is realized in such a way that an imaging structured report label extraction method comprises the following steps:
acquiring an inspection part in an imaging structural report, and determining pathological features according to the inspection part;
Acquiring free text in the imaging structured report, and determining a structured corpus according to the free text and the pathological features, wherein the free text is supplementary description of images in the imaging structured report seen by a doctor;
Inquiring a corpus according to the checked part to obtain a natural language corpus;
And carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus in sequence, and extracting a report label of the imaging structured report according to corpus analysis results.
Still further, the determining a structured corpus from the free text and the pathological features includes:
Carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
Further, the performing corpus querying according to the text position to obtain a third corpus includes:
Obtaining a structural identifier of the second structural unit, and obtaining a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, after the third corpus is generated according to the first sub-corpus and the second sub-corpus, the method further includes:
Obtaining an imaging description type of the diagnosis conclusion text, and matching the imaging description type with the corpus lookup table to obtain a third sub-corpus;
The third sub-corpus is added to the third corpus.
Still further, the determining a pathological feature from the examination site includes:
And obtaining the position codes of the checked position, and matching the position codes with a prestored code relation tree to obtain the pathological features, wherein the code relation tree stores the corresponding relations between different position codes and the corresponding pathological features.
Further, after the text of the diagnosis conclusion in the imaging structured report is sequentially analyzed with the structured corpus and the natural language corpus, the method further includes:
And acquiring a local pre-stored general corpus, and carrying out corpus analysis on the text diagnosis conclusion and the general corpus.
It is another object of an embodiment of the present invention to provide an imaging structured report label extraction system, the system comprising:
The feature determining module is used for acquiring an inspection position in the imaging structural report and determining pathological features according to the inspection position;
the corpus determining module is used for obtaining free text in the imaging structural report, determining a structural corpus according to the free text and the pathological features, and inquiring the corpus according to the checking position to obtain a natural language corpus, wherein the free text is supplementary description of the imaging in the imaging structural report for doctors;
And the label extraction module is used for sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, and extracting a report label of the imaging structured report according to a corpus analysis result.
Still further, the corpus determining module is further configured to:
Carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
It is a further object of an embodiment of the present invention to provide a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor implements the steps of the method as described above when executing the computer program.
It is a further object of embodiments of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
According to the embodiment of the invention, the pathological characteristics are determined through the checking position, the structured corpus can be automatically determined based on the free text and the pathological characteristics, the natural language corpus corresponding to the imaging structured report can be effectively determined through the checking position for carrying out corpus inquiry, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is carried out on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Example 1
Referring to fig. 1, a flowchart of an extraction method of an image structured report label according to a first embodiment of the present invention is provided, and the extraction method of an image structured report label may be applied to any terminal device or system, and the extraction method of an image structured report label includes the steps of:
step S10, acquiring an inspection part in an imaging structural report, and determining pathological features according to the inspection part;
The underlying structure of the imaging representation description part of the imaging structural report can be divided into a layer of examination part (tissue/organ) layer and a layer of specific physiological/pathological feature description under the tissue/organ layer, for example, the liver is a description of an organ, the description of the physiological/pathological feature comprises diffuse description of fatty liver, liver cirrhosis, polycystic liver disease, nonspecific diffuse liver disease, viral hepatitis, liver dysplasia, postoperative change of liver and the like, and focal lesions comprise liver cancer, cyst, hemangioma, echinococcosis, abscess, liver metastasis and the like. The diffuse and focal descriptions are made as separate structuring units, CDE (common DATA ELEMENT), all imagewise descriptive attributes of which are labeled using RADLEX or SNOMED codes.
In this step, basic information of the imaging structural report, corresponding tissue/organ information and codes, and pathological feature category information and codes to be described by a structural unit to which the imaging structural report belongs are acquired, and the pathological feature may also be a physiological feature, for example, the imaging structural report is a general report for an abdominal organ, an MR examination technology is used, the acquired examination site is pancreas, and the structural unit to which the imaging structural report belongs is pancreatic focal lesions.
Further, before step S10, the method further includes:
The coding relationship tree is constructed by using codes, and physiological/pathological characteristics described under each tissue part are expressed based on the coding relationship tree to judge the CDE type which can occur under the tissue part. For example, referring to fig. 2, the organ structure is provided with lower level contents such as position, number, distribution, surrounding invasion, etc., the surrounding invasion is a pathological feature of the tissue site, and if one type is a surrounding invasion and the tissue organ is a CDE of the pancreas, corpus corresponding to the CDE can be used in the pancreas portion.
A corpus of the CDE is built, the corpus is extracted from each structuring unit used for describing certain tissue parts or physiological/pathological characteristics of the tissue parts, or the corpus which can describe the tissue parts and the physiological/pathological characteristics is directly input, and a corpus lookup table is obtained.
For example, referring to fig. 3, the CDE is shown as a peripancreatic invasion, which has two attribute codes of pancreas and peri invasion, and according to the logic built in the structured report, corpus contents such as lesion invasion, common bile duct, duodenum, stomach, spleen and the like can be extracted, and when the contents in fig. 3 are subjected to label extraction, words such as lesion invasion, common bile duct and the like have higher priority.
In this embodiment, attribute codes representing medical meanings are also allocated to the corpus, and the corpus extracted from a specific structuring unit naturally inherits the codes of the CDE itself, and after the codes are provided, the corpus applicable to the corpus can be found through the code relation tree. The corpus generated by the CDE in fig. 3 is encoded with pancreatic and surrounding violations, etc.
Optionally, in this step, the determining a pathological feature according to the examination site includes:
obtaining the position code of the checking position, matching the position code with a prestored code relation tree to obtain the pathological feature,
Wherein, the corresponding relation between different part codes and corresponding pathological features is stored in the coding relation tree, and the corresponding pathological features are inquired from the coding relation tree according to the part codes of the checked part, for example, the pathological features comprise RADS classification describing the checked part, postoperative change, development variation and other items describing the properties of the checked part, such as size, shape, essence and the like.
Step S20, free text in the imaging structured report is obtained, and a structured corpus is determined according to the free text and the pathological features;
The free text is a supplementary description of the doctor for the image in the imaging structured report, and when the structured report is designed, an edit box is added for a tissue organ which needs to be designed or a structuring unit for describing physiological/pathological characteristics so that the doctor can input the supplementary description for the image to obtain the free text;
for example, referring to fig. 4, a CDE describing pancreatic focal lesions is illustrated, the CDE has focal features, pancreatic and other attribute codes, and an "other edit box" is added to the CDE for inputting supplementary images, and the free text is obtained by acquiring the content in the "other edit box".
In this embodiment, at the aspect of visual description, the adding position of the free text is set in a layered manner, for example, in the case of a liver, a row of free text boxes is added below the liver, and the free text boxes can be used by a diagnostician to add the types of visual manifestations which are not in the CDE list, and the types of visual manifestations are rarely used, and are not necessarily made into a structured form to be put in an interface, but in a small number of cases, some doctors still consider clinical significance, and need to describe, so the content in the free text boxes is an independent type of the visual description of the liver, possibly has a small number of descriptions of the lower-level characters, and the concept/synonym/grammar structure contained in the content is limited to a CDE type corpus under the organ, so that the diagnostician can be easily disassembled into labels of CDE-like, whether the natural language processing (NLP, natural Language Processing) is used for analysis or artificial training is performed, and the implementation of the free CDE structure is frequently used as a prompt for the conventional text report.
In this embodiment, a free text box is also set under the subordinate CDE of the tissue/organ, so that the diagnostician can supplement morphological supplement description for the CDE, for example, the free text box under the CDE is changed after liver/liver operation, the diagnostician may input "connect and cut the scope at S7, S8", and the content of the free text box is limited to the supplement description corpus under the CDE, and the free text box is easily disassembled into labels of the CDE, whether using NLP for analysis or artificial training, and for the CDE subordinate attributes which frequently occur and are not included in the existing CDE, the embodiment prompts the structural report designer to add the attributes of the CDE.
Step S30, inquiring a corpus according to the checked part to obtain a natural language corpus;
The natural language corpus obtained by inquiry is a personalized NLP (non-linear language) library preset for the checked part;
step S40, carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus in sequence, and extracting a report label of the imaging structured report according to the corpus analysis result;
Wherein, by sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, report labels in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, a diagnosis conclusion text box is arranged under the imaging diagnosis column mesh, the diagnosis conclusion text box is used for facilitating a diagnosis doctor to fill in a diagnosis conclusion text, taking CT scanning diagnosis of liver cancer as an example, the diagnosis doctor may describe the details of fatty liver, liver cirrhosis, cyst, multiple small liver cancer and the like in a section of imaging discovery, but the diagnosis probability may describe only small liver cancer and liver cirrhosis, and other unimportant imaging manifestations are ignored. Thus, where the imaging presentation employs a structured description, the text of the diagnostic content is most likely a subset of the imaging description content types, thereby defining the corpus content of diagnostic conclusion text. In this case, both the analysis using NLP and the artificial training can be easily broken down into standard concepts and correspond to RADLEX/SNOMED diagnostic codes.
Optionally, in this step, after performing corpus analysis on the diagnostic conclusion text in the imaging structured report and the structured corpus and the natural language corpus sequentially, the method further includes:
And obtaining a local pre-stored general corpus, and carrying out corpus analysis on the text diagnosis conclusion and the general corpus (general NLP library).
In this step, the NLP result is constrained according to a structured corpus (CDE corpus), and the concepts/synonyms/grammatical structures contained in the corpus have higher result priorities, for example:
1. when the examination site is pancreas, the personalized NLP pool that is matched to be most suitable is a pancreas NLP pool;
2. Acquiring the CDE corpus determined in the step S20;
3. And carrying out NLP analysis, wherein each position in the diagnosis conclusion text is preferentially matched with the content in the CDE corpus, then the content of the pancreas NLP library is matched, and finally the content in the general NLP library is matched, so that the accuracy of extracting the diagnosis conclusion text labels is improved.
Further, in this embodiment, the basic information of the imaging structural report, the corresponding information such as organization/organ and the like may be used to match with a personalized NLP dictionary library, where the personalized NLP dictionary library is trained by differences such as inspection target, inspection technology, inspection site, organization/organ and the like, so as to improve the accuracy and quality of the NLP for a specific scene.
In this embodiment, based on the structured image representation and the image representation of the labeling (the label extracted from the free text), the label is extracted from the diagnosis conclusion text, the analysis range is limited by the description of the imaging representation, the accuracy of label extraction is improved, the matching result of the NLP under the current tissue organ is constrained by the corpus, and the NLP recognition rate and recognition quality are further improved.
The free text box for describing the complex additional information is designed according to the CDE module in the structured report, so that the report requirement more complex than the existing structured report can be met, the vocabulary and grammar range of NLP analysis can be limited according to the adjacent characteristics of the CDE, the NLP technology is used for changing the diagnosis conclusion text into codes, and meanwhile, the requirements of scientific research and teaching are met, the application range of the structured report is greatly improved, and the structured design can be continuously perfected according to the NLP analysis, so that the structured report can be more continuously developed.
In this embodiment, the pathological features are determined by the checking position, the structured corpus is automatically determined based on the free text and the pathological features, the corpus is queried by the checking position, the natural language corpus corresponding to the imaging structured report can be effectively determined, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in this embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is performed on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Example two
Referring to fig. 5, a flowchart of an extraction method of an image structured report label according to a second embodiment of the present invention is provided, and the embodiment is used for further refining step S20 in the first embodiment, and includes the steps of:
Step S21, carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
The method comprises the steps of inquiring a structuring unit through pathological features to obtain a structuring unit corresponding to the pathological features related to an inspection part, obtaining a first structuring unit, and inquiring a structuring unit corresponding to an imaging structuring report to obtain a second structuring unit to which the current imaging structuring report belongs;
Step S22, respectively obtaining the corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
the method comprises the steps of respectively matching the structuring identifiers of a first structuring unit and a second structuring unit with a corpus lookup table to obtain a corpus corresponding to the first structuring unit and the second structuring unit, and obtaining a first corpus and a second corpus;
Step S23, acquiring the text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
The method includes the steps of obtaining a text position of a free text in an imaging structured report, and inquiring a corpus according to the text position to inquire a corpus applicable to the free text to obtain a third corpus, and optionally, in the step, inquiring the corpus according to the text position to obtain the third corpus, wherein the steps include:
Obtaining a structural identifier of the second structural unit, and obtaining a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
Wherein, by matching the structured identifier, paragraph tag and title tag with the corpus lookup table to query the applicable corpus seen for the supplemental image at the specific tissue organ, the free text may be used to describe an independent type of imaging description under the examination site, for example, pancreas as "organ structure", and CDE generated by CDE of a specific disease species is more likely to appear, for example, RADS classification, post-operative change, developmental variation, and the like;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
The relevant structuring element is matched with the corpus lookup table to query the applicable corpus seen by the supplementary images at the structuring element of the pathological features, for example, the edit box of free text in the pancreatic focal lesion CDE, and the CDE has attribute coding of the focal features, so that the corpus of CDE of the types such as size, shape, substance, position, quantity, surrounding invasion and the like can be possibly used.
And generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, in this step, after the generating the third corpus according to the first sub-corpus and the second sub-corpus, the method further includes:
Obtaining an imaging description type of the diagnosis conclusion text, and matching the imaging description type with the corpus lookup table to obtain a third sub-corpus;
Adding the third sub-corpus to the third corpus;
Wherein, by obtaining the imaging description type of the diagnosis conclusion text and matching the imaging description type with the corpus lookup table, the corpus applicable to the diagnosis conclusion text is queried, the diagnostic conclusion text is a subset of the imaging description types with a high probability, and therefore, all CDE corpuses associated with the diagnostic conclusion text can be used as constraints.
Step S24, the structured corpus is generated according to the first corpus, the second corpus and the third corpus;
The first corpus, the second corpus and the third corpus are combined to obtain the structured corpus.
In this embodiment, a structuring unit is queried through pathological features to obtain a structuring unit corresponding to pathological features related to an inspection part, a first structuring unit is obtained, a second structuring unit to which a current imaging structuring report belongs is obtained by querying a structuring unit corresponding to the imaging structuring report, structural identifiers of the first structuring unit and the second structuring unit are respectively matched with a corpus querying table, a corpus corresponding to the first structuring unit and the second structuring unit is obtained, a first corpus and a second corpus are obtained, a text position of a free text in the imaging structuring report is obtained, a corpus query is performed according to the text position, a corpus applicable to the free text is queried, a third corpus is obtained, and the structured corpus is obtained by combining the first corpus, the second corpus and the third corpus.
Example III
Referring to fig. 6, a structural diagram of an imaging structured report label extraction system 100 according to a third embodiment of the present invention includes a feature determination module 10, a corpus determination module 11, and a label extraction module 12, wherein:
The feature determination module 10 is configured to obtain an examination location in the imaging structural report, and determine a pathological feature according to the examination location.
The feature determining module 10 is further configured to obtain a location code of the inspection location, and match the location code with a pre-stored code relationship tree, so as to obtain the pathological feature, where correspondence between different location codes and corresponding pathological features is stored in the code relationship tree.
The corpus determining module 11 is configured to obtain a free text in the imaging structural report, determine a structural corpus according to the free text and the pathological features, and perform corpus query according to the examination location to obtain a natural language corpus, where the free text is a supplementary description of the imaging in the imaging structural report for a doctor.
The corpus determining module 11 is further configured to perform a structural unit query according to the pathological feature to obtain a first structural unit, and query a structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
Optionally, the corpus determining module 11 is further configured to obtain a structural identifier of the second structural unit, and obtain a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, the corpus determining module 11 is further configured to obtain an imagewise description type of the diagnosis conclusion text, and match the imagewise description type with the corpus lookup table to obtain a third sub-corpus;
The third sub-corpus is added to the third corpus.
The label extraction module 12 is configured to sequentially perform corpus analysis on the diagnosis conclusion text in the imaging structured report and the structured corpus and the natural language corpus, and extract a report label of the imaging structured report according to the corpus analysis result.
The tag extraction module 12 is further configured to obtain a local pre-stored general corpus, and perform corpus analysis on the text diagnosis conclusion and the general corpus.
According to the embodiment, the pathological characteristics are determined through the checking position, the structured corpus can be automatically determined based on the free text and the pathological characteristics, the natural language corpus corresponding to the imaging structured report can be effectively determined through the checking position and the corpus inquiry, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is performed on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Example IV
Fig. 7 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 7, the terminal device 2 of this embodiment comprises a processor 20, a memory 21 and a computer program 22, e.g. a program of an imaging structured report label extraction method, stored in said memory 21 and executable on said processor 20. The steps of the various embodiments of the imaging structured report label extraction methods described above are implemented by processor 20 when executing the computer program 22.
Illustratively, the computer program 22 may be partitioned into one or more modules that are stored in the memory 21 and executed by the processor 20 to complete the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 22 in the terminal device 2. The terminal device may include, but is not limited to, a processor 20, a memory 21.
The Processor 20 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 21 may also be used for temporarily storing data that has been output or is to be output.
In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Wherein the computer readable storage medium may be nonvolatile or volatile. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable storage medium may include any entity or device capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, among others. It should be noted that the content of the computer readable storage medium may be appropriately scaled according to the requirements of jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunication signals, for example, according to jurisdictions and patent practices.
The foregoing embodiments are merely illustrative of the technical solutions of the present application, and not restrictive, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may still be made to the technical solutions described in the foregoing embodiments or equivalent substitutions of some technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.