[go: up one dir, main page]

CN115458110B - Method, system, terminal and storage medium for extracting labels from structured radiology reports - Google Patents

Method, system, terminal and storage medium for extracting labels from structured radiology reports

Info

Publication number
CN115458110B
CN115458110B CN202210972696.9A CN202210972696A CN115458110B CN 115458110 B CN115458110 B CN 115458110B CN 202210972696 A CN202210972696 A CN 202210972696A CN 115458110 B CN115458110 B CN 115458110B
Authority
CN
China
Prior art keywords
corpus
imaging
structured
report
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210972696.9A
Other languages
Chinese (zh)
Other versions
CN115458110A (en
Inventor
盛若凡
周建军
梁冬云
岳新
张虽虽
秦菊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Hospital Zhongshan Hospital Fudan University
Beijing Smarttree Medical Technology Co Ltd
Original Assignee
Xiamen Hospital Zhongshan Hospital Fudan University
Beijing Smarttree Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Hospital Zhongshan Hospital Fudan University, Beijing Smarttree Medical Technology Co Ltd filed Critical Xiamen Hospital Zhongshan Hospital Fudan University
Priority to CN202210972696.9A priority Critical patent/CN115458110B/en
Publication of CN115458110A publication Critical patent/CN115458110A/en
Application granted granted Critical
Publication of CN115458110B publication Critical patent/CN115458110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

本发明提供了一种影像学结构化报告标签提取方法、系统、终端及存储介质,该方法包括:根据影像学结构化报告中的检查部位确定病理特征;获取影像学结构化报告中的自由文本,根据自由文本和病理特征确定结构化语料库;根据检查部位进行语料库查询,得到自然语言语料库;将诊断结论文本依序与结构化语料库和所述自然语言语料库进行语料分析,根据语料分析结果提取影像学结构化报告的报告标签。本发明将自由文本的内容与影像学结构化报告的影像表现描述进行组合,得到结构化语料库,再结合自然语言语料库对诊断结论文本进行标签提取,提高了标签提取的准确性。

The present invention provides a method, system, terminal, and storage medium for extracting labels from structured imaging reports. The method comprises: determining pathological characteristics based on the examination site in the structured imaging report; obtaining free text in the structured imaging report and determining a structured corpus based on the free text and the pathological characteristics; performing a corpus query based on the examination site to obtain a natural language corpus; performing corpus analysis on the diagnostic conclusion text, the structured corpus, and the natural language corpus in sequence, and extracting report labels from the structured imaging report based on the corpus analysis results. The present invention combines the content of the free text with the description of the imaging manifestations in the structured imaging report to obtain a structured corpus, and then extracts labels from the diagnostic conclusion text in combination with the natural language corpus, thereby improving the accuracy of label extraction.

Description

Imaging structured report label extraction method, system, terminal and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for extracting an image structured report label.
Background
The imaging structured report has the characteristic of imaging representation description labeling, and can be widely applied to scientific research, teaching and the like. In practical applications, not all the contents of the imaging report can be completely labeled, the following is two common scenes still needing text, the first is that some imaging performance types are discovered accidentally or cannot be covered by the current structural part, if a diagnostician wants to describe the contents, the description can only be performed by using text, and the second is that under the diagnosis scene lacking expert consensus, the diagnosis cannot be automatically generated according to the imaging description, the diagnostician still needs to manually input a diagnosis conclusion text, so that the label extraction problem of the diagnosis conclusion text in the imaging structural report is more and more emphasized for improving the labeling accuracy of the imaging structural report.
In the label extraction process of the diagnosis conclusion text in the existing imaging structural report, the label extraction is generally carried out on the diagnosis conclusion text according to manual experience, so that the label extraction accuracy is low.
Disclosure of Invention
The embodiment of the invention aims to provide an imaging structured report label extraction method, an imaging structured report label extraction system, a terminal and a storage medium, and aims to solve the problem that the existing imaging structured report label extraction accuracy is low.
The embodiment of the invention is realized in such a way that an imaging structured report label extraction method comprises the following steps:
acquiring an inspection part in an imaging structural report, and determining pathological features according to the inspection part;
Acquiring free text in the imaging structured report, and determining a structured corpus according to the free text and the pathological features, wherein the free text is supplementary description of images in the imaging structured report seen by a doctor;
Inquiring a corpus according to the checked part to obtain a natural language corpus;
And carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus in sequence, and extracting a report label of the imaging structured report according to corpus analysis results.
Still further, the determining a structured corpus from the free text and the pathological features includes:
Carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
Further, the performing corpus querying according to the text position to obtain a third corpus includes:
Obtaining a structural identifier of the second structural unit, and obtaining a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, after the third corpus is generated according to the first sub-corpus and the second sub-corpus, the method further includes:
Obtaining an imaging description type of the diagnosis conclusion text, and matching the imaging description type with the corpus lookup table to obtain a third sub-corpus;
The third sub-corpus is added to the third corpus.
Still further, the determining a pathological feature from the examination site includes:
And obtaining the position codes of the checked position, and matching the position codes with a prestored code relation tree to obtain the pathological features, wherein the code relation tree stores the corresponding relations between different position codes and the corresponding pathological features.
Further, after the text of the diagnosis conclusion in the imaging structured report is sequentially analyzed with the structured corpus and the natural language corpus, the method further includes:
And acquiring a local pre-stored general corpus, and carrying out corpus analysis on the text diagnosis conclusion and the general corpus.
It is another object of an embodiment of the present invention to provide an imaging structured report label extraction system, the system comprising:
The feature determining module is used for acquiring an inspection position in the imaging structural report and determining pathological features according to the inspection position;
the corpus determining module is used for obtaining free text in the imaging structural report, determining a structural corpus according to the free text and the pathological features, and inquiring the corpus according to the checking position to obtain a natural language corpus, wherein the free text is supplementary description of the imaging in the imaging structural report for doctors;
And the label extraction module is used for sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, and extracting a report label of the imaging structured report according to a corpus analysis result.
Still further, the corpus determining module is further configured to:
Carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
It is a further object of an embodiment of the present invention to provide a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor implements the steps of the method as described above when executing the computer program.
It is a further object of embodiments of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
According to the embodiment of the invention, the pathological characteristics are determined through the checking position, the structured corpus can be automatically determined based on the free text and the pathological characteristics, the natural language corpus corresponding to the imaging structured report can be effectively determined through the checking position for carrying out corpus inquiry, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is carried out on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Drawings
FIG. 1 is a flowchart of an extraction method of an image structured report label according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a coding relationship tree according to a first embodiment of the present invention;
FIG. 3 is a schematic view showing the structure of a peri-pancreatic focus invasion CDE according to a first embodiment of the present invention;
FIG. 4 is a schematic view of the structure of a pancreatic focal lesion CDE provided by a first embodiment of the invention;
FIG. 5 is a flowchart of a method for extracting an image structured report label according to a second embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an image structured report label extraction system according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Example 1
Referring to fig. 1, a flowchart of an extraction method of an image structured report label according to a first embodiment of the present invention is provided, and the extraction method of an image structured report label may be applied to any terminal device or system, and the extraction method of an image structured report label includes the steps of:
step S10, acquiring an inspection part in an imaging structural report, and determining pathological features according to the inspection part;
The underlying structure of the imaging representation description part of the imaging structural report can be divided into a layer of examination part (tissue/organ) layer and a layer of specific physiological/pathological feature description under the tissue/organ layer, for example, the liver is a description of an organ, the description of the physiological/pathological feature comprises diffuse description of fatty liver, liver cirrhosis, polycystic liver disease, nonspecific diffuse liver disease, viral hepatitis, liver dysplasia, postoperative change of liver and the like, and focal lesions comprise liver cancer, cyst, hemangioma, echinococcosis, abscess, liver metastasis and the like. The diffuse and focal descriptions are made as separate structuring units, CDE (common DATA ELEMENT), all imagewise descriptive attributes of which are labeled using RADLEX or SNOMED codes.
In this step, basic information of the imaging structural report, corresponding tissue/organ information and codes, and pathological feature category information and codes to be described by a structural unit to which the imaging structural report belongs are acquired, and the pathological feature may also be a physiological feature, for example, the imaging structural report is a general report for an abdominal organ, an MR examination technology is used, the acquired examination site is pancreas, and the structural unit to which the imaging structural report belongs is pancreatic focal lesions.
Further, before step S10, the method further includes:
The coding relationship tree is constructed by using codes, and physiological/pathological characteristics described under each tissue part are expressed based on the coding relationship tree to judge the CDE type which can occur under the tissue part. For example, referring to fig. 2, the organ structure is provided with lower level contents such as position, number, distribution, surrounding invasion, etc., the surrounding invasion is a pathological feature of the tissue site, and if one type is a surrounding invasion and the tissue organ is a CDE of the pancreas, corpus corresponding to the CDE can be used in the pancreas portion.
A corpus of the CDE is built, the corpus is extracted from each structuring unit used for describing certain tissue parts or physiological/pathological characteristics of the tissue parts, or the corpus which can describe the tissue parts and the physiological/pathological characteristics is directly input, and a corpus lookup table is obtained.
For example, referring to fig. 3, the CDE is shown as a peripancreatic invasion, which has two attribute codes of pancreas and peri invasion, and according to the logic built in the structured report, corpus contents such as lesion invasion, common bile duct, duodenum, stomach, spleen and the like can be extracted, and when the contents in fig. 3 are subjected to label extraction, words such as lesion invasion, common bile duct and the like have higher priority.
In this embodiment, attribute codes representing medical meanings are also allocated to the corpus, and the corpus extracted from a specific structuring unit naturally inherits the codes of the CDE itself, and after the codes are provided, the corpus applicable to the corpus can be found through the code relation tree. The corpus generated by the CDE in fig. 3 is encoded with pancreatic and surrounding violations, etc.
Optionally, in this step, the determining a pathological feature according to the examination site includes:
obtaining the position code of the checking position, matching the position code with a prestored code relation tree to obtain the pathological feature,
Wherein, the corresponding relation between different part codes and corresponding pathological features is stored in the coding relation tree, and the corresponding pathological features are inquired from the coding relation tree according to the part codes of the checked part, for example, the pathological features comprise RADS classification describing the checked part, postoperative change, development variation and other items describing the properties of the checked part, such as size, shape, essence and the like.
Step S20, free text in the imaging structured report is obtained, and a structured corpus is determined according to the free text and the pathological features;
The free text is a supplementary description of the doctor for the image in the imaging structured report, and when the structured report is designed, an edit box is added for a tissue organ which needs to be designed or a structuring unit for describing physiological/pathological characteristics so that the doctor can input the supplementary description for the image to obtain the free text;
for example, referring to fig. 4, a CDE describing pancreatic focal lesions is illustrated, the CDE has focal features, pancreatic and other attribute codes, and an "other edit box" is added to the CDE for inputting supplementary images, and the free text is obtained by acquiring the content in the "other edit box".
In this embodiment, at the aspect of visual description, the adding position of the free text is set in a layered manner, for example, in the case of a liver, a row of free text boxes is added below the liver, and the free text boxes can be used by a diagnostician to add the types of visual manifestations which are not in the CDE list, and the types of visual manifestations are rarely used, and are not necessarily made into a structured form to be put in an interface, but in a small number of cases, some doctors still consider clinical significance, and need to describe, so the content in the free text boxes is an independent type of the visual description of the liver, possibly has a small number of descriptions of the lower-level characters, and the concept/synonym/grammar structure contained in the content is limited to a CDE type corpus under the organ, so that the diagnostician can be easily disassembled into labels of CDE-like, whether the natural language processing (NLP, natural Language Processing) is used for analysis or artificial training is performed, and the implementation of the free CDE structure is frequently used as a prompt for the conventional text report.
In this embodiment, a free text box is also set under the subordinate CDE of the tissue/organ, so that the diagnostician can supplement morphological supplement description for the CDE, for example, the free text box under the CDE is changed after liver/liver operation, the diagnostician may input "connect and cut the scope at S7, S8", and the content of the free text box is limited to the supplement description corpus under the CDE, and the free text box is easily disassembled into labels of the CDE, whether using NLP for analysis or artificial training, and for the CDE subordinate attributes which frequently occur and are not included in the existing CDE, the embodiment prompts the structural report designer to add the attributes of the CDE.
Step S30, inquiring a corpus according to the checked part to obtain a natural language corpus;
The natural language corpus obtained by inquiry is a personalized NLP (non-linear language) library preset for the checked part;
step S40, carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus in sequence, and extracting a report label of the imaging structured report according to the corpus analysis result;
Wherein, by sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, report labels in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, a diagnosis conclusion text box is arranged under the imaging diagnosis column mesh, the diagnosis conclusion text box is used for facilitating a diagnosis doctor to fill in a diagnosis conclusion text, taking CT scanning diagnosis of liver cancer as an example, the diagnosis doctor may describe the details of fatty liver, liver cirrhosis, cyst, multiple small liver cancer and the like in a section of imaging discovery, but the diagnosis probability may describe only small liver cancer and liver cirrhosis, and other unimportant imaging manifestations are ignored. Thus, where the imaging presentation employs a structured description, the text of the diagnostic content is most likely a subset of the imaging description content types, thereby defining the corpus content of diagnostic conclusion text. In this case, both the analysis using NLP and the artificial training can be easily broken down into standard concepts and correspond to RADLEX/SNOMED diagnostic codes.
Optionally, in this step, after performing corpus analysis on the diagnostic conclusion text in the imaging structured report and the structured corpus and the natural language corpus sequentially, the method further includes:
And obtaining a local pre-stored general corpus, and carrying out corpus analysis on the text diagnosis conclusion and the general corpus (general NLP library).
In this step, the NLP result is constrained according to a structured corpus (CDE corpus), and the concepts/synonyms/grammatical structures contained in the corpus have higher result priorities, for example:
1. when the examination site is pancreas, the personalized NLP pool that is matched to be most suitable is a pancreas NLP pool;
2. Acquiring the CDE corpus determined in the step S20;
3. And carrying out NLP analysis, wherein each position in the diagnosis conclusion text is preferentially matched with the content in the CDE corpus, then the content of the pancreas NLP library is matched, and finally the content in the general NLP library is matched, so that the accuracy of extracting the diagnosis conclusion text labels is improved.
Further, in this embodiment, the basic information of the imaging structural report, the corresponding information such as organization/organ and the like may be used to match with a personalized NLP dictionary library, where the personalized NLP dictionary library is trained by differences such as inspection target, inspection technology, inspection site, organization/organ and the like, so as to improve the accuracy and quality of the NLP for a specific scene.
In this embodiment, based on the structured image representation and the image representation of the labeling (the label extracted from the free text), the label is extracted from the diagnosis conclusion text, the analysis range is limited by the description of the imaging representation, the accuracy of label extraction is improved, the matching result of the NLP under the current tissue organ is constrained by the corpus, and the NLP recognition rate and recognition quality are further improved.
The free text box for describing the complex additional information is designed according to the CDE module in the structured report, so that the report requirement more complex than the existing structured report can be met, the vocabulary and grammar range of NLP analysis can be limited according to the adjacent characteristics of the CDE, the NLP technology is used for changing the diagnosis conclusion text into codes, and meanwhile, the requirements of scientific research and teaching are met, the application range of the structured report is greatly improved, and the structured design can be continuously perfected according to the NLP analysis, so that the structured report can be more continuously developed.
In this embodiment, the pathological features are determined by the checking position, the structured corpus is automatically determined based on the free text and the pathological features, the corpus is queried by the checking position, the natural language corpus corresponding to the imaging structured report can be effectively determined, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in this embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is performed on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Example two
Referring to fig. 5, a flowchart of an extraction method of an image structured report label according to a second embodiment of the present invention is provided, and the embodiment is used for further refining step S20 in the first embodiment, and includes the steps of:
Step S21, carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
The method comprises the steps of inquiring a structuring unit through pathological features to obtain a structuring unit corresponding to the pathological features related to an inspection part, obtaining a first structuring unit, and inquiring a structuring unit corresponding to an imaging structuring report to obtain a second structuring unit to which the current imaging structuring report belongs;
Step S22, respectively obtaining the corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
the method comprises the steps of respectively matching the structuring identifiers of a first structuring unit and a second structuring unit with a corpus lookup table to obtain a corpus corresponding to the first structuring unit and the second structuring unit, and obtaining a first corpus and a second corpus;
Step S23, acquiring the text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
The method includes the steps of obtaining a text position of a free text in an imaging structured report, and inquiring a corpus according to the text position to inquire a corpus applicable to the free text to obtain a third corpus, and optionally, in the step, inquiring the corpus according to the text position to obtain the third corpus, wherein the steps include:
Obtaining a structural identifier of the second structural unit, and obtaining a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
Wherein, by matching the structured identifier, paragraph tag and title tag with the corpus lookup table to query the applicable corpus seen for the supplemental image at the specific tissue organ, the free text may be used to describe an independent type of imaging description under the examination site, for example, pancreas as "organ structure", and CDE generated by CDE of a specific disease species is more likely to appear, for example, RADS classification, post-operative change, developmental variation, and the like;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
The relevant structuring element is matched with the corpus lookup table to query the applicable corpus seen by the supplementary images at the structuring element of the pathological features, for example, the edit box of free text in the pancreatic focal lesion CDE, and the CDE has attribute coding of the focal features, so that the corpus of CDE of the types such as size, shape, substance, position, quantity, surrounding invasion and the like can be possibly used.
And generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, in this step, after the generating the third corpus according to the first sub-corpus and the second sub-corpus, the method further includes:
Obtaining an imaging description type of the diagnosis conclusion text, and matching the imaging description type with the corpus lookup table to obtain a third sub-corpus;
Adding the third sub-corpus to the third corpus;
Wherein, by obtaining the imaging description type of the diagnosis conclusion text and matching the imaging description type with the corpus lookup table, the corpus applicable to the diagnosis conclusion text is queried, the diagnostic conclusion text is a subset of the imaging description types with a high probability, and therefore, all CDE corpuses associated with the diagnostic conclusion text can be used as constraints.
Step S24, the structured corpus is generated according to the first corpus, the second corpus and the third corpus;
The first corpus, the second corpus and the third corpus are combined to obtain the structured corpus.
In this embodiment, a structuring unit is queried through pathological features to obtain a structuring unit corresponding to pathological features related to an inspection part, a first structuring unit is obtained, a second structuring unit to which a current imaging structuring report belongs is obtained by querying a structuring unit corresponding to the imaging structuring report, structural identifiers of the first structuring unit and the second structuring unit are respectively matched with a corpus querying table, a corpus corresponding to the first structuring unit and the second structuring unit is obtained, a first corpus and a second corpus are obtained, a text position of a free text in the imaging structuring report is obtained, a corpus query is performed according to the text position, a corpus applicable to the free text is queried, a third corpus is obtained, and the structured corpus is obtained by combining the first corpus, the second corpus and the third corpus.
Example III
Referring to fig. 6, a structural diagram of an imaging structured report label extraction system 100 according to a third embodiment of the present invention includes a feature determination module 10, a corpus determination module 11, and a label extraction module 12, wherein:
The feature determination module 10 is configured to obtain an examination location in the imaging structural report, and determine a pathological feature according to the examination location.
The feature determining module 10 is further configured to obtain a location code of the inspection location, and match the location code with a pre-stored code relationship tree, so as to obtain the pathological feature, where correspondence between different location codes and corresponding pathological features is stored in the code relationship tree.
The corpus determining module 11 is configured to obtain a free text in the imaging structural report, determine a structural corpus according to the free text and the pathological features, and perform corpus query according to the examination location to obtain a natural language corpus, where the free text is a supplementary description of the imaging in the imaging structural report for a doctor.
The corpus determining module 11 is further configured to perform a structural unit query according to the pathological feature to obtain a first structural unit, and query a structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
and generating the structured corpus according to the first corpus, the second corpus and the third corpus.
Optionally, the corpus determining module 11 is further configured to obtain a structural identifier of the second structural unit, and obtain a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
Further, the corpus determining module 11 is further configured to obtain an imagewise description type of the diagnosis conclusion text, and match the imagewise description type with the corpus lookup table to obtain a third sub-corpus;
The third sub-corpus is added to the third corpus.
The label extraction module 12 is configured to sequentially perform corpus analysis on the diagnosis conclusion text in the imaging structured report and the structured corpus and the natural language corpus, and extract a report label of the imaging structured report according to the corpus analysis result.
The tag extraction module 12 is further configured to obtain a local pre-stored general corpus, and perform corpus analysis on the text diagnosis conclusion and the general corpus.
According to the embodiment, the pathological characteristics are determined through the checking position, the structured corpus can be automatically determined based on the free text and the pathological characteristics, the natural language corpus corresponding to the imaging structured report can be effectively determined through the checking position and the corpus inquiry, the diagnosis conclusion text is sequentially subjected to corpus analysis with the structured corpus and the natural language corpus, the report label in the diagnosis conclusion text can be effectively extracted based on the corpus analysis result, in the embodiment, the content of the free text is combined with the image representation description of the imaging structured report to obtain the structured corpus, and then the label extraction is performed on the diagnosis conclusion text by combining the natural language corpus, so that the label extraction accuracy is improved.
Example IV
Fig. 7 is a block diagram of a terminal device 2 according to a fourth embodiment of the present application. As shown in fig. 7, the terminal device 2 of this embodiment comprises a processor 20, a memory 21 and a computer program 22, e.g. a program of an imaging structured report label extraction method, stored in said memory 21 and executable on said processor 20. The steps of the various embodiments of the imaging structured report label extraction methods described above are implemented by processor 20 when executing the computer program 22.
Illustratively, the computer program 22 may be partitioned into one or more modules that are stored in the memory 21 and executed by the processor 20 to complete the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 22 in the terminal device 2. The terminal device may include, but is not limited to, a processor 20, a memory 21.
The Processor 20 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the terminal device 2, such as a hard disk or a memory of the terminal device 2. The memory 21 may also be an external storage device of the terminal device 2, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the terminal device 2. The memory 21 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 21 may also be used for temporarily storing data that has been output or is to be output.
In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Wherein the computer readable storage medium may be nonvolatile or volatile. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable storage medium may include any entity or device capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, among others. It should be noted that the content of the computer readable storage medium may be appropriately scaled according to the requirements of jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunication signals, for example, according to jurisdictions and patent practices.
The foregoing embodiments are merely illustrative of the technical solutions of the present application, and not restrictive, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may still be made to the technical solutions described in the foregoing embodiments or equivalent substitutions of some technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (7)

1. An imaging structured report label extraction method, comprising:
acquiring an inspection part in an imaging structural report, and determining pathological features according to the inspection part;
Acquiring free text in the imaging structured report, and determining a structured corpus according to the free text and the pathological features, wherein the free text is supplementary description of images in the imaging structured report seen by a doctor;
Inquiring a corpus according to the checked part to obtain a natural language corpus;
sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, and extracting a report label of the imaging structured report according to corpus analysis results;
The determining a structured corpus from the free text and the pathological features includes:
Carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
generating the structured corpus according to the first corpus, the second corpus and the third corpus;
the step of inquiring the corpus according to the text position to obtain a third corpus comprises the following steps:
Obtaining a structural identifier of the second structural unit, and obtaining a paragraph tag and a title tag in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
2. The method for extracting an image structured report label according to claim 1, wherein after generating the third corpus from the first sub-corpus and the second sub-corpus, the method further comprises:
Obtaining an imaging description type of the diagnosis conclusion text, and matching the imaging description type with the corpus lookup table to obtain a third sub-corpus;
The third sub-corpus is added to the third corpus.
3. The method of imaging structured report label extraction of claim 1, wherein said determining pathological features from said examination site comprises:
And obtaining the position codes of the checked position, and matching the position codes with a prestored code relation tree to obtain the pathological features, wherein the code relation tree stores the corresponding relations between different position codes and the corresponding pathological features.
4. The method for extracting a label of an imaging structured report according to any one of claims 1 to 3, wherein after sequentially performing corpus analysis on the text of the diagnosis conclusion in the imaging structured report, the structured corpus and the natural language corpus, the method further comprises:
and obtaining a local pre-stored general corpus, and carrying out corpus analysis on the diagnosis conclusion text and the general corpus.
5. An imaging structured report label extraction system, the system comprising:
The feature determining module is used for acquiring an inspection position in the imaging structural report and determining pathological features according to the inspection position;
the corpus determining module is used for obtaining free text in the imaging structural report, determining a structural corpus according to the free text and the pathological features, and inquiring the corpus according to the checking position to obtain a natural language corpus, wherein the free text is supplementary description of the imaging in the imaging structural report for doctors;
the label extraction module is used for sequentially carrying out corpus analysis on the diagnosis conclusion text in the imaging structured report, the structured corpus and the natural language corpus, and extracting a report label of the imaging structured report according to a corpus analysis result;
the corpus determining module is also used for carrying out structural unit query according to the pathological characteristics to obtain a first structural unit, and querying the structural unit corresponding to the imaging structural report to obtain a second structural unit;
respectively obtaining corpus corresponding to the first structuring unit and the second structuring unit to obtain a first corpus and a second corpus;
acquiring a text position of the free text in the imaging structured report, and inquiring a corpus according to the text position to obtain a third corpus;
generating the structured corpus according to the first corpus, the second corpus and the third corpus;
The corpus determining module is further used for acquiring the structural identification of the second structural unit and acquiring paragraph labels and title labels in the text position;
matching the structured identification, the paragraph label and the title label with a prestored corpus lookup table to obtain a first sub-corpus;
acquiring an association structuring unit of the first structuring unit, and matching the association structuring unit with the corpus lookup table to obtain a second sub-corpus;
and generating the third corpus according to the first sub-corpus and the second sub-corpus.
6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.
7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4.
CN202210972696.9A 2022-08-15 2022-08-15 Method, system, terminal and storage medium for extracting labels from structured radiology reports Active CN115458110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210972696.9A CN115458110B (en) 2022-08-15 2022-08-15 Method, system, terminal and storage medium for extracting labels from structured radiology reports

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210972696.9A CN115458110B (en) 2022-08-15 2022-08-15 Method, system, terminal and storage medium for extracting labels from structured radiology reports

Publications (2)

Publication Number Publication Date
CN115458110A CN115458110A (en) 2022-12-09
CN115458110B true CN115458110B (en) 2025-09-26

Family

ID=84299818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210972696.9A Active CN115458110B (en) 2022-08-15 2022-08-15 Method, system, terminal and storage medium for extracting labels from structured radiology reports

Country Status (1)

Country Link
CN (1) CN115458110B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880294B (en) * 2023-02-22 2023-06-13 广州高通影像技术有限公司 Integrated processing method and system based on endoscope image
CN117409921B (en) * 2023-12-15 2024-03-12 万里云医疗信息科技(北京)有限公司 Disease conclusion determination method, device and storage medium
CN119601181A (en) * 2024-11-01 2025-03-11 安徽影联云享医疗科技有限公司 A structured label generation method integrating medical images and text reports

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294764A (en) * 2012-02-29 2013-09-11 国际商业机器公司 Method and system for extracting information from electronic documents
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575994B2 (en) * 2011-02-11 2017-02-21 Siemens Aktiengesellschaft Methods and devices for data retrieval
EP3753025B1 (en) * 2018-02-16 2025-04-09 Google LLC Automated extraction of structured labels from medical text using deep convolutional networks and use thereof to train a computer vision model
CN111522901B (en) * 2020-03-18 2023-10-20 大箴(杭州)科技有限公司 Method and device for processing address information in text
CN112786162B (en) * 2020-12-28 2024-06-21 北京赛迈特锐医疗科技有限公司 System and method for designing structured report template based on sub-template semantic association
CN113127601B (en) * 2021-04-22 2024-06-21 北京赛迈特锐医疗科技有限公司 Method and device for labeling free text
CN114566245A (en) * 2022-01-25 2022-05-31 复旦大学附属中山医院厦门医院 Information display method, system, terminal device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294764A (en) * 2012-02-29 2013-09-11 国际商业机器公司 Method and system for extracting information from electronic documents
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis

Also Published As

Publication number Publication date
CN115458110A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
CN115458110B (en) Method, system, terminal and storage medium for extracting labels from structured radiology reports
He et al. Pathvqa: 30000+ questions for medical visual question answering
JP6749835B2 (en) Context-sensitive medical data entry system
JP4366108B2 (en) Document search apparatus, document search method, and computer program
CN112330624B (en) Medical image processing method and device
CN112926677B (en) Information labeling method, device and system for medical image data
CN113488180B (en) A clinical guideline knowledge modeling method and system
CN119357443A (en) A fine alignment method of visual-linguistic features for large medical multimodal models
CN111785383A (en) Data processing method and related equipment
Seifert et al. Combined semantic and similarity search in medical image databases
CN117010362A (en) Medical record writing method, system, terminal and storage medium
Tizhoosh Foundation models and information retrieval in digital pathology
CN110047569B (en) Method, device and medium for generating question-answer data set based on chest radiography report
CN117115819B (en) Target field extraction method, system, terminal and medium
CN117976187A (en) A magnetic resonance image analysis system
CN111209742A (en) Method and device for determining diagnosis basis data, readable medium and electronic equipment
CN113486644A (en) Method, system, terminal and storage medium for quickly generating medical document
Korntheuer et al. Transforming documents of the Austrian nationwide EHR system into the OMOP CDM
CN113496124A (en) Semantic analysis method and device for medical document, electronic equipment and storage medium
CN118609747B (en) Method, system and storage medium for generating structured report based on voice data
Azadmanjir et al. A three-phase decision model of computer-aided coding for the Iranian classification of health interventions (IRCHI)
CN118098522B (en) Medical data labeling method, system and medium based on large model
Moirangthem et al. Content based medical image retrieval (CBMIR): A survey of region of interest (ROI) and perceptual hash values
CN119049640B (en) Processing method and device for report template used for near infrared brain function examination
JP2013506900A (en) Document identification using image-based queries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant