[go: up one dir, main page]

TWI705459B - De-identification method and system thereof, method of generating templet data - Google Patents

De-identification method and system thereof, method of generating templet data Download PDF

Info

Publication number
TWI705459B
TWI705459B TW108107944A TW108107944A TWI705459B TW I705459 B TWI705459 B TW I705459B TW 108107944 A TW108107944 A TW 108107944A TW 108107944 A TW108107944 A TW 108107944A TW I705459 B TWI705459 B TW I705459B
Authority
TW
Taiwan
Prior art keywords
text
data
processing unit
sample
image
Prior art date
Application number
TW108107944A
Other languages
Chinese (zh)
Other versions
TW202034347A (en
Inventor
陳琦文
吳良贊
黃威達
Original Assignee
睿傳數據股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 睿傳數據股份有限公司 filed Critical 睿傳數據股份有限公司
Priority to TW108107944A priority Critical patent/TWI705459B/en
Priority to CN201910214546.XA priority patent/CN111667415A/en
Publication of TW202034347A publication Critical patent/TW202034347A/en
Application granted granted Critical
Publication of TWI705459B publication Critical patent/TWI705459B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

A de-identification method executed by a processor, the method includes: receiving a sample image data including sample image, the sample image includes text; recognizing the text of the sample image using OCR; choosing one of the recognized text which match a predetermined condition; generating de-identification template data which indicates de-identification area corresponds to the text matching the predetermined condition; saving the generated de-identification template data; receiving a target image data including target image; choosing one of a plurality of saved de-identification template data; de-identifying contents of the target image corresponds to the de-identification area of the chosen de-identification template data; and generating an output image data having the de-identified target image.

Description

去識別化方法及系統、產生模板數據的方法 De-identification method and system, method for generating template data

本發明係關於一種對影像的去識別化方法,尤其是一種自動化建立模板來對影像去識別化的方法。 The present invention relates to a method for de-identifying images, in particular to a method for automatically creating templates to de-identify images.

在許多產業中,特定數據有機會從原機構被分享到另一機構,以供被利用在例如任務外包、學術研究等用途。因此,有機會使得數據中需要被保密的內容暴露於未經授權的機構。舉例來說,《健康保險隱私及責任法案》(Health Insurance Portability and Accountability Act,HIPAA)規定要求不披露病人的個人信息,而醫療影像數據通常包含病人的個人數據(例如包含姓名、身分證字號、病歷號碼等等),故學術機構根據接收自醫療機構的醫療影像數據進行研究時,會一併獲得病人的個人數據,使得需被保密的個人數據暴露於未經授權的學術機構。為此,醫療機構在將數據分享給學術機構之前,必須將醫療影像數據去識別化。在一些情況中,將醫療影像去識別化的方式,是以人工的方式辨識及遮蔽,浪費人力且效率不高。 In many industries, specific data has the opportunity to be shared from the original institution to another institution for use in tasks such as task outsourcing, academic research, etc. Therefore, there is an opportunity to expose the contents of the data that need to be kept secret to unauthorized organizations. For example, the Health Insurance Portability and Accountability Act (HIPAA) requires that the patient’s personal information not be disclosed, and medical image data usually contains the patient’s personal data (including name, ID number, Medical record numbers, etc.), so academic institutions will obtain the patient’s personal data when conducting research based on medical imaging data received from medical institutions, so that personal data that needs to be kept confidential is exposed to unauthorized academic institutions. For this reason, medical institutions must de-identify medical imaging data before sharing data with academic institutions. In some cases, the way to de-identify medical images is to identify and mask manually, which wastes manpower and is not efficient.

本公開案提供了一種去識別化方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的 文字;所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並儲存所產生的模板數據;所述處理單元接收具有目標影像的目標影像數據;所述處理單元選擇已儲存的多個模板數據其中一者;所述處理單元將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像。 The present disclosure provides a de-identification method, including: a processing unit receives sample data with a sample image, the sample image has characters; the processing unit uses optical character recognition (Optical Character Recognition, OCR) to identify the data The text in the sample image; the processing unit selects and recognizes the text in the sample image which meets a predetermined condition The processing unit generates template data, the template data instructs to go out the recognition area, the de-recognition area corresponds to the text that meets the predetermined conditions, and stores the generated template data; the processing unit receives The target image data of the target image; the processing unit selects one of a plurality of stored template data; the processing unit removes the content of the de-identified region in the target image corresponding to the selected template data Identifying; and the processing unit generates output image data, the output image data having the de-identified target image.

本公開案亦提供一種產生模板數據的方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;及所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字。 The present disclosure also provides a method for generating template data, including: a processing unit receives sample data with a sample image, the sample image has text; the processing unit uses optical character recognition (OCR) to identify The text in the sample image; the processing unit selects and recognizes text from the text in the sample image that meets a predetermined condition; and the processing unit generates template data, the template data instructing to identify The de-identified area corresponds to a character that meets the predetermined condition.

本公開案提供一種去識別化系統包含一存儲模塊及一處理單元。 所述存儲模塊存儲多個模板數據。所述處理單元與所述存儲模塊數據連接。所述處理單元組配來接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元組配來利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元組配來選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;所述處理單元組配來產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所產生的模板數據儲存至所述存儲模塊;所述處理單元組配來接收具有目標影像的目標影像數據;所述處理單元組配來選擇儲存在所述存儲模塊的多個模板數據其中一者;所述處理單元組配來將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元組配來產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標 影像。 The present disclosure provides a de-identification system including a storage module and a processing unit. The storage module stores a plurality of template data. The processing unit is in data connection with the storage module. The processing unit is configured to receive sample data with a sample image, the sample image has text; the processing unit is configured to use optical character recognition (Optical Character Recognition, OCR) to identify all of the sample images The text; the processing unit is configured to select the text recognized from the text in the sample image which meets a predetermined condition; the processing unit is configured to generate template data, the template data indicating the identification area , The de-identified area corresponds to the text that meets the predetermined condition, and the generated template data is stored in the storage module; the processing unit is configured to receive the target image data with the target image; the processing The unit is configured to select one of the plurality of template data stored in the storage module; the processing unit is configured to remove the content of the de-identified area corresponding to the selected template data in the target image Identification; and the processing unit is configured to generate output image data, the output image data having the de-identified target image.

承上所述,處理單元利用樣本數據產生模板數據,且利用模板數據將所述目標影像去識別化並產生一輸出影像數據,能有效提升效率。 As mentioned above, the processing unit uses the sample data to generate template data, and uses the template data to de-identify the target image and generate an output image data, which can effectively improve efficiency.

110:去識別化系統 110: De-identification system

111:通信模塊 111: Communication module

112:存儲模塊 112: storage module

113:翻譯模塊 113: Translation Module

114:處理單元 114: Processing Unit

121:設備 121: Equipment

122:設備 122: Equipment

123:設備 123: Equipment

130:通信網絡 130: communication network

D,D’:模板數據 D, D’: template data

T,T’:預設標籤 T, T’: preset label

S21~S24:程序 S21~S24: Program

S51~S55:程序 S51~S55: Program

S521~S522:程序 S521~S522: Program

310:目標影像 310: Target image

311:文字 311: text

410:經去識別化的目標影像 410: De-identified target image

411:遮罩 411: Mask

610,710,810:樣本影像 610, 710, 810: sample images

611:圖像資訊 611: Image Information

612,612’,712’,812’:文字 612, 612’, 712’, 812’: text

913,1003,1103:參考區域 913, 1003, 1103: reference area

914,1004,1104:比較區域 914, 1004, 1104: comparison area

1201:模板影像 1201: template image

1202:去識別化區域 1202: De-identification area

S131~S139:程序 S131~S139: Program

為可仔細理解本案以上記載之特徵,參照實施態樣可提供簡述如上之本案的更特定描述,一些實施態樣係說明於隨附圖式中。然而,要注意的是,隨附圖式僅說明本案的典型實施態樣並且因此不被視為限制本案的範圍,因為本案可承認其他等效實施態樣。 In order to understand the features described above in this case carefully, a more specific description of the above case can be provided with reference to the implementation aspects, and some implementation aspects are illustrated in the accompanying drawings. However, it should be noted that the accompanying drawings only illustrate the typical implementation aspects of this case and are therefore not regarded as limiting the scope of this case, because this case may recognize other equivalent implementation aspects.

第1圖示出了根據本公開的一些實施例的去識別化系統的組件方塊圖及示例性操作環境;第2圖示出了根據本公開的一些例示性去識別化系統所實施的去識別化方法的流程圖;第3圖示出了根據本公開的一些實施例的一目標影像;第4圖示出了根據本公開的一些實施例的一經去識別化目標影像;第5圖示出了根據本公開的一些例示性去識別化系統所實施的產生去識別模板的方法的流程圖;第6圖至第8圖分別示出了根據本公開的一些實施例的三個樣本影像;第9圖至第11圖分別示出了根據本公開的一些實施例的已被界定出去識別化區域的三個樣本影像;第12圖示出了根據本公開的一些實施例的一去識別化區域;及第13圖示出了根據本公開的一些例示性去識別化系統所實施的去識別方法的流程圖。 Figure 1 shows a component block diagram and an exemplary operating environment of a de-identification system according to some embodiments of the present disclosure; Figure 2 shows a de-identification implemented by some exemplary de-identification systems according to the present disclosure Figure 3 shows a target image according to some embodiments of the present disclosure; Figure 4 shows a de-identified target image according to some embodiments of the present disclosure; Figure 5 shows A flowchart of a method for generating a de-identification template implemented by some exemplary de-identification systems according to the present disclosure; Fig. 6 to Fig. 8 respectively show three sample images according to some embodiments of the present disclosure; Figures 9 to 11 respectively show three sample images with de-identified regions defined according to some embodiments of the present disclosure; Figure 12 shows a de-identified region according to some embodiments of the present disclosure ; And Figure 13 shows a flowchart of a de-identification method implemented by some exemplary de-identification systems according to the present disclosure.

以下描述將參考附圖以更全面地描述本公開內容。附圖中所示 為本公開的示例性實施例。然而,本公開可以以許多不同的形式來實施,並且不應該被解釋為限於在此闡述的示例性實施例。提供這些示例性實施例是為了使本公開透徹和完整,並且將本公開的範圍充分地傳達給本領域技術人員。類似的附圖標記表示相同或類似的元件。 The following description will refer to the accompanying drawings to more fully describe the present disclosure. Shown in the attached picture This is an exemplary embodiment of the present disclosure. However, the present disclosure can be implemented in many different forms, and should not be construed as being limited to the exemplary embodiments set forth herein. These exemplary embodiments are provided to make the present disclosure thorough and complete, and to fully convey the scope of the present disclosure to those skilled in the art. Similar reference numerals indicate the same or similar elements.

本文使用的術語僅用於描述特定示例性實施例的目的,而不意圖限製本公開。如本文所使用的,除非上下文另外清楚地指出,否則單數形式“一”,“一個”和“所述”旨在也包括複數形式。此外,當在本文中使用時,“包括”和/或“包含”或“包括”和/或“包括”或“具有”和/或“具有”,整數,步驟,操作,元件和/或組件,但不排除存在或添加一個或多個其它特徵,區域,整數,步驟,操作,元件,組件和/或其群組。 The terms used herein are only for the purpose of describing specific exemplary embodiments, and are not intended to limit the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular forms "a", "an" and "said" are intended to also include the plural forms. In addition, when used herein, "includes" and/or "includes" or "includes" and/or "includes" or "has" and/or "has", integers, steps, operations, elements and/or components , But does not exclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components and/or groups thereof.

除非另外定義,否則本文使用的所有術語(包括技術和科學術語)具有與本公開所屬領域的普通技術人員通常理解的相同的含義。此外,除非文中明確定義,諸如在通用字典中定義的那些術語應該被解釋為具有與其在相關技術和本公開內容中的含義一致的含義,並且將不被解釋為理想化或過於正式的含義。 Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this disclosure belongs. In addition, unless clearly defined in the context, terms such as those defined in a general dictionary should be interpreted as having meanings consistent with their meanings in the related art and the present disclosure, and will not be interpreted as idealized or overly formal meanings.

以下內容將結合附圖對示例性實施例進行描述。須注意的是,參考附圖中所描繪的元件不一定按比例顯示;而相同或類似的元件將被賦予相同或相似的附圖標記表示或類似的技術用語。 The following content will describe exemplary embodiments with reference to the accompanying drawings. It should be noted that the elements depicted in the reference drawings are not necessarily shown to scale; and the same or similar elements will be given the same or similar reference numerals or similar technical terms.

圖1示出了根據本公開的一些實施例的去識別化系統110的一示例性操作情境,其中包含所述去識別化系統110的組件方塊圖,以及與所述去識別化系統110數據連接的設備121、設備122、設備123。在其他的操作環境中,所述去識別化系統110所連接的設備之數量不以圖1所揭露之內容為限。 FIG. 1 shows an exemplary operation scenario of the de-identification system 110 according to some embodiments of the present disclosure, which includes a block diagram of the components of the de-identification system 110 and data connection with the de-identification system 110的 device 121, device 122, device 123. In other operating environments, the number of devices connected to the de-identification system 110 is not limited to the content disclosed in FIG. 1.

在所述示例性操作情境中,設備121、設備122、設備123分別歸屬於A醫院、B醫院及C醫院的影像生成裝置。設備121為一個斷層攝影設 備。在所述示例性操作情境中,所述設備121、設備122、設備123組配來產生具有一檔案格式定義的影像數據,所述的影像數據諸如符合醫療數位影像傳輸協定(DICOM,Digital Imaging and Communications in Medicine)之通用標準的影像數據。在所述示例性操作情境中,所述設備121、設備122、設備123分別組配來產生電腦斷層攝影、電腦X光攝影、超聲心動圖等符合DICOM標準的影像數據。 In the exemplary operation scenario, the equipment 121, the equipment 122, and the equipment 123 belong to the image generating devices of the A hospital, the B hospital, and the C hospital, respectively. Device 121 is a tomographic device Prepared. In the exemplary operation scenario, the device 121, the device 122, and the device 123 are combined to generate image data with a file format definition, such as the image data conforming to the Digital Imaging and Medical Transmission Protocol (DICOM). Communications in Medicine) common standard image data. In the exemplary operation scenario, the device 121, the device 122, and the device 123 are respectively configured to generate image data conforming to the DICOM standard, such as computer tomography, computer X-ray photography, and echocardiography.

DICOM是一個規範如何處理、存儲、列印及傳輸醫療影像數據的標準,DICOM的標準包括檔案格式定義及通信協定。符合DICOM標準的影像數據必須以支援TCP/IP協定的通信網絡進行傳輸。符合DICOM標準的影像數據具有諸如像素數據(pixel data)及影像屬性資訊(attribute information)等格式定義。像素數據描述每個像素的值,組成一個影像。屬性資訊具有多個標籤(tags)及多個分別對應於所述標籤的屬性值(attribute value)。標籤(Tag)內包含了群組代號(Group number)與元素代號(Element number),如標籤(0010,0010),其0010為群組代號(Group number),而0010則為元素代號(Element number)。每一標籤唯一對應於一屬性值。每一屬性值用以描述一種信息,所述信息可以是病人姓名(例如「黃小明」)、病人ID、醫事機構(如「台大醫院」)、儀器廠商(如「Philips」)、設備型號(如「EPIQ 5」)、影像類別(如「US」)、影像大小(如「640 x 480」)、影像類型(如「Secondary」)等等。舉例來說,標籤為「(0010,0010)」所對應的屬性值為「黃小明」,其用於描述病人姓名。 DICOM is a standard that regulates how to process, store, print, and transmit medical image data. The DICOM standard includes file format definitions and communication protocols. Image data conforming to the DICOM standard must be transmitted through a communication network that supports the TCP/IP protocol. Image data conforming to the DICOM standard has format definitions such as pixel data and image attribute information. Pixel data describes the value of each pixel to form an image. The attribute information has a plurality of tags and a plurality of attribute values respectively corresponding to the tags. The tag contains the group number and the element number, such as the tag (0010, 0010), where 0010 is the group number, and 0010 is the element number ). Each tag uniquely corresponds to an attribute value. Each attribute value is used to describe a type of information, the information can be the patient's name (such as "Huang Xiaoming"), patient ID, medical institution (such as "National Taiwan University Hospital"), instrument manufacturer (such as "Philips"), equipment model (such as "EPIQ 5"), image type (such as "US"), image size (such as "640 x 480"), image type (such as "Secondary"), etc. For example, the attribute value corresponding to the tag "(0010,0010)" is "Huang Xiaoming", which is used to describe the patient's name.

在本實施例中,所述去識別化系統110包括一存儲模塊112、一翻譯模塊113、及一處理單元114。 In this embodiment, the de-identification system 110 includes a storage module 112, a translation module 113, and a processing unit 114.

所述存儲模塊112可以包括一個或多個存儲設備,並且被配置用以存儲多個模板數據D、D’及多個預設標籤T、T’。在本實施例中,每一模板數據D(D’)具有至少一去識別化標籤、至少一模板屬性值、一具有特定格式的模 板影像之格式、及在所述模板影像上的至少一個去識別化區域。在本實施例中,每一個去識別化區域位於所述模板影像的不變的位置。舉例來說,模板數據D’指示出一個去識別化區域(例如去識別化區域1202)及一個去識別化標籤(例如「(0010,0010)」),並具有三個模板屬性值:「(ONIS,ONIS25)」(設備型號)及「430」(影像長度方向的解析度)、及「600」(影像寬度方向的解析度)。 The storage module 112 may include one or more storage devices, and is configured to store a plurality of template data D, D'and a plurality of preset tags T, T'. In this embodiment, each template data D (D’) has at least one de-identification tag, at least one template attribute value, and a template with a specific format. The format of the board image and at least one de-identified area on the template image. In this embodiment, each de-identified area is located at a constant position of the template image. For example, the template data D'indicates a de-identified area (for example, the de-identified area 1202) and a de-identified label (for example, "(0010,0010)"), and has three template attribute values: "( ONIS, ONIS25)” (device model) and “430” (resolution in the length direction of the image), and “600” (resolution in the width direction of the image).

圖12示例性地呈現在一些實施態樣中模板數據D’的一模板影像1201及一去識別化區域1202。 Fig. 12 exemplarily shows a template image 1201 and a de-identification area 1202 of the template data D'in some embodiments.

舉另一個較簡單的例子,模板數據D所包含的每一模板屬性值描述以下資訊其中一者:1.醫事機構(如台大醫院)、2.儀器廠商(如Philips)、3.設備型號(如EPIQ 5)、4.影像類別(如US)、5.影像大小(如640 x 480)、6.影像類型(如Secondary)。模板數據D的所指示的去識別化區域為影像中的多個遮蔽位置。 To give another simpler example, each template attribute value contained in template data D describes one of the following information: 1. Medical institution (such as National Taiwan University Hospital), 2. Instrument manufacturer (such as Philips), 3. Equipment model ( Such as EPIQ 5), 4. Image type (such as US), 5. Image size (such as 640 x 480), 6. Image type (such as Secondary). The de-identified area indicated by the template data D is a plurality of masked positions in the image.

當判斷出醫療影像的屬性值與模板數據D的1~6等屬性值相同時,就可以直接以模板數據D的所指示的去識別化區域去為醫療影像去識別化。 When it is determined that the attribute value of the medical image is the same as the attribute values 1 to 6 of the template data D, the de-identified area indicated by the template data D can be directly used to de-identify the medical image.

在一些情況中,在醫療影像中,並非所有的文字都需要被去識別化。例如,一醫療影像具有對應於病患姓名及對應於影像設備廠商名稱的文字,僅有對應於病患姓名的文字需要被去識別化,而對應於影像設備廠商名稱的文字不需要被去識別化。因此,在產生模板數據D,D’時,可以選擇需要被去識別化的文字。預設標籤T、T’即對應於需要被去識別化的文字,預設標籤T、T’可以被用在後述的產生模板數據的方法。舉例來說,所述預設標籤T’為(0010,0010),其對應於病患姓名。在其他情況中,醫療影像可以具有對應於病患姓名及對應於病患體溫的文字。 In some cases, not all characters in medical images need to be de-identified. For example, a medical image has text corresponding to the name of the patient and the name of the imaging equipment manufacturer. Only the text corresponding to the patient’s name needs to be de-identified, while the text corresponding to the name of the imaging equipment manufacturer does not need to be de-identified.化. Therefore, when the template data D, D'are generated, the characters that need to be de-identified can be selected. The preset tags T, T'correspond to the characters that need to be de-identified, and the preset tags T, T'can be used in the method of generating template data described later. For example, the preset label T'is (0010, 0010), which corresponds to the patient's name. In other cases, the medical image may have text corresponding to the patient's name and corresponding to the patient's body temperature.

所述翻譯模塊113可以是包含必要的硬體、軟體、或韌體的電子 模塊,例如伺服器。所述翻譯模塊133被配置用以文字翻譯成一預定語言。在本實施例中,所述預定語言例如為英文,但不以此為限。在本實施例中,文字包含字母、單詞、數字等形式。 The translation module 113 may be an electronic device including necessary hardware, software, or firmware. Modules, such as servers. The translation module 133 is configured to translate text into a predetermined language. In this embodiment, the predetermined language is, for example, English, but not limited to this. In this embodiment, the text includes letters, words, numbers and other forms.

所述處理單元114所述存儲模塊112及所述翻譯模塊113數據連接。所述處理單元114可以是包括一個或多個硬體、軟體、或韌體的電子模塊,例如伺服器。這些伺服器可以採用集中式的配置或分散式的集群安排。在其他實施態樣中,所述處理單元114可以是單一電腦或高速運算電腦。處理單元114可以通過一通信模塊(圖未示出)所接收的來自所述設備121、122、123的符合DICOM標準的影像數據產生模板數據D(D’),且可以利用模板數據D(D’),對通過所述通信模塊所接收的來自所述設備121、122、123的符合DICOM標準的影像數據進行去識別化。 The processing unit 114, the storage module 112, and the translation module 113 are data connected. The processing unit 114 may be an electronic module including one or more hardware, software, or firmware, such as a server. These servers can adopt a centralized configuration or a distributed cluster arrangement. In other embodiments, the processing unit 114 may be a single computer or a high-speed computing computer. The processing unit 114 can generate template data D (D') from image data conforming to the DICOM standard received from the devices 121, 122, 123 by a communication module (not shown), and can use the template data D (D ') to de-identify the DICOM-compliant image data from the devices 121, 122, and 123 received through the communication module.

所述通信模塊可以是包含必要的硬體、軟體、或韌體的電子模塊,組配來通過合適的通信網絡建立與所述設備121、122、123的數據連接。所述合適的通信網絡可以包括有線和無線介質其中至少一者。所述通信模塊可經由通信網絡接收來自設備121、122、123的數據。在本實施例中,所述通信模塊支援TCP/IP協定,而能接收來自所述設備121、設備122、設備123的符合DICOM標準的影像數據。在其他的實施態樣中,通信模塊也可以被省略,設備121、設備122、設備123所產生的影像數據可以經由其他的路徑被儲存在存儲模塊112中來供處理單元114讀取。 The communication module may be an electronic module including necessary hardware, software, or firmware, and is configured to establish a data connection with the devices 121, 122, and 123 through a suitable communication network. The suitable communication network may include at least one of wired and wireless media. The communication module can receive data from the devices 121, 122, 123 via a communication network. In this embodiment, the communication module supports the TCP/IP protocol, and can receive image data conforming to the DICOM standard from the device 121, the device 122, and the device 123. In other implementations, the communication module may also be omitted, and the image data generated by the device 121, the device 122, and the device 123 may be stored in the storage module 112 via other paths for the processing unit 114 to read.

第2圖示出了根據本公開的一些例示性去識別化系統所實施的去識別化方法的流程圖。 Figure 2 shows a flowchart of a de-identification method implemented by some exemplary de-identification systems according to the present disclosure.

如程序S21,所述處理單元114藉由所述通信模塊111接收來自所述設備121的具有一檔案格式定義的一目標影像數據。在本實施例中,所述目標影像數據例如是符合DICOM標準的影像數據。所述目標影像數據包括屬性 資訊及至一個或多個目標影像,所述屬性資訊包括多個標籤及多個分別對應所述標籤的屬性值。本實施例中,在所述目標影像數據中,所述屬性值至少相關於產生所述目標影像的一設備的型號及所述一個或多個目標影像的影像大小,一般而言,所述目標影像數據的所述屬性值還會相關於所述病患的個人資訊。 In step S21, the processing unit 114 receives a target image data with a file format definition from the device 121 through the communication module 111. In this embodiment, the target image data is, for example, image data conforming to the DICOM standard. The target image data includes attributes Information and to one or more target images, the attribute information includes a plurality of tags and a plurality of attribute values respectively corresponding to the tags. In this embodiment, in the target image data, the attribute value is at least related to the model of a device that generates the target image and the image size of the one or more target images. Generally speaking, the target The attribute value of the image data is also related to the personal information of the patient.

為助於理解,以下應用一個示例性的使用情境來輔助說明後續程序S22至程序S24。在所述使用情境中,目標影像數據僅具有一個目標影像(例如第3圖所示的目標影像310),其屬性資訊具有三個屬性值及三個標籤,一屬性值「(ONIS,ONIS25)」指示出產生所述目標影像數據的設備之型號,並且對應於標籤「(0008,1090)」;另一屬性值「430」指示出所述目標影像的在寬度上的解析度,並對應於標籤「0028,0010」;另一屬性值「600」指示出所述目標影像的在長度上的解析度,並對應於標籤「0028,0011」;另一屬性「CHEN CHI WEN」指示出所述病患的姓名,並對應於標籤「(0010,0010)」。 To facilitate understanding, an exemplary usage scenario is applied below to assist in explaining the subsequent procedures S22 to S24. In the usage scenario, the target image data has only one target image (for example, the target image 310 shown in Figure 3), and its attribute information has three attribute values and three tags, one attribute value "(ONIS,ONIS25) "Indicates the model of the device that generated the target image data, and corresponds to the label "(0008,1090)"; another attribute value "430" indicates the width resolution of the target image, and corresponds to Label "0028,0010"; another attribute value "600" indicates the resolution in length of the target image, and corresponds to the label "0028,0011"; another attribute "CHEN CHI WEN" indicates the The name of the patient and corresponds to the label "(0010,0010)".

第3圖示出了根據本公開的一個例示性的目標影像310。所述目標影像310具有文字311及其他文字,所述文字311為「CHEN CHI WEN」。 FIG. 3 shows an exemplary target image 310 according to the present disclosure. The target image 310 has text 311 and other texts, and the text 311 is "CHEN CHI WEN".

在程序S22中,所述處理單元114根據目標影像數據的所述屬性值選擇存儲於所述存儲模塊112的所述模板數據D、D’其中一者(如圖一所示)。 具體地,所述處理單元114所選擇的所述模板數據D(D’)的模板屬性值匹配於所述目標影像數據的屬性值。沿用前述使用情況,所述處理單元114比對出目標影像數據的屬性值「430」、「600」、「(ONIS,ONIS25)」分別相同於所述模板數據D’的三個模板屬性值(「430」、「600」、「(ONIS,ONIS25)」),即判定所述模板數據D’的模板屬性值匹配於所述目標影像數據的屬性值,進而選擇模板數據D’。如此,目標影像數據中的目標影像之格式便匹配於模板數據D’的模板影像的格式,於是,所述處理單元114所選擇的所述模板數據D’所指示出的去識別化區域1202實質上對應於目標影像310的文字311。在一些實施態樣中, 某些模板數據D也會記錄某些影像像素中沒有個資。 In the procedure S22, the processing unit 114 selects one of the template data D, D'stored in the storage module 112 according to the attribute value of the target image data (as shown in FIG. 1). Specifically, the template attribute value of the template data D (D') selected by the processing unit 114 matches the attribute value of the target image data. Following the aforementioned use case, the processing unit 114 compares the attribute values "430", "600", and "(ONIS,ONIS25)" of the target image data to the three template attribute values of the template data D'( "430", "600", "(ONIS, ONIS25)"), that is, it is determined that the template attribute value of the template data D'matches the attribute value of the target image data, and then the template data D'is selected. In this way, the format of the target image in the target image data matches the format of the template image of the template data D', so the de-identification area 1202 indicated by the template data D'selected by the processing unit 114 is substantially The above corresponds to the text 311 of the target image 310. In some implementation aspects, Some template data D will also record that there is no data in some image pixels.

舉另一個相異於前述使用情況的例子,當模板數據D的模板屬性值包含下列資訊:1.醫事機構(如台大醫院)、2.儀器廠商(如Philips)、3.設備型號(如EPIQ 5)、4.影像類別(如US)、5.影像大小(如640 x 480)、6.影像類型(如Secondary)。則處理單元114在S22中在判斷出目標影像數據的屬性值與模板數據D的1~6等模板屬性值相同時,就會選擇模板數據D。 Take another example that is different from the previous usage. When the template attribute value of template data D contains the following information: 1. Medical institution (such as National Taiwan University Hospital), 2. Instrument manufacturer (such as Philips), 3. Device model (such as EPIQ) 5), 4. Image category (such as US), 5. Image size (such as 640 x 480), 6. Image type (such as Secondary). Then, when the processing unit 114 determines that the attribute value of the target image data is the same as the template attribute values such as 1 to 6 of the template data D in S22, it will select the template data D.

如程序S23,所述處理單元114利用所選擇的所述模板數據D(D’),將所述目標影像中對應於所選擇的所述模板數據D(D’)的去識別化區域的內容去識別化。在本實施例中,所述處理單元114被設置以遮蔽所述去識別化區域內的內容或其部分的方式將去識別化區域內的內容去識別化。在其他實施態樣中,所述處理單元114也可被設置以模糊化去識別化區域內的內容或其部分、去除去識別化區域內的內容或其部分、以其他內容替換去識別化區域內的內容或其部分等等方式將去識別化區域內的內容去識別化。在一些情況中,處理單元114會將所述去識別化區域內的資訊以黑色或是相同於影像周遭顏色的部分遮蔽,或是以馬賽克方式處理。 As in step S23, the processing unit 114 uses the selected template data D(D') to de-identify the content of the target image corresponding to the selected template data D(D') De-identification. In this embodiment, the processing unit 114 is configured to de-identify the content in the de-identified area by masking the content or part of the de-identified area. In other embodiments, the processing unit 114 may also be configured to blur the content or part of the de-identified area, remove the content or part of the de-identified area, and replace the de-identified area with other content. The content or part of the content in the de-identification area is de-identified. In some cases, the processing unit 114 will mask the information in the de-identified area in black or the same color as the surrounding color of the image, or process it in a mosaic manner.

在目標影像數據中,除了目標影像,其屬性值也會透漏病患的敏感資訊。因此,在本實施例中,在程序S23中,除了對目標影像進行去識別化之外,所述處理單元114還將目標影像數據的屬性值其中至少一者去識別化。 換言之,程序S23的去識別化處理中包含了兩個部份的去識別化處理,一種是像素資料上的去識別化處理,一種是屬性的去識別化處理(針對DICOM tag)。在一些情況中,所述目標影像數據的屬性值當中被去識別化的至少一者所對應的標籤是匹配於所述處理單元114所選的所述模板數據D(D’)的所述至少一去識別化標籤。在前述使用情況的目標影像數據的所述標籤「(0008,1090)」、「0028,0010」、「0028,0011」、及「(0010,0010)」當中僅有「(0010,0010)」匹配於所述處理單 元114所選的所述模板數據D’的去識別化標籤「(0010,0010)」,因此,所述處理單元114將「(0010,0010)」所對應的屬性值「CHEN CHI WEN」去識別化。 在本實施例中,所述處理單元114被組配來將屬性值去識別化的方式包含刪除所述屬性值之內容或其部分。在其他實施態樣中,所述處理單元114可以被組配來替換所述屬性值之內容或其部分。在其他實施態樣中,所述處理單元114可以被組配來利用特殊的運算邏輯將所述屬性值之內容或其部份轉換成另一群相異於所述屬性值的符號,例如將病患的ID轉換成一群數字,如此,當需要時,可以利用所述運算邏輯反推所述屬性值之內容。在其他的實施態樣中,屬性值的去識別化可以是全域去處理的,例如直接指定哪些屬性值要識別化,然後就直接對所有的影像資料做處理,如此,模板數據裡不需要特別指定哪些的影像要特別去識別化哪些屬性值。 In the target image data, in addition to the target image, its attribute values also reveal sensitive patient information. Therefore, in this embodiment, in the procedure S23, in addition to de-identifying the target image, the processing unit 114 also de-identifying at least one of the attribute values of the target image data. In other words, the de-identification processing of the program S23 includes two parts of the de-identification processing, one is the de-identification processing on the pixel data, and the other is the attribute de-identification processing (for DICOM tags). In some cases, the tag corresponding to at least one of the attribute values of the target image data that is de-identified matches the at least one of the template data D(D') selected by the processing unit 114 One to identify the label. Among the tags "(0008,1090)", "0028,0010", "0028,0011", and "(0010,0010)" in the target image data of the aforementioned use case, only "(0010,0010)" Match the handling order The de-identification label "(0010,0010)" of the template data D'selected by the element 114, therefore, the processing unit 114 removes the attribute value "CHEN CHI WEN" corresponding to "(0010,0010)" Identification. In this embodiment, the processing unit 114 configured to de-identify the attribute value includes deleting the content or part of the attribute value. In other implementation aspects, the processing unit 114 may be configured to replace the content or part of the attribute value. In other embodiments, the processing unit 114 can be configured to use special arithmetic logic to convert the content or part of the attribute value into another group of symbols that are different from the attribute value, for example, the The ID of the patient is converted into a group of numbers, so that when necessary, the content of the attribute value can be reversed by the arithmetic logic. In other implementations, the de-identification of attribute values can be processed globally, for example, directly specify which attribute values are to be identified, and then directly process all the image data, so that there is no need to specialize in the template data Specify which images should be specifically identified which attribute values.

最後,如程序S24,所述處理單元114根據所述已去識別化的目標影像數據,產生一輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像及已去識別化的屬性值。要注意的是,產生的所述已去識別化的目標影像為單一圖層影像,並非將去模板影像與目標影像以不同圖層方式疊合在一起。 Finally, in step S24, the processing unit 114 generates output image data based on the de-identified target image data, the output image data having the de-identified target image and the de-identified target image Attribute value. It should be noted that the generated de-identified target image is a single-layer image, instead of superimposing the de-template image and the target image in different layers.

第4圖示出了根據本公開的一個例示性的經去識別化的目標影像410,其中,病患的姓名(即去識別化區域內的內容)已被一遮罩411遮蔽。 FIG. 4 shows an exemplary de-identified target image 410 according to the present disclosure, in which the patient's name (ie, the content in the de-identified area) has been obscured by a mask 411.

值得說明的是,來自於特定設備型號的目標影像數據通常具有相似的影像解析度及佈局,因此,處理單元114利用所選的模板數據D(D’)所指示出的去識別化區域,能快速而有效地將大量的目標影像的特定文字去識別化。 It is worth noting that the target image data from a specific device model usually has a similar image resolution and layout. Therefore, the processing unit 114 uses the de-identification area indicated by the selected template data D(D') to be able to Quickly and effectively de-identify a large number of specific characters in the target image.

另一方面,在不使用前述去識別化方法的情況下,若改用其他去識別化方法,例如,對每一張目標影像利用光學字元辨識(Optical Character Recognition,OCR)來辨識出醫療影像上的所有文字後再對所有文字進行遮蔽, 則所需時間與影像大小成正相關,大約2~5秒,在需要對大量醫療影像進行去識別化處理的情況下,所需工作時間長,以單日處理一百萬張醫療影像為例,單一執行緒需耗費八百小時,顯著地增加浪費時間及運算資源等成本。 On the other hand, if the aforementioned de-identification method is not used, other de-identification methods can be used instead, for example, the use of optical character recognition (OCR) for each target image to identify the medical image Mask all the text after all the text on it, The time required is positively related to the size of the image, about 2 to 5 seconds. When a large number of medical images need to be de-identified, the working time is long. Take the processing of one million medical images in a single day as an example. A single thread requires 800 hours, which significantly increases the cost of wasting time and computing resources.

第5圖示出了根據本公開的一些例示性去識別化系統所實施的產生模板數據的方法的流程圖。以下將以產生模板數據D’為例進行說明。 Figure 5 shows a flowchart of a method for generating template data implemented by some exemplary de-identification systems according to the present disclosure. The following will take the template data D'as an example for description.

首先,如程序S51,所述處理單元114接收具有一檔案格式定義的樣本數據。在本實施例中,所述樣本數據例如是符合DICOM標準的影像數據。 所述樣本數據具有一個或多個樣本影像及屬性資訊。所述屬性資訊包括多個標籤及多個分別對應所述標籤的屬性值。每一樣本影像具有圖像資訊及文字。為助於理解,以下應用一個示例性的使用情境來輔助說明後續程序S52至程序S55,在所述例子中,樣本影像數據僅具有三個樣本影像。在所述例子中,所述樣本數據具有一屬性值「HUANG WEI DA」(病患姓名)及一對應的標籤「(0010,0010)」。 First, in step S51, the processing unit 114 receives sample data with a file format definition. In this embodiment, the sample data is, for example, image data conforming to the DICOM standard. The sample data has one or more sample images and attribute information. The attribute information includes multiple tags and multiple attribute values respectively corresponding to the tags. Each sample image has image information and text. To facilitate understanding, an exemplary usage scenario is applied below to assist in explaining the subsequent procedures S52 to S55. In the example, the sample image data has only three sample images. In the example, the sample data has an attribute value "HUANG WEI DA" (patient name) and a corresponding tag "(0010, 0010)".

第6圖至第8圖示出了根據本公開的一些例示性的樣本影像610、710、810。 Figures 6 to 8 show some exemplary sample images 610, 710, 810 according to the present disclosure.

第6圖示例性地呈現出所述樣本影像610所具有的圖像資訊611及文字612、612’。所述圖像資訊611為一超音波影像,所述文字612相關於病患之體溫,所述文字612’為患者姓名。 Fig. 6 exemplarily shows the image information 611 and text 612, 612' of the sample image 610. The image information 611 is an ultrasound image, the text 612 is related to the patient's body temperature, and the text 612' is the patient's name.

接著,如程序S52,所述處理單元114對每一樣本影像執行一預定程序。所述預定程序包含程序S521及程序S522。 Then, as in procedure S52, the processing unit 114 executes a predetermined procedure for each sample image. The predetermined procedure includes procedure S521 and procedure S522.

在程序S521中,處理單元114利用光學字元辨識(Optical Character Recognition,OCR)來辨識所述樣本影像當中的所述文字。以第6圖的樣本影像610為例,所述處理單元114辨識出的文字612、612’例如包含「37.0C」、「<37.0C」、及「HUANG WEI DA」等符號。 In the procedure S521, the processing unit 114 uses Optical Character Recognition (OCR) to recognize the characters in the sample image. Taking the sample image 610 in FIG. 6 as an example, the characters 612, 612' recognized by the processing unit 114 include symbols such as "37.0C", "<37.0C", and "HUANG WEI DA".

在程序S522中,所述處理單元11選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字。在一些情況中,並非所有的符號都需要被去識別化,例如「HUANG WEI DA」需要被去識別化,而「37.0C」、「<37.0C」不需要被去識別化,程序S522是用來在所有被OCR辨識出的文字中選出需要被去識別化的(即符合所述預定條件的)文字。在本實施例中,所述處理單元114藉由以下程序判定每一文字是否符合所述預定條件。為方便說明,以下以判定文字612’(即「HUANG WEI DA」)是否符合所述預定條件為例子進行說明。 In the procedure S522, the processing unit 11 selects a text that meets a predetermined condition among the text recognized from the sample image. In some cases, not all symbols need to be de-identified. For example, "HUANG WEI DA" needs to be de-identified, while "37.0C" and "<37.0C" do not need to be de-identified. Procedure S522 is used To select the characters that need to be de-identified (that is, those that meet the predetermined conditions) from all the characters recognized by the OCR. In this embodiment, the processing unit 114 uses the following procedure to determine whether each character meets the predetermined condition. For the convenience of description, the following takes the determination of whether the text 612' (ie "HUANG WEI DA") meets the predetermined conditions as an example for description.

首先,所述處理單元114判定所述文字是否匹配於所述樣本數據的所述屬性值其中一者。延用前例,處理單元114比對所述樣本數據的所述屬性值,並且判斷是否存在與文字612’(即「HUANG WEI DA」)相同或近似的屬性值。 First, the processing unit 114 determines whether the text matches one of the attribute values of the sample data. Following the previous example, the processing unit 114 compares the attribute value of the sample data and determines whether there is an attribute value that is the same as or similar to the text 612' (ie, "HUANG WEI DA").

接著,所述處理單元114在判定出所述文字匹配於所述樣本數據的所述屬性值其中一者時,判定匹配於所述文字的所述屬性值所對應的所述標籤是否相同於儲存在所述存儲模塊112的所述預設標籤T、T’其中一者。延續前例,處理單元114判定出所述樣本數據的屬性值「HUANG WEI DA」確實相同於文字612’(即「HUANG WEI DA」),因此,判定是否存在與屬性值「HUANG WEI DA」所對應的標籤(即「(0010,0010)」)相同的預設標籤T(T’)。 Then, when the processing unit 114 determines that the character matches one of the attribute values of the sample data, it determines whether the label corresponding to the attribute value matching the character is the same as the stored one. One of the preset tags T and T′ in the storage module 112. Continuing the previous example, the processing unit 114 determines that the attribute value "HUANG WEI DA" of the sample data is indeed the same as the text 612' (ie "HUANG WEI DA"), and therefore determines whether there is a corresponding attribute value "HUANG WEI DA" The label (ie "(0010,0010)") is the same as the default label T(T').

然後,所述處理單元114在判定出匹配於所述文字的所述屬性值所對應的所述標籤相同於所述預設標籤T、T’其中一者時,判定所述文字符合所述預定條件。延續前例,所述處理單元114判定出所述預設標籤T’(即「(0010,0010)」)相同於屬性值「HUANG WEI DA」所對應的標籤(即「(0010,0010)」),故判定文字612’(即「HUANG WEI DA」)符合所述預定條件。 Then, when the processing unit 114 determines that the tag corresponding to the attribute value matching the text is the same as one of the preset tags T, T', it determines that the text meets the predetermined condition. Continuing the previous example, the processing unit 114 determines that the default tag T'(ie "(0010,0010)") is the same as the tag corresponding to the attribute value "HUANG WEI DA" (ie "(0010,0010)") , So it is determined that the text 612' (ie, "HUANG WEI DA") meets the predetermined conditions.

在本實施例中,所述處理單元114還藉由所述翻譯模塊113,將所述辨識自所述樣本影像當中的所述文字其中不符合所述預定條件的文字翻譯 為預定語言,並且判定所述被翻譯過的文字是否符合所述預定條件,再選擇所述所述被翻譯過的文字其中符合所述預定條件者。在本實施例中,所述預定語言例如為英文,但不以此為限。在其他的示例性情況中,若處理單元114辨識出的文字為相異於英文的其他語言文字,例如中文,在程序S522中便可能獲得所述中文的文字不符合所述預定條件的判定結果。所述處理單元114藉由所述翻譯模塊113,將所述中文的文字翻譯為英文,並再次判斷被翻譯為英文的文字是否符合所述預定條件,並選擇所述所述被翻譯過的文字其中符合所述預定條件者。如此,降低產生語言差異所致的誤判的機會。 In this embodiment, the processing unit 114 also uses the translation module 113 to translate the text recognized from the sample image, which does not meet the predetermined condition. It is a predetermined language, and it is determined whether the translated text meets the predetermined condition, and then the translated text meets the predetermined condition. In this embodiment, the predetermined language is, for example, English, but not limited to this. In other exemplary situations, if the characters recognized by the processing unit 114 are other language characters that are different from English, such as Chinese, it is possible to obtain the determination result that the Chinese characters do not meet the predetermined condition in the procedure S522. . The processing unit 114 uses the translation module 113 to translate the Chinese text into English, and again determines whether the translated text meets the predetermined condition, and selects the translated text Among them, those who meet the predetermined conditions. In this way, the chance of misjudgment caused by language differences is reduced.

判定文字是否符合所述預定條件的方式不以本實施例為限。雖然DICOM檔案中標籤(0010,0010)所對應的屬性值應該是病患姓名,但在一些情況中,標籤(0010,0010)所對應的屬性值的內容可能會隨著不同的來源醫院或儀器有所不同,可能會填中文姓名、英譯姓名、病歷號、身份證號、甚至沒填。因此,在其他的實施態樣中,可以利用正確性比較高的額外來源(例如健保署)的額外資料取得其真實姓名及其他個資,當取得的真實姓名是中文姓名,會再以各種英譯系統去轉換成英譯姓名,以便用來比對出利用光學字元辨識所辨識出的所述文字裡的姓名,然後再把比對結果用來建立遮蔽。以健保署為例,額外資料可以是和影像(例如DICOM檔案)一起申報上來的資料。 The manner of determining whether the text meets the predetermined condition is not limited to this embodiment. Although the attribute value corresponding to the tag (0010, 0010) in the DICOM file should be the patient's name, in some cases, the content of the attribute value corresponding to the tag (0010, 0010) may vary depending on the source hospital or equipment It is different. It may fill in Chinese name, English translation name, medical record number, ID number, or even not fill in. Therefore, in other implementation modes, additional information from additional sources with higher accuracy (such as the National Health Insurance Agency) can be used to obtain their real name and other personal information. When the real name obtained is a Chinese name, it will be written in various English. The translation system converts the name into an English translation, so as to compare the name in the text recognized by optical character recognition, and then use the comparison result to create a mask. Take the National Health Insurance Agency as an example, the additional information can be the information declared together with the image (such as DICOM files).

參閱圖6至圖8,在前述的例子中,在此階段,所述處理單元114判定出所述樣本影像610、710、810各自具有一符合所述預定條件的文字612’、712’、812’。 6 to 8, in the foregoing example, at this stage, the processing unit 114 determines that the sample images 610, 710, and 810 each have a text 612', 712', 812 that meets the predetermined conditions '.

於是,接下來,如程序S53,所述處理單元114產生模板數據,所述模板數據指示出去識別D(D’)化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所述模板數據D(D’)儲存至存儲模塊111。在本實施例中,所述處理單元114是根據辨識自所述樣本影像且符合所述預定條件的所述文字 其中占據最大範圍者決定所述去識別化區域的涵蓋範圍,如此,能有效遮蔽大部分長度的資訊。細節說明如下。 Then, next, as in procedure S53, the processing unit 114 generates template data, the template data instructs to identify the D(D') region, and the de-identification region corresponds to the text that meets the predetermined condition, and The template data D (D′) is stored in the storage module 111. In this embodiment, the processing unit 114 is based on the text that is recognized from the sample image and meets the predetermined condition. Among them, the one that occupies the largest area determines the coverage of the de-identified area, so that most length information can be effectively shielded. The details are as follows.

參閱圖9至圖11,首先,所述處理單元114在所述樣本影像中界定出參考區域,所述參考區域分別對應地涵蓋所述符合所述預定條件的文字的所在區域。例如,所述處理單元114在所述樣本影像610界定出參考區域913,其涵蓋文字612’(即「HUANG WEI DA」)的所在區域;在所述樣本影像710界定出參考區域1003,其涵蓋文字712’(即「CHEN CHI WEN」)的所在區域;在所述樣本影像810界定出參考區域1103,其涵蓋文字812’(即「ALICE WANG」)的所在區域。 Referring to FIGS. 9 to 11, first, the processing unit 114 defines a reference area in the sample image, and the reference area respectively covers the area where the text that meets the predetermined condition is located. For example, the processing unit 114 defines a reference area 913 in the sample image 610, which covers the area where the text 612' (ie "HUANG WEI DA") is located; defines a reference area 1003 in the sample image 710, which covers The area where the text 712' (ie "CHEN CHI WEN") is located; the sample image 810 defines a reference area 1103, which covers the area where the text 812' (ie, "ALICE WANG") is located.

接著,參閱圖9至圖11,所述處理單元114還將所述樣本影像的所述參考區域其中位於每一樣本影像的同一位置者分別作為多個比較區域。 延續前例,由於前例中的所述參考區域913、1003、1103皆位於每一樣本影像610(710或810)當中的同一位置,於是,所述三個參考區域913、1003、1103分別被作為三個比較區域914、1004、1104。 Next, referring to FIGS. 9 to 11, the processing unit 114 also uses the reference regions of the sample images that are located at the same position in each sample image as multiple comparison regions. Continuing the previous example, since the reference areas 913, 1003, and 1103 in the previous example are all located at the same position in each sample image 610 (710 or 810), the three reference areas 913, 1003, and 1103 are respectively regarded as three Two comparison areas 914, 1004, 1104.

接著,處理單元114根據所述所述比較區域其中一涵蓋最大範圍者決定對應於所述比較區域的去識別化區域的涵蓋範圍。在所述比較區域914、1004、1104中,比較區域914所占據的範圍最大,因此,圖12中,去識別化區域1202是對應於所述比較區域914、1004、1104,且去識別化區域1202的所占範圍之大小是根據比較區域914所占據的範圍來決定,如此,去識別化區域1202能有效遮蔽大部分長度的姓名。 Then, the processing unit 114 determines the coverage of the de-identified area corresponding to the comparison area according to one of the comparison areas that covers the largest area. Among the comparison areas 914, 1004, and 1104, the comparison area 914 occupies the largest range. Therefore, in FIG. 12, the de-identification area 1202 corresponds to the comparison areas 914, 1004, and 1104, and the de-identification area The size of the area occupied by 1202 is determined according to the area occupied by the comparison area 914. In this way, the de-identification area 1202 can effectively cover most of the length of names.

補充說明者,在其他的情況中,當處理單元114所接收的樣本影像數據僅包含一個樣本影像時,對於每一參考區域,僅會獲得一個比較區域,而所述比較區域即為占據範圍最大者。 To supplement, in other cases, when the sample image data received by the processing unit 114 contains only one sample image, for each reference area, only one comparison area is obtained, and the comparison area is the largest occupied area. By.

圖13示出了根據本公開的一些例示性去識別化系統(例如去識別 化系統110)所實施的去識別化方法的流程圖。 Figure 13 shows some exemplary de-identification systems according to the present disclosure (e.g., de-identification The flow chart of the de-identification method implemented by the system 110).

S131:處理單元(例如處理單元114)接收具有樣本影像的樣本數據,所述樣本影像具有文字。 S131: The processing unit (for example, the processing unit 114) receives sample data with sample images, the sample images having text.

S132:所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字。 S132: The processing unit uses Optical Character Recognition (OCR) to recognize the text in the sample image.

S133:處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字。 S133: The processing unit selects a word that meets a predetermined condition among the words recognized from the sample image.

S134:所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所述模板數據儲存至一存儲模塊(例如存儲模塊112)。 S134: The processing unit generates template data, the template data indicates an identification area, and the de-identification area corresponds to a text that meets the predetermined condition, and stores the template data in a storage module (for example, storage Module 112).

S135:所述處理單元接收具有目標影像的目標影像數據。 S135: The processing unit receives target image data with a target image.

S136:所述處理單元選擇儲存在所述存儲模塊的多個模板數據其中一者。 S136: The processing unit selects one of a plurality of template data stored in the storage module.

S137:所述處理單元將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化。 S137: The processing unit de-identifies the content of the de-identified area corresponding to the selected template data in the target image.

S138:所述處理單元產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像。 S138: The processing unit generates output image data, the output image data having the de-identified target image.

綜上所述,在上述實施例中,處理單元114不僅能根據所接收的樣本影像數據及預設標籤T(T‘),利用OCR產生對應的模板數據D(D’),還能根據已存儲在存儲模塊112的模板數據D、D’快速地對接收到的目標影像數據去識別化,如此,能有效降低將影像數據去識別化所需的時間及運算資源等成本。 In summary, in the above-mentioned embodiment, the processing unit 114 can not only generate the corresponding template data D(D') based on the received sample image data and the preset label T(T') using OCR, but also according to the existing The template data D and D'stored in the storage module 112 quickly de-identify the received target image data. In this way, the time and computing resources required for de-identifying the image data can be effectively reduced.

惟以上所述者,僅為本發明之實施例而已,當不能以此限定本發明實施之範圍,凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。 However, the above are only examples of the present invention. When the scope of implementation of the present invention cannot be limited by this, all simple equivalent changes and modifications made in accordance with the scope of the patent application of the present invention and the content of the patent specification still belong to This invention patent covers the scope.

S131~S138:程序 S131~S138: Program

Claims (13)

一種去識別化方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並儲存所產生的模板數據;所述處理單元接收具有目標影像的目標影像數據;所述處理單元選擇已儲存的多個模板數據其中一者;所述處理單元將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像;其中:所述樣本數據包含多個樣本影像;所述處理單元根據辨識自所述樣本影像且符合所述預定條件的所述文字其中占據最大範圍者決定所述去識別化區域的涵蓋範圍。 A de-identification method includes: a processing unit receives sample data with a sample image, the sample image has text; the processing unit uses optical character recognition (Optical Character Recognition, OCR) to identify the sample image The text; the processing unit selects and recognizes from the text in the sample image the text that meets a predetermined condition; the processing unit generates template data, the template data indicates the identification area, the de-identification The transformation area corresponds to the text that meets the predetermined conditions and stores the generated template data; the processing unit receives target image data with a target image; the processing unit selects one of the stored template data; The processing unit de-identifies the content of the de-identified area corresponding to the selected template data in the target image; and the processing unit generates output image data, the output image data having the de-identified The target image; wherein: the sample data includes a plurality of sample images; the processing unit determines the de-identification area according to the text that is recognized from the sample image and meets the predetermined condition, which occupies the largest range The scope of coverage. 如請求項1所述的去識別化方法,其中:所述樣本數據還具有多個標籤及多個分別對應所述標籤的屬性值;及 所述處理單元藉由以下程序判定所述文字是否符合所述預定條件:判定所述文字是否匹配於所述樣本數據的所述屬性值其中一者,在判定出所述文字匹配於所述樣本數據的所述屬性值其中一者時,判定匹配於所述文字的所述屬性值所對應的所述標籤是否相同於多個預設標籤其中一者,及在判定出匹配於所述文字的所述屬性值所對應的所述標籤相同於所述預設標籤其中一者時,判定所述文字符合所述預定條件。 The de-identification method according to claim 1, wherein: the sample data further has a plurality of tags and a plurality of attribute values corresponding to the tags; and The processing unit determines whether the text meets the predetermined condition by the following procedure: determining whether the text matches one of the attribute values of the sample data, and after determining that the text matches the sample When one of the attribute values of the data is determined, it is determined whether the tag corresponding to the attribute value matching the text is the same as one of a plurality of preset tags, and when it is determined that the tag matching the text is When the tag corresponding to the attribute value is the same as one of the preset tags, it is determined that the text meets the predetermined condition. 如請求項1所述的去識別化方法,還包含:一與所述處理單元數據連接的翻譯模塊將所述辨識自所述樣本影像當中的所述文字其中不符合所述預定條件的文字翻譯為預定語言;及所述處理單元判定所述被翻譯過的文字是否符合所述預定條件。 The de-identification method according to claim 1, further comprising: a translation module connected to the processing unit data to translate the text recognized from the sample image, and the text that does not meet the predetermined condition is translated Is a predetermined language; and the processing unit determines whether the translated text meets the predetermined condition. 如請求項1所述的去識別化方法,其中:每一模板數據具有至少一模板屬性值;所述目標影像數據還具有至少一屬性值;及所述處理單元在選擇儲存在所述存儲模塊的多個模板數據其中一者時,所選擇的所述模板數據的所述至少一模板屬性值匹配於所述目標影像數據的至少一屬性值。 The de-identification method according to claim 1, wherein: each template data has at least one template attribute value; the target image data also has at least one attribute value; and the processing unit is selecting and storing in the storage module When one of the plurality of template data is selected, the at least one template attribute value of the selected template data matches at least one attribute value of the target image data. 如請求項4所述的去識別化方法,其中: 每一模板屬性值所描述的資訊包含一醫事機構、一儀器廠商、一設備型號、一影像類別、一影像大小、及一影像類型其中一者。 The de-identification method according to claim 4, wherein: The information described by each template attribute value includes one of a medical institution, an instrument manufacturer, a device model, an image type, an image size, and an image type. 一種產生模板數據的方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;及所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字;其中:所述樣本數據包含多個樣本影像;所述處理單元根據辨識自所述樣本影像且符合所述預定條件的所述文字其中占據最大範圍者決定所述去識別化區域的涵蓋範圍。 A method for generating template data includes: a processing unit receives sample data with a sample image, the sample image has text; the processing unit uses optical character recognition (OCR) to identify the sample image The text; the processing unit selects and recognizes the text in the sample image which meets a predetermined condition text; and the processing unit generates template data, the template data indicating the recognition area, the The de-identified area corresponds to the text that meets the predetermined condition; wherein: the sample data includes a plurality of sample images; the processing unit occupies the largest portion of the text that is identified from the sample image and meets the predetermined condition The scope person determines the scope of the de-identified area. 如請求項6所述的產生模板數據的方法,其中:所述樣本數據還具有多個標籤及多個分別對應所述標籤的屬性值;及所述處理單元藉由以下程序判定所述文字是否符合所述預定條件:判定所述文字是否匹配於所述樣本數據的所述屬性值其中一者,在判定出所述文字匹配於所述樣本數據的所述屬性值其中一者時,判定匹配於所述文字的所述屬性值所對應的所述標籤是否相同於多個預設標籤其中一者,及 在判定出匹配於所述文字的所述屬性值所對應的所述標籤相同於所述預設標籤其中一者時,判定所述文字符合所述預定條件。 The method for generating template data according to claim 6, wherein: the sample data further has a plurality of tags and a plurality of attribute values corresponding to the tags; and the processing unit determines whether the text is Meet the predetermined condition: determine whether the character matches one of the attribute values of the sample data, and when it is determined that the character matches one of the attribute values of the sample data, determine a match Whether the tag corresponding to the attribute value of the text is the same as one of a plurality of preset tags, and When it is determined that the tag corresponding to the attribute value matching the text is the same as one of the preset tags, it is determined that the text meets the predetermined condition. 如請求項6所述的產生模板數據的方法,還包含:一與所述處理單元數據連接的翻譯模塊將所述辨識自所述樣本影像當中的所述文字其中不符合所述預定條件的文字翻譯為預定語言;及所述處理單元判定所述被翻譯過的文字是否符合所述預定條件。 The method for generating template data according to claim 6, further comprising: a translation module connected to the processing unit data to recognize the text in the sample image, and the text does not meet the predetermined condition Translate into a predetermined language; and the processing unit determines whether the translated text meets the predetermined condition. 一種去識別化系統,包含:一存儲模塊,存儲多個模板數據;及一處理單元,與所述存儲模塊數據連接;所述處理單元組配來接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元組配來利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元組配來選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;所述處理單元組配來產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所產生的模板數據儲存至所述存儲模塊;所述處理單元組配來接收具有目標影像的目標影像數據;所述處理單元組配來組配來選擇儲存在所述存儲模塊的多個模板數據其中一者; 所述處理單元組配來將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元組配來產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像;其中:所述樣本數據包含多個樣本影像;所述處理單元組配來根據辨識自所述樣本影像且符合所述預定條件的所述文字其中占據最大範圍者決定所述去識別化區域的涵蓋範圍。 A de-identification system includes: a storage module that stores a plurality of template data; and a processing unit connected to the storage module for data; the processing unit is configured to receive sample data with a sample image, the sample The image has text; the processing unit is configured to use optical character recognition (OCR) to recognize the text in the sample image; the processing unit is configured to select the text that is recognized from the sample image Among the texts, texts that meet a predetermined condition; the processing unit is configured to generate template data, the template data indicates the identification area, the de-identification area corresponds to the text that meets the predetermined condition, and The generated template data is stored in the storage module; the processing unit is configured to receive target image data with target images; the processing unit is configured to configure to select a plurality of template data stored in the storage module One of them The processing unit is configured to de-identify the content of the de-identified area corresponding to the selected template data in the target image; and the processing unit is configured to generate output image data, the output image The data has the de-identified target image; wherein: the sample data includes a plurality of sample images; the processing unit is configured to occupy the image according to the text identified from the sample image and meet the predetermined condition The person with the largest range determines the coverage of the de-identified area. 如請求項9所述的去識別化系統,其中:所述存儲模塊還儲存多個預設標籤;所述樣本數據還具有多個標籤及多個分別對應所述標籤的屬性值;所述處理單元組配來藉由以下程序判定所述文字是否符合所述預定條件:判定所述文字是否匹配於所述樣本數據的所述屬性值其中一者,在判定出所述文字匹配於所述樣本數據的所述屬性值其中一者時,判定匹配於所述文字的所述屬性值所對應的所述標籤是否相同於多個預設標籤其中一者,及在判定出匹配於所述文字的所述屬性值所對應的所述標籤相同於所述預設標籤其中一者時,判定所述文字符合所述預定條件。 The de-identification system according to claim 9, wherein: the storage module further stores a plurality of preset tags; the sample data further has a plurality of tags and a plurality of attribute values corresponding to the tags; the processing The unit is assembled to determine whether the text meets the predetermined condition by the following procedure: it is determined whether the text matches one of the attribute values of the sample data, and when it is determined that the text matches the sample When one of the attribute values of the data is determined, it is determined whether the tag corresponding to the attribute value matching the text is the same as one of a plurality of preset tags, and when it is determined that the tag matching the text is When the tag corresponding to the attribute value is the same as one of the preset tags, it is determined that the text meets the predetermined condition. 如請求項9所述的去識別化系統,還包含: 一翻譯模塊,與所述處理單元數據連接,組配來將所述辨識自所述樣本影像當中的所述文字其中不符合所述預定條件的文字翻譯為一預定語言;及所述處理單元組配來判定所述被翻譯過的文字是否符合所述預定條件。 The de-identification system as described in claim 9, further comprising: A translation module, which is connected to the processing unit for data, configured to translate the text that does not meet the predetermined condition among the text recognized in the sample image into a predetermined language; and the processing unit group Is configured to determine whether the translated text meets the predetermined condition. 如請求項9所述的去識別化系統,其中:每一模板數據具有至少一模板屬性值;所述目標影像數據還具有至少一屬性值;及所述處理單元在選擇儲存在所述存儲模塊的多個模板數據其中一者時,所選擇的所述模板數據的所述至少一模板屬性值匹配於所述目標影像數據的至少一屬性值。 The de-identification system according to claim 9, wherein: each template data has at least one template attribute value; the target image data also has at least one attribute value; and the processing unit is selectively storing in the storage module When one of the plurality of template data is selected, the at least one template attribute value of the selected template data matches at least one attribute value of the target image data. 如請求項12所述的去識別化系統,其中:每一模板屬性值所描述的資訊包含一醫事機構、一儀器廠商、一設備型號、一影像類別、一影像大小、及一影像類型其中一者。 The de-identification system according to claim 12, wherein: the information described by each template attribute value includes one of a medical institution, an instrument manufacturer, a device model, an image type, an image size, and an image type By.
TW108107944A 2019-03-08 2019-03-08 De-identification method and system thereof, method of generating templet data TWI705459B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW108107944A TWI705459B (en) 2019-03-08 2019-03-08 De-identification method and system thereof, method of generating templet data
CN201910214546.XA CN111667415A (en) 2019-03-08 2019-03-20 De-identification method and system and method for generating template data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108107944A TWI705459B (en) 2019-03-08 2019-03-08 De-identification method and system thereof, method of generating templet data

Publications (2)

Publication Number Publication Date
TW202034347A TW202034347A (en) 2020-09-16
TWI705459B true TWI705459B (en) 2020-09-21

Family

ID=72382203

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108107944A TWI705459B (en) 2019-03-08 2019-03-08 De-identification method and system thereof, method of generating templet data

Country Status (2)

Country Link
CN (1) CN111667415A (en)
TW (1) TWI705459B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080080009A1 (en) * 2006-09-28 2008-04-03 Fujitsu Limited Electronic watermark embedding apparatus and electronic watermark detection apparatus
CN101166260A (en) * 2007-09-12 2008-04-23 华为技术有限公司 Method and device for image encoding and watermark removal
CN103488630A (en) * 2013-09-29 2014-01-01 小米科技有限责任公司 Method, device and terminal for processing picture
TWM550424U (en) * 2017-06-30 2017-10-11 萬能學校財團法人萬能科技大學 Smart reading assistance device with translation function

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101389005A (en) * 2007-09-11 2009-03-18 华为技术有限公司 A method and device for blocking a specific position of an image
CN104021350B (en) * 2014-05-13 2016-07-06 小米科技有限责任公司 Privacy information hidden method and device
US20160307063A1 (en) * 2015-04-16 2016-10-20 Synaptive Medical (Barbados) Inc. Dicom de-identification system and method
CN106022142B (en) * 2016-05-04 2019-12-10 泰康保险集团股份有限公司 Image privacy information processing method and device
CN106131360A (en) * 2016-06-15 2016-11-16 珠海市魅族科技有限公司 Image data sending method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080080009A1 (en) * 2006-09-28 2008-04-03 Fujitsu Limited Electronic watermark embedding apparatus and electronic watermark detection apparatus
CN101166260A (en) * 2007-09-12 2008-04-23 华为技术有限公司 Method and device for image encoding and watermark removal
CN103488630A (en) * 2013-09-29 2014-01-01 小米科技有限责任公司 Method, device and terminal for processing picture
TWM550424U (en) * 2017-06-30 2017-10-11 萬能學校財團法人萬能科技大學 Smart reading assistance device with translation function

Also Published As

Publication number Publication date
TW202034347A (en) 2020-09-16
CN111667415A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
US11602302B1 (en) Machine learning based non-invasive diagnosis of thyroid disease
US8200505B2 (en) System and method for creating and rendering DICOM structured clinical reporting via the internet
US8688476B2 (en) Interoperability tools and procedures to aggregate and consolidate lab test results
US20140006926A1 (en) Systems and methods for natural language processing to provide smart links in radiology reports
US20090048866A1 (en) Rules-Based System For Routing Evidence and Recommendation Information to Patients and Physicians By a Specialist Based on Mining Report Text
US20140215301A1 (en) Document template auto discovery
US12530860B2 (en) Systems and methods for using AI to identify regions of interest in medical images
US9372916B2 (en) Document template auto discovery
EP4511762A1 (en) Machine learning for data anonymization
US10395405B2 (en) Removing identifying information from image data on computing devices using markers
WO2022269504A1 (en) System and method for privacy risk assessment and mitigatory recommendation
US20190027149A1 (en) Documentation tag processing system
US7840041B2 (en) Device for converting medical image data
TWI705459B (en) De-identification method and system thereof, method of generating templet data
US20070140538A1 (en) Method for processing unenhanced medical images
US10114808B2 (en) Conflict resolution of originally paper based data entry
KR102572802B1 (en) Server for supporting automation and unification of malfunction reception, and system
US20220375071A1 (en) Systems and methods to process electronic images to categorize intra-slide specimen tissue type
TWM585395U (en) System for processing insurance claims using long-short term memory model of deep learning
JP2010250406A (en) Medical image processing apparatus and program
KR102410848B1 (en) De-identification method of electronic apparatus for de-identifying personal identification information in images
JP2010257276A (en) Medical image capturing device and program
US20160350483A1 (en) Concepts for extracting lab data
US20250209202A1 (en) Computer system and methods for augmenting a graphical user interface
KR102893937B1 (en) Apparatus and method for interprenting medical record sheet