TWI705459B - De-identification method and system thereof, method of generating templet data - Google Patents
De-identification method and system thereof, method of generating templet data Download PDFInfo
- Publication number
- TWI705459B TWI705459B TW108107944A TW108107944A TWI705459B TW I705459 B TWI705459 B TW I705459B TW 108107944 A TW108107944 A TW 108107944A TW 108107944 A TW108107944 A TW 108107944A TW I705459 B TWI705459 B TW I705459B
- Authority
- TW
- Taiwan
- Prior art keywords
- text
- data
- processing unit
- sample
- image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
Description
本發明係關於一種對影像的去識別化方法,尤其是一種自動化建立模板來對影像去識別化的方法。 The present invention relates to a method for de-identifying images, in particular to a method for automatically creating templates to de-identify images.
在許多產業中,特定數據有機會從原機構被分享到另一機構,以供被利用在例如任務外包、學術研究等用途。因此,有機會使得數據中需要被保密的內容暴露於未經授權的機構。舉例來說,《健康保險隱私及責任法案》(Health Insurance Portability and Accountability Act,HIPAA)規定要求不披露病人的個人信息,而醫療影像數據通常包含病人的個人數據(例如包含姓名、身分證字號、病歷號碼等等),故學術機構根據接收自醫療機構的醫療影像數據進行研究時,會一併獲得病人的個人數據,使得需被保密的個人數據暴露於未經授權的學術機構。為此,醫療機構在將數據分享給學術機構之前,必須將醫療影像數據去識別化。在一些情況中,將醫療影像去識別化的方式,是以人工的方式辨識及遮蔽,浪費人力且效率不高。 In many industries, specific data has the opportunity to be shared from the original institution to another institution for use in tasks such as task outsourcing, academic research, etc. Therefore, there is an opportunity to expose the contents of the data that need to be kept secret to unauthorized organizations. For example, the Health Insurance Portability and Accountability Act (HIPAA) requires that the patient’s personal information not be disclosed, and medical image data usually contains the patient’s personal data (including name, ID number, Medical record numbers, etc.), so academic institutions will obtain the patient’s personal data when conducting research based on medical imaging data received from medical institutions, so that personal data that needs to be kept confidential is exposed to unauthorized academic institutions. For this reason, medical institutions must de-identify medical imaging data before sharing data with academic institutions. In some cases, the way to de-identify medical images is to identify and mask manually, which wastes manpower and is not efficient.
本公開案提供了一種去識別化方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的 文字;所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並儲存所產生的模板數據;所述處理單元接收具有目標影像的目標影像數據;所述處理單元選擇已儲存的多個模板數據其中一者;所述處理單元將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像。 The present disclosure provides a de-identification method, including: a processing unit receives sample data with a sample image, the sample image has characters; the processing unit uses optical character recognition (Optical Character Recognition, OCR) to identify the data The text in the sample image; the processing unit selects and recognizes the text in the sample image which meets a predetermined condition The processing unit generates template data, the template data instructs to go out the recognition area, the de-recognition area corresponds to the text that meets the predetermined conditions, and stores the generated template data; the processing unit receives The target image data of the target image; the processing unit selects one of a plurality of stored template data; the processing unit removes the content of the de-identified region in the target image corresponding to the selected template data Identifying; and the processing unit generates output image data, the output image data having the de-identified target image.
本公開案亦提供一種產生模板數據的方法,包含:一處理單元接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;及所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字。 The present disclosure also provides a method for generating template data, including: a processing unit receives sample data with a sample image, the sample image has text; the processing unit uses optical character recognition (OCR) to identify The text in the sample image; the processing unit selects and recognizes text from the text in the sample image that meets a predetermined condition; and the processing unit generates template data, the template data instructing to identify The de-identified area corresponds to a character that meets the predetermined condition.
本公開案提供一種去識別化系統包含一存儲模塊及一處理單元。 所述存儲模塊存儲多個模板數據。所述處理單元與所述存儲模塊數據連接。所述處理單元組配來接收具有一樣本影像的樣本數據,所述樣本影像具有文字;所述處理單元組配來利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字;所述處理單元組配來選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字;所述處理單元組配來產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所產生的模板數據儲存至所述存儲模塊;所述處理單元組配來接收具有目標影像的目標影像數據;所述處理單元組配來選擇儲存在所述存儲模塊的多個模板數據其中一者;所述處理單元組配來將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化;及所述處理單元組配來產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標 影像。 The present disclosure provides a de-identification system including a storage module and a processing unit. The storage module stores a plurality of template data. The processing unit is in data connection with the storage module. The processing unit is configured to receive sample data with a sample image, the sample image has text; the processing unit is configured to use optical character recognition (Optical Character Recognition, OCR) to identify all of the sample images The text; the processing unit is configured to select the text recognized from the text in the sample image which meets a predetermined condition; the processing unit is configured to generate template data, the template data indicating the identification area , The de-identified area corresponds to the text that meets the predetermined condition, and the generated template data is stored in the storage module; the processing unit is configured to receive the target image data with the target image; the processing The unit is configured to select one of the plurality of template data stored in the storage module; the processing unit is configured to remove the content of the de-identified area corresponding to the selected template data in the target image Identification; and the processing unit is configured to generate output image data, the output image data having the de-identified target image.
承上所述,處理單元利用樣本數據產生模板數據,且利用模板數據將所述目標影像去識別化並產生一輸出影像數據,能有效提升效率。 As mentioned above, the processing unit uses the sample data to generate template data, and uses the template data to de-identify the target image and generate an output image data, which can effectively improve efficiency.
110:去識別化系統 110: De-identification system
111:通信模塊 111: Communication module
112:存儲模塊 112: storage module
113:翻譯模塊 113: Translation Module
114:處理單元 114: Processing Unit
121:設備 121: Equipment
122:設備 122: Equipment
123:設備 123: Equipment
130:通信網絡 130: communication network
D,D’:模板數據 D, D’: template data
T,T’:預設標籤 T, T’: preset label
S21~S24:程序 S21~S24: Program
S51~S55:程序 S51~S55: Program
S521~S522:程序 S521~S522: Program
310:目標影像 310: Target image
311:文字 311: text
410:經去識別化的目標影像 410: De-identified target image
411:遮罩 411: Mask
610,710,810:樣本影像 610, 710, 810: sample images
611:圖像資訊 611: Image Information
612,612’,712’,812’:文字 612, 612’, 712’, 812’: text
913,1003,1103:參考區域 913, 1003, 1103: reference area
914,1004,1104:比較區域 914, 1004, 1104: comparison area
1201:模板影像 1201: template image
1202:去識別化區域 1202: De-identification area
S131~S139:程序 S131~S139: Program
為可仔細理解本案以上記載之特徵,參照實施態樣可提供簡述如上之本案的更特定描述,一些實施態樣係說明於隨附圖式中。然而,要注意的是,隨附圖式僅說明本案的典型實施態樣並且因此不被視為限制本案的範圍,因為本案可承認其他等效實施態樣。 In order to understand the features described above in this case carefully, a more specific description of the above case can be provided with reference to the implementation aspects, and some implementation aspects are illustrated in the accompanying drawings. However, it should be noted that the accompanying drawings only illustrate the typical implementation aspects of this case and are therefore not regarded as limiting the scope of this case, because this case may recognize other equivalent implementation aspects.
第1圖示出了根據本公開的一些實施例的去識別化系統的組件方塊圖及示例性操作環境;第2圖示出了根據本公開的一些例示性去識別化系統所實施的去識別化方法的流程圖;第3圖示出了根據本公開的一些實施例的一目標影像;第4圖示出了根據本公開的一些實施例的一經去識別化目標影像;第5圖示出了根據本公開的一些例示性去識別化系統所實施的產生去識別模板的方法的流程圖;第6圖至第8圖分別示出了根據本公開的一些實施例的三個樣本影像;第9圖至第11圖分別示出了根據本公開的一些實施例的已被界定出去識別化區域的三個樣本影像;第12圖示出了根據本公開的一些實施例的一去識別化區域;及第13圖示出了根據本公開的一些例示性去識別化系統所實施的去識別方法的流程圖。 Figure 1 shows a component block diagram and an exemplary operating environment of a de-identification system according to some embodiments of the present disclosure; Figure 2 shows a de-identification implemented by some exemplary de-identification systems according to the present disclosure Figure 3 shows a target image according to some embodiments of the present disclosure; Figure 4 shows a de-identified target image according to some embodiments of the present disclosure; Figure 5 shows A flowchart of a method for generating a de-identification template implemented by some exemplary de-identification systems according to the present disclosure; Fig. 6 to Fig. 8 respectively show three sample images according to some embodiments of the present disclosure; Figures 9 to 11 respectively show three sample images with de-identified regions defined according to some embodiments of the present disclosure; Figure 12 shows a de-identified region according to some embodiments of the present disclosure ; And Figure 13 shows a flowchart of a de-identification method implemented by some exemplary de-identification systems according to the present disclosure.
以下描述將參考附圖以更全面地描述本公開內容。附圖中所示 為本公開的示例性實施例。然而,本公開可以以許多不同的形式來實施,並且不應該被解釋為限於在此闡述的示例性實施例。提供這些示例性實施例是為了使本公開透徹和完整,並且將本公開的範圍充分地傳達給本領域技術人員。類似的附圖標記表示相同或類似的元件。 The following description will refer to the accompanying drawings to more fully describe the present disclosure. Shown in the attached picture This is an exemplary embodiment of the present disclosure. However, the present disclosure can be implemented in many different forms, and should not be construed as being limited to the exemplary embodiments set forth herein. These exemplary embodiments are provided to make the present disclosure thorough and complete, and to fully convey the scope of the present disclosure to those skilled in the art. Similar reference numerals indicate the same or similar elements.
本文使用的術語僅用於描述特定示例性實施例的目的,而不意圖限製本公開。如本文所使用的,除非上下文另外清楚地指出,否則單數形式“一”,“一個”和“所述”旨在也包括複數形式。此外,當在本文中使用時,“包括”和/或“包含”或“包括”和/或“包括”或“具有”和/或“具有”,整數,步驟,操作,元件和/或組件,但不排除存在或添加一個或多個其它特徵,區域,整數,步驟,操作,元件,組件和/或其群組。 The terms used herein are only for the purpose of describing specific exemplary embodiments, and are not intended to limit the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular forms "a", "an" and "said" are intended to also include the plural forms. In addition, when used herein, "includes" and/or "includes" or "includes" and/or "includes" or "has" and/or "has", integers, steps, operations, elements and/or components , But does not exclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components and/or groups thereof.
除非另外定義,否則本文使用的所有術語(包括技術和科學術語)具有與本公開所屬領域的普通技術人員通常理解的相同的含義。此外,除非文中明確定義,諸如在通用字典中定義的那些術語應該被解釋為具有與其在相關技術和本公開內容中的含義一致的含義,並且將不被解釋為理想化或過於正式的含義。 Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this disclosure belongs. In addition, unless clearly defined in the context, terms such as those defined in a general dictionary should be interpreted as having meanings consistent with their meanings in the related art and the present disclosure, and will not be interpreted as idealized or overly formal meanings.
以下內容將結合附圖對示例性實施例進行描述。須注意的是,參考附圖中所描繪的元件不一定按比例顯示;而相同或類似的元件將被賦予相同或相似的附圖標記表示或類似的技術用語。 The following content will describe exemplary embodiments with reference to the accompanying drawings. It should be noted that the elements depicted in the reference drawings are not necessarily shown to scale; and the same or similar elements will be given the same or similar reference numerals or similar technical terms.
圖1示出了根據本公開的一些實施例的去識別化系統110的一示例性操作情境,其中包含所述去識別化系統110的組件方塊圖,以及與所述去識別化系統110數據連接的設備121、設備122、設備123。在其他的操作環境中,所述去識別化系統110所連接的設備之數量不以圖1所揭露之內容為限。
FIG. 1 shows an exemplary operation scenario of the
在所述示例性操作情境中,設備121、設備122、設備123分別歸屬於A醫院、B醫院及C醫院的影像生成裝置。設備121為一個斷層攝影設
備。在所述示例性操作情境中,所述設備121、設備122、設備123組配來產生具有一檔案格式定義的影像數據,所述的影像數據諸如符合醫療數位影像傳輸協定(DICOM,Digital Imaging and Communications in Medicine)之通用標準的影像數據。在所述示例性操作情境中,所述設備121、設備122、設備123分別組配來產生電腦斷層攝影、電腦X光攝影、超聲心動圖等符合DICOM標準的影像數據。
In the exemplary operation scenario, the
DICOM是一個規範如何處理、存儲、列印及傳輸醫療影像數據的標準,DICOM的標準包括檔案格式定義及通信協定。符合DICOM標準的影像數據必須以支援TCP/IP協定的通信網絡進行傳輸。符合DICOM標準的影像數據具有諸如像素數據(pixel data)及影像屬性資訊(attribute information)等格式定義。像素數據描述每個像素的值,組成一個影像。屬性資訊具有多個標籤(tags)及多個分別對應於所述標籤的屬性值(attribute value)。標籤(Tag)內包含了群組代號(Group number)與元素代號(Element number),如標籤(0010,0010),其0010為群組代號(Group number),而0010則為元素代號(Element number)。每一標籤唯一對應於一屬性值。每一屬性值用以描述一種信息,所述信息可以是病人姓名(例如「黃小明」)、病人ID、醫事機構(如「台大醫院」)、儀器廠商(如「Philips」)、設備型號(如「EPIQ 5」)、影像類別(如「US」)、影像大小(如「640 x 480」)、影像類型(如「Secondary」)等等。舉例來說,標籤為「(0010,0010)」所對應的屬性值為「黃小明」,其用於描述病人姓名。
DICOM is a standard that regulates how to process, store, print, and transmit medical image data. The DICOM standard includes file format definitions and communication protocols. Image data conforming to the DICOM standard must be transmitted through a communication network that supports the TCP/IP protocol. Image data conforming to the DICOM standard has format definitions such as pixel data and image attribute information. Pixel data describes the value of each pixel to form an image. The attribute information has a plurality of tags and a plurality of attribute values respectively corresponding to the tags. The tag contains the group number and the element number, such as the tag (0010, 0010), where 0010 is the group number, and 0010 is the element number ). Each tag uniquely corresponds to an attribute value. Each attribute value is used to describe a type of information, the information can be the patient's name (such as "Huang Xiaoming"), patient ID, medical institution (such as "National Taiwan University Hospital"), instrument manufacturer (such as "Philips"), equipment model (such as "
在本實施例中,所述去識別化系統110包括一存儲模塊112、一翻譯模塊113、及一處理單元114。
In this embodiment, the
所述存儲模塊112可以包括一個或多個存儲設備,並且被配置用以存儲多個模板數據D、D’及多個預設標籤T、T’。在本實施例中,每一模板數據D(D’)具有至少一去識別化標籤、至少一模板屬性值、一具有特定格式的模
板影像之格式、及在所述模板影像上的至少一個去識別化區域。在本實施例中,每一個去識別化區域位於所述模板影像的不變的位置。舉例來說,模板數據D’指示出一個去識別化區域(例如去識別化區域1202)及一個去識別化標籤(例如「(0010,0010)」),並具有三個模板屬性值:「(ONIS,ONIS25)」(設備型號)及「430」(影像長度方向的解析度)、及「600」(影像寬度方向的解析度)。
The
圖12示例性地呈現在一些實施態樣中模板數據D’的一模板影像1201及一去識別化區域1202。
Fig. 12 exemplarily shows a
舉另一個較簡單的例子,模板數據D所包含的每一模板屬性值描述以下資訊其中一者:1.醫事機構(如台大醫院)、2.儀器廠商(如Philips)、3.設備型號(如EPIQ 5)、4.影像類別(如US)、5.影像大小(如640 x 480)、6.影像類型(如Secondary)。模板數據D的所指示的去識別化區域為影像中的多個遮蔽位置。 To give another simpler example, each template attribute value contained in template data D describes one of the following information: 1. Medical institution (such as National Taiwan University Hospital), 2. Instrument manufacturer (such as Philips), 3. Equipment model ( Such as EPIQ 5), 4. Image type (such as US), 5. Image size (such as 640 x 480), 6. Image type (such as Secondary). The de-identified area indicated by the template data D is a plurality of masked positions in the image.
當判斷出醫療影像的屬性值與模板數據D的1~6等屬性值相同時,就可以直接以模板數據D的所指示的去識別化區域去為醫療影像去識別化。 When it is determined that the attribute value of the medical image is the same as the attribute values 1 to 6 of the template data D, the de-identified area indicated by the template data D can be directly used to de-identify the medical image.
在一些情況中,在醫療影像中,並非所有的文字都需要被去識別化。例如,一醫療影像具有對應於病患姓名及對應於影像設備廠商名稱的文字,僅有對應於病患姓名的文字需要被去識別化,而對應於影像設備廠商名稱的文字不需要被去識別化。因此,在產生模板數據D,D’時,可以選擇需要被去識別化的文字。預設標籤T、T’即對應於需要被去識別化的文字,預設標籤T、T’可以被用在後述的產生模板數據的方法。舉例來說,所述預設標籤T’為(0010,0010),其對應於病患姓名。在其他情況中,醫療影像可以具有對應於病患姓名及對應於病患體溫的文字。 In some cases, not all characters in medical images need to be de-identified. For example, a medical image has text corresponding to the name of the patient and the name of the imaging equipment manufacturer. Only the text corresponding to the patient’s name needs to be de-identified, while the text corresponding to the name of the imaging equipment manufacturer does not need to be de-identified.化. Therefore, when the template data D, D'are generated, the characters that need to be de-identified can be selected. The preset tags T, T'correspond to the characters that need to be de-identified, and the preset tags T, T'can be used in the method of generating template data described later. For example, the preset label T'is (0010, 0010), which corresponds to the patient's name. In other cases, the medical image may have text corresponding to the patient's name and corresponding to the patient's body temperature.
所述翻譯模塊113可以是包含必要的硬體、軟體、或韌體的電子
模塊,例如伺服器。所述翻譯模塊133被配置用以文字翻譯成一預定語言。在本實施例中,所述預定語言例如為英文,但不以此為限。在本實施例中,文字包含字母、單詞、數字等形式。
The
所述處理單元114所述存儲模塊112及所述翻譯模塊113數據連接。所述處理單元114可以是包括一個或多個硬體、軟體、或韌體的電子模塊,例如伺服器。這些伺服器可以採用集中式的配置或分散式的集群安排。在其他實施態樣中,所述處理單元114可以是單一電腦或高速運算電腦。處理單元114可以通過一通信模塊(圖未示出)所接收的來自所述設備121、122、123的符合DICOM標準的影像數據產生模板數據D(D’),且可以利用模板數據D(D’),對通過所述通信模塊所接收的來自所述設備121、122、123的符合DICOM標準的影像數據進行去識別化。
The
所述通信模塊可以是包含必要的硬體、軟體、或韌體的電子模塊,組配來通過合適的通信網絡建立與所述設備121、122、123的數據連接。所述合適的通信網絡可以包括有線和無線介質其中至少一者。所述通信模塊可經由通信網絡接收來自設備121、122、123的數據。在本實施例中,所述通信模塊支援TCP/IP協定,而能接收來自所述設備121、設備122、設備123的符合DICOM標準的影像數據。在其他的實施態樣中,通信模塊也可以被省略,設備121、設備122、設備123所產生的影像數據可以經由其他的路徑被儲存在存儲模塊112中來供處理單元114讀取。
The communication module may be an electronic module including necessary hardware, software, or firmware, and is configured to establish a data connection with the
第2圖示出了根據本公開的一些例示性去識別化系統所實施的去識別化方法的流程圖。 Figure 2 shows a flowchart of a de-identification method implemented by some exemplary de-identification systems according to the present disclosure.
如程序S21,所述處理單元114藉由所述通信模塊111接收來自所述設備121的具有一檔案格式定義的一目標影像數據。在本實施例中,所述目標影像數據例如是符合DICOM標準的影像數據。所述目標影像數據包括屬性
資訊及至一個或多個目標影像,所述屬性資訊包括多個標籤及多個分別對應所述標籤的屬性值。本實施例中,在所述目標影像數據中,所述屬性值至少相關於產生所述目標影像的一設備的型號及所述一個或多個目標影像的影像大小,一般而言,所述目標影像數據的所述屬性值還會相關於所述病患的個人資訊。
In step S21, the
為助於理解,以下應用一個示例性的使用情境來輔助說明後續程序S22至程序S24。在所述使用情境中,目標影像數據僅具有一個目標影像(例如第3圖所示的目標影像310),其屬性資訊具有三個屬性值及三個標籤,一屬性值「(ONIS,ONIS25)」指示出產生所述目標影像數據的設備之型號,並且對應於標籤「(0008,1090)」;另一屬性值「430」指示出所述目標影像的在寬度上的解析度,並對應於標籤「0028,0010」;另一屬性值「600」指示出所述目標影像的在長度上的解析度,並對應於標籤「0028,0011」;另一屬性「CHEN CHI WEN」指示出所述病患的姓名,並對應於標籤「(0010,0010)」。
To facilitate understanding, an exemplary usage scenario is applied below to assist in explaining the subsequent procedures S22 to S24. In the usage scenario, the target image data has only one target image (for example, the
第3圖示出了根據本公開的一個例示性的目標影像310。所述目標影像310具有文字311及其他文字,所述文字311為「CHEN CHI WEN」。
FIG. 3 shows an
在程序S22中,所述處理單元114根據目標影像數據的所述屬性值選擇存儲於所述存儲模塊112的所述模板數據D、D’其中一者(如圖一所示)。
具體地,所述處理單元114所選擇的所述模板數據D(D’)的模板屬性值匹配於所述目標影像數據的屬性值。沿用前述使用情況,所述處理單元114比對出目標影像數據的屬性值「430」、「600」、「(ONIS,ONIS25)」分別相同於所述模板數據D’的三個模板屬性值(「430」、「600」、「(ONIS,ONIS25)」),即判定所述模板數據D’的模板屬性值匹配於所述目標影像數據的屬性值,進而選擇模板數據D’。如此,目標影像數據中的目標影像之格式便匹配於模板數據D’的模板影像的格式,於是,所述處理單元114所選擇的所述模板數據D’所指示出的去識別化區域1202實質上對應於目標影像310的文字311。在一些實施態樣中,
某些模板數據D也會記錄某些影像像素中沒有個資。
In the procedure S22, the
舉另一個相異於前述使用情況的例子,當模板數據D的模板屬性值包含下列資訊:1.醫事機構(如台大醫院)、2.儀器廠商(如Philips)、3.設備型號(如EPIQ 5)、4.影像類別(如US)、5.影像大小(如640 x 480)、6.影像類型(如Secondary)。則處理單元114在S22中在判斷出目標影像數據的屬性值與模板數據D的1~6等模板屬性值相同時,就會選擇模板數據D。
Take another example that is different from the previous usage. When the template attribute value of template data D contains the following information: 1. Medical institution (such as National Taiwan University Hospital), 2. Instrument manufacturer (such as Philips), 3. Device model (such as EPIQ) 5), 4. Image category (such as US), 5. Image size (such as 640 x 480), 6. Image type (such as Secondary). Then, when the
如程序S23,所述處理單元114利用所選擇的所述模板數據D(D’),將所述目標影像中對應於所選擇的所述模板數據D(D’)的去識別化區域的內容去識別化。在本實施例中,所述處理單元114被設置以遮蔽所述去識別化區域內的內容或其部分的方式將去識別化區域內的內容去識別化。在其他實施態樣中,所述處理單元114也可被設置以模糊化去識別化區域內的內容或其部分、去除去識別化區域內的內容或其部分、以其他內容替換去識別化區域內的內容或其部分等等方式將去識別化區域內的內容去識別化。在一些情況中,處理單元114會將所述去識別化區域內的資訊以黑色或是相同於影像周遭顏色的部分遮蔽,或是以馬賽克方式處理。
As in step S23, the
在目標影像數據中,除了目標影像,其屬性值也會透漏病患的敏感資訊。因此,在本實施例中,在程序S23中,除了對目標影像進行去識別化之外,所述處理單元114還將目標影像數據的屬性值其中至少一者去識別化。
換言之,程序S23的去識別化處理中包含了兩個部份的去識別化處理,一種是像素資料上的去識別化處理,一種是屬性的去識別化處理(針對DICOM tag)。在一些情況中,所述目標影像數據的屬性值當中被去識別化的至少一者所對應的標籤是匹配於所述處理單元114所選的所述模板數據D(D’)的所述至少一去識別化標籤。在前述使用情況的目標影像數據的所述標籤「(0008,1090)」、「0028,0010」、「0028,0011」、及「(0010,0010)」當中僅有「(0010,0010)」匹配於所述處理單
元114所選的所述模板數據D’的去識別化標籤「(0010,0010)」,因此,所述處理單元114將「(0010,0010)」所對應的屬性值「CHEN CHI WEN」去識別化。
在本實施例中,所述處理單元114被組配來將屬性值去識別化的方式包含刪除所述屬性值之內容或其部分。在其他實施態樣中,所述處理單元114可以被組配來替換所述屬性值之內容或其部分。在其他實施態樣中,所述處理單元114可以被組配來利用特殊的運算邏輯將所述屬性值之內容或其部份轉換成另一群相異於所述屬性值的符號,例如將病患的ID轉換成一群數字,如此,當需要時,可以利用所述運算邏輯反推所述屬性值之內容。在其他的實施態樣中,屬性值的去識別化可以是全域去處理的,例如直接指定哪些屬性值要識別化,然後就直接對所有的影像資料做處理,如此,模板數據裡不需要特別指定哪些的影像要特別去識別化哪些屬性值。
In the target image data, in addition to the target image, its attribute values also reveal sensitive patient information. Therefore, in this embodiment, in the procedure S23, in addition to de-identifying the target image, the
最後,如程序S24,所述處理單元114根據所述已去識別化的目標影像數據,產生一輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像及已去識別化的屬性值。要注意的是,產生的所述已去識別化的目標影像為單一圖層影像,並非將去模板影像與目標影像以不同圖層方式疊合在一起。
Finally, in step S24, the
第4圖示出了根據本公開的一個例示性的經去識別化的目標影像410,其中,病患的姓名(即去識別化區域內的內容)已被一遮罩411遮蔽。
FIG. 4 shows an exemplary
值得說明的是,來自於特定設備型號的目標影像數據通常具有相似的影像解析度及佈局,因此,處理單元114利用所選的模板數據D(D’)所指示出的去識別化區域,能快速而有效地將大量的目標影像的特定文字去識別化。
It is worth noting that the target image data from a specific device model usually has a similar image resolution and layout. Therefore, the
另一方面,在不使用前述去識別化方法的情況下,若改用其他去識別化方法,例如,對每一張目標影像利用光學字元辨識(Optical Character Recognition,OCR)來辨識出醫療影像上的所有文字後再對所有文字進行遮蔽, 則所需時間與影像大小成正相關,大約2~5秒,在需要對大量醫療影像進行去識別化處理的情況下,所需工作時間長,以單日處理一百萬張醫療影像為例,單一執行緒需耗費八百小時,顯著地增加浪費時間及運算資源等成本。 On the other hand, if the aforementioned de-identification method is not used, other de-identification methods can be used instead, for example, the use of optical character recognition (OCR) for each target image to identify the medical image Mask all the text after all the text on it, The time required is positively related to the size of the image, about 2 to 5 seconds. When a large number of medical images need to be de-identified, the working time is long. Take the processing of one million medical images in a single day as an example. A single thread requires 800 hours, which significantly increases the cost of wasting time and computing resources.
第5圖示出了根據本公開的一些例示性去識別化系統所實施的產生模板數據的方法的流程圖。以下將以產生模板數據D’為例進行說明。 Figure 5 shows a flowchart of a method for generating template data implemented by some exemplary de-identification systems according to the present disclosure. The following will take the template data D'as an example for description.
首先,如程序S51,所述處理單元114接收具有一檔案格式定義的樣本數據。在本實施例中,所述樣本數據例如是符合DICOM標準的影像數據。
所述樣本數據具有一個或多個樣本影像及屬性資訊。所述屬性資訊包括多個標籤及多個分別對應所述標籤的屬性值。每一樣本影像具有圖像資訊及文字。為助於理解,以下應用一個示例性的使用情境來輔助說明後續程序S52至程序S55,在所述例子中,樣本影像數據僅具有三個樣本影像。在所述例子中,所述樣本數據具有一屬性值「HUANG WEI DA」(病患姓名)及一對應的標籤「(0010,0010)」。
First, in step S51, the
第6圖至第8圖示出了根據本公開的一些例示性的樣本影像610、710、810。
Figures 6 to 8 show some
第6圖示例性地呈現出所述樣本影像610所具有的圖像資訊611及文字612、612’。所述圖像資訊611為一超音波影像,所述文字612相關於病患之體溫,所述文字612’為患者姓名。
Fig. 6 exemplarily shows the
接著,如程序S52,所述處理單元114對每一樣本影像執行一預定程序。所述預定程序包含程序S521及程序S522。
Then, as in procedure S52, the
在程序S521中,處理單元114利用光學字元辨識(Optical Character Recognition,OCR)來辨識所述樣本影像當中的所述文字。以第6圖的樣本影像610為例,所述處理單元114辨識出的文字612、612’例如包含「37.0C」、「<37.0C」、及「HUANG WEI DA」等符號。
In the procedure S521, the
在程序S522中,所述處理單元11選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字。在一些情況中,並非所有的符號都需要被去識別化,例如「HUANG WEI DA」需要被去識別化,而「37.0C」、「<37.0C」不需要被去識別化,程序S522是用來在所有被OCR辨識出的文字中選出需要被去識別化的(即符合所述預定條件的)文字。在本實施例中,所述處理單元114藉由以下程序判定每一文字是否符合所述預定條件。為方便說明,以下以判定文字612’(即「HUANG WEI DA」)是否符合所述預定條件為例子進行說明。
In the procedure S522, the
首先,所述處理單元114判定所述文字是否匹配於所述樣本數據的所述屬性值其中一者。延用前例,處理單元114比對所述樣本數據的所述屬性值,並且判斷是否存在與文字612’(即「HUANG WEI DA」)相同或近似的屬性值。
First, the
接著,所述處理單元114在判定出所述文字匹配於所述樣本數據的所述屬性值其中一者時,判定匹配於所述文字的所述屬性值所對應的所述標籤是否相同於儲存在所述存儲模塊112的所述預設標籤T、T’其中一者。延續前例,處理單元114判定出所述樣本數據的屬性值「HUANG WEI DA」確實相同於文字612’(即「HUANG WEI DA」),因此,判定是否存在與屬性值「HUANG WEI DA」所對應的標籤(即「(0010,0010)」)相同的預設標籤T(T’)。
Then, when the
然後,所述處理單元114在判定出匹配於所述文字的所述屬性值所對應的所述標籤相同於所述預設標籤T、T’其中一者時,判定所述文字符合所述預定條件。延續前例,所述處理單元114判定出所述預設標籤T’(即「(0010,0010)」)相同於屬性值「HUANG WEI DA」所對應的標籤(即「(0010,0010)」),故判定文字612’(即「HUANG WEI DA」)符合所述預定條件。
Then, when the
在本實施例中,所述處理單元114還藉由所述翻譯模塊113,將所述辨識自所述樣本影像當中的所述文字其中不符合所述預定條件的文字翻譯
為預定語言,並且判定所述被翻譯過的文字是否符合所述預定條件,再選擇所述所述被翻譯過的文字其中符合所述預定條件者。在本實施例中,所述預定語言例如為英文,但不以此為限。在其他的示例性情況中,若處理單元114辨識出的文字為相異於英文的其他語言文字,例如中文,在程序S522中便可能獲得所述中文的文字不符合所述預定條件的判定結果。所述處理單元114藉由所述翻譯模塊113,將所述中文的文字翻譯為英文,並再次判斷被翻譯為英文的文字是否符合所述預定條件,並選擇所述所述被翻譯過的文字其中符合所述預定條件者。如此,降低產生語言差異所致的誤判的機會。
In this embodiment, the
判定文字是否符合所述預定條件的方式不以本實施例為限。雖然DICOM檔案中標籤(0010,0010)所對應的屬性值應該是病患姓名,但在一些情況中,標籤(0010,0010)所對應的屬性值的內容可能會隨著不同的來源醫院或儀器有所不同,可能會填中文姓名、英譯姓名、病歷號、身份證號、甚至沒填。因此,在其他的實施態樣中,可以利用正確性比較高的額外來源(例如健保署)的額外資料取得其真實姓名及其他個資,當取得的真實姓名是中文姓名,會再以各種英譯系統去轉換成英譯姓名,以便用來比對出利用光學字元辨識所辨識出的所述文字裡的姓名,然後再把比對結果用來建立遮蔽。以健保署為例,額外資料可以是和影像(例如DICOM檔案)一起申報上來的資料。 The manner of determining whether the text meets the predetermined condition is not limited to this embodiment. Although the attribute value corresponding to the tag (0010, 0010) in the DICOM file should be the patient's name, in some cases, the content of the attribute value corresponding to the tag (0010, 0010) may vary depending on the source hospital or equipment It is different. It may fill in Chinese name, English translation name, medical record number, ID number, or even not fill in. Therefore, in other implementation modes, additional information from additional sources with higher accuracy (such as the National Health Insurance Agency) can be used to obtain their real name and other personal information. When the real name obtained is a Chinese name, it will be written in various English. The translation system converts the name into an English translation, so as to compare the name in the text recognized by optical character recognition, and then use the comparison result to create a mask. Take the National Health Insurance Agency as an example, the additional information can be the information declared together with the image (such as DICOM files).
參閱圖6至圖8,在前述的例子中,在此階段,所述處理單元114判定出所述樣本影像610、710、810各自具有一符合所述預定條件的文字612’、712’、812’。
6 to 8, in the foregoing example, at this stage, the
於是,接下來,如程序S53,所述處理單元114產生模板數據,所述模板數據指示出去識別D(D’)化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所述模板數據D(D’)儲存至存儲模塊111。在本實施例中,所述處理單元114是根據辨識自所述樣本影像且符合所述預定條件的所述文字
其中占據最大範圍者決定所述去識別化區域的涵蓋範圍,如此,能有效遮蔽大部分長度的資訊。細節說明如下。
Then, next, as in procedure S53, the
參閱圖9至圖11,首先,所述處理單元114在所述樣本影像中界定出參考區域,所述參考區域分別對應地涵蓋所述符合所述預定條件的文字的所在區域。例如,所述處理單元114在所述樣本影像610界定出參考區域913,其涵蓋文字612’(即「HUANG WEI DA」)的所在區域;在所述樣本影像710界定出參考區域1003,其涵蓋文字712’(即「CHEN CHI WEN」)的所在區域;在所述樣本影像810界定出參考區域1103,其涵蓋文字812’(即「ALICE WANG」)的所在區域。
Referring to FIGS. 9 to 11, first, the
接著,參閱圖9至圖11,所述處理單元114還將所述樣本影像的所述參考區域其中位於每一樣本影像的同一位置者分別作為多個比較區域。
延續前例,由於前例中的所述參考區域913、1003、1103皆位於每一樣本影像610(710或810)當中的同一位置,於是,所述三個參考區域913、1003、1103分別被作為三個比較區域914、1004、1104。
Next, referring to FIGS. 9 to 11, the
接著,處理單元114根據所述所述比較區域其中一涵蓋最大範圍者決定對應於所述比較區域的去識別化區域的涵蓋範圍。在所述比較區域914、1004、1104中,比較區域914所占據的範圍最大,因此,圖12中,去識別化區域1202是對應於所述比較區域914、1004、1104,且去識別化區域1202的所占範圍之大小是根據比較區域914所占據的範圍來決定,如此,去識別化區域1202能有效遮蔽大部分長度的姓名。
Then, the
補充說明者,在其他的情況中,當處理單元114所接收的樣本影像數據僅包含一個樣本影像時,對於每一參考區域,僅會獲得一個比較區域,而所述比較區域即為占據範圍最大者。
To supplement, in other cases, when the sample image data received by the
圖13示出了根據本公開的一些例示性去識別化系統(例如去識別 化系統110)所實施的去識別化方法的流程圖。 Figure 13 shows some exemplary de-identification systems according to the present disclosure (e.g., de-identification The flow chart of the de-identification method implemented by the system 110).
S131:處理單元(例如處理單元114)接收具有樣本影像的樣本數據,所述樣本影像具有文字。 S131: The processing unit (for example, the processing unit 114) receives sample data with sample images, the sample images having text.
S132:所述處理單元利用光學字元辨識(Optical Character Recognition,OCR)辨識所述樣本影像當中的所述文字。 S132: The processing unit uses Optical Character Recognition (OCR) to recognize the text in the sample image.
S133:處理單元選擇辨識自所述樣本影像當中的所述文字其中符合一預定條件的文字。 S133: The processing unit selects a word that meets a predetermined condition among the words recognized from the sample image.
S134:所述處理單元產生模板數據,所述模板數據指示出去識別化區域,所述去識別化區域對應於符合所述預定條件的文字,並將所述模板數據儲存至一存儲模塊(例如存儲模塊112)。 S134: The processing unit generates template data, the template data indicates an identification area, and the de-identification area corresponds to a text that meets the predetermined condition, and stores the template data in a storage module (for example, storage Module 112).
S135:所述處理單元接收具有目標影像的目標影像數據。 S135: The processing unit receives target image data with a target image.
S136:所述處理單元選擇儲存在所述存儲模塊的多個模板數據其中一者。 S136: The processing unit selects one of a plurality of template data stored in the storage module.
S137:所述處理單元將所述目標影像中對應於所選擇的所述模板數據的去識別化區域的內容去識別化。 S137: The processing unit de-identifies the content of the de-identified area corresponding to the selected template data in the target image.
S138:所述處理單元產生輸出影像數據,所述輸出影像數據具有所述已去識別化的目標影像。 S138: The processing unit generates output image data, the output image data having the de-identified target image.
綜上所述,在上述實施例中,處理單元114不僅能根據所接收的樣本影像數據及預設標籤T(T‘),利用OCR產生對應的模板數據D(D’),還能根據已存儲在存儲模塊112的模板數據D、D’快速地對接收到的目標影像數據去識別化,如此,能有效降低將影像數據去識別化所需的時間及運算資源等成本。
In summary, in the above-mentioned embodiment, the
惟以上所述者,僅為本發明之實施例而已,當不能以此限定本發明實施之範圍,凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。 However, the above are only examples of the present invention. When the scope of implementation of the present invention cannot be limited by this, all simple equivalent changes and modifications made in accordance with the scope of the patent application of the present invention and the content of the patent specification still belong to This invention patent covers the scope.
S131~S138:程序 S131~S138: Program
Claims (13)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW108107944A TWI705459B (en) | 2019-03-08 | 2019-03-08 | De-identification method and system thereof, method of generating templet data |
| CN201910214546.XA CN111667415A (en) | 2019-03-08 | 2019-03-20 | De-identification method and system and method for generating template data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW108107944A TWI705459B (en) | 2019-03-08 | 2019-03-08 | De-identification method and system thereof, method of generating templet data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202034347A TW202034347A (en) | 2020-09-16 |
| TWI705459B true TWI705459B (en) | 2020-09-21 |
Family
ID=72382203
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW108107944A TWI705459B (en) | 2019-03-08 | 2019-03-08 | De-identification method and system thereof, method of generating templet data |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN111667415A (en) |
| TW (1) | TWI705459B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080080009A1 (en) * | 2006-09-28 | 2008-04-03 | Fujitsu Limited | Electronic watermark embedding apparatus and electronic watermark detection apparatus |
| CN101166260A (en) * | 2007-09-12 | 2008-04-23 | 华为技术有限公司 | Method and device for image encoding and watermark removal |
| CN103488630A (en) * | 2013-09-29 | 2014-01-01 | 小米科技有限责任公司 | Method, device and terminal for processing picture |
| TWM550424U (en) * | 2017-06-30 | 2017-10-11 | 萬能學校財團法人萬能科技大學 | Smart reading assistance device with translation function |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101389005A (en) * | 2007-09-11 | 2009-03-18 | 华为技术有限公司 | A method and device for blocking a specific position of an image |
| CN104021350B (en) * | 2014-05-13 | 2016-07-06 | 小米科技有限责任公司 | Privacy information hidden method and device |
| US20160307063A1 (en) * | 2015-04-16 | 2016-10-20 | Synaptive Medical (Barbados) Inc. | Dicom de-identification system and method |
| CN106022142B (en) * | 2016-05-04 | 2019-12-10 | 泰康保险集团股份有限公司 | Image privacy information processing method and device |
| CN106131360A (en) * | 2016-06-15 | 2016-11-16 | 珠海市魅族科技有限公司 | Image data sending method and device |
-
2019
- 2019-03-08 TW TW108107944A patent/TWI705459B/en active
- 2019-03-20 CN CN201910214546.XA patent/CN111667415A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080080009A1 (en) * | 2006-09-28 | 2008-04-03 | Fujitsu Limited | Electronic watermark embedding apparatus and electronic watermark detection apparatus |
| CN101166260A (en) * | 2007-09-12 | 2008-04-23 | 华为技术有限公司 | Method and device for image encoding and watermark removal |
| CN103488630A (en) * | 2013-09-29 | 2014-01-01 | 小米科技有限责任公司 | Method, device and terminal for processing picture |
| TWM550424U (en) * | 2017-06-30 | 2017-10-11 | 萬能學校財團法人萬能科技大學 | Smart reading assistance device with translation function |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202034347A (en) | 2020-09-16 |
| CN111667415A (en) | 2020-09-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11602302B1 (en) | Machine learning based non-invasive diagnosis of thyroid disease | |
| US8200505B2 (en) | System and method for creating and rendering DICOM structured clinical reporting via the internet | |
| US8688476B2 (en) | Interoperability tools and procedures to aggregate and consolidate lab test results | |
| US20140006926A1 (en) | Systems and methods for natural language processing to provide smart links in radiology reports | |
| US20090048866A1 (en) | Rules-Based System For Routing Evidence and Recommendation Information to Patients and Physicians By a Specialist Based on Mining Report Text | |
| US20140215301A1 (en) | Document template auto discovery | |
| US12530860B2 (en) | Systems and methods for using AI to identify regions of interest in medical images | |
| US9372916B2 (en) | Document template auto discovery | |
| EP4511762A1 (en) | Machine learning for data anonymization | |
| US10395405B2 (en) | Removing identifying information from image data on computing devices using markers | |
| WO2022269504A1 (en) | System and method for privacy risk assessment and mitigatory recommendation | |
| US20190027149A1 (en) | Documentation tag processing system | |
| US7840041B2 (en) | Device for converting medical image data | |
| TWI705459B (en) | De-identification method and system thereof, method of generating templet data | |
| US20070140538A1 (en) | Method for processing unenhanced medical images | |
| US10114808B2 (en) | Conflict resolution of originally paper based data entry | |
| KR102572802B1 (en) | Server for supporting automation and unification of malfunction reception, and system | |
| US20220375071A1 (en) | Systems and methods to process electronic images to categorize intra-slide specimen tissue type | |
| TWM585395U (en) | System for processing insurance claims using long-short term memory model of deep learning | |
| JP2010250406A (en) | Medical image processing apparatus and program | |
| KR102410848B1 (en) | De-identification method of electronic apparatus for de-identifying personal identification information in images | |
| JP2010257276A (en) | Medical image capturing device and program | |
| US20160350483A1 (en) | Concepts for extracting lab data | |
| US20250209202A1 (en) | Computer system and methods for augmenting a graphical user interface | |
| KR102893937B1 (en) | Apparatus and method for interprenting medical record sheet |