[go: up one dir, main page]

TWI815411B - Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report - Google Patents

Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report Download PDF

Info

Publication number
TWI815411B
TWI815411B TW111115383A TW111115383A TWI815411B TW I815411 B TWI815411 B TW I815411B TW 111115383 A TW111115383 A TW 111115383A TW 111115383 A TW111115383 A TW 111115383A TW I815411 B TWI815411 B TW I815411B
Authority
TW
Taiwan
Prior art keywords
semantic
term
computing device
processor
patterns
Prior art date
Application number
TW111115383A
Other languages
Chinese (zh)
Other versions
TW202343470A (en
Inventor
陳震宇
張詠淳
蕭世欣
Original Assignee
臺北醫學大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 臺北醫學大學 filed Critical 臺北醫學大學
Priority to TW111115383A priority Critical patent/TWI815411B/en
Application granted granted Critical
Publication of TWI815411B publication Critical patent/TWI815411B/en
Publication of TW202343470A publication Critical patent/TW202343470A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Disclosed are methods and the non-transitory computer storage media of extracting linguistic patterns and summarizing a pathology report thereof. The present disclosure provides a method of extracting key linguistic patterns from a pathology report. The method comprises: determining a confidence degree and a support degree between a linguistic term and a next linguistic term based on co-occurrences of the linguistic term and the next linguistic term; generating a set of candidate linguistic terms; generating a first set of linguistic patterns through performing random walks on the set of candidate linguistic terms; and determining the key linguistic patterns through removing redundant linguistic patterns from the first set of linguistic patterns.

Description

提取語意模式和總結病理報告的方法及非暫態電腦儲存媒體Methods for extracting semantic patterns and summarizing pathology reports and non-transitory computer storage media

本發明係關於一種處理一病理報告之方法。特定言之,本發明係關於一種從一病理報告提取關鍵語意模式之方法、一種總結一病理報告之方法及其非暫態電腦儲存媒體。本發明進一步係關於一種判定病理報告之間的相似度之方法。The present invention relates to a method of processing a pathology report. Specifically, the present invention relates to a method of extracting key semantic patterns from a pathology report, a method of summarizing a pathology report, and non-transitory computer storage media thereof. The invention further relates to a method of determining similarity between pathology reports.

一患者之一病理報告包含大量資訊,尤其針對癌症患者,且此病理報告包含大量繁雜且乏味之資訊。主治外科醫師及內科醫師可花費許多時間來瞭解一患者之情況,但電腦可有助於縮減所浪費之時間量,且因此可增加整體效率。A patient's pathology report contains a large amount of information, especially for cancer patients, and this pathology report contains a large amount of complicated and boring information. Attending surgeons and physicians can spend a lot of time understanding a patient's condition, but computers can help reduce the amount of time wasted and therefore increase overall efficiency.

本發明可分析一病理報告。一病理報告可含有藉由在一顯微鏡下檢驗細胞及組織而判定之診斷。該報告可針對一肺癌患者。可從一繁雜且乏味之病理報告總結重要訊息。此等訊息可包含六個類別之特徵:基本病理描述、腫瘤特徵、組織學描述、免疫組織化學(IHC)資訊、一基因檢測結果及一病理TNM (腫瘤、淋巴結及轉移)期別。本發明可進一步總結一個患者之多個病理報告。本發明可進一步提供在大量患者之資料當中進行搜尋之一功能,且搜尋結果可供外科醫師及內科醫師之一參考。The present invention can analyze a pathology report. A pathology report may contain a diagnosis determined by examining cells and tissue under a microscope. This report may be for a lung cancer patient. Important information can be summarized from a complex and boring pathology report. This information can include features in six categories: basic pathological description, tumor characteristics, histological description, immunohistochemistry (IHC) information, a genetic test result, and a pathological TNM (tumor, lymph node and metastasis) stage. The present invention can further summarize multiple pathology reports of a patient. The present invention can further provide a function of searching among a large number of patient data, and the search results can be used for reference by surgeons and physicians.

本發明之一實施例提供一種從一病理報告提取關鍵語意模式之方法。該方法包括:基於一語意術語及一下一語意術語之共現來判定該語意術語與該下一語意術語之間的一信賴度及一支持度;產生一組候選語意術語;透過對該組候選語意術語執行隨機漫步來產生一第一組語意模式;及透過從該第一組語意模式移除冗餘語意模式來判定該等關鍵語意模式。在該病理報告中,該語意術語在該下一語意術語之前出現。一候選語意術語與一對應下一候選語意術語之間的該信賴度等於或大於一信賴臨限值。該候選語意術語與該對應下一候選語意術語之間的該支持度等於或大於一支持臨限值。One embodiment of the present invention provides a method for extracting key semantic patterns from a pathology report. The method includes: determining a degree of trust and a degree of support between a semantic term and the next semantic term based on the co-occurrence of a semantic term and a next semantic term; generating a group of candidate semantic terms; The semantic terms perform a random walk to generate a first set of semantic patterns; and the key semantic patterns are determined by removing redundant semantic patterns from the first set of semantic patterns. In the pathology report, the semantic term appears before the next semantic term. The reliability between a candidate semantic term and a corresponding next candidate semantic term is equal to or greater than a trust threshold value. The support between the candidate semantic term and the corresponding next candidate semantic term is equal to or greater than a support threshold value.

本發明之另一實施例提供一種總結一病理報告之方法。該方法包括基於關鍵語意模式從該病理報告獲取複數個病理特徵。根據本發明之該等方法或操作之任一者來產生該等關鍵語意模式。Another embodiment of the present invention provides a method of summarizing a pathology report. The method includes obtaining a plurality of pathological features from the pathology report based on key semantic patterns. The key semantic patterns are generated according to any one of the methods or operations of the present invention.

本發明之一進一步實施例提供一種非暫態電腦儲存媒體。該非暫態電腦儲存媒體具有儲存於其上之程式指令。在由一處理器執行該等程式指令時,該等程式指令導致執行根據本發明之該等方法之任一者之一組操作。A further embodiment of the present invention provides a non-transitory computer storage medium. The non-transitory computer storage medium has program instructions stored thereon. The program instructions, when executed by a processor, cause the performance of a set of operations according to any of the methods of the invention.

以下揭示內容提供用於實施所提供標的物之不同特徵之許多不同實施例或實例。在下文描述操作、組件、及配置之特定實例以簡化本發明。當然,此等僅為實例且不旨在為限制性。例如,在描述中,在一第二操作之前或之後執行之一第一操作可包含其中第一操作及第二操作共同執行之實施例,且亦可包含其中可在第一操作與第二操作之間執行額外操作之實施例。例如,在以下描述中,在一第二特徵上方、其上或其中形成一第一特徵可包含其中第一特徵及第二特徵形成為直接接觸之實施例,且亦可包含其中可在第一特徵與第二特徵之間形成額外特徵以使得第一特徵及第二特徵可不直接接觸之實施例。另外,本發明可在各種實例中重複參考數字及/或字母。此重複係出於簡單及清晰之目的且本身並不指示所論述之各種實施例及/或構形之間的一關係。The following disclosure provides many different embodiments or examples for implementing different features of the provided subject matter. Specific examples of operations, components, and configurations are described below to simplify the present invention. Of course, these are examples only and are not intended to be limiting. For example, the description of a first operation being performed before or after a second operation may include embodiments in which the first operation and the second operation are performed together, and may also include embodiments in which the first operation and the second operation may be performed together. Examples of performing additional operations in between. For example, in the following description, forming a first feature over, on, or in a second feature may include embodiments in which the first feature and the second feature are formed in direct contact, and may also include embodiments in which the first feature may be in direct contact with the second feature. Embodiments in which additional features are formed between the features and the second features so that the first features and the second features may not be in direct contact. Additionally, the present invention may repeat reference numbers and/or letters in various instances. This repetition is for simplicity and clarity and does not by itself indicate a relationship between the various embodiments and/or configurations discussed.

為便於描述,可在本文中使用諸如「在……之前」、「在……前」、「在……之後」、「在……後」及類似物之時間相對術語來描述一個操作或特徵與另一(些)操作或特徵之關係,如在圖式中繪示。時間相對術語旨在涵蓋圖中描繪之操作之不同序列。此外,為便於描述,可在本文中使用諸如「在……下方」、「在……下」、「下」、「在……上方」、「上」及類似物之空間相對術語來描述一個元件或特徵與另一(些)元件或特徵之關係,如在圖式中繪示。空間相對術語旨在涵蓋除圖中描繪之定向之外之使用或操作中之裝置之不同定向。設備可以其他方式定向(旋轉90度或以其他定向),且可同樣相應地解釋本文中使用之空間相對描述符。為便於描述,可在本文中使用諸如「連接」、「經連接」、「連接」、「耦合」、「經耦合」、「通信中」及類似物之連接相對術語來描述兩個元件或特徵之間的一操作連接、耦合或連結。連接相對術語旨在涵蓋裝置或組件之不同連接、耦合或連結。裝置或組件可直接或透過例如另一組組件間接彼此連接、耦合或連結。裝置或組件可彼此有線及/或無線連接、耦合或連結。For ease of description, time-relative terms such as “before,” “before,” “after,” “after,” and the like may be used herein to describe an operation or feature. The relationship with another operation(s) or characteristics, as shown in the diagram. Time-relative terms are intended to cover the different sequences of operations depicted in the figures. In addition, for ease of description, spatially relative terms such as “below,” “below,” “lower,” “above,” “upper,” and the like may be used herein to describe a The relationship of an element or feature to another element or feature(s), as illustrated in the drawings. Spatially relative terms are intended to cover different orientations of the device in use or operation in addition to the orientation depicted in the figures. The device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly. For ease of description, connection-relative terms such as "connected," "connected," "connected," "coupled," "coupled," "in communication," and the like may be used herein to describe two elements or features. An operational connection, coupling or connection between. Connection relative terms are intended to cover various connections, couplings or connections of devices or components. Devices or components may be connected, coupled or connected to each other directly or indirectly, such as through another set of components. Devices or components may be wired and/or wirelessly connected, coupled, or connected to each other.

如本文中使用,單數術語「一」、「一個」及「該」可包含複數參考物,除非語意脈絡另外明確指示。例如,對一裝置之參考可包含多個裝置,除非語意脈絡另外明確指示。術語「包括」及「包含」可指示所描述特徵、整數、步驟、操作、元件、及/或組件之存在,但可不排除特徵、整數、步驟、操作、元件、及/或組件之一或多者之組合之存在。術語「及/或」可包含一或多個所列出項目之任何者或全部組合。As used herein, the singular terms "a," "an," and "the" may include plural references unless the context clearly dictates otherwise. For example, a reference to a device may include multiple devices, unless the context clearly dictates otherwise. The terms "include" and "include" may indicate the presence of described features, integers, steps, operations, elements, and/or components, but may not exclude one or more of the features, integers, steps, operations, elements, and/or components. The existence of the combination of those. The term "and/or" may include any and all combinations of one or more of the listed items.

另外,有時在本文中以一範圍格式呈現量、比率、及其他數值。應理解,此範圍格式係為了方便及簡潔而使用,且應被靈活地理解為包含明確指定為一範圍之限制之數值,但亦包含涵蓋於該範圍內之全部個別數值或子範圍,猶如各數值及子範圍被明確指定。Additionally, quantities, ratios, and other numerical values are sometimes presented herein in a range format. It should be understood that this range format is used for convenience and brevity, and should be flexibly understood to include values expressly designated as the limits of a range, but also to include all individual values or subranges encompassed within that range, as if each Values and subranges are explicitly specified.

下文詳細論述實施例之性質及使用。然而,應瞭解,本發明提供可在各種特定語意脈絡中體現之許多適用發明概念。所論述之特定實施例僅繪示體現及使用本發明之特定方式,而不限制其範疇。The nature and uses of the embodiments are discussed in detail below. It should be appreciated, however, that the present invention provides many applicable inventive concepts that can be embodied in a variety of specific semantic contexts. The specific embodiments discussed are merely illustrative of specific ways of making and using the invention and do not limit its scope.

為了從一病理報告(例如,肺癌之一病理報告)總結重要病理特徵,本發明提供名為PR2Sum (病理報告彙總)之一方法。在本發明之一些實施例中,從一病理報告選擇、篩選或定義六個類別當中之50個重要病理特徵。表1中列出50個例示性病理特徵。六個類別可包含:基本描述、((若干)腫瘤之)發現、((若干)腫瘤之)組織學(資訊)、IHC資訊、基因檢測(結果)、及TNM期別。在進一步實施例中,一患者之(若干)報告之資料表達可由表1中展示之50個病理特徵來表達。 類別 病理特徵 基本描述 器官、Bx部位、取樣方法、診斷 發現 最大尺寸、腫瘤大小、最近邊緣、淋巴血管侵襲、VPI、腫瘤病灶 組織學 組織學類型、組織學期別 IHC CK7、TTF-1、天冬氨酸蛋白酶A、CK20、P40、CDX2、P63、P16、細胞角蛋白(AE1/AE3)、波形蛋白、PAX-8、CD56、染色顆粒素A、突觸素、GATA3、P53、S100、Ki67、EBER 基因檢測 EGFR、ALK、ROS1、BRAF、MET、KRAS、ERBB2、PIK3CA、NRAS、MEK1、NTRK、RET、PDL1 TNM期別 版本、pT、pN、pM、pStage、N資訊 1 In order to summarize important pathological features from a pathology report (for example, a pathology report of lung cancer), the present invention provides a method named PR2Sum (Pathology Report Summary). In some embodiments of the present invention, 50 important pathological features among six categories are selected, filtered or defined from a pathology report. Fifty exemplary pathological features are listed in Table 1. The six categories may include: basic description, findings (of (several) tumors), histology (information of (several) tumors), IHC information, genetic testing (results), and TNM stage. In a further embodiment, the reported data of a patient(s) may be represented by the 50 pathological features shown in Table 1. Category Pathological features Basic description Organs, Bx Sites, Sampling Methods, Diagnosis discover Maximum size, tumor size, proximal margin, lymphovascular invasion, VPI, tumor focus Histology Histology type, histology class IHC CK7, TTF-1, aspartic proteinase A, CK20, P40, CDX2, P63, P16, cytokeratin (AE1/AE3), vimentin, PAX-8, CD56, chromogranin A, synaptophysin, GATA3, P53, S100, Ki67, EBER genetic testing EGFR, ALK, ROS1, BRAF, MET, KRAS, ERBB2, PIK3CA, NRAS, MEK1, NTRK, RET, PDL1 TNM period Version, pT, pN, pM, pStage, N information Table 1

在使用機器學習演算法(或深度學習演算法)之情況下,在訓練模型期間必須不斷嘗試各種不同特徵組合以增加效能。產生各種不同特徵組合之程序可被稱為特徵工程。用於機器學習演算法之特徵工程係昂貴的。例如,用於機器學習演算法之特徵工程將花費大量時間來產生各種特徵組合且檢測該等組合。然而,從此一高成本模型獲得之特徵組合可僅適用於當前研究領域。例如,應針對與另一疾病相關聯之病理報告產生一組全新特徵組合。一旦研究領域或疾病改變,應針對一組新特徵組合重新開始特徵工程,且應使用該組新不同特徵組合再次訓練模型。經建構或訓練之機器學習模型難以達成知識分享之效應。另外,針對類似問題之已知機器學習演算法係不可解譯的,此係因為此等機器學習演算法僅產生許多不可解譯之參數及概率。In the case of using machine learning algorithms (or deep learning algorithms), various feature combinations must be constantly tried during model training to increase performance. The process of generating various combinations of features may be called feature engineering. Feature engineering for machine learning algorithms is expensive. For example, feature engineering for machine learning algorithms will take a lot of time to generate various feature combinations and detect these combinations. However, the combination of features obtained from this high-cost model may only be applicable to the current research area. For example, a completely new set of feature combinations should be generated for a pathology report associated with another disease. Once the research domain or disease changes, feature engineering should be restarted for a new set of feature combinations, and the model should be trained again using this new and different set of feature combinations. It is difficult for the constructed or trained machine learning model to achieve the effect of knowledge sharing. In addition, known machine learning algorithms for similar problems are not interpretable because they only generate many uninterpretable parameters and probabilities.

人類知識可被積累。在人類思維中,以一語言陳述之一文章或一問題之情況/狀況可透過一些重要語意術語來縮小。例如,當一臨床醫師解譯一病理報告時,如果術語「免疫組織化學」、「免疫反應」及「天冬氨酸蛋白酶A」同時出現在報告中,則其將自發地認為對應語句與天冬氨酸蛋白酶A之免疫組織化學染色反應相關,此係因為該三個術語具有一強烈相關性。即人類可藉由略讀來閱讀一文章。如果一病理報告中之術語可滿足一臨床醫師之知識架構,則臨床醫師可理解在一語句或段落中描述之病理特徵。Human knowledge can be accumulated. In human thinking, the situation/situation of an article or a problem stated in a language can be narrowed down through some important semantic terms. For example, when a clinician interprets a pathology report, if the terms "immunohistochemistry," "immune response," and "aspartic acid protease A" appear in the report at the same time, he will automatically assume that the corresponding sentences are related to pathology. The immunohistochemical staining reactions of aspartate proteinase A are related because the three terms have a strong correlation. That is, humans can read an article by skimming. If the terminology in a pathology report satisfies a clinician's knowledge structure, the clinician can understand the pathological characteristics described in a sentence or paragraph.

透過識別重要實體或相關內容以篩選出可能候選者來獲得人類對一主題之感知。例如,當如「免疫組織化學」及「天冬氨酸蛋白酶A」之高度相關字詞同時出現在一語句中時,自然地得出結論,此更有可能係描述患者對天冬氨酸蛋白酶A免疫組織化學染色之反應之一語句。本發明可類似於人類在略讀一病理報告以捕捉其主要思想時所做之事。再者,從不同主題獲取之知識可被積累且用於辨識其他新主題。Obtain human perception of a topic by identifying important entities or related content to filter out possible candidates. For example, when highly related words such as "immunohistochemistry" and "aspartic acid protease A" appear together in one sentence, it is natural to conclude that this is more likely to describe the patient's response to aspartic protease A. A statement of the reaction of immunohistochemical staining. The invention may be similar to what humans do when skimming a pathology report to capture its main ideas. Furthermore, knowledge gained from different topics can be accumulated and used to identify other new topics.

鑑於此,本發明之方法模仿人類理解之感知行為,此係從病理報告中之原始文字學習特性化肺癌領域之語意模式之一高度自動化方法。本發明之一個主要優點係高精度及知識積累能力。面對一新領域,可藉由增添新規則來進一步擴展知識以適應未知資訊。In view of this, the method of the present invention imitates the perceptual behavior of human understanding, which is a highly automated method of learning semantic patterns that characterize the lung cancer field from the original text in the pathology report. One of the main advantages of the present invention is its high accuracy and knowledge accumulation capabilities. When facing a new field, knowledge can be further expanded to accommodate unknown information by adding new rules.

不同於機器學習演算法,本發明提供用於自然語言理解之一新穎方法。關於病理特徵之獲取,本發明模擬一臨床醫師在閱讀一病理報告(例如,一肺癌患者之一病理報告)時之行為。本發明可透過瞭解要點或語意術語來快速縮小一文章或一問題之情況/狀況。原因包含文章(或問題)之要旨與相鄰術語之間的強烈相關性。因此,可自然地識別要旨或問題。因此,在本發明中提供之方法或演算法可為可解譯的。Different from machine learning algorithms, the present invention provides a novel method for natural language understanding. Regarding the acquisition of pathological features, the present invention simulates the behavior of a clinician when reading a pathology report (for example, a pathology report of a lung cancer patient). The present invention can quickly narrow down the situation/situation of an article or a problem by understanding the key points or semantic terms. Reasons include strong correlations between the gist of the article (or question) and adjacent terms. Therefore, the gist or question can be identified naturally. Therefore, the method or algorithm provided in the present invention may be interpretable.

除了獲取一病理報告之重要特徵之外,本發明亦著重於無法由僵化正規表示式實行之靈活比較。因此,本發明在病理語意模式匹配期間具有一較高自由度。In addition to capturing important features of a pathology report, the present invention also focuses on flexible comparisons that cannot be performed by rigid formal expressions. Therefore, the present invention has a higher degree of freedom during pathological semantic pattern matching.

本發明提供一病理語意模式產生演算法。病理語意模式產生演算法可用於肺癌患者。在一些實施例中,病理語意模式產生演算法可用於各種癌症之患者,包含但限於前列腺癌、大腸直腸癌、胃癌、乳腺癌、大腸直腸癌、及子宮頸癌。在本發明之一些實施例中,可基於500個肺癌患者之病理報告來產生用於識別肺癌之病理特徵之語意模式。The present invention provides a pathological semantic pattern generation algorithm. Pathological semantic pattern generation algorithm can be used for lung cancer patients. In some embodiments, the pathological semantic pattern generation algorithm can be used for patients with various cancers, including but limited to prostate cancer, colorectal cancer, gastric cancer, breast cancer, colorectal cancer, and cervical cancer. In some embodiments of the present invention, a semantic pattern for identifying pathological features of lung cancer can be generated based on pathology reports of 500 lung cancer patients.

在本發明中,產生肺癌之病理語意模式之程序可被視為頻繁樣式探勘之一問題。可基於病理報告中之術語之共現來建構肺癌之一病理語意關聯圖。病理語意關聯圖可描述不同術語之間的強度語意相關性。In the present invention, the process of generating pathological semantic patterns of lung cancer can be regarded as a problem of frequent pattern exploration. A pathological semantic association map of lung cancer can be constructed based on the co-occurrence of terms in pathology reports. The pathological semantic association map can describe the strong semantic correlation between different terms.

圖1繪示根據本發明之一些實施例之一病理語意關聯圖100。可基於一或多個肺癌患者之一或多個病理報告中之術語之共現來建構病理語意關聯圖100。在一些實施例中,基於出現在多個患者之多個病理報告中之術語來建構病理語意關聯圖100。在計算病理報告中出現之各術語之出現頻率及病理報告中出現之不同術語之間的共現次數之後,可建構病理語意關聯圖100。Figure 1 illustrates a pathological semantic association diagram 100 according to some embodiments of the present invention. The pathology semantic association graph 100 may be constructed based on the co-occurrence of terms in one or more pathology reports of one or more lung cancer patients. In some embodiments, the pathology semantic association graph 100 is constructed based on terms appearing in multiple pathology reports for multiple patients. After calculating the frequency of occurrence of each term appearing in the pathology report and the number of co-occurrences between different terms appearing in the pathology report, the pathology semantic association graph 100 can be constructed.

由於待產生之病理語意模式(例如,針對肺癌)可為一有順序性的有向圖,因此本發明建構具有關聯規則之一語意關聯圖。Since the pathological semantic pattern to be generated (for example, for lung cancer) can be a sequential directed graph, the present invention constructs a semantic association graph with association rules.

圖1中之各頂點可指示不同術語。例如,術語(或片語) S 1及S 2表達為不同頂點。圖1中之各邊基於共現術語(或片語)來建構。例如,術語S 1及S 2之共現C 12表達為對應頂點之間的邊,其中術語S 1在術語S 2之前出現。換言之,共現C 12指示術語S 1之共現在下一術語S 2之前出現。共現C ij之值指示對術語S i及S j之信賴度。共現C ij之值指示一條件概率,其係對術語S j出現之概率之一量度(假定S i已出現)。在一些實施例中,與術語S 2共現之術語S 1可為術語S 2之一上位術語。與術語S 2共現之術語S 1可為術語S 2之一從屬術語。對術語S i及S j之信賴度被定義為方程式(1)。 方程式(1) Each vertex in Figure 1 may indicate a different term. For example, the terms (or phrases) S 1 and S 2 are expressed as different vertices. Each edge in Figure 1 is constructed based on co-occurring terms (or phrases). For example, the co-occurrence C 12 of terms S 1 and S 2 is expressed as an edge between corresponding vertices, where term S 1 occurs before term S 2 . In other words, co-occurrence C 12 indicates that the co-occurrence of term S 1 occurs before the next term S 2 . The value of co-occurrence C ij indicates the degree of trust in the terms Si and S j . The value of co-occurrence C ij indicates a conditional probability, which is a measure of the probability of occurrence of term S j (assuming that S i has occurred). In some embodiments, the term S 1 that co-occurs with the term S 2 may be a generic term of the term S 2 . The term S 1 that co-occurs with the term S 2 may be a dependent term of the term S 2 . The degree of confidence in terms S i and S j is defined as equation (1). Equation (1)

支持度可被定義為「屬於該類別之真實回應之樣本數目」。在本發明中,術語S i之支持度可指示術語S i之出現頻率。在一些實施例中,術語S i之支持度可指示術語S i之出現次數。術語S i及下一術語S j之支持度可指示術語S i及下一術語S j之共現頻率。術語S i及下一術語S J之支持度可指示術語S i及下一術語S j之共現次數。一信賴度與對應支持度之間的關係可被定義為方程式(2)。 方程式(2) Support can be defined as "the number of samples of true responses belonging to that category." In the present invention, the support of the term Si can indicate the frequency of occurrence of the term Si . In some embodiments, the support of term Si may indicate the number of occurrences of term Si . The support of term Si and the next term S j may indicate the co-occurrence frequency of term Si and the next term S j . The support of the term Si and the next term S J may indicate the number of co-occurrences of the term Si and the next term S j . The relationship between a trust degree and the corresponding support degree can be defined as equation (2). Equation (2)

為了使所產生之語意模式在病理特徵方面具有區分性,在本發明之一些實施例中保留具有較高頻率之術語。根據所保留之頻繁術語之共現次數,可建構肺癌之病理語意關聯圖。在本發明之一些實施例中,最小支持度被設定為10,且最小信賴度被設定為0.3,以便避免產生過短之語意模式。In order to make the generated semantic patterns distinctive in terms of pathological characteristics, terms with higher frequencies are retained in some embodiments of the present invention. Based on the co-occurrence times of retained frequent terms, a pathological semantic association graph of lung cancer can be constructed. In some embodiments of the present invention, the minimum support is set to 10, and the minimum reliability is set to 0.3, in order to avoid generating too short semantic patterns.

在建構肺癌之病理語意關聯圖之後,基於隨機漫步將具有較高頻率及較佳區分性之術語串在一起。可基於串在一起之術語來產生一病理特徵之語意模式。After constructing the pathological semantic association map of lung cancer, terms with higher frequency and better distinction are strung together based on random walk. A semantic pattern of pathological characteristics can be generated based on terms strung together.

肺癌之一病理語意關聯圖可被定義為 ,其中 。V指示頂點集合(例如,圖1中展示之術語S 1至S 6之頂點集合),E指示邊集合(例如,圖1中展示之共現C ij之邊集合),p指示集合V中之元素數目,且k指示集合E中之元素數目。可對圖表G執行一隨機漫步程序。隨機漫步程序可由一系列隨機路徑組成。C ij之值指示從術語S i移動至術語S j之概率。針對一個術語S n,全部相鄰術語之集合可表達為N(S n)。針對一個術語S n,從術語S n移動至集合N(S n)之各術語之概率之總和滿足方程式(3)。 方程式(3) One pathological semantic association map of lung cancer can be defined as ,in . V refers to the vertex set (for example, the vertex set of the terms S 1 to S 6 shown in Figure 1), E refers to the edge set (for example, the edge set of the co-occurring C ij shown in Figure 1), and p refers to the set V in The number of elements, and k indicates the number of elements in the set E. A random walk procedure can be performed on graph G. A random walk program can consist of a sequence of random paths. The value of C ij indicates the probability of moving from term Si to term S j . For a term S n , the set of all adjacent terms can be expressed as N(S n ). For a term Sn , the sum of the probabilities of each term moving from the term Sn to the set N(S n ) satisfies equation (3). Equation (3)

透過將一隨機漫步程序應用至病理語意關聯圖G,所獲得之概率矩陣Pr符合方程式(4)。X k指示隨機漫步程序之第k個步驟。因此,一系列隨機產生之頂點係一馬爾可夫鏈。 方程式(4) By applying a random walk procedure to the pathological semantic association graph G, the probability matrix Pr obtained conforms to equation (4). Xk indicates the kth step of the random walk procedure. Therefore, a series of randomly generated vertices is a Markov chain. Equation (4)

根據L Lovász博士之研究結果,一圖表上之覆蓋時間(CT)可表達為方程式(5)。 方程式(5) According to the research results of Dr. L Lovász, the coverage time (CT) on a graph can be expressed as equation (5). Equation (5)

因此,透過將隨機漫步理論應用於肺癌之病理語意關聯圖,本發明可搜尋可能語意頻繁模式。除了避免具有較低概率之組合之丟失之外,本發明亦可產生不同病理特徵之語意模式。Therefore, by applying the random walk theory to the pathological semantic association graph of lung cancer, the present invention can search for possible semantic frequent patterns. In addition to avoiding the loss of combinations with lower probability, the present invention can also generate semantic patterns of different pathological characteristics.

本發明透過肺癌之病理語意關聯圖描述各頻繁術語之間的相關性。接著,透過將隨機漫步程序應用於病理語意關聯圖,將病理報告中頻繁出現之術語串在一起,且產生病理特徵之語意模式。The present invention describes the correlation between frequent terms through a pathological semantic correlation diagram of lung cancer. Then, by applying the random walk procedure to the pathological semantic association graph, terms frequently appearing in pathology reports are strung together and semantic patterns of pathological characteristics are generated.

然而,可透過隨機漫步理論產生一些冗餘語意模式,且一些進一步整合可為必要的。在一些實施例中,本發明移除完全包含於另一語意模式中之一語意模式,且藉此嘗試保留具有較長長度及較佳涵蓋率之語意模式(或語意框架)。例如,產生「S 1àS 2àS 3」及「S 1àS 4àS 2àS 3àS 5àS 6」之語意模式(或語意框架),且前一模式將被保留,此係因為前一模式包含後一模式。換言之,如果一第二語意模式係一第一語意模式之一子集,則第二語意模式將被移除,且第一語意模式將被保留。此外,如果一第一語意模式主宰一第二語意模式,則第二語意模式將被移除,且第一語意模式將被保留。在移除特定語意模式之後,剩餘語意模式之各者可為一組無序術語/片語(S i)或一組有序術語/片語(S i)。 However, some redundant semantic patterns may be generated through random walk theory, and some further integration may be necessary. In some embodiments, the present invention removes a semantic pattern that is completely included in another semantic pattern, and thereby attempts to retain a semantic pattern (or semantic framework) with longer length and better coverage. For example, the semantic patterns (or semantic frames) of "S 1 àS 2 àS 3 " and "S 1 àS 4 àS 2 àS 3 àS 5 àS 6 " are generated, and the previous pattern will be retained because the previous pattern contains The latter mode. In other words, if a second semantic pattern is a subset of a first semantic pattern, the second semantic pattern will be removed and the first semantic pattern will be retained. Furthermore, if a first semantic pattern dominates a second semantic pattern, the second semantic pattern will be removed and the first semantic pattern will be retained. After removing a specific semantic pattern, each of the remaining semantic patterns may be a set of unordered terms/phrases (S i ) or a set of ordered terms/phrases (S i ).

圖2係根據本發明之一些實施例之一方法200之一流程圖。方法200可從一或多個患者之一或多個病理報告提取關鍵語意模式。方法200包含操作201。在操作201中,可基於一語意術語及一下一語意術語之共現來判定該語意術語與下一語意術語之間的一信賴度及一支持度。在一或多個病理報告中,該語意術語可在下一語意術語之前出現。Figure 2 is a flowchart of a method 200 according to some embodiments of the present invention. Method 200 may extract key semantic patterns from one or more pathology reports of one or more patients. Method 200 includes operation 201 . In operation 201, a degree of trust and a degree of support between a semantic term and a next semantic term may be determined based on the co-occurrence of a semantic term and a next semantic term. This semantic term may appear before the next semantic term in one or more pathology reports.

方法200包含操作203。在操作203中,可從一或多個病理報告產生或選擇一組候選語意術語(或片語)。在該組候選語意術語中,一個候選語意術語與對應下一候選語意術語之間的信賴度值等於或大於一信賴臨限值。在該組候選語意術語中,一個候選語意術語與對應下一候選語意術語之間的支持度值等於或大於一支持臨限值。Method 200 includes operation 203. In operation 203, a set of candidate semantic terms (or phrases) may be generated or selected from one or more pathology reports. In the group of candidate semantic terms, the reliability value between one candidate semantic term and the corresponding next candidate semantic term is equal to or greater than a trust threshold value. In the group of candidate semantic terms, the support value between one candidate semantic term and the corresponding next candidate semantic term is equal to or greater than a support threshold value.

方法200包含操作205。在操作205中,可從該組候選語意術語產生或選擇一第一組語意模式。可透過對該組候選語意術語執行隨機漫步來產生或選擇第一組語意模式。Method 200 includes operation 205. In operation 205, a first set of semantic patterns may be generated or selected from the set of candidate semantic terms. The first set of semantic patterns may be generated or selected by performing a random walk on the set of candidate semantic terms.

方法200包含操作207。在操作207中,可從第一組語意模式產生或選擇關鍵語意模式。可透過從第一組語意模式移除冗餘語意模式來產生或選擇關鍵語意模式。關鍵語意模式之各者可為一組無序術語/片語。關鍵語意模式之各者可為一組有序術語/片語。Method 200 includes operation 207. In operation 207, key semantic patterns may be generated or selected from the first set of semantic patterns. Key semantic patterns may be generated or selected by removing redundant semantic patterns from the first set of semantic patterns. Each of the key semantic patterns may be an unordered set of terms/phrases. Each of the key semantic patterns may be a set of ordered terms/phrases.

方法200包含操作209。在操作209中,可基於關鍵語意模式從一或多個病理報告獲取複數個病理特徵。複數個病理特徵可包含表1中列出之例示性50個病理特徵。形成一病理之所獲取、提取或總結之病理特徵可容許臨床醫師快速瞭解對應患者之情況或狀況。Method 200 includes operation 209. In operation 209, a plurality of pathology features may be obtained from one or more pathology reports based on key semantic patterns. The plurality of pathological characteristics may include the exemplary 50 pathological characteristics listed in Table 1. The acquired, extracted or summarized pathological features forming a pathology can allow clinicians to quickly understand the situation or condition of the corresponding patient.

在本發明之一些實施例中,可基於一語意術語及下一語意術語之共現次數來產生或計算該語意術語與下一語意術語之間的支持度。在出現一語意術語之情況下,可基於下一語意術語之一出現概率來產生或計算該語意術語與下一語意術語之間的信賴度值。In some embodiments of the present invention, the support between a semantic term and the next semantic term can be generated or calculated based on the number of co-occurrences of a semantic term and the next semantic term. When a semantic term appears, a reliability value between the semantic term and the next semantic term may be generated or calculated based on the occurrence probability of one of the next semantic terms.

在本發明之一些實施例中,信賴臨限值可被設定為0.3,且支持臨限值可被設定為10。可在0.2至0.5之範圍內選擇信賴臨限值。可在7至12之範圍內選擇支持臨限值。In some embodiments of the present invention, the trust threshold may be set to 0.3, and the support threshold may be set to 10. The trust threshold can be selected in the range of 0.2 to 0.5. Support thresholds can be selected in the range 7 to 12.

在本發明之一些實施例中,當一第二語意模式係一第一語意模式之一子集時,移除第一語意模式。即如果一第一語意模式包含一第二語意模式,則移除第一語意模式。In some embodiments of the present invention, when a second semantic pattern is a subset of a first semantic pattern, the first semantic pattern is removed. That is, if a first semantic pattern includes a second semantic pattern, the first semantic pattern is removed.

在本發明之一些實施例中,一語意術語與下一語意術語之間的信賴度與一先前語意術語與該語意術語之間的一第二信賴度值無關。在病理報告中,先前語意術語在該語意術語之前出現,且該語意術語在下一語意術語之前出現。In some embodiments of the present invention, the reliability between a semantic term and the next semantic term is independent of a second reliability value between a previous semantic term and the semantic term. In pathology reports, the previous semantic term appears before this semantic term, and this semantic term appears before the next semantic term.

在本發明之一些實施例中,透過病理語意模式產生演算法產生肺癌之重要病理特徵之50個語意模式。在臨床醫師驗證之後,下文描述獲取六個類別之病理特徵之方法,包含:基本描述、(若干)腫瘤之發現、(若干)腫瘤之組織學描述、IHC資訊、基因檢測結果及TNM期別。In some embodiments of the present invention, 50 semantic patterns of important pathological features of lung cancer are generated through a pathological semantic pattern generation algorithm. After verification by clinicians, the method for obtaining six categories of pathological characteristics is described below, including: basic description, discovery of (several) tumors, histological description of (several) tumors, IHC information, genetic test results, and TNM stage.

一病理報告包含分類於基本描述中之資訊。可在SOAP (主觀、客觀、評估、及計劃)區段中提供此基本描述。「主觀」可指示主觀資料,包含由患者表示之主訴、症狀、病史、藥物過敏史、藥物不良反應史、及用藥史等。「客觀」可指示客觀資料,包含生命徵象、體檢結果、實驗室檢測結果、及患者之醫學成像結果。「評估」可包含印象/診斷、患者之狀況、疾病狀況、及對治療之分析及評估。「計劃」可包含診斷方法(實驗室檢測)、治療方法(用藥、程序、手術等)、及健康照護教育方法。A pathology report contains information classified in the basic description. This basic description can be provided in the SOAP (Subjective, Objective, Assessment, and Plan) section. "Subjective" can refer to subjective data, including the main complaints, symptoms, medical history, drug allergy history, adverse drug reaction history, and medication history expressed by the patient. "Objective" can refer to objective data, including vital signs, physical examination results, laboratory test results, and patient medical imaging results. "Assessment" may include impression/diagnosis, patient's condition, disease status, and analysis and evaluation of treatment. A "plan" may include diagnostic methods (laboratory tests), treatment methods (medications, procedures, surgeries, etc.), and health care education methods.

SOAP區段之內容可由逗號(即「,」(中文則使用頓號「、」))分離或劃分為多個部分。在一些實施例中,當在SOAP區段中由逗號分離之部分之數目小於4時,SOAP區段之內容可為檢測結果。針對肺癌患者,當在SOAP區段中由逗號分離之部分之數目大於或等於4時,SOAP區段之內容可為肺部描述。The content of a SOAP section can be separated or divided into multiple parts by commas (i.e. "," (in Chinese, use commas ",")). In some embodiments, when the number of comma-separated parts in the SOAP section is less than 4, the content of the SOAP section may be the detection result. For lung cancer patients, when the number of parts separated by commas in the SOAP section is greater than or equal to 4, the content of the SOAP section can be a description of the lungs.

表2展示四個例示性SOAP區段。針對表2中之案例1、2、及4之各者,部分之數目被判定為小於4。針對表2中之案例1、2、及4之各者,SOAP區段之內容被判定為與檢測結果相關。針對表2中之案例3,部分之數目被判定為大於4。 案例 SOAP區段 1 ALK表示式為負。 2 未偵測到ROS1基因重排。 3 肺部,下葉,右側,支氣管鏡活檢,腺癌 4 EGFR外顯子18突變:未偵測到突變 EGFR外顯子19突變:未偵測到突變 EGFR外顯子20突變:外顯子20插入(EX20Ins) EGFR外顯子21突變:未偵測到突變 2 Table 2 shows four exemplary SOAP sections. For each of cases 1, 2, and 4 in Table 2, the number of parts is judged to be less than 4. For each of cases 1, 2, and 4 in Table 2, the content of the SOAP section is determined to be relevant to the detection results. For case 3 in Table 2, the number of parts is judged to be greater than 4. Case SOAP section 1 The ALK expression is negative. 2 No ROS1 gene rearrangements were detected. 3 Lung, lower lobe, right side, bronchoscopy biopsy, adenocarcinoma 4 EGFR exon 18 mutations: no mutations detected EGFR exon 19 mutations: no mutations detected EGFR exon 20 mutations: exon 20 insertion (EX20Ins) EGFR exon 21 mutations: no mutations detected mutation Table 2

在一些實施例中,當在SOAP區段中由逗號分離之部分之數目等於4時,此四個部分將依序與「器官」、「位置」、「取樣方法」及「診斷」相關。即第一部分將與一或多個器官相關,第二部分將與一或多個位置相關,第三部分將與一或多個取樣方法相關,且第四部分將與一或多個診斷相關。In some embodiments, when the number of comma-separated parts in the SOAP section is equal to 4, the four parts will be related to "organ", "location", "sampling method" and "diagnosis" in order. That is, the first part will be related to one or more organs, the second part will be related to one or more locations, the third part will be related to one or more sampling methods, and the fourth part will be related to one or more diagnoses.

在一些實施例中,當在SOAP區段中由逗號分離之部分之數目大於4時,可合併一些部分。例如,針對基本描述產生之語意模式指示與「器官」相關之部分之一些關鍵語意模式及與「取樣方法」相關之部分之一些關鍵語意模式。可透過一些關鍵語意模式來定位或識別與「器官」相關之一或多個部分。可透過一些關鍵語意模式來定位或識別與「取樣方法」相關之一或多個部分。針對肺癌患者,在表3中列出用於定位與「器官」及「取樣方法」相關之部分之例示性關鍵語意模式。在定位與「器官」及「取樣方法」相關之一或多個部分之後,與「位置」相關之一或多個部分可定位於與「器官」及「取樣方法」相關之部分之間。例如,如果第一部分及第五部分分別與「器官」及「取樣方法」相關,則第二部分至第四部分將被判定為與「位置」相關。與「取樣方法」相關之部分之後的一或多個部分將被判定為與「診斷」相關。例如,如果SOAP區段被劃分為七個部分,且如果第五部分與「取樣方法」相關,則第六部分及第七部分將被判定為與「診斷」相關。在一些較佳實施例中,僅在與「取樣方法」相關之部分之後的一個部分將被判定為與「診斷」相關。 部分 關鍵語意模式 器官 肺部、淋巴結、腦部、皮膚 取樣方法 活檢、VATS、EBUS 3 In some embodiments, when the number of parts separated by commas in the SOAP section is greater than 4, some parts may be merged. For example, the semantic patterns generated for the basic description indicate some key semantic patterns for the part related to "organ" and some key semantic patterns for the part related to "sampling method". One or more parts related to the "organ" can be located or identified through some key semantic patterns. One or more parts related to the "sampling method" can be located or identified through some key semantic patterns. For lung cancer patients, exemplary key semantic patterns for locating parts related to "organ" and "sampling method" are listed in Table 3. After locating one or more parts related to "organ" and "sampling method", one or more parts related to "location" can be positioned between the parts related to "organ" and "sampling method". For example, if the first and fifth parts are related to "organ" and "sampling method" respectively, then the second to fourth parts will be judged to be related to "location". One or more sections following the section related to "Sampling Methods" will be judged to be related to "Diagnosis". For example, if the SOAP section is divided into seven parts, and if the fifth part is related to "Sampling Method", then the sixth and seventh parts will be judged to be related to "Diagnosis". In some preferred embodiments, only a portion following the portion related to "Sampling Method" will be determined to be related to "Diagnosis". part Key semantic patterns organ Lungs, lymph nodes, brain, skin Sampling method Biopsy, VATS, EBUS Table 3

針對表2中之案例3,SOAP區段之內容係「肺部,下葉,右側,支氣管鏡活檢,腺癌。」針對案例3,部分之數目被判定為大於4。基於表3中之關鍵語意模式,案例3中之部分「肺部」將被判定為與「器官」相關,案例3中之「支氣管鏡活檢」部分將被判定為與「取樣方法」相關。在案例3中,部分「下葉」及「右側」將被判定為與「位置」相關,此係因為其等定位於與「器官」及「取樣方法」相關之部分之間。在案例3中,部分「腺癌」將被判定為與「診斷」相關,此係因為其在與「取樣方法」相關之部分之後。針對表2中之案例3,與「器官」、「位置」、「取樣方法」及「診斷」相關之部分用灰色標記。For Case 3 in Table 2, the content of the SOAP section is "Lung, lower lobe, right side, bronchoscopy biopsy, adenocarcinoma." For Case 3, the number of sections is judged to be greater than 4. Based on the key semantic patterns in Table 3, the "lung" part in Case 3 will be judged to be related to "organ", and the "Bronchoscopic Biopsy" part in Case 3 will be judged to be related to "Sampling Method". In case 3, parts of "lower lobe" and "right side" will be judged to be related to "location" because they are located between parts related to "organ" and "sampling method". In case 3, the part "adenocarcinoma" will be judged to be related to "diagnosis" because it follows the part related to "sampling method". For Case 3 in Table 2, the parts related to "organ", "location", "sampling method" and "diagnosis" are marked in gray.

圖3繪示根據本發明之一些實施例之一病理報告之一區段。圖3展示一病理報告之SOAP區段。在圖3之情況中,SOAP區段包含6個語句。在圖3中由各語句之逗號分離之部分之數目將被判定為大於4。根據本發明,具有多於4個部分之此六個語句將被選為候選者以供進一步處理。針對圖3中之各語句,將基於表3中列出之與「器官」相關之部分之一或多個關鍵語意模式來判定哪一(些)部分與「器官」相關。針對圖3中之各語句,將基於表3中列出之與「器官」相關之部分之一或多個關鍵語意模式來判定哪一(些)部分與「取樣方法」相關。在判定與「器官」及「取樣方法」相關之部分之後,與「器官」及「取樣方法」相關之部分之間的部分將被判定為與「位置」相關。與「取樣方法」相關之部分之後的一個部分將被判定為與「診斷」相關。針對圖3之案例,輸出基本描述將為「器官:肺部/位置:上葉,左側/取樣方法:VATS葉切除術/診斷:腺癌。」在圖3之案例中,與「器官」、「位置」、「取樣方法」及「診斷」相關之部分用灰色標記。Figure 3 illustrates a section of a pathology report according to some embodiments of the present invention. Figure 3 shows the SOAP section of a pathology report. In the case of Figure 3, the SOAP section contains 6 statements. The number of parts separated by commas of each statement in Figure 3 will be judged to be greater than 4. According to the present invention, these six statements with more than 4 parts will be selected as candidates for further processing. For each statement in Figure 3, which part(s) are related to "organ" will be determined based on one or more key semantic patterns of the parts related to "organ" listed in Table 3. For each statement in Figure 3, which part(s) are related to the "sampling method" will be determined based on one or more key semantic patterns of the parts related to "organs" listed in Table 3. After determining the part related to "organ" and "sampling method", the part between the parts related to "organ" and "sampling method" will be judged to be related to "location". A section following the section related to "Sampling Methods" will be judged to be related to "Diagnosis". For the case in Figure 3, the basic description of the output will be "Organ: Lung/Location: Upper lobe, left side/Sampling method: VATS lobectomy/Diagnosis: adenocarcinoma." In the case of Figure 3, with "Organ", Sections related to "Location", "Sampling Method" and "Diagnosis" are marked in gray.

可針對一肺癌患者選擇或產生表3中列出之與「器官」相關之部分之一或多個關鍵語意模式。表3中列出之與「器官」相關之部分之一或多個關鍵語意模式可包含除「肺部」以外之術語。原因係肺癌之轉移或侵襲可發生在其他器官中。因此,表3中列出之與「器官」相關之部分之一或多個關鍵語意模式可包含如「淋巴結」、「腦部」及「皮膚」之術語。One or more key semantic patterns of the "organ"-related parts listed in Table 3 may be selected or generated for a lung cancer patient. One or more of the key semantic patterns listed in Table 3 for parts related to "organ" may include terms other than "lungs." The reason is that metastasis or invasion of lung cancer can occur in other organs. Therefore, one or more of the key semantic patterns of parts related to "organ" listed in Table 3 may include terms such as "lymph node", "brain" and "skin".

在本發明之一些實施例中,可進一步標準化與「位置」及「診斷」相關之部分中之內容以供輸出。例如,與「位置」相關之(若干)部分中之內容可包含各種類型之描述。與「位置」相關之(若干)部分中之內容可進一步標準化為「左上葉」、「左下葉」、「右上葉」、「右中葉」或「右下葉」之至少一者。如果腫瘤侵襲其他器官,則與「位置」相關之(若干)部分中之內容可進一步標準化為「胸膜」、「骨骼」、「腦部」或「皮膚」之至少一者。與「診斷」相關之(若干)部分中之內容亦可包含各種類型之描述。與「診斷」相關之(若干)部分中之內容可進一步標準化為「腺癌」、「非小細胞癌」、「鱗狀細胞癌」或「大細胞癌」之至少一者。In some embodiments of the present invention, the content in the sections related to "location" and "diagnosis" can be further standardized for output. For example, content in the section(s) related to "location" may include various types of descriptions. The content in the part(s) related to "location" may be further standardized as at least one of "upper left lobe", "lower left lobe", "upper right lobe", "middle right lobe" or "lower right lobe". If the tumor invades other organs, the content in the section(s) related to "location" can be further standardized to at least one of "pleura", "bone", "brain" or "skin". The content in the section(s) related to "Diagnosis" may also include various types of descriptions. The content in the section(s) related to "Diagnosis" may be further standardized as at least one of "adenocarcinoma", "non-small cell carcinoma", "squamous cell carcinoma" or "large cell carcinoma".

一病理報告包含分類於(若干)腫瘤之發現中之資訊。根據本發明之一些實施例,針對(若干)腫瘤之發現而產生之語意模式與體積相關聯。(若干)腫瘤之發現之語意模式可為體積之一語意模式。(若干)腫瘤之發現之語意模式可包含多個數字、多個乘法符號、及一單位。例如,發現之語意模式可包含三個數字、兩個乘法符號、及一單位。第一乘法符號可在第一數字與第二數字之間。第二乘法符號可在第二數字與第三數字之間。單位可位於第三數字之後。單位可為一長度單位(例如,「cm」或「mm」)或一體積單位。A pathology report contains information classified among the findings of tumor(s). According to some embodiments of the invention, the semantic pattern generated for the discovery of tumor(s) is associated with volume. The semantic pattern of the discovery of (several) tumors may be a semantic pattern of volume. The semantic pattern of (several) tumor findings may include multiple numbers, multiple multiplication symbols, and one unit. For example, the discovered semantic pattern may include three numbers, two multiplication symbols, and one unit. The first multiplication symbol may be between the first number and the second number. The second multiplication symbol may be between the second number and the third number. The unit can be placed after the third digit. The unit can be a unit of length (for example, "cm" or "mm") or a unit of volume.

根據本發明之一些實施例,可基於針對(若干)腫瘤之發現而產生之語意模式來選擇一或多個候選片段。如果一或多個候選片段之語意脈絡包含表4中列出之關鍵語意模式之一或多個術語,則相關聯之一或多個體積值將被判定為腫瘤大小之資訊。此外,指示一體積之值之最大值將被判定為最大尺寸之值。例如,「50 mm × 35 mm × 20 mm」包含指示體積之三個值,且最大值「50」被判定為最大尺寸之值。 腫瘤大小之關鍵語意模式 切割、硬質腫瘤、腫瘤量測 4 According to some embodiments of the invention, one or more candidate fragments may be selected based on semantic patterns generated from findings for tumor(s). If the semantic context of one or more candidate fragments contains one or more terms of the key semantic patterns listed in Table 4, then the associated one or more volume values will be determined as information about tumor size. In addition, the maximum value of the values indicating a volume will be judged as the value of the maximum size. For example, "50 mm × 35 mm × 20 mm" contains three values indicating the volume, and the maximum value "50" is determined as the value of the largest size. Key semantic patterns of tumor size Cutting, hard tumors, tumor measurement Table 4

除(若干)腫瘤之發現之外,一病理報告亦包含分類於(若干)腫瘤之組織學資訊中之資訊。可在一實驗室中透過顯微鏡觀察(若干)腫瘤之發現之資訊及(若干)腫瘤之組織學資訊。可在一病理報告之不同項目中描述透過顯微鏡觀察到之資訊。可在一病理報告之「顯微鏡評估」區段中描述與顯微鏡資訊相關聯之項目。In addition to the findings of the tumor(s), a pathology report also contains information classified as histological information of the tumor(s). Information on the discovery of (several) tumors and histological information on (several) tumors can be observed through a microscope in a laboratory. Information observed through a microscope can be described in different items of a pathology report. Items associated with microscopy information may be described in the "Microscopic Assessment" section of a pathology report.

在本發明之一些實施例中,可基於術語「顯微鏡評估」來定位或識別顯微鏡評估區段。可在一冒號(即「:」)之後描述顯微鏡評估區段中之一項目之資訊。在本發明之一些實施例中,可透過定位對應冒號(即「:」)來獲取在各項目中描述之資訊。例如,在定位顯微鏡評估區段之後,可藉由搜尋對應項目名稱來定位一給定項目,接著定位給定項目之對應冒號,且將獲取定位於給定項目之冒號之後的資訊作為該項目之資訊。顯微鏡評估區段可包含「腫瘤灶性」、「組織學類型」、「組織學期別」、「淋巴血管侵襲」、「內臟胸膜侵襲」及「最近邊緣」之至少一個項目。「腫瘤病灶」、「淋巴血管侵襲」、「內臟胸膜侵襲」、「最近邊緣」之項目可被分類於(若干)腫瘤之發現中。「組織學類型」及「組織學期別」之項目可被分類於(若干)腫瘤之組織學資訊中。In some embodiments of the invention, a microscope assessment section may be located or identified based on the term "microscope assessment." Information about one of the items in the microscope evaluation section can be described after a colon (i.e. ":"). In some embodiments of the present invention, the information described in each item can be obtained by locating the corresponding colon (ie, ":"). For example, after locating the microscope evaluation section, a given item can be located by searching for the corresponding item name, then the corresponding colon of the given item is located, and the information after the colon located in the given item is obtained as the item. information. The microscopic evaluation section may include at least one item of "tumor focus", "histological type", "histological classification", "lymphovascular invasion", "visceral pleural invasion" and "nearest edge". The items of "tumor focus", "lymphovascular invasion", "visceral pleural invasion", and "nearest edge" can be classified into (several) tumor findings. The items of "histological type" and "histological category" can be classified into the histological information of (several) tumors.

一病理報告包含分類於IHC資訊中之資訊。根據本發明之一些實施例,針對IHC資訊產生之語意模式指示表5中列出之術語(其等可來自一或多個關鍵語意模式)可被視為IHC區段之一初始術語。 IHC區段之一初始術語之關鍵語意模式 免疫組織化學的、免疫組織化學地、免疫組織化學的、免疫組織化學地、免疫研究、IHC、免疫染色、免疫組織化學、免疫研究、免疫反應、展示腺癌組成 5 A pathology report contains information classified as IHC information. According to some embodiments of the present invention, the terms listed in the semantic pattern indication table 5 for IHC information generation (which may come from one or more key semantic patterns) may be regarded as one of the initial terms of the IHC section. The key semantic pattern of one of the initial terms in the IHC section Immunohistochemistry, immunohistochemistry, immunohistochemistry, immunohistochemistry, immunoresearch, IHC, immunostaining, immunohistochemistry, immunoresearch, immune response, demonstrating adenocarcinoma composition Table 5

在由初始術語定位或識別IHC區段之後,可進一步獲取IHC區段之資訊。可基於表6中列出之一或多個關鍵語意模式來定位或識別IHC區段中之目標項目。After locating or identifying the IHC segment from the initial term, further information on the IHC segment can be obtained. The target item in the IHC section may be located or identified based on one or more key semantic patterns listed in Table 6.

在定位或識別IHC區段中之目標項目之後,針對各目標項目,可在目標項目之前定位或識別一第一修飾語,且可在目標項目之後定位或識別一第二修飾語。針對各目標項目,可計算第一修飾語與目標項目之間的一第一距離及第二修飾語與目標項目之間的一第二距離。針對各目標項目,如果第一距離小於第二距離,則第一修飾語將被判定為目標項目之修飾語;如果第二距離小於第一距離,則第二修飾語將被判定為目標項目之修飾語。第一修飾語及第二修飾語可為「陽性」或「陰性」。當第一修飾語及第二修飾語相同時,將無需選擇或判定使用哪一修飾語來修改目標項目。因此,第一距離及第二距離之計算可為不必要的。此外,第一距離與第二距離之間的比較可為不必要的。 IHC區段之目標項目之關鍵語意模式 CK7、TTF-1、天冬氨酸蛋白酶A、CK20、P40、CDX2、P63、P16、細胞角蛋白(AE1/AE3)、波形蛋白、PAX-8、CD56、染色顆粒素A、突觸素、GATA3、P53、S100、Ki67、EBER 6 After locating or identifying the target items in the IHC section, for each target item, a first modifier may be located or identified before the target item, and a second modifier may be located or identified after the target item. For each target item, a first distance between the first modifier and the target item and a second distance between the second modifier and the target item can be calculated. For each target item, if the first distance is smaller than the second distance, the first modifier will be judged as the modifier of the target item; if the second distance is smaller than the first distance, the second modifier will be judged as the modifier of the target item. Modifier. The first modifier and the second modifier can be "masculine" or "feminine". When the first modifier and the second modifier are the same, there is no need to select or determine which modifier to use to modify the target item. Therefore, calculation of the first distance and the second distance may be unnecessary. Furthermore, the comparison between the first distance and the second distance may not be necessary. Key semantic patterns of target items in the IHC section CK7, TTF-1, aspartic proteinase A, CK20, P40, CDX2, P63, P16, cytokeratin (AE1/AE3), vimentin, PAX-8, CD56, chromogranin A, synaptophysin, GATA3, P53, S100, Ki67, EBER Table 6

一病理報告包含分類於基因檢測中之資訊。基因檢測之資訊可包含免疫檢查點抑製劑(例如,PDL1抑製劑)之檢測、表皮生長因子受體(EGFR)之基因檢測、及其他遺傳分子檢測。A pathology report contains information classified in genetic testing. Genetic testing information may include testing for immune checkpoint inhibitors (eg, PDL1 inhibitors), genetic testing for epidermal growth factor receptor (EGFR), and other genetic molecular testing.

在本發明之一些實施例中,針對免疫檢查點抑製劑之檢測而產生之語意模式指示與PDL1檢測試劑盒相關之一些術語。在表7中列出與PDL1檢測試劑盒相關之一或多個例示性關鍵語意模式。 PDL1檢測之關鍵語意模式 22C3、28-8、SP142、SP263 7 In some embodiments of the present invention, the semantic patterns generated for the detection of immune checkpoint inhibitors indicate some terms related to the PDL1 detection kit. One or more exemplary key semantic patterns related to the PDL1 detection kit are listed in Table 7. Key semantic patterns in PDL1 detection 22C3, 28-8, SP142, SP263 Table 7

基於表7中列出之一或多個關鍵語意模式之一或多個術語對整個病理報告進行搜尋。可基於此等搜尋來判定是否已執行PDL1檢測。例如,如果在整個病理報告上搜尋到表7中列出之關鍵語意模式之一或多個術語,則可判定已執行PDL1檢測。如果執行PDL1檢測,則將進一步獲取與PDL1檢測相關聯之項目之資訊。與PDL1檢測相關之項目包含:腫瘤比例分數(TPS)、綜合陽性分數(CPS)、腫瘤細胞(TC)、及免疫細胞(IC)。Search the entire pathology report based on one or more terms of one or more key semantic patterns listed in Table 7. Based on these searches, it can be determined whether PDL1 detection has been performed. For example, if one or more terms of the key semantic patterns listed in Table 7 are searched on the entire pathology report, it can be determined that PDL1 detection has been performed. If a PDL1 test is executed, information on items associated with the PDL1 test will be further obtained. Items related to PDL1 detection include: tumor proportion score (TPS), comprehensive positive score (CPS), tumor cells (TC), and immune cells (IC).

表7中列出之關鍵語意模式可用於定位或識別PDL1檢測部分。可在PDL1檢測部分中提供與PDL1檢測相關聯之項目。可在一冒號(即「:」)之後描述與PDL1檢測相關聯之一項目之資訊。在本發明之一些實施例中,可透過定位對應冒號(即「:」)來獲取在各項目中描述之資訊。例如,在定位PDL1檢測部分之後,可藉由搜尋項目名稱來定位一給定項目;接著可定位給定項目之對應冒號,且將獲取定位於給定項目之冒號之後的資訊作為該項目之資訊。The key semantic patterns listed in Table 7 can be used to locate or identify the PDL1 detection part. Items associated with PDL1 testing may be provided in the PDL1 testing section. Information about an item associated with the PDL1 test can be described after a colon (i.e. ":"). In some embodiments of the present invention, the information described in each item can be obtained by locating the corresponding colon (ie, ":"). For example, after locating the PDL1 detection part, a given item can be located by searching the item name; then the corresponding colon of the given item can be located, and the information after the colon located in the given item will be obtained as the information of the item .

在本發明之一些實施例中,針對EGFR之檢測而產生之語意模式指示一或多個術語,例如,術語「EGFR」。可基於術語「EGFR」對整個病理報告進行搜尋。可基於此等搜尋來判定是否已執行EGFR檢測。例如,如果在整個病理報告上搜尋到術語「EGFR」或其他類似術語,則可判定已執行EGFR檢測。如果執行EGFR檢測,則將進一步獲取關於EGFR之外顯子18、外顯子19、外顯子20、及外顯子21之突變之資訊。In some embodiments of the invention, the semantic pattern generated for the detection of EGFR indicates one or more terms, for example, the term "EGFR". The entire pathology report can be searched based on the term "EGFR". It can be determined based on these searches whether EGFR testing has been performed. For example, if the term "EGFR" or other similar terms is searched on the entire pathology report, it can be determined that an EGFR test has been performed. If EGFR testing is performed, further information on mutations in exon 18, exon 19, exon 20, and exon 21 of EGFR will be obtained.

術語「EGFR」可用於定位或識別EGFR檢測部分。可在EGFR檢測部分中提供關於EGFR之外顯子18、外顯子19、外顯子20、及外顯子21之突變之資訊。在一些實施例中,可基於對術語「18」、「19」、「20」及「21」之搜尋來判定突變是否在外顯子18、外顯子19、外顯子20或外顯子21中。例如,如果在EGFR檢測部分中搜尋到術語「18」,則將判定一突變在外顯子18中。在一些其他實施例中,可基於對術語「T790M」之搜尋來判定一突變是否位於外顯子20之位置790。例如,如果在EGFR檢測部分中搜尋到術語「T790M」,則將判定一突變位於外顯子20之位置790。The term "EGFR" can be used to locate or identify the detection portion of EGFR. Information about EGFR exon 18, exon 19, exon 20, and exon 21 mutations can be provided in the EGFR testing section. In some embodiments, whether a mutation is in exon 18, exon 19, exon 20, or exon 21 can be determined based on a search for the terms "18", "19", "20", and "21" middle. For example, if the term "18" is searched in the EGFR detection section, a mutation will be determined in exon 18. In some other embodiments, a determination of whether a mutation is located at position 790 of exon 20 can be based on a search for the term "T790M." For example, if the term "T790M" is searched in the EGFR detection section, a mutation will be determined to be located at position 790 of exon 20.

在本發明之一些實施例中,針對其他遺傳分子檢測產生之語意模式指示表8中列出之一些關鍵語意模式。可基於表8中列出之關鍵語意模式之術語對整個病理報告進行搜尋。可基於此等搜尋來判定是否已執行一些特定遺傳分子檢測。例如,如果在整個病理報告上搜尋到表8中列出之一或多個關鍵語意模式之一或多個術語,則可判定已執行對應遺傳分子檢測。 遺傳分子檢測之關鍵語意模式 ALK、ROS1、BRAF、MET、KRAS、ERBB2、PIK3CA、NRAS、MEK1、NTRK、RET 8 In some embodiments of the present invention, the semantic patterns generated for other genetic molecular detection indicate some of the key semantic patterns listed in Table 8. The entire pathology report can be searched based on the terms of the key semantic patterns listed in Table 8. These searches can be used to determine whether certain genetic molecular tests have been performed. For example, if one or more terms of one or more key semantic patterns listed in Table 8 are searched on the entire pathology report, it can be determined that the corresponding genetic molecular test has been performed. Key semantic patterns in genetic molecular testing ALK, ROS1, BRAF, MET, KRAS, ERBB2, PIK3CA, NRAS, MEK1, NTRK, RET Table 8

表8中列出之關鍵語意模式可用於定位或識別特定遺傳分子檢測部分。在定位或識別一個遺傳分子檢測部分之後,可進一步獲取與結果相關之資訊。例如,如果一個遺傳分子檢測部分之語意脈絡包含術語「陽性」,則其指示對應基因中可發生一突變。如果一個遺傳分子檢測部分之語意脈絡包含術語「陰性」,則其指示對應基因中可未發生一突變。The key semantic patterns listed in Table 8 can be used to locate or identify specific genetic molecular detection portions. After locating or identifying a portion of a genetic molecule tested, further information related to the results can be obtained. For example, if the semantic context of a genetic molecular test portion includes the term "positive," it indicates that a mutation may occur in the corresponding gene. If the semantic context of a genetic molecular test section includes the term "negative," it indicates that a mutation may not have occurred in the corresponding gene.

在一些實施例中,在定位或識別一個遺傳分子檢測部分之後,可在對應關鍵語意模式之前定位或識別一第一修飾語,且可在對應關鍵語意模式之後定位或識別一第二修飾語。針對所定位之遺傳分子檢測部分,可計算第一修飾語與對應關鍵語意模式之間的一第一距離及第二修飾語與對應關鍵語意模式之間的一第二距離。如果第一距離小於第二距離,則第一修飾語將被判定為所定位之遺傳分子檢測部分之修飾語;如果第二距離小於第一距離,則第二修飾語將被判定為所定位之遺傳分子檢測部分之修飾語。第一修飾語及第二修飾語可為「陽性」或「陰性」。In some embodiments, after locating or identifying a genetic molecule detection portion, a first modifier may be located or identified before the corresponding key semantic pattern, and a second modifier may be located or identified after the corresponding key semantic pattern. For the located genetic molecule detection part, a first distance between the first modifier and the corresponding key semantic pattern and a second distance between the second modifier and the corresponding key semantic pattern can be calculated. If the first distance is less than the second distance, the first modifier will be determined as the modifier of the located genetic molecule detection portion; if the second distance is less than the first distance, the second modifier will be determined as the located portion. Modifier for the genetic molecular testing part. The first modifier and the second modifier can be "masculine" or "feminine".

一病理報告包含分類於病理TNM期別中之資訊。在本發明之一些實施例中,針對病理TNM期別產生之語意模式指示一些術語。在表9中列出病理TNM期別之一或多個例示性關鍵語意模式。 病理TNM期別之關鍵語意模式 病理分期、pTNM、AJCC 9 A pathology report contains information classified into pathology TNM stages. In some embodiments of the present invention, the semantic patterns generated for pathological TNM stages indicate some terms. One or more exemplary key semantic patterns of pathological TNM stages are listed in Table 9. Key semantic patterns of pathological TNM stages Pathological staging, pTNM, AJCC Table 9

基於表9中列出之一或多個關鍵語意模式之一或多個術語對整個病理報告進行搜尋。表9中列出之一或多個關鍵語意模式之術語可用於定位或識別病理TNM期別之區段。例如,病理TNM期別區段可為「病理分期(pTNM)」區段。Search the entire pathology report based on one or more terms of one or more key semantic patterns listed in Table 9. One or more of the key semantic pattern terms listed in Table 9 can be used to locate or identify the segment of the pathological TNM stage. For example, the pathological TNM stage section may be the "pathological stage (pTNM)" section.

圖4繪示根據本發明之一些實施例之一病理報告之一病理TNM期別區段。可基於表9中列出之一或多個關鍵語意模式之術語對整個病理報告進行搜尋。例如,如果在整個病理報告上搜尋到表9中列出之一或多個關鍵語意模式之一或多個術語,則可定位或識別病理TNM期別區段。圖4展示其中術語「病理分期」、「pTNM」及「AJCC」定位於病理TNM期別區段之開頭之一實例。將進一步獲取病理TNM期別區段中之項目之分期資訊。病理TNM期別區段中之項目包含:pT (即原發腫瘤)、pN (即區域淋巴結)、及pM (遠端轉移)。Figure 4 illustrates a pathological TNM stage section of a pathology report according to some embodiments of the present invention. The entire pathology report can be searched based on terms of one or more key semantic patterns listed in Table 9. For example, if one or more terms of one or more key semantic patterns listed in Table 9 are searched on the entire pathology report, the pathological TNM stage section can be located or identified. Figure 4 shows an example in which the terms "pathological stage", "pTNM" and "AJCC" are positioned at the beginning of the pathological TNM stage section. The staging information of the items in the pathological TNM stage section will be further obtained. Items in the pathological TNM stage section include: pT (i.e. primary tumor), pN (i.e. regional lymph nodes), and pM (i.e. distant metastasis).

在一些實施例中,在定位病理TNM期別區段之後,將獲取在術語「AJCC」(即美國癌症綜合委員會)之後提供之版本號碼。可在AJCC之版本號碼之後或下方提供分期資訊。例如,圖4展示在AJCC之版本號碼下方提供項目「原發(pT)、區域淋巴結(pN)、遠端轉移(pM)」及對應資訊。In some embodiments, after locating the pathological TNM stage segment, the version number provided after the term "AJCC" (ie, American Comprehensive Committee on Cancer) will be obtained. Installment information can be provided after or below the AJCC edition number. For example, Figure 4 shows that the items "primary (pT), regional lymph node (pN), distant metastasis (pM)" and corresponding information are provided below the AJCC version number.

相關聯語意模式指示分期結果可接在一給定項目之名稱之後。例如,可在項目之名稱(例如,「原發腫瘤」或「pT」)之後找到項目「原發腫瘤(pT)」之分期結果。圖4展示在項目名稱之後提供項目「原發(pT)」之分期結果「pT1B」。The associated semantic pattern indicates that the staging results may follow the name of a given project. For example, the staging results for the item "primary tumor (pT)" can be found after the item's name (eg, "primary tumor" or "pT"). Figure 4 shows the staging result "pT1B" for the project "original (pT)" provided after the project name.

可在一冒號(即「:」)之後描述病理TNM期別區段中之一項目之分期資訊。在本發明之一些實施例中,可透過定位對應冒號(即「:」)來獲取在各項目中描述之資訊。例如,在定位病理TNM期別區段之後,可藉由搜尋項目名稱來定位一給定項目;接著可定位給定項目之對應冒號,且將獲取定位於給定項目之冒號之後的資訊作為該項目之資訊。圖4展示在對應冒號之後提供項目「區域淋巴結(pN)」之分期結果「pN0」。The staging information of one item in the pathological TNM stage section can be described after a colon (i.e. ":"). In some embodiments of the present invention, the information described in each item can be obtained by locating the corresponding colon (ie, ":"). For example, after locating the pathological TNM stage section, a given item can be located by searching the item name; then the corresponding colon of the given item can be located, and the information after the colon located in the given item will be obtained as the Project information. Figure 4 shows the staging result "pN0" for the item "regional lymph node (pN)" provided after the corresponding colon.

在一些實施例中,病理TNM期別區段可進一步包含項目「TNM期別分組」。圖4展示包含項目「TNM期別分組」之一病理TNM期別區段。項目「TNM期別分組」之資訊指示基於項目「原發(pT)、區域淋巴結(pN)」及「遠端轉移(pM)」之資訊之一TNM期別。In some embodiments, the pathological TNM stage section may further include the item "TNM stage grouping". Figure 4 shows one of the pathological TNM stage sections including the item "TNM stage grouping". The information of the item "TNM stage grouping" indicates a TNM stage based on the information of the items "primary (pT), regional lymph node (pN)" and "distal metastasis (pM)".

相關聯語意模式指示TNM分期結果可接在項目之名稱「TNM期別分組」之後。圖4展示在項目名稱之後提供項目「TNM期別分組」之TNM分期結果「期別IA2」。The associated semantic pattern indicates that the TNM period results can be followed by the name of the project "TNM period grouping". Figure 4 shows the TNM phase result "Phase IA2" of the project "TNM Phase Group" provided after the project name.

可在一冒號(即「:」)之後描述項目「TNM期別分組」之TNM分期資訊。在本發明之一些實施例中,可透過定位對應冒號(即「:」)來獲取在各項目中描述之TNM分期資訊。例如,在藉由搜尋項目名稱「TNM期別分組」來定位項目之後,接著可定位項目之對應冒號,且將獲取定位於項目之冒號之後的TNM分期資訊作為TNM分期資訊。圖4展示在對應冒號之後提供項目「TNM期別分組」之TNM分期結果「期別IA2」。The TNM installment information of the item "TNM installment group" can be described after a colon (i.e. ":"). In some embodiments of the present invention, the TNM staging information described in each item can be obtained by locating the corresponding colon (ie ":"). For example, after locating the project by searching for the project name "TNM installment group", the corresponding colon of the project can then be located, and the TNM installment information located after the colon of the project will be obtained as the TNM installment information. Figure 4 shows the TNM installment result "Period IA2" provided with the item "TNM Period Group" after the corresponding colon.

在一些實施例中,病理TNM期別區段可不包含項目「TNM期別分組」。本發明可基於pT項目、pN項目、及pM項目之分期資訊來獲取TNM分期資訊。在一些實施例中,可透過查找表10來獲取TNM分期資訊。表10係基於AJCC/UICC (國際癌症控制聯盟)之第8版TNM分期系統。在表10中,T1、T2、T3、T4及對應子類別係「原發腫瘤(pT)」之期別;N0、N1、N2、及N3係「區域淋巴結(pN)」之期別;且M1及對應子類別係「遠端轉移(pM)」之期別。在表10中,IA1、IA2、IA3、IB、IIA、IIB、IIIA、IIIB、IIIC、IVA、及IVB係「TNM期別分組」之期別。 T/M 子類別 N0 N1 N2 N3 T1 T1a IA1 IIB IIIA IIIB T1b IA2 IIB IIIA IIIB T1c IA3 IIB IIIA IIIB T2 T2a IB IIB IIIA IIIB T2b IIA IIB IIIA IIIB T3 T3 IIB IIIA IIIB IIIC T4 T4 IIIA IIIA IIIB IIIC M1 M1a IVA IVA IVA IVA M1b IVA IVA IVA IVA M1c IVB IVB IVB IVB 10 In some embodiments, the pathological TNM stage section may not include the item "TNM stage grouping". The present invention can obtain TNM staging information based on the staging information of pT items, pN items, and pM items. In some embodiments, the TNM staging information can be obtained through the lookup table 10 . Table 10 is based on the 8th edition of the AJCC/UICC (Union for International Cancer Control) TNM staging system. In Table 10, T1, T2, T3, T4 and the corresponding subcategories are the stages of “primary tumor (pT)”; N0, N1, N2, and N3 are the stages of “regional lymph node (pN)”; and M1 and corresponding subcategories are the phases of "remote transfer (pM)". In Table 10, IA1, IA2, IA3, IB, IIA, IIB, IIIA, IIIB, IIIC, IVA, and IVB are the phases of the "TNM phase grouping". T/M subcategory N0 N1 N2 N3 T1 T1a IA1 IIB IIIA IIIB T1b IA2 IIB IIIA IIIB T1c IA3 IIB IIIA IIIB T2 t2a IB IIB IIIA IIIB T2b IIA IIB IIIA IIIB T3 T3 IIB IIIA IIIB IIIC T4 T4 IIIA IIIA IIIB IIIC M1 M1a IVA IVA IVA IVA M1b IVA IVA IVA IVA M1c IVB IVB IVB IVB Table 10

在本發明之一些實施例中,可產生「N資訊」之一欄位以驗證或補充項目「區域淋巴結(pN)」之資訊。在一些實施例中,一病理報告可提供所檢驗之淋巴結及相關聯資訊。本發明可進一步基於所檢驗之淋巴結及相關聯資訊來驗證「區域淋巴結(pN)」之分期資訊。In some embodiments of the present invention, a field of "N information" can be generated to verify or supplement the information of the item "regional lymph node (pN)". In some embodiments, a pathology report may provide the lymph nodes examined and associated information. The present invention can further verify the staging information of "regional lymph nodes (pN)" based on the examined lymph nodes and related information.

圖5繪示根據本發明之一些實施例之一病理報告之一部分。在圖5中,提供所檢驗之淋巴結及相關聯資訊。在圖5中,第三行指示:在「肺門」位置檢驗7個淋巴結,其中2個淋巴結受累;在「5號」位置檢驗5個淋巴結,其中1個淋巴結受累;在「6號」位置檢驗2個淋巴結,其中無淋巴結受累;在「9號」位置檢驗1個淋巴結,其中無淋巴結受累;在「10號」位置檢驗兩個淋巴結,其中一個淋巴結受累;且在「11號」位置檢驗8個淋巴結,其中5個淋巴結受累。Figure 5 illustrates a portion of a pathology report according to some embodiments of the present invention. In Figure 5, the lymph nodes examined and associated information are provided. In Figure 5, the third row indicates: 7 lymph nodes were tested at the "hilar" position, 2 of which were involved; 5 lymph nodes were tested at the "5" position, and 1 lymph node was involved; 5 lymph nodes were tested at the "6" position 2 lymph nodes, none of which are involved; 1 lymph node at location "9", none of which is involved; 2 lymph nodes at location "10", one of which is involved; and 8 lymph nodes at location "11" lymph nodes, 5 of which were involved.

本發明將基於受累或被侵襲之淋巴結來判定pN期別。在圖5中展示之病理報告中,9個淋巴結受累或被侵襲。基於臨床醫師之一分期方法,可獲得圖5之病理報告中之不同位置之分期結果。「肺門」位置之期別係N2,此係因為7個淋巴結中之2者受累。「5號」位置之期別係N2,此係因為2個淋巴結中之1者受累。「10號」位置之期別係N1,此係因為2個淋巴結中之1者受累。「11號」位置之期別係N1,此係因為8個淋巴結中之5者受累。在不同位置之分期結果當中,pN期別將被判定為最大期別。針對圖5中展示之病理報告,pN期別將被判定為「N2」,此係因為「N2」係「肺門」、「5號」、「10號」及「11號」位置之期別當中之最大期別。在圖5之病理報告中,項目「區域淋巴結(pN)」亦提供期別「N2」。The present invention will determine the pN stage based on the involved or invaded lymph nodes. In the pathology report shown in Figure 5, 9 lymph nodes were involved or invaded. Based on the clinician's staging method, the staging results of different locations in the pathology report in Figure 5 can be obtained. The stage of "hilar" position is N2 because 2 of the 7 lymph nodes are involved. The stage at "No. 5" is N2 because one of the two lymph nodes is involved. The stage at the "No. 10" position is N1 because one of the two lymph nodes is involved. The stage at "No. 11" is N1 because 5 of the 8 lymph nodes are involved. Among the staging results at different positions, the pN period will be determined as the largest period. For the pathology report shown in Figure 5, the pN stage will be judged as "N2". This is because "N2" is among the stages at the positions of "hilar", "No. 5", "No. 10" and "No. 11" The biggest difference. In the pathology report in Figure 5, the item "regional lymph node (pN)" also provides the stage "N2".

為了驗證本發明中提供之方法之效能,根據本發明之一些實施例,將來自203個肺癌患者之849個病理報告提供至一程式。在表11中提供肺癌之例示性50個病理特徵之準確性。針對一整個病理報告,本發明之方法可提供86.69%之一總體準確性。 基本描述 器官 Bx部位 取樣方法 診斷    98.8% 89.5% 97.9% 93.6%    發現 最大尺寸 腫瘤大小 最近邊緣       98.5% 98.2% 99.5%       淋巴血管侵襲 VPI 腫瘤病灶       99.4% 99.8% 99.8%       組織學 組織學類型, 組織學期別          97.9% 100%          IHC CK7 TTF-1 天冬氨酸蛋白酶A CK20 P40 98.3% 95.4% 97.1% 99.8% 98.6% CDX2 P63 P16 細胞角蛋白(AE1/AE3) 波形蛋白 100% 100% 99.8% 99.8% 100% PAX-8 CD56 染色顆粒素A 突觸素 GATA3 100% 100% 99.8% 100% 99.7% P53 S100 Ki67 EBER    100% 99.8% 99.2% 100%    基因檢測 EGFR ALK ROS1 BRAF MET 100% 100% 99.2% 100% 100% KRAS ERBB2 PIK3CA NRAS MEK1 100% 100% 100% 100% 100% NTRK RET PDL1       100% 100% 98.8%       TNM期別 版本 pT pN pM pStage 100% 98.3% 99.7% 99.8% 94.8% N資訊             100%             11 In order to verify the performance of the method provided in the present invention, according to some embodiments of the present invention, 849 pathology reports from 203 lung cancer patients were provided to a program. The accuracy of illustrative 50 pathological features of lung cancer is provided in Table 11. For an entire pathology report, the method of the present invention can provide an overall accuracy of 86.69%. Basic description organ Bx site Sampling method Diagnosis 98.8% 89.5% 97.9% 93.6% discover biggest size tumor size nearest edge 98.5% 98.2% 99.5% lymphovascular invasion VPI tumor focus 99.4% 99.8% 99.8% Histology Histological type, Organization semester 97.9% 100% IHC CK7 TTF-1 aspartic proteinase A CK20 P40 98.3% 95.4% 97.1% 99.8% 98.6% CDX2 P63 P16 Cytokeratin (AE1/AE3) Vimentin 100% 100% 99.8% 99.8% 100% PAX-8 CD56 chromogranin A synaptophysin GATA3 100% 100% 99.8% 100% 99.7% P53 S100 Ki67 EBER 100% 99.8% 99.2% 100% genetic testing EGFR ALK ROS1 BRAF MET 100% 100% 99.2% 100% 100% KRAS ERBB2 PIK3CA NRAS MEK1 100% 100% 100% 100% 100% NTRK RET PDL1 100% 100% 98.8% TNM period Version pT pN pM pStage 100% 98.3% 99.7% 99.8% 94.8% N information 100% Table 11

在本發明之一些實施例中,如果一患者具有多個病理報告,則此等病理報告將被總結、組合及劃分為「一初始診斷報告」及「一最新診斷報告」。應判定初始診斷及最新診斷之時間戳記。例如,在以時間序列排列多個病理報告之後,初始診斷及最新診斷之時間戳記可分別為第一非穿刺病理報告及最新非穿刺病理報告之日期。In some embodiments of the present invention, if a patient has multiple pathology reports, these pathology reports will be summarized, combined and divided into "an initial diagnosis report" and "a latest diagnosis report". The time stamps of the initial diagnosis and the latest diagnosis should be determined. For example, after arranging multiple pathology reports in time series, the time stamps of the initial diagnosis and the latest diagnosis may be the dates of the first non-puncture pathology report and the latest non-puncture pathology report respectively.

根據臨床醫師之經驗,基因檢測及IHC染色檢測之資訊可不出現在第一非穿刺病理報告及最新非穿刺病理報告中。可在第一非穿刺病理報告及最新非穿刺病理報告之後的一個月執行基因檢測及IHC染色檢測。因此,應監測或保留從第一非穿刺病理報告或最新非穿刺病理報告之時間戳記起之一個月內之病理報告之內容。可保留第一非穿刺病理報告或最新非穿刺病理報告中關於基本描述、腫瘤之發現及組織學之資訊。關於基因檢測及IHC染色檢測之資訊可經總結且與關於基本描述、腫瘤之發現、及組織學之資訊組合。在一些實施例中,最新非穿刺病理報告中之資訊及(若干)隨後病理報告中關於基因檢測及IHC染色檢測之資訊可為針對已經由手術或治療進行治療之患者。Based on the experience of clinicians, information on genetic testing and IHC staining may not appear in the first non-puncture pathology report and the latest non-puncture pathology report. Genetic testing and IHC staining testing can be performed one month after the first non-puncture pathology report and the latest non-puncture pathology report. Therefore, the contents of pathology reports within one month from the time stamp of the first non-puncture pathology report or the latest non-puncture pathology report should be monitored or retained. Information on basic description, tumor findings and histology from the first non-puncture pathology report or the latest non-puncture pathology report can be retained. Information about genetic testing and IHC staining testing can be summarized and combined with information about basic description, tumor findings, and histology. In some embodiments, the information in the latest non-puncture pathology report and the information about genetic testing and IHC staining testing in (several) subsequent pathology reports may be for patients who have been treated by surgery or therapy.

根據本發明之一些實施例,將基於例示性50個病理特徵來總結一個患者之多個病理報告之各者。在總結時,一個患者之多個病理報告之各者可依據例示性50個病理特徵來表達。一個患者之多個病理報告之各者之資料表達可依據例示性50個病理特徵來表達。可基於時間戳記之序列來組合已依據例示性50個病理特徵表達之多個病理報告。According to some embodiments of the present invention, each of multiple pathology reports for a patient will be summarized based on illustrative 50 pathology features. In summary, each of multiple pathology reports for a patient can be expressed in terms of illustrative 50 pathological features. Data expression for each of multiple pathology reports of a patient may be expressed based on 50 exemplary pathological features. Multiple pathology reports that have been expressed in terms of the illustrative 50 pathology features can be combined based on the sequence of time stamps.

一個患者之一病理報告之資料表達可依據例示性50個病理特徵來表達。在一些實施例中,如果一給定病理特徵之資料係描述性資料,則描述性資料可用一語句嵌入方法來表達。在本發明之一些實施例中,語句嵌入方法可基於病理報告進行訓練,且可為300維的。語句嵌入係一組自然語言處理(NLP)技術之總稱,其中將語句映射至實數向量。語句嵌入係用於表達一描述性特徵之一技術,其中將文字或段落映射或投射至一高維空間,且以一高維空間或一高維向量表達描述性特徵之含義。The data expression of a patient's pathology report can be expressed based on 50 exemplary pathological characteristics. In some embodiments, if the data for a given pathological characteristic is descriptive data, the descriptive data may be expressed using a sentence embedding method. In some embodiments of the present invention, the sentence embedding method can be trained based on pathology reports and can be 300-dimensional. Sentence embedding is a general term for a set of natural language processing (NLP) techniques in which sentences are mapped to real-number vectors. Sentence embedding is a technique used to express a descriptive feature, in which words or paragraphs are mapped or projected into a high-dimensional space, and the meaning of the descriptive feature is expressed in a high-dimensional space or a high-dimensional vector.

一個患者之一病理報告之資料表達可依據例示性50個病理特徵來表達。因此,病理報告可彼此比較。然而,各病理特徵之重要性可彼此不同。因此,在本發明之一些實施例中,可用不同權重值對不同病理特徵進行加權。在表12中提供例示性50個病理特徵之權重值。根據本發明之一些實施例,一病理報告之一資料表達可依據例示性50個病理特徵來表達。此外,可依據經加權之50個病理特徵正規化一多重病理報告之資料表達,其中可用表12中提供之權重值對經加權之50個病理特徵進行加權。 基本描述 器官 Bx部位 取樣方法 診斷    100 5 100 100    發現 最大尺寸 腫瘤大小 最近邊緣       1 1 20       淋巴血管侵襲 VPI 腫瘤病灶       20 20 5       組織學 組織學類型, 組織學期別          100 5          IHC CK7 TTF-1 天冬氨酸蛋白酶A CK20 P40 5 20 5 5 5 CDX2 P63 P16 細胞角蛋白(AE1/AE3) 波形蛋白 5 20 20 5 5 PAX-8 CD56 染色顆粒素A 突觸素 GATA3 1 5 5 5 5 P53 S100 Ki67 EBER    40 5 5 5    基因檢測 EGFR ALK ROS1 BRAF MET 100 100 100 100 100 KRAS ERBB2 PIK3CA NRAS MEK1 100 100 100 100 100 NTRK RET PDL1       100 100 50       TNM期別 版本 pT pN pM pStage 5 5 5 5 100 N資訊             5             12 The data expression of a patient's pathology report can be expressed based on 50 exemplary pathological characteristics. Therefore, pathology reports can be compared with each other. However, the importance of each pathological feature may differ from one another. Therefore, in some embodiments of the invention, different pathological features may be weighted with different weight values. Exemplary weight values for 50 pathological features are provided in Table 12. According to some embodiments of the present invention, data expression in a pathology report may be expressed based on 50 exemplary pathological characteristics. In addition, the data representation of a multiple pathology report can be normalized based on the weighted 50 pathological features, where the weighted 50 pathological features can be weighted with the weight values provided in Table 12. Basic description organ Bx site Sampling method Diagnosis 100 5 100 100 discover biggest size tumor size nearest edge 1 1 20 lymphovascular invasion VPI tumor focus 20 20 5 Histology Histological type, Organization semester 100 5 IHC CK7 TTF-1 aspartic proteinase A CK20 P40 5 20 5 5 5 CDX2 P63 P16 Cytokeratin (AE1/AE3) Vimentin 5 20 20 5 5 PAX-8 CD56 chromogranin A synaptophysin GATA3 1 5 5 5 5 P53 S100 Ki67 EBER 40 5 5 5 genetic testing EGFR ALK ROS1 BRAF MET 100 100 100 100 100 KRAS ERBB2 PIK3CA NRAS MEK1 100 100 100 100 100 NTRK RET PDL1 100 100 50 TNM period Version pT pN pM pStage 5 5 5 5 100 N information 5 Table 12

圖6繪示根據本發明之一些實施例之病理報告610及620之資料表達之一圖。病理報告610及620可屬於同一患者或不同患者。病理報告610可表達為一病理報告特徵向量V j。病理報告特徵向量V j可包含從病理報告610總結或提取之若干特徵。可透過本發明之一或多個方法或演算法從病理報告610總結或提取病理報告特徵向量V j之特徵。在圖6中,病理報告特徵向量V j包含特徵f j1至f j50。特徵f j1至f j50可為對應於表1、表11或表12中列出之50個特徵之數值資料或描述性資料。在圖6之實施例中,特徵f j1及f j50係數值資料,且特徵f j2及f j3係描述性資料。 Figure 6 illustrates a diagram of data representation of pathology reports 610 and 620 according to some embodiments of the present invention. Pathology reports 610 and 620 may belong to the same patient or different patients. The pathology report 610 can be expressed as a pathology report feature vector V j . The pathology report feature vector V j may include several features summarized or extracted from the pathology report 610 . The characteristics of the pathology report feature vector V j can be summarized or extracted from the pathology report 610 through one or more methods or algorithms of the present invention. In Figure 6, the pathology report feature vector V j contains features f j1 to f j50 . Features f j1 to f j50 may be numerical data or descriptive data corresponding to the 50 features listed in Table 1, Table 11 or Table 12. In the embodiment of FIG. 6 , features f j1 and f j50 are coefficient value data, and features f j2 and f j3 are descriptive data.

病理報告620可表達為一病理報告特徵向量V k。病理報告特徵向量V k可包含從病理報告620總結或提取之若干特徵。可透過本發明之一或多個方法或演算法從病理報告620總結或提取病理報告特徵向量V k之特徵。在圖6中,病理報告特徵向量V k包含特徵f k1至f k50。特徵f k1至f k50可為對應於表1、表11或表12中列出之50個特徵之數值資料或描述性資料。在圖6之實施例中,特徵f k1及f k50係數值資料,且特徵f k2及f k3係描述性資料。 The pathology report 620 can be expressed as a pathology report feature vector V k . The pathology report feature vector V k may include several features summarized or extracted from the pathology report 620 . The characteristics of the pathology report feature vector V k can be summarized or extracted from the pathology report 620 through one or more methods or algorithms of the present invention. In Figure 6, the pathology report feature vector Vk contains features fk1 to fk50 . The features f k1 to f k50 may be numerical data or descriptive data corresponding to the 50 features listed in Table 1, Table 11 or Table 12. In the embodiment of FIG. 6 , features f k1 and f k50 are coefficient value data, and features f k2 and f k3 are descriptive data.

針對具有數值資料之特徵之各者(例如,圖6中之特徵f j1、f j50、f k1、及f k50之各者),數值資料可進一步變換為一類別類型資料。例如,圖6中之特徵f j1、f j50、f k1、及f k50分別變換為類別C j1、C j50、C k1、及C k50。在一些實施例中,當特徵f j1之數值資料低於一給定臨限值時,類別C j1之值可為0;當特徵f j1之數值資料大於給定臨限值時,類別C j1之值可為1。 For each of the characteristics of numerical data (for example, each of the characteristics f j1 , f j50 , f k1 , and f k50 in FIG. 6 ), the numerical data can be further transformed into a category type data. For example, the features f j1 , f j50 , f k1 , and f k50 in Figure 6 are transformed into categories C j1 , C j50 , C k1 , and C k50 respectively. In some embodiments, when the numerical data of feature f j1 is lower than a given threshold value, the value of category C j1 may be 0; when the numerical data of feature f j1 is greater than the given threshold value, the value of category C j1 The value can be 1.

針對具有描述性資料之特徵之各者(例如,圖6中之特徵f j2、f j3、f k2、及f k3之各者),描述性資料可透過語句嵌入進一步變換為一多維空間(例如,300維空間)。例如,圖6中之特徵f j2、f j3、f k2、及f k3分別變換為語句向量Em j2、Em j3、Em k2、及Em k3。因此,病理報告特徵向量V j及V k可變換為向量V’ j及V’ kFor each person with the characteristics of descriptive data (for example, each of the characteristics f j2 , f j3 , f k2 , and f k3 in Figure 6 ), the descriptive data can be further transformed into a multi-dimensional space through sentence embedding ( For example, 300-dimensional space). For example, the features f j2 , f j3 , f k2 , and f k3 in Figure 6 are transformed into sentence vectors Em j2 , Em j3 , Em k2 , and Em k3 respectively. Therefore, the pathology report feature vectors V j and V k can be transformed into vectors V' j and V' k .

向量V’ j之各元素可相當於向量V’ k之對應元素。因此,基於向量V’ j及V’ k之對應元素之間的比較,向量V’ j及V’ k可為相當的。即可基於向量V’ j及V’ k之對應元素之間的比較來判定病理報告610及620如何相似。 Each element of vector V' j may be equivalent to the corresponding element of vector V' k . Therefore, vectors V' j and V' k may be equivalent based on a comparison between corresponding elements of vectors V' j and V' k . That is, it can be determined how similar the pathology reports 610 and 620 are based on the comparison between the corresponding elements of the vectors V′ j and V′ k .

針對向量V’ j及V’ k中之類別類型資料(例如,C j1、C j50、C k1、及C k50),如果向量V’ j之一個類別匹配(或相同於)向量V’ k之對應一者,則向量V’ j及V’ k之兩個對應類別之間的比較結果將為1 (例如,指示匹配為真)。如果向量V’ j之一個類別不匹配(或不相同於)向量V’ k之對應一者,則向量V’ j及V’ k之兩個對應類別之間的比較結果將為0 (例如,指示匹配為假)。例如,如果向量V’ j之類別C j1匹配(或相同於)向量V’ k之類別C k1,則類別C j1及C k1之間的比較為1 (例如,指示匹配為真)。如果向量V’ j之類別C j50不匹配(或不相同於)向量V’ k之類別C k50,則類別C j50及C k50之間的比較為0 (例如,指示匹配為假)。例如,類別C j1及C k1之間或類別C j50及C k50之間的比較可表達為方程式(6)。 方程式(6) For category type data in vectors V' j and V' k (for example, C j1 , C j50 , C k1 , and C k50 ), if one category of vector V' j matches (or is the same as) one of vector V' k corresponds to one, then the comparison result between the two corresponding categories of vectors V' j and V' k will be 1 (ie, indicating that the match is true). If a category of vector V' j does not match (or is not identical to) a corresponding category of vector V' k , then the comparison between the two corresponding categories of vectors V' j and V' k will be 0 (e.g., indicates that the match was false). For example, if category C j1 of vector V' j matches (or is the same as) category C k1 of vector V' k , then the comparison between categories C j1 and C k1 is 1 (eg, indicating that the match is true). If category C j50 of vector V' j does not match (or is not the same as) category C k50 of vector V' k , then the comparison between categories C j50 and C k50 is 0 (eg, indicating a false match). For example, the comparison between categories C j1 and C k1 or between categories C j50 and C k50 can be expressed as equation (6). Equation (6)

針對在向量V’ j及V’ k中透過語句嵌入變換之語句向量(例如,Em j2、Em j3、Em k2、及Em k3),基於兩個對應雙語句向量之餘弦相似度來判定其等之間的相似度。例如,基於語句向量Em j2及Em k2之間的餘弦相似度來判定語句向量Em j2及Em k2之間的相似度。基於語句向量Em j3及Em k3之間的餘弦相似度來判定語句向量Em j3及Em k3之間的相似度。例如,語句向量Em j2及Em k2之間或語句向量Em j3及Em k3之間的餘弦相似度可表達為方程式(8)。在方程式(8)中,語句向量位於一300維空間中。 方程式(7) For the sentence vectors transformed by sentence embedding in vectors V' j and V' k (for example, Em j2 , Em j3 , Em k2 , and Em k3 ), their equality is determined based on the cosine similarity of the two corresponding bi-sentence vectors. similarity between them. For example, the similarity between the sentence vectors Em j2 and Em k2 is determined based on the cosine similarity between the sentence vectors Em j2 and Em k2 . The similarity between the sentence vectors Em j3 and Em k3 is determined based on the cosine similarity between the sentence vectors Em j3 and Em k3 . For example, the cosine similarity between sentence vectors Em j2 and Em k2 or between sentence vectors Em j3 and Em k3 can be expressed as equation (8). In equation (8), the sentence vector is located in a 300-dimensional space. Equation (7)

表12中列出之50個權重值可進一步應用於兩個病理報告之間的相似度。表12之50個權重值之正規化可表達為方程式(9)。變數w 1至w 50指示表12中列出之50個權重值,且變數w’ 1至w’ 50指示對應50個正規化權重值。 方程式(8) The 50 weight values listed in Table 12 can be further applied to the similarity between two pathology reports. The normalization of the 50 weight values in Table 12 can be expressed as equation (9). The variables w 1 to w 50 indicate the 50 weight values listed in Table 12, and the variables w' 1 to w' 50 indicate the corresponding 50 normalized weight values. Equation (8)

病理報告610之第n個特徵與病理報告620之第n個特徵之間的相似度分數可為向量V’ j之第n個特徵與向量V’ k之第n個特徵之間的相似度分數。第n個特徵之相似度分數可表達為方程式(9)。當第n個特徵係數值資料或類別類型資料時,match n將藉由對應正規化權重值。當第n個特徵係描述性資料或一語句向量時,similarity n將乘以對應正規化權重值。 方程式(9) The similarity score between the n-th feature of the pathology report 610 and the n-th feature of the pathology report 620 may be the similarity score between the n-th feature of the vector V'j and the n-th feature of the vector V'k . The similarity score of the nth feature can be expressed as equation (9). When the nth feature coefficient value data or category type data, match n will be normalized by the corresponding weight value. When the nth feature is descriptive information or a sentence vector, similarity n will be multiplied by the corresponding normalized weight value. Equation (9)

病理報告610及620之間的相似度分數可為第一個特徵至第50個特徵之相似度分數之總和。向量V’ j及V’ k之間的相似度分數可為第一個特徵至第50個特徵之相似度分數之總和。第一個特徵至第50個特徵之相似度分數之總和可表達為方程式(10)。 方程式(10) The similarity score between the pathology reports 610 and 620 may be the sum of the similarity scores from the first feature to the 50th feature. The similarity score between vectors V' j and V' k may be the sum of the similarity scores from the first feature to the 50th feature. The sum of the similarity scores from the first feature to the 50th feature can be expressed as equation (10). Equation (10)

因此,本發明可為臨床醫師提供對兩個案例之間的相似度進行評分之一高效方法。由於可對兩個患者之兩個病理報告之相似度進行評分,臨床醫師可更容易地搜尋過去之相似案例。此外,臨床醫師可基於如表1中展示之50個重要病理特徵之分數來知道兩個案例之哪些部分相似及兩個案例為何相似。Therefore, the present invention can provide clinicians with an efficient method to score the similarity between two cases. Because the similarity of two pathology reports for two patients can be scored, clinicians can more easily search for similar cases in the past. In addition, clinicians can know which parts of two cases are similar and why the two cases are similar based on the scores of 50 important pathological features as shown in Table 1.

因此,本發明提供對兩個案例之間的相似度進行評分之一可解譯方法。相比之下,由一機器學習演算法(或一深度學習演算法)產生之許多參數及概率係不可解譯的。本發明之可解譯性使臨床醫師能夠說明兩個案例之相似之處及為何相似。由本發明提供之可解譯性及相似度將有助於臨床醫師評估一患者之後續診斷及治療。Therefore, the present invention provides an interpretable method of scoring the similarity between two cases. In contrast, many parameters and probabilities generated by a machine learning algorithm (or a deep learning algorithm) are not interpretable. The interpretability of the invention enables clinicians to explain how two cases are similar and why. The interpretability and similarity provided by the present invention will assist clinicians in evaluating a patient's subsequent diagnosis and treatment.

參考圖7,其展示能夠執行本發明之方法之一或多個操作之一電腦系統700之一實例。在本發明之至少一些實施例中,電腦系統700包含一運算裝置710及一資料庫720。運算裝置710可為一伺服器電腦、一用戶端電腦、一個人電腦(PC)、一平板PC、一機上盒(STB)、一個人數位助理(PDA)、一蜂巢式電話或一智慧型電話。運算裝置710包括處理器711、輸入/輸出介面712、通信介面713、及記憶體714。資料庫720可儲存將從其中提取關鍵語意模式之病理報告。資料庫720可儲存待分析或總結之病理報告。輸入/輸出介面712與處理器711耦合。輸入/輸出介面712容許使用者操縱運算裝置710,以便執行本發明之操作或方法(例如,圖2中揭示之方法)。通信介面713與處理器711耦合。通信介面713容許運算裝置710與資料庫720通信。通信介面713可支援以下協定之一或多者:通用串列匯流排(USB)、乙太網路、藍牙、IEEE 802.11、3GPP長期演進(LTE) (4G)、及3GPP新無線電(5G)。一記憶體714可為一非暫態電腦可讀儲存媒體。記憶體714與處理器711耦合。記憶體714已儲存可由一或多個處理器(例如,處理器711)執行之程式指令。在執行儲存於記憶體714上之程式指令時,程式指令導致本發明中揭示之方法之一或多個操作之執行。例如,程式指令可導致運算裝置710執行至少包含以下步驟之一組動作:(i)基於一語意術語及一下一語意術語之共現來判定該語意術語與下一語意術語之間的一信賴度及一支持度;(ii)產生一組候選語意術語;(iii)透過對該組候選語意術語執行隨機漫步來產生一第一組語意模式;(iv)透過從第一組語意模式移除冗餘語意模式來產生一第二組語意模式;及(v)基於第二組語意模式來輸出關鍵語意模式。Referring to Figure 7, shown is an example of a computer system 700 capable of performing one or more operations of the method of the present invention. In at least some embodiments of the present invention, computer system 700 includes a computing device 710 and a database 720 . Computing device 710 may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular phone, or a smart phone. The computing device 710 includes a processor 711, an input/output interface 712, a communication interface 713, and a memory 714. Database 720 may store pathology reports from which key semantic patterns are to be extracted. The database 720 may store pathology reports to be analyzed or summarized. Input/output interface 712 is coupled to processor 711. The input/output interface 712 allows the user to manipulate the computing device 710 to perform operations or methods of the present invention (eg, the method disclosed in FIG. 2 ). Communication interface 713 is coupled to processor 711. The communication interface 713 allows the computing device 710 to communicate with the database 720 . Communication interface 713 may support one or more of the following protocols: Universal Serial Bus (USB), Ethernet, Bluetooth, IEEE 802.11, 3GPP Long Term Evolution (LTE) (4G), and 3GPP New Radio (5G). A memory 714 may be a non-transitory computer-readable storage medium. Memory 714 is coupled to processor 711. Memory 714 has stored program instructions executable by one or more processors (eg, processor 711). When executed, program instructions stored on memory 714 cause the performance of one or more operations of the methods disclosed in this disclosure. For example, the program instructions may cause the computing device 710 to perform a set of actions including at least the following steps: (i) determining a degree of trust between a semantic term and a next semantic term based on the co-occurrence of the semantic term and the next semantic term. and a support; (ii) generate a set of candidate semantic terms; (iii) generate a first set of semantic patterns by performing a random walk on the set of candidate semantic terms; (iv) generate a first set of semantic patterns by removing redundant semantic patterns from the first set and (v) generating a second group of semantic patterns based on the remaining semantic patterns; and (v) outputting key semantic patterns based on the second group of semantic patterns.

作為另一例示性實例,程式指令可導致運算裝置710執行總結一病理報告之一方法。該方法可包括基於關鍵語意模式從病理報告獲取複數個病理特徵。根據本發明之方法之任一者來產生關鍵語意模式。As another illustrative example, program instructions may cause the computing device 710 to perform a method of summarizing a pathology report. The method may include obtaining a plurality of pathology features from the pathology report based on key semantic patterns. Key semantic patterns are generated according to any of the methods of the present invention.

本發明之範疇不旨在限於說明書中描述之程序、機器、製造及物質成分、構件、方法、步驟及操作之特定實施例。熟習此項技術者將容易地從本發明之揭示內容瞭解,可根據本發明來利用執行與本文中描述之對應實施例實質上相同之功能或達成與其等實質上相同之結果之當前現有或以後開發之程序、機器、製造、物質成分、構件、方法、步驟或操作。因此,隨附發明申請專利範圍旨在將程序、機器、製造及物質成分、構件、方法、步驟或操作包含於其等之範疇內。另外,各請求項構成一單獨實施例,且各種請求項及實施例之組合在本發明之範疇內。The scope of the invention is not intended to be limited to the particular embodiments of the procedures, machines, manufacture and composition of matter, means, methods, steps and operations described in the specification. Those skilled in the art will readily appreciate from the disclosure of the present invention that currently existing or hereafter embodiments that perform substantially the same functions or achieve substantially the same results as corresponding embodiments described herein may be utilized in accordance with the present invention. A process, machine, manufacture, composition of matter, component, method, step or operation developed. Therefore, the patent scope of the appended invention application is intended to include within the scope of the process, machine, manufacture and material composition, component, method, step or operation thereof. Additionally, each claim constitutes a separate embodiment and combinations of various claims and embodiments are within the scope of the invention.

根據本發明之實施例之方法、程序或操作亦可在一程式化處理器上實施。然而,控制器、流程圖、及模組亦可在一通用或專用電腦、一程式化微處理器或微控制器及周邊積體電路元件、一積體電路、一硬體電子或邏輯電路(諸如一離散元件電路)、一可程式化邏輯裝置或類似物上實施。一般言之,其上駐留能夠實施圖中展示之流程圖之一有限狀態機之任何裝置可用於實施本發明之處理器功能。Methods, procedures or operations according to embodiments of the present invention may also be implemented on a programmed processor. However, the controller, flowcharts, and modules may also be represented by a general or special purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit components, an integrated circuit, a hardware electronic or logic circuit ( such as a discrete component circuit), a programmable logic device, or the like. In general, any device having resident thereon a finite state machine capable of implementing the flowcharts shown in the figures may be used to implement the processor functions of the present invention.

一替代實施例較佳地在儲存電腦可程式化指令之一非暫態電腦可讀儲存媒體中實施根據本發明之實施例之方法、程序或操作。指令較佳地由較佳地與一網路安全系統整合之電腦可執行組件執行。非暫態電腦可讀儲存媒體可儲存於任何適合電腦可讀媒體上,諸如RAM、ROM、快閃記憶體、EEPROM、光學儲存裝置(CD或DVD)、硬碟機、軟碟機或任何適合裝置。電腦可執行組件較佳地係一處理器,但指令可替代地或另外由任何適合專用硬體裝置執行。例如,本發明之一實施例提供具有儲存於其中之電腦可程式化指令之一非暫態電腦可讀儲存媒體。An alternative embodiment preferably implements methods, procedures or operations according to embodiments of the invention in a non-transitory computer-readable storage medium storing computer programmable instructions. The instructions are preferably executed by a computer executable component preferably integrated with a network security system. The non-transitory computer-readable storage medium can be stored on any suitable computer-readable medium, such as RAM, ROM, flash memory, EEPROM, optical storage device (CD or DVD), hard drive, floppy disk drive, or any other suitable device. The computer-executable component is preferably a processor, but the instructions may alternatively or additionally be executed by any suitable special purpose hardware device. For example, one embodiment of the invention provides a non-transitory computer-readable storage medium having computer programmable instructions stored therein.

雖然已使用本發明之特定實施例描述本發明,但顯然,熟習此項技術者可明白許多替代方案、修改、及變化。例如,可在其他實施例中互換、增添或替換實施例之各種組件。而且,各圖之全部元件對於所揭示之實施例之操作並非必要的。例如,藉由簡單地採用獨立請求項之元件,一般技術者將能夠製作及使用本發明之教示。因此,如本文中闡述之本發明之實施例旨在係闡釋性的,而非限制性。在不脫離本發明之精神及範疇之情況下,可作出各種改變。Although the invention has been described using specific embodiments thereof, it will be apparent to those skilled in the art that many alternatives, modifications, and variations will be apparent to those skilled in the art. For example, various components of the embodiments may be interchanged, added, or replaced in other embodiments. Furthermore, not all elements of the figures are necessary to the operation of the disclosed embodiments. For example, a person of ordinary skill will be able to make and use the teachings of the present invention, by simply employing the elements of the independent claims. Accordingly, the embodiments of the present invention as set forth herein are intended to be illustrative and not restrictive. Various changes may be made without departing from the spirit and scope of the invention.

即使已在前述描述中闡述本發明之數種特性及優點,以及本發明之結構及功能之細節,本發明仍僅為闡釋性的。在本發明之原理內,可在細節上(尤其與部分之形狀、大小、及配置相關)進行改變以達到由表示隨附發明申請專利範圍之術語之廣泛通用含義指示之全部範圍。Even though the several features and advantages of the invention have been set forth in the foregoing description, as well as details of the structure and function of the invention, the invention is intended to be illustrative only. Within the principles of the invention, modifications may be made in details (especially relating particularly to the shape, size, and arrangement of the parts) to give the full scope indicated by the broad and common meaning of the terms indicating the patentable scope of the invention to which the invention is claimed.

100:病理語意關聯圖100: Pathological semantic association map

200:方法200:Method

201:操作201:Operation

203:操作203:Operation

205:操作205:Operation

205:操作205:Operation

207:操作207:Operation

209:操作209:Operation

610:病理報告610: Pathology report

620:病理報告620: Pathology report

700:電腦系統700:Computer system

710:運算裝置710:Computing device

711:處理器711: Processor

712:輸入/輸出介面712:Input/output interface

713:通信介面713: Communication interface

714:記憶體714:Memory

720:資料庫720:Database

為了描述可獲得本發明之優點及特徵之方式,藉由參考在隨附圖式中繪示之本發明之特定實施例來呈現本發明之一描述。此等圖式僅描繪本發明之實例實施例,且因此不應被視為限制其範疇。For purposes of describing the manner in which the advantages and features of the invention may be obtained, a description of the invention is presented by reference to specific embodiments of the invention illustrated in the accompanying drawings. The drawings depict only example embodiments of the invention and, therefore, should not be construed as limiting its scope.

圖1繪示根據本發明之一些實施例之一病理語意關聯圖。Figure 1 illustrates a pathological semantic association diagram according to some embodiments of the present invention.

圖2繪示根據本發明之一些實施例之提取關鍵語意模式之一方法之一流程圖。FIG. 2 illustrates a flow chart of a method for extracting key semantic patterns according to some embodiments of the present invention.

圖3係根據本發明之一些實施例之一病理報告之一區段。Figure 3 is a section of a pathology report according to some embodiments of the present invention.

圖4係根據本發明之一些實施例之一病理報告之一區段。Figure 4 is a section of a pathology report according to some embodiments of the present invention.

圖5係根據本發明之一些實施例之一病理報告之一部分。Figure 5 is a portion of a pathology report according to some embodiments of the present invention.

圖6繪示根據本發明之一些實施例之病理報告之資料表達之一示意圖。Figure 6 is a schematic diagram illustrating data expression of a pathology report according to some embodiments of the present invention.

圖7繪示展示根據本發明之一些實施例之一電腦系統之一示意圖。FIG. 7 is a schematic diagram showing a computer system according to some embodiments of the present invention.

700:電腦系統 700:Computer system

710:運算裝置 710:Computing device

711:處理器 711: Processor

712:輸入/輸出介面 712:Input/output interface

713:通信介面 713: Communication interface

714:記憶體 714:Memory

720:資料庫 720:Database

Claims (20)

一種藉由一運算裝置而從一病理報告提取關鍵語意模式之方法,其包括:基於一語意術語及一下一語意術語之共現,藉由該運算裝置之一處理器判定該語意術語與該下一語意術語之間的一信賴度及一支持度,其中在該病理報告中,該語意術語在該下一語意術語之前出現;藉由該運算裝置之該處理器產生一組候選語意術語,其中一候選語意術語與一對應下一候選語意術語之間的該信賴度等於或大於一信賴臨限值,且該候選語意術語與該對應下一候選語意術語之間的該支持度等於或大於一支持臨限值;藉由該運算裝置之該處理器,透過對該組候選語意術語執行隨機漫步而產生一第一組語意模式;及藉由該運算裝置之該處理器,透過從該第一組語意模式移除冗餘語意模式而判定該等關鍵語意模式。 A method for extracting key semantic patterns from a pathology report through a computing device, which includes: based on the co-occurrence of a semantic term and a following semantic term, using a processor of the computing device to determine the semantic term and the following semantic term. A degree of reliability and a degree of support between a semantic term, wherein in the pathology report, the semantic term appears before the next semantic term; a set of candidate semantic terms is generated by the processor of the computing device, wherein The degree of trust between a candidate semantic term and a corresponding next candidate semantic term is equal to or greater than a trust threshold value, and the degree of support between the candidate semantic term and the corresponding next candidate semantic term is equal to or greater than a supporting a threshold; by the processor of the computing device, generating a first set of semantic patterns by performing a random walk on the set of candidate semantic terms; and by the processor of the computing device, by performing a random walk on the set of candidate semantic terms; The group semantic pattern removes redundant semantic patterns and determines the key semantic patterns. 如請求項1之方法,其中:基於該語意術語及該下一語意術語之共現次數,藉由該運算裝置之該處理器產生該語意術語與該下一語意術語之間的該支持度;且在出現該語意術語之情況下,基於該下一語意術語之一出現概率,藉由該運算裝置之該處理器產生該語意術語與該下一語意術語之間的該信賴度。 The method of claim 1, wherein: based on the number of co-occurrences of the semantic term and the next semantic term, the processor of the computing device generates the support between the semantic term and the next semantic term; And when the semantic term appears, the reliability between the semantic term and the next semantic term is generated by the processor of the computing device based on an occurrence probability of the next semantic term. 如請求項1之方法,其中從該第一組語意模式移除該等冗餘語意模式包含當一第一語意模式指示一第二語意模式之一子集時,藉由該運算裝置之該處理器移除該第一語意模式。 The method of claim 1, wherein removing the redundant semantic patterns from the first set of semantic patterns includes when a first semantic pattern indicates a subset of a second semantic pattern, by the processing of the computing device The processor removes the first semantic pattern. 如請求項1之方法,其中該語意術語與該下一語意術語之間的該信賴度與一先前語意術語與該語意術語之間的一第二信賴度無關,其中在該病理報告中,該先前語意術語在該語意術語之前出現。 The method of claim 1, wherein the degree of reliability between the semantic term and the next semantic term is independent of a second degree of reliability between a previous semantic term and the semantic term, wherein in the pathology report, the The previous semantic term appears before this semantic term. 一種藉由一運算裝置而總結一病理報告之方法,其包括:基於關鍵語意模式,藉由該運算裝置之一處理器而從該病理報告獲取複數個病理特徵,其中根據請求項1之方法產生該等關鍵語意模式。 A method for summarizing a pathology report through a computing device, which includes: based on key semantic patterns, using a processor of the computing device to obtain a plurality of pathological features from the pathology report, wherein the method generated according to claim 1 These key semantic patterns. 如請求項5之方法,其進一步包括:藉由該運算裝置之該處理器而在一SOAP(主觀、客觀、評估、及計劃)區段中定位一描述,其中該描述具有四個或更多個部分,且一「、」分離任何兩個相鄰部分,其中當該描述具有四個部分時,該描述之一第一部分與一器官相關,該描述之一第二部分與一位置相關,該描述之一第三部分與一取樣方法相關,且該描述之一第四部分與一診斷相關。 The method of claim 5, further comprising: locating, by the processor of the computing device, a description in a SOAP (Subjective, Objective, Assessment, and Plan) section, wherein the description has four or more parts, and a "," separates any two adjacent parts, where when the description has four parts, a first part of the description relates to an organ, a second part of the description relates to a location, the The third part of the description relates to a sampling method, and the fourth part of the description relates to a diagnosis. 如請求項6之方法,其中當該描述具有四個部分時,該方法進一步包括:基於該等關鍵語意模式定位與一器官相關之一第一部分及與一取樣 方法相關之一第二部分;及在該第一部分與該第二部分之間定位與一位置相關之一第三部分;在該第二部分之後定位與一診斷相關之一第四部分。 Such as the method of claim 6, wherein when the description has four parts, the method further includes: locating a first part related to an organ based on the key semantic patterns and a sample a second part associated with a method; and a third part associated with a location positioned between the first part and the second part; and a fourth part associated with a diagnosis positioned after the second part. 如請求項7之方法,其中:基於該等關鍵語意模式之一第一子集定位與該器官相關之該第一部分,該等關鍵語意模式之該第一子集包含「肺部」、「淋巴結」、「腦部」及「皮膚」;且基於該等關鍵語意模式之一第二子集定位與該取樣方法相關之該第二部分,該等關鍵語意模式之該第二子集包含「活檢」、「VATS」及「EBUS」。 Such as requesting the method of item 7, wherein: the first part related to the organ is located based on a first subset of the key semantic patterns, and the first subset of the key semantic patterns includes "lungs", "lymph nodes" ", "brain" and "skin"; and the second part related to the sampling method is located based on a second subset of the key semantic patterns, the second subset of the key semantic patterns includes "biopsy" ”, “VATS” and “EBUS”. 如請求項5之方法,其進一步包括:基於該等關鍵語意模式之一第三子集定位一免疫組織化學(IHC)部分,其中該等關鍵語意模式之該第三子集包含「免疫組織化學的」、「免疫組織化學地」、「免疫組織化學的」、「免疫組織化學地」、「免疫研究」、「IHC」、「免疫染色」、「免疫組織化學」、「免疫研究」、「免疫反應」、「展示腺癌組成」。 The method of claim 5, further comprising: locating an immunohistochemistry (IHC) section based on a third subset of the key semantic patterns, wherein the third subset of the key semantic patterns includes "immunohistochemistry" of", "immunohistochemistry", "immunohistochemistry", "immunohistochemistry", "immunoresearch", "IHC", "immunostaining", "immunohistochemistry", "immunoresearch", " Immune response", "showing adenocarcinoma composition". 如請求項9之方法,其進一步包括:藉由該運算裝置之該處理器而在該IHC部分中定位該等關鍵語意模式之一第四子集之一特徵術語; 藉由該運算裝置之該處理器而定位在該特徵術語之前出現之一第一修飾術語;藉由該運算裝置之該處理器而定位在該特徵術語之後出現之一第二修飾術語;及基於該第一修飾術語與該特徵術語之間的一第一距離及該第二修飾術語與該特徵術語之間的一第二距離,藉由該運算裝置之該處理器選擇該第一修飾術語及該第二修飾術語之一者。 The method of claim 9, further comprising: locating, by the processor of the computing device, a characteristic term of a fourth subset of the key semantic patterns in the IHC part; Locating by the processor of the computing device a first modifying term that appears before the characteristic term; locating a second modifying term that appears after the characteristic term by the processor of the computing device; and based on A first distance between the first modified term and the characteristic term and a second distance between the second modified term and the characteristic term, the processor of the computing device selects the first modified term and One of the second modifying terms. 如請求項10之方法,其中:該等關鍵語意模式之該第四子集包含「CK7」、「TTF-1」、「天冬氨酸蛋白酶A」、「CK20」、「P40」、「CDX2」、「P63」、「P16」「、細胞角蛋白(AE1/AE3)」、「波形蛋白」、「PAX-8」、「CD56」、「染色顆粒素A」、「突觸素」、「GATA3」、「P53」、「S100」、「Ki67」及「EBER」;及該第一修飾術語及該第二修飾術語包含「陽性」及「陰性」。 For example, the method of claim 10, wherein: the fourth subset of the key semantic patterns includes "CK7", "TTF-1", "Aspartate Protease A", "CK20", "P40", "CDX2" ", "P63", "P16", "cytokeratin (AE1/AE3)", "vimentin", "PAX-8", "CD56", "chromogranin A", "synaptophysin", " GATA3", "P53", "S100", "Ki67" and "EBER"; and the first modified term and the second modified term include "masculine" and "feminine". 如請求項6之方法,其進一步包括:基於體積之一語意模式,藉由該運算裝置之該處理器而定位至少一個候選片段;當該至少一個候選片段之一者之語意脈絡包含該等關鍵語意模式之一第五子集之一個關鍵語意模式時,藉由該運算裝置之該處理器而獲取該至少一個候選片段之該一者中之一體積作為一腫瘤大小;及藉由該運算裝置之該處理器而將該腫瘤大小之一最大值判定為一最 大尺寸,其中該等關鍵語意模式之該第五子集包含「切割」、「硬質腫瘤」及「腫瘤量測」。 The method of claim 6, further comprising: locating at least one candidate segment by the processor of the computing device based on a semantic pattern of the volume; when the semantic context of one of the at least one candidate segment includes the key When a key semantic pattern of one of the fifth subsets of semantic patterns is obtained, by the processor of the computing device, one of the volumes of the at least one candidate fragment is obtained as a tumor size; and by the computing device The processor determines a maximum value of the tumor size as a maximum Large size, wherein the fifth subset of the key semantic patterns includes "cutting", "hard tumor" and "tumor measurement". 如請求項12之方法,進一步其中體積之該語意模式包含:一第一數字、一第二數字、及一第三數字,該第一數字與該第二數字之間的一乘法符號,該第二數字與該第三數字之間的另一乘法符號,及該第三數字之後的一長度單位。 The method of claim 12, further wherein the semantic pattern of the volume includes: a first number, a second number, and a third number, a multiplication symbol between the first number and the second number, the third number Another multiplication symbol between the second number and the third number, and a unit of length after the third number. 如請求項5之方法,其進一步包括:基於「顯微鏡評估」之一術語,藉由該運算裝置之該處理器而定位一顯微鏡評估區段;及藉由該運算裝置之該處理器而在該顯微鏡評估區段中定位一第一組項目之一項目之一冒號;及藉由該運算裝置之該處理器而獲取該冒號之後的資訊作為該第一組項目之該項目,其中該第一組項目包含:腫瘤病灶、組織學類型、組織學期別、淋巴血管侵襲、內臟胸膜侵襲、及最近邊緣。 The method of claim 5, further comprising: locating a microscope evaluation section by the processor of the computing device based on the term "microscope evaluation"; and locating a microscope evaluation section by the processor of the computing device in the locating a colon of one of the first group items in the microscope evaluation section; and obtaining, by the processor of the computing device, information following the colon as the item of the first group of items, wherein the first group Items include: tumor focus, histological type, histological grade, lymphovascular invasion, visceral pleural invasion, and nearest margin. 如請求項5之方法,其進一步包括:基於該等關鍵語意模式之一第六子集,藉由該運算裝置之該處理器而定位一PD-L1檢測部分;及 藉由該運算裝置之該處理器而在該PD-L1檢測部分中定位一第二組項目之一項目之一冒號;及藉由該運算裝置之該處理器而獲取該冒號之後的資訊作為該第二組項目之該項目,其中該等關鍵語意模式之該第六子集包含:「22C3」、「28-8」、「SP142」及「SP263」,且該第二組項目包含:腫瘤比例分數(TPS)、綜合陽性分數(CPS)、腫瘤細胞(TC)、及免疫細胞(IC)。 The method of claim 5, further comprising: locating a PD-L1 detection portion by the processor of the computing device based on a sixth subset of the key semantic patterns; and Locating a colon of one of the items in a second group of items in the PD-L1 detection part by the processor of the computing device; and obtaining information after the colon as the The item of the second group of items, wherein the sixth subset of the key semantic patterns includes: "22C3", "28-8", "SP142" and "SP263", and the second group of items includes: tumor proportion score (TPS), combined positive score (CPS), tumor cells (TC), and immune cells (IC). 如請求項5之方法,其進一步包括:基於表皮生長因子受體(EGFR)之一術語,藉由該運算裝置之該處理器而定位一EGFR部分;及基於「18」、「19」、「20」、及「21」之術語,藉由該運算裝置之該處理器而判定突變是否在外顯子18、外顯子19、外顯子20或外顯子21中;及基於「T790M」之一術語,藉由該運算裝置之該處理器而判定一突變位於外顯子20之位置790。 The method of claim 5, further comprising: locating an EGFR portion by the processor of the computing device based on a term of epidermal growth factor receptor (EGFR); and based on "18", "19", " 20", and "21", the processor of the computing device determines whether the mutation is in exon 18, exon 19, exon 20 or exon 21; and based on "T790M" In one term, the processor of the computing device determines that a mutation is located at position 790 in exon 20. 如請求項5之方法,其進一步包括:基於關鍵語意模式之一第七子集,藉由該運算裝置之該處理器而定位一分子檢測部分;及藉由該運算裝置之該處理器而在關鍵語意模式之該第七子集之一個關鍵語意模式之語意脈絡中識別一修飾術語;及 當該修飾術語被識別為「陽性」時,藉由該運算裝置之該處理器而判定一突變在與該一個關鍵語意模式相關之一基因中,其中該等關鍵語意模式之該第七子集包含:「ALK」、「ROS1」、「BRAF」、「MET」、「KRAS」、「ERBB2」、「PIK3CA」、「NRAS」、「MEK1」、「NTRK」及「RET」。 The method of claim 5, further comprising: locating, by the processor of the computing device, a molecule detection portion based on a seventh subset of key semantic patterns; and locating, by the processor of the computing device, Identify a modifying term in the semantic context of a key semantic pattern in the seventh subset of key semantic patterns; and When the modified term is identified as "positive", it is determined by the processor of the computing device that a mutation is in a gene associated with the one key semantic pattern, wherein the seventh subset of the key semantic patterns Includes: "ALK", "ROS1", "BRAF", "MET", "KRAS", "ERBB2", "PIK3CA", "NRAS", "MEK1", "NTRK" and "RET". 如請求項5之方法,其進一步包括:基於「病理分期」之一術語及「pTNM」之一術語,藉由該運算裝置之該處理器而定位一病理分期(pTNM)部分;及基於「pT」之一術語,藉由該運算裝置之該處理器而擷取一第一期別指示符;基於「pN」之一術語,藉由該運算裝置之該處理器而擷取一第二期別指示符;及基於「pM」之一術語,藉由該運算裝置之該處理器而擷取一第三期別指示符。 The method of claim 5, further comprising: locating a pathological staging (pTNM) portion by the processor of the computing device based on a term of "pathological staging" and a term of "pTNM"; and based on "pT ” a term based on retrieving a first phase indicator by the processor of the computing device; a term based on “pN” retrieving a second phase indicator by the processor of the computing device indicator; and retrieving a third phase indicator by the processor of the computing device based on the term "pM". 如請求項5之方法,其進一步包括:依據該複數個病理特徵,藉由該運算裝置之該處理器而將該病理報告變換為一第一向量,其中該第一向量包含類別之多個元素及語句向量之多個元素;依據該複數個病理特徵,藉由該運算裝置之該處理器而將一第二病理報告變換為一第二向量,其中該第二向量包含類別之多個元素及語句向量之多個元素; 藉由該運算裝置之該處理器,透過加總該第一向量及該第二向量之對應元素之各分數來計算該病理報告與該第二病理報告之間的一相似度分數,其中當一第n個元素係類別之一元素時,該第n個元素之一分數係
Figure 111115383-A0305-02-0045-1
,C1n指示該第一向量之該第n個元素,C2n指示該第二向量之該第n個元素,wn指示該第n個元素之一權重值,且其中當該第n個元素係語句向量之一元素時,該第n個元素之該分數 係
Figure 111115383-A0305-02-0045-2
,Em1n指示該第一向量之該第n個元素,Em2n指示該第二向量之該第n個元素。
The method of claim 5, further comprising: converting the pathology report into a first vector by the processor of the computing device according to the plurality of pathology characteristics, wherein the first vector includes multiple elements of a category and multiple elements of the sentence vector; according to the plurality of pathological characteristics, a second pathology report is transformed into a second vector by the processor of the computing device, wherein the second vector includes multiple elements of the category and a plurality of elements of the statement vector; the processor of the computing device calculates a difference between the pathology report and the second pathology report by summing the scores of the corresponding elements of the first vector and the second vector. Similarity score, where when an n-th element is one of the elements of the category, one of the n-th element's scores is
Figure 111115383-A0305-02-0045-1
, C 1n indicates the n-th element of the first vector, C 2n indicates the n-th element of the second vector, w n indicates a weight value of the n-th element, and where when the n-th element When the series is one element of the sentence vector, the fractional system of the n-th element
Figure 111115383-A0305-02-0045-2
, Em 1n indicates the n-th element of the first vector, and Em 2n indicates the n-th element of the second vector.
一種非暫態電腦儲存媒體,其具有儲存於其上之程式指令,該等程式指令在由一處理器執行時導致如請求項1之方法之一組操作之執行。A non-transitory computer storage medium having program instructions stored thereon which, when executed by a processor, result in the execution of a set of operations such as the method of claim 1.
TW111115383A 2022-04-22 2022-04-22 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report TWI815411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111115383A TWI815411B (en) 2022-04-22 2022-04-22 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW111115383A TWI815411B (en) 2022-04-22 2022-04-22 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report

Publications (2)

Publication Number Publication Date
TWI815411B true TWI815411B (en) 2023-09-11
TW202343470A TW202343470A (en) 2023-11-01

Family

ID=88966002

Family Applications (1)

Application Number Title Priority Date Filing Date
TW111115383A TWI815411B (en) 2022-04-22 2022-04-22 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report

Country Status (1)

Country Link
TW (1) TWI815411B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201606690A (en) * 2014-08-05 2016-02-16 Pei-Hong Liao Nursing decision support system
CN109977406A (en) * 2019-03-26 2019-07-05 浙江大学 A kind of Chinese medicine state of an illness text key word extracting method based on sick position
US20200302117A1 (en) * 2016-03-25 2020-09-24 Canon Kabushiki Kaisha Method and apparatus for extracting diagnosis object from medical document
CN113657112A (en) * 2021-08-18 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for reading article

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201606690A (en) * 2014-08-05 2016-02-16 Pei-Hong Liao Nursing decision support system
US20200302117A1 (en) * 2016-03-25 2020-09-24 Canon Kabushiki Kaisha Method and apparatus for extracting diagnosis object from medical document
CN109977406A (en) * 2019-03-26 2019-07-05 浙江大学 A kind of Chinese medicine state of an illness text key word extracting method based on sick position
CN113657112A (en) * 2021-08-18 2021-11-16 支付宝(杭州)信息技术有限公司 Method and device for reading article

Also Published As

Publication number Publication date
TW202343470A (en) 2023-11-01

Similar Documents

Publication Publication Date Title
JP7522167B2 (en) Systems and methods for model-assisted cohort selection
US20250078971A1 (en) Automated information extraction and enrichment in pathology report using natural language processing
Jiang et al. MHAttnSurv: Multi-head attention for survival prediction using whole-slide pathology images
US20210257106A1 (en) Generalized biomarker model
Martinez et al. Cross-hospital portability of information extraction of cancer staging information
Taha et al. Analysis of artificial intelligence in thyroid diagnostics and surgery: a scoping review
Oyelade et al. ST-ONCODIAG: A semantic rule-base approach to diagnosing breast cancer base on Wisconsin datasets
Flayyih et al. ASystematic Mapping Study on Brain Tumors Recognition Based on Machine Learning Algorithms
Özkan et al. Effect of data preprocessing on ensemble learning for classification in disease diagnosis
Zhao et al. A machine learning method for improving liver cancer staging
CN120092302A (en) Systems and methods for multimodal prediction of patient outcomes
Li et al. Narrative review of the application of artificial intelligence-related technologies in the diagnosis of pulmonary nodules with recommendations for clinical practice and future research
Kennedy et al. Using a gradient boosted model for case ascertainment from free-text veterinary records
Saxena et al. [Retracted] Appropriate Supervised Machine Learning Techniques for Mesothelioma Detection and Cure
US20230343425A1 (en) Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report
TWI815411B (en) Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report
Kalankesh et al. Taming EHR data: using semantic similarity to reduce dimensionality
James et al. Classification of x-ray covid-19 image using convolutional neural network
Sun et al. Artificial intelligence-based pathological application to predict regional lymph node metastasis in Papillary Thyroid Cancer
US12125204B2 (en) Radiogenomics for cancer subtype feature visualization
CN118522417A (en) Medical Image Processing Using Artificial Intelligence
Özmen et al. The Role of Machine Learning Algorithms in Sepsis Diagnosis: A Retrospective Overview using Bibliometric Analysis
TWI894564B (en) Methods, devices, and non-transitory computer storage medium of matching clinical trials
Vaz et al. Artificial Intelligence in Hepatology: A Narrative Review
Rezayi et al. A scoping and bibliometric review of deep learning techniques in breast cancer imaging: mapping the landscape and future directions