TWI894564B

TWI894564B - Methods, devices, and non-transitory computer storage medium of matching clinical trials

Info

Publication number: TWI894564B
Application number: TW112118923A
Authority: TW
Inventors: 陳震宇; 蕭世欣; 張詠淳
Original assignee: 臺北醫學大學
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2025-08-21
Also published as: TW202447639A

Abstract

Disclosed are methods, devices and the non-transitory computer storage media of matching clinical trials. The present disclosure provides a method of matching clinical trials. The method comprises: obtaining a first data set from a pathology report; obtaining a second data set of a clinical trial; determining whether the first data set and the second data set are matched with respect to a first set of fields; determining a relevance value between the first data set and the second data set with respect to a second set of fields when the first data set and the second data set are matched with respect to the first set of fields; and determining the clinical trial as recommended when the relevance value exceeds a threshold.

Description

Methods, devices, and non-transitory computer storage media for matching clinical trials

本公開係關於一種匹配臨床試驗之方法、一種匹配臨床試驗之裝置及一種相關的非暫態性電腦儲存媒體。特定而言，本公開係關於基於患者之病理學報告為患者匹配臨床試驗之方法，及其相關裝置及非暫態性電腦儲存媒體。 This disclosure relates to a method for matching clinical trials, a device for matching clinical trials, and a related non-transitory computer storage medium. Specifically, this disclosure relates to a method for matching patients to clinical trials based on their pathology reports, and a related device and non-transitory computer storage medium.

患者的病理學報告包括大量資訊，尤其是癌症患者的病理學報告，且此類病理學報告包括大量混雜及繁瑣資訊。外科醫生及主任醫師可能花費大量時間來理解患者的情形並找到可適用於患者的臨床試驗，而電腦可有助於減少所浪費的時間並因此可增加整體效率。 Patient pathology reports, especially those for cancer patients, contain a wealth of information and can be complex and complex. Surgeons and primary care physicians can spend considerable time understanding a patient's condition and identifying appropriate clinical trials. Computers can help reduce this wasted time and thus increase overall efficiency.

本公開可分析患者之病理學報告並為患者找到合適的臨床試驗。病理學報告可含有藉由在顯微鏡下檢查細胞及組織而判定的診斷。病理學報告可用於肺癌患者。重要訊息可根據混雜及繁瑣的病理學報告概述。此類訊息可包括特徵類別：病理學之基本描述、腫瘤特徵、組織學描述、免疫組織化學(IHC)資訊、基因測試結果及病理TNM(腫瘤、節點及轉移)分期。本公開可進一步概述一名患者的多個病理學報告。本公開可進一步提供收集大量臨床試驗之資料的功能，且將自病理學報告獲得之特徵與臨床試驗進行比較以為患者判定合適的臨床試驗，該等臨床試驗可為外科醫生及醫師之參考。 This publication analyzes a patient's pathology report and identifies appropriate clinical trials. A pathology report may contain a diagnosis determined by microscopic examination of cells and tissues. Pathology reports can be used for patients with lung cancer. Key information can be summarized from a confusing and complex pathology report. This information may include feature categories: basic pathology description, tumor characteristics, histology description, immunohistochemistry (IHC) information, genetic test results, and pathological TNM (tumor, node, and metastasis) staging. This publication further summarizes multiple pathology reports for a single patient. This disclosure can further provide the ability to collect data from a large number of clinical trials and compare characteristics obtained from pathology reports with clinical trials to determine appropriate clinical trials for patients. These clinical trials can serve as a reference for surgeons and physicians.

本公開之實施例提供一種匹配臨床試驗之方法。該方法包含：自病理學報告獲得第一資料集；獲得臨床試驗之第二資料集；判定第一資料集與第二資料集是否關於第一組欄位匹配；當第一資料集與第二資料集關於第一組欄位匹配時，判定第一資料集與第二資料集之間關於第二組欄位的相關性值；以及當相關性值超過臨限值時，判定臨床試驗為推薦的。 Embodiments of the present disclosure provide a method for matching clinical trials. The method includes: obtaining a first data set from a pathology report; obtaining a second data set of a clinical trial; determining whether the first data set and the second data set match with respect to a first set of fields; when the first data set and the second data set match with respect to the first set of fields, determining a correlation value between the first data set and the second data set with respect to the second set of fields; and when the correlation value exceeds a threshold value, determining that the clinical trial is recommended.

本公開之另一實施例提供一種匹配臨床試驗之裝置。該裝置包含處理器及與處理器耦接之記憶體。處理器執行儲存於記憶體中以執行操作的電腦可讀指令，且操作包含：自病理學報告獲得第一資料集；獲得臨床試驗之第二資料集；判定第一資料集與第二資料集之間關於第一組欄位的相關性值；以及當相關性值超過臨限值時，判定臨床試驗為推薦的。 Another embodiment of the present disclosure provides a device for matching clinical trials. The device includes a processor and a memory coupled to the processor. The processor executes computer-readable instructions stored in the memory to perform operations, wherein the operations include: obtaining a first data set from a pathology report; obtaining a second data set of a clinical trial; determining a correlation value between the first data set and the second data set with respect to a first set of fields; and determining that the clinical trial is recommended when the correlation value exceeds a threshold value.

本公開之另一實施例提供一種非暫態性電腦儲存媒體。非暫態性電腦儲存媒體具有儲存於其上之程式指令。一旦由處理器執行程式指令，程式指令就引起操作之集合的執行。操作包含：自病理學報告獲得第一資料集；獲得臨床試驗之第二資料集；判定第一資料集與第二資料集是否關於第一組欄位匹配；當第一資料集與第二資料集關於第一組欄位匹配時，判定第一資料集與第二資料集之間關於第二組欄位的相關性值；以及當相關性值超過臨限值時，判定臨床試驗為推薦的。 Another embodiment of the present disclosure provides a non-transitory computer storage medium. The non-transitory computer storage medium has program instructions stored thereon. When executed by a processor, the program instructions cause a set of operations to be performed. The operations include: obtaining a first data set from a pathology report; obtaining a second data set from a clinical trial; determining whether the first data set and the second data set match with respect to a first set of fields; when the first data set and the second data set match with respect to the first set of fields, determining a correlation value between the first data set and the second data set with respect to the second set of fields; and determining that the clinical trial is recommended when the correlation value exceeds a threshold value.

11:病理學報告 11: Pathology report

12:經預先訓練模型 12: Pre-trained model

13:病理特徵 13: Pathological characteristics

14:人口統計資料 14: Demographics

15:臨床試驗匹配系統 15: Clinical Trial Matching System

30:方法 30: Methods

40:方法 40: Methods

41:方法 41: Methods

50:方法 50: Methods

151:操作 151: Operation

152:操作 152: Operation

153:操作 153: Operation

154:操作 154: Operation

301:操作 301: Operation

302:操作 302: Operation

303:操作 303: Operation

304:操作 304: Operation

305:操作 305: Operation

401:操作 401: Operation

402:操作 402: Operation

403:操作 403: Operation

404:操作 404: Operation

405:操作 405: Operation

406:操作 406: Operation

411:操作 411: Operation

412:操作 412: Operation

501:操作 501: Operation

502:操作 502: Operation

503:操作 503: Operation

504:操作 504: Operation

505:操作 505: Operation

601:資料 601: Data

602:資料 602: Data

700:電腦系統 700: Computer System

710:計算裝置 710: Computing device

711:處理器 711: Processor

712:輸入/輸出介面 712: Input/Output Interface

713:通信介面 713: Communication Interface

714:記憶體 714: Memory

720:資料庫 720:Database

1521:步驟 1521: Steps

1522:步驟 1522: Steps

為了描述可獲得本公開之優點及特徵的方式，藉由參考本公開之特定實施例來呈現本公開之描述，該等實施例在隨附圖式中加以繪示。此等圖式僅描繪本公開之範例性實施例且因此不被視為對其範疇之限制。 To illustrate how the advantages and features of the present disclosure may be obtained, the description of the present disclosure is presented with reference to specific embodiments thereof, which are illustrated in the accompanying drawings. These drawings depict only exemplary embodiments of the present disclosure and, therefore, are not to be considered limiting of its scope.

圖1繪示根據本公開之一些實施例之匹配臨床試驗的系統之示意圖。 FIG1 is a schematic diagram of a system for matching clinical trials according to some embodiments of the present disclosure.

圖2繪示根據本公開之一些實施例的包括於圖1中之臨床試驗匹配系統中的次要條件匹配之流程圖。 FIG2 illustrates a flow chart of secondary condition matching in the clinical trial matching system of FIG1 according to some embodiments of the present disclosure.

圖3繪示根據本公開之一些實施例之匹配臨床試驗的方法之流程圖。 FIG3 illustrates a flow chart of a method for matching clinical trials according to some embodiments of the present disclosure.

圖4A繪示根據本公開之一些實施例的預先訓練用於自病理學報告提取特徵之模型的方法之流程圖。 FIG4A illustrates a flow chart of a method for pre-training a model for extracting features from pathology reports according to some embodiments of the present disclosure.

圖4B繪示根據本公開之一些實施例之自病理學報告提取特徵的方法之流程圖。 FIG4B illustrates a flow chart of a method for extracting features from a pathology report according to some embodiments of the present disclosure.

圖5繪示根據本公開之一些實施例之收集臨床試驗的方法之流程圖。 FIG5 shows a flow chart of a method for collecting clinical trial data according to some embodiments of the present disclosure.

圖6繪示根據本公開之一些實施例之臨床試驗匹配系統的表示之示意圖。 FIG6 is a schematic diagram illustrating a representation of a clinical trial matching system according to some embodiments of the present disclosure.

圖7繪示展示根據本公開之一些實施例的電腦系統之示意圖。 FIG7 is a schematic diagram illustrating a computer system according to some embodiments of the present disclosure.

以下揭示提供用以實施所提供主題之不同特徵的許多不同實施例或實例。下文描述操作、組件及配置之特定實例以簡化本公開。當然，此等僅為實例且不意欲為限制性的。舉例而言，在描述中在第二操作之前或之後執行的第一操作可包括第一操作及第二操作一起執行的實施例，且亦可包括可在第一操作與第二操作之間執行額外操作的實施例。舉例而言，在以下描述中，第一特徵在第二特徵上方或上或中的形成可包括第一特徵及第二特徵直接接觸地形成的實施例，且亦可包括額外特徵可在第一特徵與第二特徵之間形成使得第一特徵與第二特徵可不直接接觸的實施例。此外，本公開可在各種實例中重複參考編號及/或字母。此重複係出於簡單及清晰的目的，且本身並不指示所論述的各種實施例及/或組態之間的關係。 The following disclosure provides numerous different embodiments or examples for implementing various features of the presented subject matter. Specific examples of operations, components, and configurations are described below to simplify this disclosure. However, these are merely examples and are not intended to be limiting. For example, a description of a first operation being performed before or after a second operation may include embodiments in which the first and second operations are performed together, and may also include embodiments in which additional operations are performed between the first and second operations. For example, in the following description, a description of a first feature being formed above, on, or within a second feature may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which an additional feature is formed between the first and second features, such that the first and second features are not in direct contact. Furthermore, this disclosure may refer to repeated reference numbers and/or letters throughout the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or configurations discussed.

出於易於描述之目的，本文中可使用時間相對術語，諸如「之前」、「在之前」、「之後」、「在之後」及其類似者，以描述如圖式中所繪示的一個操作或特徵與另一操作或特徵之關係。時間相對術語意欲涵蓋圖式中所描繪之操作的不同序列。另外，在本文中為了易於描述，可使用諸如「在...下方(beneath)」、「下方(below)」、「下部(lower)」、「在...上方(above)」、「上部(upper)」及其類似者之空間相對術語來描述如在圖式中所繪示之一個元件或特徵與另一元件或特徵的關係。除了圖式中所描繪的定向之外，空間相對術語亦意欲涵蓋裝置在使用或操作中的不同定向。設備可以其他方式定向(旋轉90度或處於其他定向)，且本文中所使用的空間相對描述詞可同樣相應地進行解釋。為了便於描述，本文中可使用用於連接之相對術語，諸如「連接(connect)」、「連接(connected)」、「連接(connection)」、「耦接(couple)」、「耦接(coupled)」、「連通」及其類似者來描述操作連接、耦接或連結兩個元件或特徵之間的一個。用於連接之相對術語意欲涵蓋裝置或組件之不同連接、耦接或連結。裝置或組件可直接地或經由例如另一組件集合間接地彼此連接、耦接或連結。裝置或組件可以有線及/或無線方式彼此連接、耦接或連結。 For ease of description, time-relative terms, such as "before," "before," "after," "after," and the like, may be used herein to describe the relationship of one operation or feature to another operation or feature as depicted in the figures. Time-relative terms are intended to encompass different sequences of operations depicted in the figures. Additionally, for ease of description, spatially-relative terms, such as "beneath," "below," "lower," "above," "upper," and the like, may be used herein to describe the relationship of one element or feature to another element or feature as depicted in the figures. Spatially-relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The device may be otherwise oriented (rotated 90 degrees or at other orientations), and the spatially relative descriptors used herein should be interpreted accordingly. For ease of description, relative terms for connection may be used herein, such as "connect," "connected," "connection," "coupled," "coupled," "in communication," and the like, to describe an operational connection, coupling, or linking between two elements or features. Relative terms for connection are intended to encompass different types of connections, couplings, or links between devices or components. Devices or components may be connected, coupled, or linked to each other directly or indirectly, such as through another set of components. Devices or components may be connected, coupled, or linked to each other using wired and/or wireless means.

如本文中所使用，除非上下文另外清晰指示，否則單數術語「一(a/an)」及「該(the)」可包括複數個指示物。舉例而言，除非上下文另外清楚地指示，否則對裝置之提及可包括多個裝置。術語「包含」及「包括」可指示所描述特徵、整體、步驟、操作、元件及/或組件之存在，但可不排除特徵、整體、步驟、操作、元件及/或組件中之一或多者之組合的存在。術語「及/或」可包括一或多個所列項目之任何或所有組合。 As used herein, the singular terms "a," "an," and "the" may include plural referents unless the context clearly indicates otherwise. For example, a reference to a device may include a plurality of devices unless the context clearly indicates otherwise. The terms "include" and "comprising" may indicate the presence of described features, integers, steps, operations, elements, and/or components, but may not exclude the presence of a combination of one or more of those features, integers, steps, operations, elements, and/or components. The term "and/or" may include any and all combinations of one or more of the listed items.

另外，有時在本文中以範圍格式呈現量、比率及其他數值。應理解，此類範圍格式出於便利及簡潔起見而使用，且應靈活地理解為不僅包括明確地指定為範圍限制之數值，且亦包括涵蓋於彼範圍內之所有個別數值或子範圍，如同明確地指定各數值及子範圍一般。 Additionally, quantities, ratios, and other numerical values are sometimes presented herein in a range format. It should be understood that such range format is used for convenience and brevity and should be construed flexibly to include not only the values specifically designated as limits of the range, but also all individual values or sub-ranges encompassed within that range, as if each value and sub-range were specifically designated.

實施例之性質及用途詳細地論述如下。然而，應瞭解，本公開提供許多適用的發明概念，其可體現在各種各樣的特定情境中。所論述之特定實施例僅繪示體現及使用本公開之特定方式，而不限制其範疇。 The nature and uses of the embodiments are discussed in detail below. However, it should be understood that the present disclosure provides many applicable inventive concepts that can be embodied in a wide variety of specific contexts. The specific embodiments discussed are merely illustrative of specific ways to embody and use the present disclosure and are not intended to limit its scope.

為將患者之病理學報告(例如，肺癌之病理學報告)與臨床試驗匹配，本公開提供一種自病理學報告提取病理特徵之方法。在本公開之一些實施例中，病理學報告可包括若干類別中之病理特徵。例示性病理特徵列於表1中。類別可包括：基本描述、(腫瘤之)探測、組織學(腫瘤之資訊)、IHC資訊、基因測試(結果)及TNM分期。在其他實施例中，患者之報告的資料表示可由表1中所示之病理特徵表示。 To match a patient's pathology report (e.g., a lung cancer pathology report) with a clinical trial, this disclosure provides a method for extracting pathology features from the report. In some embodiments of this disclosure, the pathology report may include pathology features in several categories. Exemplary pathology features are listed in Table 1. Categories may include: basic description, tumor detection, histology (tumor information), IHC information, genetic testing (results), and TNM staging. In other embodiments, the data representation of the patient's report may be represented by the pathology features shown in Table 1.

本公開提供臨床試驗匹配系統，其可分析包括病理特徵及人口統計資料(亦即患者之個人資訊)的病理學報告。臨床試驗匹配系統可判定該病理學報告與臨床試驗之間的類似性及相關性，且隨後為患者找出所推薦的臨床試驗。所推薦臨床試驗可為外科醫生及醫師之參考。因此，外科醫生及醫師可基於所推薦臨床試驗為患者提供更多治療選項。使用臨床試驗匹配系統，外科醫生及醫師可快速且準確地找到合適的臨床試驗，而得以節省手動地搜尋臨床試驗所耗費的大量時間。 This disclosure provides a clinical trial matching system that analyzes pathology reports, including pathological characteristics and demographic data (i.e., patient information). The clinical trial matching system determines the similarity and correlation between the pathology report and clinical trials, and subsequently identifies recommended clinical trials for the patient. These recommended clinical trials serve as a reference for surgeons and physicians, allowing them to provide patients with more treatment options based on the recommended clinical trials. Using the clinical trial matching system, surgeons and physicians can quickly and accurately find appropriate clinical trials, saving the considerable time required to manually search for clinical trials.

參看圖1，可提供患者的病理學報告11。可將病理學報告11輸入至經預先訓練模型12以提取一或多個病理特徵13。可將病理特徵13提供至臨床試驗匹配系統15。 Referring to FIG1 , a patient's pathology report 11 may be provided. The pathology report 11 may be input into a pre-trained model 12 to extract one or more pathology features 13. The pathology features 13 may be provided to a clinical trial matching system 15.

經預先訓練模型12可執行分類任務及/或序列標記任務以提取或獲得病理特徵13。病理特徵13可包括與EGFR、ALK、ROS1、KRAS、BRAF、RET、NTRK、MET、P53及Her2相關的資訊、以及與手術(例如，外科手術)、組織學、腫瘤大小、分期(例如，病理分期)及 PDL1相關的資訊。經由分類任務，可提取或獲得與EGFR、ALK、ROS1、KRAS、BRAF、RET、NTRK、MET、P53、Her2等相關的資訊。關於序列標記任務，可提取或獲得與手術、組織學、腫瘤大小、分期、PDL1等相關的資訊。 The pre-trained model 12 can perform classification and/or sequence labeling tasks to extract or obtain pathological features 13. Pathological features 13 may include information related to EGFR, ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, and Her2, as well as information related to surgery (e.g., surgical procedure), histology, tumor size, stage (e.g., pathological stage), and PDL1. Through the classification task, information related to EGFR, ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, Her2, etc. can be extracted or obtained. Regarding the sequence labeling task, information related to surgery, histology, tumor size, stage, PDL1, etc. can be extracted or obtained.

在一些實施例中，可向臨床試驗匹配系統15提供患者的人口統計資料14。人口統計資料14可包括與以下各者相關的資料或資訊：年齡、性別、吸菸、淋巴結轉移、遠處轉移、CNS轉移、骨轉移、野生型、抗血管生成(anti-angiogenesis)、鉑(platinum)、EGFR TKI、ALK抑制劑、PD-1/PD-L1抑制劑、CTLA-4抑制劑、放射療法、順鉑(cisplatin)/卡鉑(carboplatin)、化學療法、全身性療法、疾病狀態、ECOG PS等。可經由經預先訓練模型12之分類任務或序列標記任務提取或獲得人口統計資料14。當人口統計資料14係以可由臨床試驗匹配系統15直接利用之格式(例如，電腦可處理資料)儲存時，人口統計資料14可經由存取相關資料庫而獲得。 In some embodiments, the patient's demographic data 14 may be provided to the clinical trial matching system 15. The demographic data 14 may include data or information related to age, sex, smoking, lymph node metastasis, distant metastasis, CNS metastasis, bone metastasis, wild-type, anti-angiogenesis, platinum, EGFR TKI, ALK inhibitor, PD-1/PD-L1 inhibitor, CTLA-4 inhibitor, radiation therapy, cisplatin/carboplatin, chemotherapy, systemic therapy, disease status, ECOG PS, etc. The demographic data 14 may be extracted or obtained through a classification task or a sequence labeling task of a pre-trained model 12. When the demographic data 14 is stored in a format that can be directly used by the clinical trial matching system 15 (e.g., computer-processable data), the demographic data 14 can be obtained by accessing the relevant database.

臨床試驗匹配系統15可分析患者之病理特徵13及人口統計資料14，且為患者找出一或多個合適的臨床試驗。在一些實施例中，臨床試驗匹配系統15可耦接至臨床試驗資料庫(圖中未示)，使得臨床試驗匹配系統15可將病理特徵13及人口統計資料14與臨床試驗進行比較。因此，可為患者找到所推薦的臨床試驗。 The clinical trial matching system 15 can analyze a patient's pathological characteristics 13 and demographic data 14 and identify one or more suitable clinical trials for the patient. In some embodiments, the clinical trial matching system 15 can be coupled to a clinical trial database (not shown), allowing the clinical trial matching system 15 to compare the pathological characteristics 13 and demographic data 14 with clinical trials. Thus, a recommended clinical trial can be found for the patient.

參看圖1，臨床試驗匹配系統15包括操作151、152、153及154。患者之病理特徵13及人口統計資料14可經輸入至臨床試驗匹配系統15中且與各臨床試驗進行比較。在操作151中，判定病理特徵13及人口統計資料14中之一或多個主要條件(或欄位)是否匹配臨床試驗之納入準則。在一些實施例中，病理特徵13及人口統計資料14中之一或多個主要條件與臨床試驗之納入準則完全匹配。該等主要條件(或欄位)可包括以下各者中之至少一者：所估計腎小球濾過率(EFGR)、外科手術、組織學、病理分期、年齡、性別或吸菸。在一些實施例中，在操作151處，當患者之條件組的值或資訊與納入準則中所述之條件組的彼等值或資訊完全匹配時，程序將繼續進行至操作152。條件組可包括EGFR條件、外科手術條件、組織學條件、病理分期條件、年齡條件、性別條件及吸菸條件。 Referring to FIG. 1 , the clinical trial matching system 15 includes operations 151, 152, 153, and 154. A patient's pathological features 13 and demographic data 14 may be input into the clinical trial matching system 15 and compared with various clinical trials. In operation 151, a determination is made as to whether one or more key criteria (or fields) in the pathological features 13 and demographic data 14 match the inclusion criteria of the clinical trial. In some embodiments, one or more key criteria in the pathological features 13 and demographic data 14 fully match the inclusion criteria of the clinical trial. These key criteria (or fields) may include at least one of the following: estimated glomerular filtration rate (EFGR), surgical procedure, histology, pathological stage, age, sex, or smoking status. In some embodiments, at operation 151, when the patient's condition set values or information fully match those of the condition sets described in the inclusion criteria, the process proceeds to operation 152. The condition sets may include EGFR conditions, surgical conditions, histological conditions, pathological stage conditions, age conditions, gender conditions, and smoking conditions.

當一或多個主要條件完全匹配時，程序將繼續進行至操作152。另一方面，當一或多個主要條件並不完全匹配時，程序將繼續進行至操作154。 When one or more primary conditions are fully matched, the process proceeds to operation 152. On the other hand, when one or more primary conditions are not fully matched, the process proceeds to operation 154.

在操作152中，判定病理特徵13及人口統計資料14中之一或多個次要條件(或欄位)是否匹配臨床試驗。在一些實施例中，病理特徵13及人口統計資料14中之一或多個次要條件部分匹配。不同於主要條件，次要條件可能不需要與臨床試驗完全匹配。在一些實施例中，次要條件匹配係由病理學報告中之次要條件與臨床試驗中之次要條件之間的相關性判定。在一些實施例中，當相關性值大於臨限值時，可判定一或多個次要條件與臨床試驗是匹配的。相關性之判定的細節將在圖2中加以論述。 In operation 152, a determination is made as to whether one or more secondary conditions (or fields) in the pathology feature 13 and demographic data 14 match the clinical trial. In some embodiments, one or more secondary conditions in the pathology feature 13 and demographic data 14 partially match. Unlike primary conditions, secondary conditions may not necessarily fully match the clinical trial. In some embodiments, secondary condition matching is determined by correlation between secondary conditions in the pathology report and secondary conditions in the clinical trial. In some embodiments, when the correlation value is greater than a threshold value, it is determined that one or more secondary conditions match the clinical trial. Details of correlation determination are discussed in FIG. 2 .

當一或多個次要條件匹配時，程序將繼續進行至操作153。另一方面，當一或多個次要條件並不匹配時，程序將繼續進行至操作154。 When one or more secondary conditions are matched, the process proceeds to operation 153. On the other hand, when one or more secondary conditions are not matched, the process proceeds to operation 154.

在操作153中，當患者之病理特徵13及人口統計資料14與臨床試驗匹配(亦即，通過操作151及152)時，將為患者推薦臨床試驗。接著，臨床試驗匹配系統15可執行另一程序以判定另一臨床試驗與同一患者之病理特徵13及人口統計資料14是否匹配。 In operation 153, when a patient's pathological characteristics 13 and demographic data 14 match a clinical trial (i.e., through operations 151 and 152), a clinical trial is recommended for the patient. The clinical trial matching system 15 may then execute another process to determine whether another clinical trial matches the same patient's pathological characteristics 13 and demographic data 14.

在操作154中，當患者之病理特徵13及人口統計資料14與臨床試驗不匹配(亦即，未通過操作151或152)時，將不為患者推薦臨床試驗。接著，臨床試驗匹配系統15可執行另一程序以判定另一臨床試驗與同一患者之病理特徵13及人口統計資料14是否匹配。 In operation 154, if the patient's pathological characteristics 13 and demographic data 14 do not match a clinical trial (i.e., they fail operations 151 or 152), no clinical trial is recommended for the patient. The clinical trial matching system 15 may then execute another process to determine whether another clinical trial matches the same patient's pathological characteristics 13 and demographic data 14.

利用臨床試驗匹配系統15，醫生可容易地為患者找到相關的正在進行的臨床試驗。臨床試驗匹配系統15可對臨床試驗進行篩選且因此可幫助外科醫生及醫師為患者推薦合適的臨床試驗，使得患者可具有更多治療選項。 Using the Clinical Trial Matching System 15, doctors can easily find relevant ongoing clinical trials for their patients. The Clinical Trial Matching System 15 can screen clinical trials and thus help surgeons and physicians recommend appropriate clinical trials for their patients, giving them more treatment options.

圖2繪示根據本公開之一些實施例的包括於圖1中之臨床試驗匹配系統15中的次要條件匹配之流程圖。參看圖2，次要條件匹配之操作152可包括兩個步驟1521及1522。 FIG2 illustrates a flowchart of secondary condition matching in the clinical trial matching system 15 of FIG1 according to some embodiments of the present disclosure. Referring to FIG2 , the secondary condition matching operation 152 may include two steps 1521 and 1522.

在一些實施例中，操作152中之次要條件匹配係由病理學報告與臨床試驗之間關於次要條件的相關性來判定。在步驟1521中，判定病理學報告及臨床試驗關於次要條件(欄位)的相關性值S_d。在一些實施例中，病理學報告可包括患者之病理特徵13及人口統計資料14。在一些實施例中，可基於BM25演算法而判定相關性值S_d，該演算法為用以估計文件與給定搜尋查詢之相關性的排序函數。 In some embodiments, the secondary condition match in operation 152 is determined by the relevance of the pathology report and the clinical trial with respect to the secondary condition. In step 1521, a relevance value S _d is determined between the pathology report and the clinical trial with respect to the secondary condition (field). In some embodiments, the pathology report may include the patient's pathological characteristics 13 and demographic data 14. In some embodiments, the relevance value S _d may be determined based on the BM25 algorithm, which is a ranking function used to estimate the relevance of a document to a given search query.

在一些實施例中，可藉由等式1計算病理學報告及臨床試驗d之相關性值S_d：其中Q表示所有查詢(例如，若干次要條件)； q表示個別查詢(例如，次要條件)；W _q表示經指派給個別查詢q之個別權重；表示各別關鍵字t之各別逆向文件頻率(IDF)；df _t表示包括各別關鍵字t的臨床試驗之數目；N表示臨床試驗資料庫中臨床試驗之數目；tf _td表示臨床試驗d中關鍵字t之出現次數；L _d表示臨床試驗d之長度；L _avg表示臨床試驗資料庫中所有臨床試驗之平均長度；k ₁為用於正規化文件中之關鍵字之頻率範圍的常數(例如，k ₁可在1.2至2.0之範圍內，較佳為1.2或1.5)k ₃為用於校正查詢中之關鍵字之頻率範圍的常數(例如，k ₃可在1.2至2.0之範圍內，較佳為1.2或1.5)；且b為常數(例如，b可為0.75或0.5)。 In some embodiments, the correlation value S _d between the pathology report and the clinical trial d can be calculated by Equation 1: Where Q represents all queries (e.g., several secondary conditions); q represents an individual query (e.g., a secondary condition) _; Wq represents an individual weight assigned to the individual query q ; represents the respective inverse document frequency (IDF) of each keyword t ; df _t represents the number of clinical trials including each keyword t ; N represents the number of clinical trials in the clinical trial database; tf _td represents the number of occurrences of keyword t in clinical trial d ; L _d represents the length of clinical trial d ; L _avg represents the average length of all clinical trials in the clinical trial database; k ₁ is a constant used to normalize the frequency range of keywords in the document (for example, k ₁ can be in the range of 1.2 to 2.0, preferably 1.2 or 1.5) ; k ₃ is a constant used to correct the frequency range of keywords in the query (for example, k ₃ may be in the range of 1.2 to 2.0, preferably 1.2 or 1.5); and b is a constant (for example, b may be 0.75 or 0.5).

在一些實施例中，可自一個查詢(例如，第二組欄位中之一個欄位或次要條件中之一個條件)獲得個別相關性值，且相關性值S_d為個別相關性值之總和。等式1可包括個別權重W _q、各別逆向文件頻率(IDF)、各別關鍵字t與個別查詢q之間的相似性，以及各別關鍵字t的權重。各別IDF可表達為。各別關鍵字t與個別查詢q之間的相似性可表達為。在一些實施例中，由於拉普拉斯平滑之使用，類似性等式包括(k₁+1)。各別關鍵字t的權重可表達為。在一些實施例中，由於拉普拉斯平滑之使用，各別關鍵字t的權重包括(k₃+1)。 In some embodiments, a separate relevance value may be obtained from a query (e.g., a field in the second set of fields or a condition in the secondary condition), and the relevance value _Sd is the sum of the separate relevance values. Equation 1 may include the separate weight Wq _, the separate inverse document frequency (IDF), the similarity between the separate keyword t and the separate query q , and the weight of the separate keyword t . The separate IDF may be expressed as The similarity between each keyword t and each query q can be expressed as In some embodiments, due to the use of Laplace smoothing, the similarity equation includes (k ₁ +1). The weight of each keyword t can be expressed as In some embodiments, due to the use of Laplace smoothing, the weight of each keyword t comprises (k ₃ +1).

在一些實施例中，個別相關性值可與個別權重W _q相關聯，該個別權重由臨床醫師指派給個別查詢。特定而言，個別相關性值可與個別權重W _q成比例。臨床醫師可判定各查詢(或條件)之重要性或相關性，且接著將恰當權重指派給此查詢(或條件)。 In some embodiments, each relevance value may be associated with a respective weight Wq _, which is assigned to each query by a clinician. Specifically, each relevance value may be proportional to the respective _weight Wq . The clinician may determine the importance or relevance of each query (or condition) and then assign an appropriate weight to the query (or condition).

在一些實施例中，IDF為數值統計，其意欲反映字詞/術語對集合或語料庫中之文件的重要程度。IDF為指示使用字詞/術語之常用程度的權重。字詞/術語在集合或語料庫中之文件中使用愈頻繁，其IDF評分愈低。IDF評分愈低，則字詞/術語變得愈不重要。舉例而言，術語「該(the)」出現於幾乎所有英文文字中且因此將具有極低IDF評分，此係因為該術語蘊含極少「主題」資訊。 In some embodiments, IDF is a numerical statistic that purports to reflect the importance of a word/term to documents in a collection or corpus. IDF is a weight that indicates how commonly a word/term is used. The more frequently a word/term is used in documents in a collection or corpus, the lower its IDF score. The lower the IDF score, the less important the word/term becomes. For example, the term "the" appears in almost all English text and therefore would have a very low IDF score because the term contains very little "topic" information.

在一些實施例中，個別相關性值可與各別關鍵字之各別IDF相關聯。舉例而言，個別相關性值可與各別IDF成比例。因此，各別關鍵字跨越臨床試驗資料庫中之臨床試驗出現愈少，其各別IDF將愈大，且因此個別相關性值將愈大。 In some embodiments, the individual relevance values may be associated with the individual IDF of the individual keywords. For example, the individual relevance values may be proportional to the individual IDF. Thus, the fewer times a individual keyword appears across clinical trials in the clinical trial database, the greater its individual IDF will be, and thus the greater its individual relevance value will be.

利用等式1，可判定病理學報告及臨床試驗關於次要條件(欄位)的相關性值S_d。 Using Equation 1, the correlation value S _d between the pathology report and the clinical trial for the secondary condition (field) can be determined.

在步驟1522中，當相關性值S_d超過臨限值K時，判定病理學報告(或相應病理特徵13及人口統計資料14)匹配臨床試驗。返回至圖1，當在操作152處病理特徵13及人口統計資料14匹配臨床試驗時，隨後在操作153處將該臨床試驗推薦給相應患者。相關性值S_d可與臨限值K進行比較，使得臨床試驗可經判定是否匹配病理學報告。當臨床試驗之相關性值S_d超過臨限值K時，為患者推薦臨床試驗(亦即，轉至圖1中之操作153)。另一方面，當病理學報告與臨床試驗之間的相關性值S_d小於臨限值K時，判定病理學報告(或相應病理特徵13及人口統計資料14)不匹配臨床試驗。返回至圖1，當在操作152處病理特徵13及人口統計資料14不匹配臨床試驗時，則在操作154處臨床試驗不被推薦給相應患者。 In step 1522, when the correlation value _Sd exceeds the threshold value K, the pathology report (or the corresponding pathology features 13 and demographic data 14) is determined to match the clinical trial. Returning to FIG1 , when the pathology features 13 and demographic data 14 match the clinical trial at operation 152, the clinical trial is then recommended to the corresponding patient at operation 153. The correlation value _Sd can be compared with the threshold value K to determine whether the clinical trial matches the pathology report. When the correlation value _Sd of the clinical trial exceeds the threshold value K, the clinical trial is recommended to the patient (i.e., proceeding to operation 153 in FIG1 ). On the other hand, when the correlation value S _d between the pathology report and the clinical trial is less than the threshold value K, the pathology report (or the corresponding pathology features 13 and demographic data 14) is determined to be unmatched with the clinical trial. Returning to FIG. 1 , when the pathology features 13 and demographic data 14 do not match the clinical trial at operation 152 , the clinical trial is not recommended to the corresponding patient at operation 154 .

圖3繪示根據本公開之一些實施例之匹配臨床試驗的方法30之流程圖。 FIG3 illustrates a flow chart of a method 30 for matching clinical trials according to some embodiments of the present disclosure.

在操作301中，可自病理學報告獲得第一資料集。在一些實施例中，第一資料集可包括如圖1中所論述之患者的病理特徵13及人口統計資料14。舉例而言，第一資料集(諸如，人口統計資料14)可自病理學報告獲得。在另一實施例中，第一資料集(諸如，病理特徵13)可經由經預先訓練模型12自病理學報告獲得。 In operation 301, a first dataset may be obtained from a pathology report. In some embodiments, the first dataset may include the patient's pathological features 13 and demographic data 14 as discussed in FIG. 1 . For example, the first dataset (e.g., demographic data 14) may be obtained from a pathology report. In another embodiment, the first dataset (e.g., pathological features 13) may be obtained from the pathology report via a pre-trained model 12.

在操作302中，可獲得臨床試驗之第二資料集。在一些實施例中，臨床試驗之第二資料集可自臨床資料庫獲得。 In operation 302, a second dataset of a clinical trial may be obtained. In some embodiments, the second dataset of the clinical trial may be obtained from a clinical database.

在操作303中，可判定第一資料集與第二資料集關於第一組欄位是否匹配。在一些實施例中，操作303可對應於圖1中之操作151。第一組欄位可包括以下各者中之一或多者：所估計腎小球濾過率(EFGR)、外科手術、組織學、病理分期、年齡、性別或吸菸。 In operation 303, a determination may be made as to whether the first dataset and the second dataset match with respect to a first set of fields. In some embodiments, operation 303 may correspond to operation 151 in FIG. 1 . The first set of fields may include one or more of the following: estimated glomerular filtration rate (EFGR), surgical procedure, histology, pathological stage, age, sex, or smoking.

在操作304中，當第一資料集與第二資料集關於第一組欄位匹配時，可判定第一資料集與第二資料集之間關於第二組欄位的相關性值。在一些實施例中，操作304可對應於圖1中之操作152。第二組欄位可包括以下各者中之一或多者：ALK、ROS1、KRAS、BRAF、RET、NTRK、MET、P53、Her2、腫瘤大小、腫瘤最大直徑、程序性死亡配體1(PD-L1)、淋巴結轉移、遠處轉移、CNS轉移、骨轉移、野生型、抗血管生成、鉑、EGFR TKI、ALK抑制劑、PD-1/PD-L1抑制劑、CTLA-4抑制劑、放射療法、順鉑/卡鉑、化學療法、全身性療法、疾病狀態或美國東岸癌症臨床研究合作組織日常體能狀態(Eastern Cooperative Oncology Group Performance Status；ECOG PS。 In operation 304, when the first data set and the second data set match with respect to the first set of fields, a correlation value between the first data set and the second data set with respect to the second set of fields may be determined. In some embodiments, operation 304 may correspond to operation 152 in FIG. 1 . The second set of columns may include one or more of the following: ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, Her2, tumor size, tumor maximum diameter, programmed death ligand 1 (PD-L1), lymph node metastasis, distant metastasis, CNS metastasis, bone metastasis, wild-type, anti-angiogenesis, platinum, EGFR TKI, ALK inhibitor, PD-1/PD-L1 inhibitor, CTLA-4 inhibitor, radiation therapy, cisplatin/carboplatin, chemotherapy, systemic therapy, disease status, or Eastern Cooperative Oncology Group Performance Status (ECOG PS).

在操作305中，當相關性值超過臨限值時，可判定臨床試驗為推薦的。當臨床試驗之相關性值超過臨限值時，其指示臨床試驗與患者相關，且因此可為患者推薦臨床試驗。 In operation 305, when the correlation value exceeds a critical value, the clinical test may be determined to be recommended. When the correlation value of the clinical test exceeds the critical value, it indicates that the clinical test is relevant to the patient and, therefore, the clinical test may be recommended for the patient.

圖4A繪示根據本公開之一些實施例的預先訓練用於自病理學報告提取特徵之模型的方法40之流程圖。在一些實施例中，可藉由輸入未標記之字詞內容，諸如若干患者之病理學報告來預先訓練模型。 FIG4A illustrates a flow chart of a method 40 for pre-training a model for extracting features from pathology reports according to some embodiments of the present disclosure. In some embodiments, the model can be pre-trained by inputting unlabeled word content, such as pathology reports of several patients.

在操作401中，可根據預定長度將病理學報告的內容劃分成複數個序列。在一些實施例中，序列中之各者可包括複數個句子。 In operation 401, the content of the pathology report may be divided into a plurality of sequences according to a predetermined length. In some embodiments, each of the sequences may include a plurality of sentences.

在操作402中，可在複數個序列中之各者的開頭添加分類符記。在一些實施例中，病理學報告之序列可為可由臨床醫師識別的一或多個段落。分類符記可表示整個序列之向量。 In operation 402, a classification token may be added to the beginning of each of a plurality of sequences. In some embodiments, the sequence of a pathology report may be one or more paragraphs that can be identified by a clinician. The classification token may represent a vector of the entire sequence.

在操作403中，可在兩個連續句子之間添加句子分隔符記。在一些實施例中，句子分隔符記可用以識別不同句子。在一些實施例中，兩個連續句子中之各者可包括由臨床醫師識別之一或多個句子。 In operation 403, a sentence separator may be added between two consecutive sentences. In some embodiments, the sentence separator may be used to identify different sentences. In some embodiments, each of the two consecutive sentences may include one or more sentences identified by a clinician.

在操作404中，可對病理學報告之內容執行預處理，以獲得符記嵌入、句子嵌入及位置嵌入。在一些實施例中，符記嵌入可為內容之值表示。句子嵌入可為句子之值表示。位置嵌入可為內容的位置表示。 In operation 404, pre-processing may be performed on the content of the pathology report to obtain token embeddings, sentence embeddings, and position embeddings. In some embodiments, the token embeddings may be a value representation of the content. The sentence embeddings may be a value representation of the sentence. The position embeddings may be a position representation of the content.

在操作405中，可將符記嵌入、句子嵌入及位置嵌入彙總為經預處理內容(pre-processed content)。在一些實施例中，可對符記嵌入、句子嵌入及位置嵌入進行彙總，將所彙總之內容複製成三個複本且接著可對三個複本執行多頭部自關注演算法(multi-head self-attention algorithm)，且可獲得含有各字詞之表示向量的經預處理內容。 In operation 405, the token embeddings, sentence embeddings, and position embeddings may be aggregated into pre-processed content. In some embodiments, the token embeddings, sentence embeddings, and position embeddings may be aggregated, the aggregated content may be copied into three copies, and a multi-head self-attention algorithm may then be applied to the three copies to obtain pre-processed content containing a representation vector for each word.

在操作406中，可藉由對經預處理內容執行遮罩語言模型及/或下一句預測來訓練模型，以獲得經預先訓練模型。 In operation 406, the model may be trained by performing masked language modeling and/or next sentence prediction on the pre-processed content to obtain a pre-trained model.

在一些實施例中，遮罩語言模型可在多層情境中容易地預測目標術語。遮罩語言模型為輸入術語之一些部分(諸如，特定術語之一些措辭)可被簡單地隨機遮罩，且接著可預測彼等經遮罩術語。在一些實施例中，輸入術語可變換為符記以供分析。 In some embodiments, a masked language model can easily predict target terms in a multi-layered context. The masked language model allows portions of the input term (e.g., some phrasing of a particular term) to be simply randomly masked, and then the masked term can be predicted. In some embodiments, the input term can be converted into tokens for analysis.

在一些實施例中，為訓練理解句子關係之模型，下一句預測可用以訓練模型。在一些實施例中，下一句預測為可自語料庫/資料庫容易地產生的任務。具體而言，當針對各相關實例選擇句子A及B時，50%的時間B為A之後的實際下一句，且50%的時間其為來自語料庫之隨機句子。在用大量輸入資料訓練之後，經預先訓練模型之下一句預測的準確度可增加。 In some embodiments, to train a model that understands sentence relationships, next sentence prediction can be used to train the model. In some embodiments, next sentence prediction is a task that can be easily generated from a corpus/database. Specifically, when sentences A and B are selected for each relevant example, 50% of the time, B is the actual next sentence after A, and 50% of the time, it is a random sentence from the corpus. After training with a large amount of input data, the accuracy of the pre-trained model's next sentence prediction can be increased.

在一些實施例中，經預先訓練模型可以是一種來自變換器的雙向編碼器表示(Bidirectional Encoder Representations from Transformers；BERT)模型。經預先訓練模型可藉由輸入一或多個醫院的臨床病理學報告來預先訓練。 In some embodiments, the pre-trained model can be a Bidirectional Encoder Representations from Transformers (BERT) model. The pre-trained model can be pre-trained by inputting clinical pathology reports from one or more hospitals.

圖4B繪示根據本公開之一些實施例之自病理學報告提取特徵的方法41之流程圖。在一些實施例中，方法41可藉由根據圖4A中之方法40訓練的經預先訓練模型來執行。 FIG4B illustrates a flow chart of a method 41 for extracting features from a pathology report according to some embodiments of the present disclosure. In some embodiments, method 41 may be performed using a pre-trained model trained according to method 40 in FIG4A .

在操作411中，可藉由經預先訓練模型對病理學報告執行分類任務，使得獲得至少一個狀態值。在一些實施例中，分類任務係判定病理學報告是否包括特定欄位。因此，分類任務之答案/結果將為是或否(1或0)。亦即，分類任務之結果為狀態值。 In operation 411, a classification task may be performed on the pathology report using a pre-trained model, resulting in at least one status value. In some embodiments, the classification task involves determining whether the pathology report includes a specific field. Therefore, the answer/result of the classification task will be yes or no (1 or 0). In other words, the result of the classification task is a status value.

在一些實施例中，病理學報告中之至少一個狀態值可包括欄位(或條件)之狀態值，包括：EGFR、ALK、ROS1、KRAS、BRAF、RET、NTRK、MET、P53或Her2。在一些實施例中，至少狀態值可包括於圖3中所論述之第一資料集中。 In some embodiments, at least one status value in the pathology report may include a status value for a field (or condition) including: EGFR, ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, or Her2. In some embodiments, at least the status value may be included in the first dataset discussed in FIG. 3 .

舉例而言，EGFR欄位之狀態值可包括2⁴+1個可能值，例如，外顯子18、19、20及21之突變狀態及未知狀態。對於ALK、ROS1、KRAS、BRAF、RET、NTRK、MET、P53及Her2之欄位，狀態值可為正、負或未知。 For example, the status value for the EGFR field can include 2 ^<4> + 1 possible values, such as mutation status and unknown status for exons 18, 19, 20, and 21. For the ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, and Her2 fields, the status value can be positive, negative, or unknown.

在操作412中，可藉由經預先訓練模型對病理學報告執行序列標記任務，使得獲得至少一個描述。在一些實施例中，序列標記任務可判定不同類別的特定術語。因此，序列標記任務之答案/結果將為一描述。 In operation 412, a sequence labeling task may be performed on the pathology report using a pre-trained model to obtain at least one description. In some embodiments, the sequence labeling task may determine specific terms of different categories. Therefore, the answer/result of the sequence labeling task will be a description.

在一些實施例中，病理學報告中之至少一個描述包括針對欄位(或條件)之描述，包括：手術(或外科手術)、組織學、腫瘤大小、分期(或病理分期)或PDL1。在一些實施例中，至少描述可包括於圖3中所論述之第一資料集中。 In some embodiments, at least one description in the pathology report includes a description of a field (or condition) including: procedure (or surgical procedure), histology, tumor size, stage (or pathological stage), or PDL1. In some embodiments, at least the description may be included in the first dataset discussed in FIG. 3 .

舉例而言，操作欄位之描述可為「視頻輔助胸腔鏡手術(VATS)葉切除術」。組織學欄位之描述可為「低分化之非小細胞癌瘤」。腫瘤大小欄位之描述可為「0.6×0.4×0.3cm」，且最大腫瘤直徑之描述可為「0.6cm」。分期欄位之描述可為「pStageIVA」。 For example, the description for the procedure field might be "Video-assisted thoracic surgery (VATS) lobectomy." The description for the histology field might be "Poorly differentiated non-small cell carcinoma." The description for the tumor size field might be "0.6 × 0.4 × 0.3 cm," and the description for the maximum tumor diameter might be "0.6 cm." The description for the stage field might be "pStageIVA."

圖5繪示根據本公開之一些實施例之收集臨床試驗的方法50之流程圖。執行方法50之裝置可用於更新臨床試驗資料庫，其可與圖1中之臨床試驗匹配系統耦合。 FIG5 illustrates a flow chart of a method 50 for collecting clinical trials according to some embodiments of the present disclosure. A device performing method 50 can be used to update a clinical trial database, which can be coupled with the clinical trial matching system in FIG1 .

在操作501中，可在一或多個臨床試驗線上資料庫上查詢一或多個關鍵字以獲得一或多個查詢結果。一或多個臨床試驗線上資料庫可為政府公共臨床試驗資料庫(諸如，clinicaltrials.gov及www1.cde.org.tw/ct_taiwan)。在一些實施例中，關鍵字可為疾病/診斷及/或分期。在一些實施例中，疾病/診斷可包括NSCLC(非小細胞肺癌)、非小細胞、肺腺癌瘤、非鱗狀、鱗狀細胞癌瘤、非鱗狀非小細胞肺癌、鱗狀細胞肺癌、大細胞肺癌...等。在一些實施例中，分期可包括發展、分期IIIB、分期IIIC、分期IV、轉移性...等。舉例而言，關鍵字可為NSCLC晚期、NSCLC分期IIIB...等。 In operation 501, one or more keywords may be searched on one or more online clinical trial databases to obtain one or more query results. The one or more online clinical trial databases may be government public clinical trial databases (e.g., clinicaltrials.gov and www1.cde.org.tw/ct_taiwan). In some embodiments, the keywords may be diseases/diagnoses and/or stages. In some embodiments, the diseases/diagnoses may include NSCLC (non-small cell lung cancer), non-small cell lung cancer, lung adenocarcinoma, non-squamous, squamous cell carcinoma, non-squamous non-small cell lung cancer, squamous cell lung cancer, large cell lung cancer, etc. In some embodiments, the stage may include progressive, stage IIIB, stage IIIC, stage IV, metastatic, etc. For example, the keywords may be NSCLC advanced stage, NSCLC stage IIIB, etc.

在操作502中，可記錄查詢結果中之各者的一或多個參數。 In operation 502, one or more parameters of each of the query results may be recorded.

在操作503中，可基於一或多個參數建構查詢結果中之各者的網站鏈接。 In operation 503, a website link for each of the query results may be constructed based on one or more parameters.

在操作504中，可收集查詢結果之一或多個欄位的資料。在一些實施例中，欄位可為查詢結果中之感興趣的欄。舉例而言，欄位可包括關鍵字、臨床試驗/程式ID、臨床試驗/項目標題、申請人、發起人、臨床試驗之預計開始日期、臨床試驗之實際開始日期、臨床試驗之預計結束日期、臨床試驗之實際結束日期、臨床試驗之納入準則及臨床試驗之排除準則、試驗醫院、試驗位置(諸如，州或國家)、臺灣之預計試驗數目、世界之預計試驗數目、臨床試驗之最後更新日期及臨床試驗之網站鏈接(亦即，URL)。 In operation 504, data for one or more fields of the query results may be collected. In some embodiments, the fields may be columns of interest in the query results. For example, fields may include keywords, clinical trial/program ID, clinical trial/project title, applicant, sponsor, expected clinical trial start date, actual clinical trial start date, expected clinical trial end date, actual clinical trial end date, clinical trial inclusion criteria, and clinical trial exclusion criteria, trial hospital, trial location (e.g., state or country), expected number of trials in Taiwan, expected number of trials worldwide, last updated date of the clinical trial, and website link (i.e., URL) of the clinical trial.

在操作505中，查詢結果之一或多個欄位的資料可儲存至臨床試驗資料庫中。在一些實施例中，臨床試驗資料庫可與臨床試驗匹配系統耦合，使得該系統可將病理特徵及人口統計資料與臨床試驗進行比較。因此，可為患者找到所推薦的臨床試驗。 At operation 505, data from one or more fields of the query result may be stored in a clinical trial database. In some embodiments, the clinical trial database may be coupled with a clinical trial matching system, enabling the system to compare pathological characteristics and demographic data with clinical trials. Thus, recommended clinical trials may be found for the patient.

圖6繪示根據本公開之一些實施例之臨床試驗匹配系統的表示之示意圖。參看圖6，臨床試驗匹配系統可包括人口統計區塊、基因/轉移區塊、治療/藥物區塊及病理學資訊區塊。 FIG6 is a schematic diagram illustrating a representation of a clinical trial matching system according to some embodiments of the present disclosure. Referring to FIG6 , the clinical trial matching system may include a demographic block, a gene/transfer block, a treatment/drug block, and a pathology information block.

人口統計區塊中之資料，諸如年齡、性別、吸菸及ECOG PS可自圖1中之人口統計資料14獲得。舉例而言，年齡可為50。性別可為男性。患者可具有吸菸習慣。ECOG PS之得分可為3，其可在0至5之範圍內。 Data in the demographic block, such as age, sex, smoking, and ECOG PS, can be obtained from demographic data 14 in Figure 1. For example, the age may be 50. The sex may be male. The patient may have a smoking habit. The ECOG PS score may be 3, which can range from 0 to 5.

在一些實施例中，基因/轉移區塊中之資料601可經由經預先訓練模型自病理學報告獲得。亦即，資料601可自圖1中之病理特徵13獲得。另一方面，在基因/轉移區塊中除資料601以外的資料可自圖1中之人口統計資料14而獲得。在一些實施例中，基因/轉移區塊中的資料之各者可為狀態值。對於EGFR欄位，其展示「未知」或具有突變(mutation)之外顯子(exon)的數目；狀態值「18、19」指示外顯子18及19具有突變。對於資料601中之其他欄位，狀態值可為P(陽性)、N(陰性)或U(未知)。舉例而言，淋巴結轉移可為是(亦即，已發生淋巴結轉移)。 In some embodiments, data 601 in a gene/transfer block can be obtained from a pathology report via a pre-trained model. That is, data 601 can be obtained from pathological features 13 in FIG. 1 . On the other hand, data other than data 601 in a gene/transfer block can be obtained from demographic data 14 in FIG. 1 . In some embodiments, each of the data in a gene/transfer block can be a status value. For the EGFR field, it displays the number of exons (exons) with "unknown" or mutations; the status value "18, 19" indicates that exons 18 and 19 have mutations. For other fields in data 601, the status value can be P (positive), N (negative), or U (unknown). For example, lymph node metastasis could be Yes (i.e., lymph node metastasis has occurred).

在基因/轉移區塊中，狀態值「野生型」、「淋巴結轉移」、「遠處轉移」、「CNS轉移」及「骨轉移」可為是或否。 In the Gene/Metastasis section, the status values for "Wild Type," "Lymph Node Metastasis," "Distant Metastasis," "CNS Metastasis," and "Bone Metastasis" can be either Yes or No.

治療/藥物區塊中之資料可自圖1中之人口統計資料14獲得。在一些實施例中，治療/藥物區塊中的資料之各者可為狀態值，其可為是或否。舉例而言，放射療法可為是(亦即，已進行放射療法)。 The data in the treatment/drug block can be obtained from demographic data 14 in Figure 1. In some embodiments, each of the data in the treatment/drug block can be a status value, which can be either yes or no. For example, radiation therapy can be yes (i.e., radiation therapy has been administered).

在一些實施例中，病理學資訊區塊中之資料602可經由經預先訓練模型自病理學報告獲得。亦即，資料602可自圖1中之病理特徵13獲得。另一方面，除病理學資訊區塊中之資料602以外的資料可自圖1中之人口統計資料14獲得。 In some embodiments, data 602 in the pathology information block can be obtained from pathology reports via a pre-trained model. Specifically, data 602 can be obtained from pathology features 13 in FIG. 1 . Alternatively, data other than data 602 in the pathology information block can be obtained from demographic data 14 in FIG. 1 .

參考圖7，其展示能夠執行本公開方法之一或多個操作的電腦系統700之實例。在本公開之至少一些實施例中，電腦系統700包括計算裝置710及資料庫720。計算裝置710可為伺服器電腦、客戶端電腦、個人電腦(PC)、平板PC、機頂盒(STB)、個人數位助理(PDA)、蜂巢式電話或智慧型手機。計算裝置710包含處理器711、輸入/輸出介面712、通信介面713及記憶體714。資料庫720可儲存病理學報告，病理特徵13及人口統計資料14將自該病理學報告中提取。資料庫720可儲存待分析或概述之病理學報告。輸入/輸出介面712與處理器711耦接。輸入/輸出介面712允許使用者操縱計算裝置710以便執行本公開之操作或方法(例如，圖3中所揭示之方法)。通信介面713與處理器711耦接。通信介面713允許計算裝置710與資料庫720通信。通信介面713可支援以下協定中之一或多者：通用串列匯流排(USB)、乙太網、藍芽、IEEE 802.11、3GPP長期演進(LTE)(4G)及3GPP新無線電(5G)。記憶體714可為非暫態性電腦可讀儲存媒體。記憶體714與處理器711耦接。記憶體714已儲存可由一或多個處理器(例如，處理器711)執行的程式指令。在執行儲存於記憶體714上之程式指令後，程式指令即引起執行本公開中所揭示之方法的一或多個操作。 Referring to FIG. 7 , an example of a computer system 700 capable of performing one or more operations of the disclosed methods is shown. In at least some embodiments of the present disclosure, the computer system 700 includes a computing device 710 and a database 720. The computing device 710 can be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular phone, or a smartphone. The computing device 710 includes a processor 711, an input/output interface 712, a communication interface 713, and a memory 714. The database 720 can store pathology reports from which pathology features 13 and demographic data 14 are to be extracted. The database 720 can store pathology reports to be analyzed or summarized. The input/output interface 712 is coupled to the processor 711. The input/output interface 712 allows a user to operate the computing device 710 to perform the operations or methods disclosed herein (e.g., the method disclosed in FIG. 3 ). The communication interface 713 is coupled to the processor 711. The communication interface 713 allows the computing device 710 to communicate with the database 720. The communication interface 713 may support one or more of the following protocols: Universal Serial Bus (USB), Ethernet, Bluetooth, IEEE 802.11, 3GPP Long Term Evolution (LTE) (4G), and 3GPP New Radio (5G). The memory 714 may be a non-transitory computer-readable storage medium. The memory 714 is coupled to the processor 711. The memory 714 stores program instructions that can be executed by one or more processors (e.g., processor 711). When the program instructions stored in the memory 714 are executed, the program instructions cause one or more operations of the method disclosed in this disclosure to be performed.

舉例而言，程式指令可引起計算裝置710執行動作之集合，其至少包括：自病理學報告獲得第一資料集；獲得臨床試驗之第二資料集；判定第一資料集與第二資料集是否關於第一組欄位匹配；當第一資料集與第二資料集關於第一組欄位匹配時，判定第一資料集與第二資料集之間關於第二組欄位的相關性值；以及當相關性值超過臨限值時，判定臨床試驗為推薦的。 For example, the program instructions may cause the computing device 710 to perform a set of actions including at least: obtaining a first data set from a pathology report; obtaining a second data set from a clinical trial; determining whether the first data set and the second data set match with respect to a first set of fields; when the first data set and the second data set match with respect to the first set of fields, determining a correlation value between the first data set and the second data set with respect to the second set of fields; and when the correlation value exceeds a threshold value, determining that the clinical trial is recommended.

本公開之範疇並不意欲限於說明書中描述的程序、機器、製品及物質組成、手段、方法、步驟及操作的特定實施例。如熟習此項技術者將易於自本公開之揭示內容而瞭解，可根據本公開利用執行與本文中所描述之對應實施例實質上相同的功能或實現與該等對應實施例實質上相同的結果的當前現有或稍後待開發的程序、機器、製品、物質組成、手段、方法、步驟或操作。因此，所附申請專利範圍意欲在其範疇內包括程序、機器、製品及物質組成、手段、方法、步驟或操作。此外，各申請專利範圍構成一單獨實施例，且各種申請專利範圍與實施例的組合在本公開之範疇內。 The scope of this disclosure is not intended to be limited to the specific embodiments of the processes, machines, articles, compositions of matter, means, methods, steps, and operations described in this specification. Those skilled in the art will readily understand from the disclosure of this disclosure that currently existing or later developed processes, machines, articles, compositions of matter, means, methods, steps, or operations that perform substantially the same functions or achieve substantially the same results as the corresponding embodiments described herein can be utilized in accordance with this disclosure. Therefore, the appended claims are intended to include within their scope such processes, machines, articles, compositions of matter, means, methods, steps, or operations. Furthermore, each claim constitutes a separate embodiment, and the combination of each claim and embodiment is within the scope of this disclosure.

根據本公開之實施例的方法、程序或操作亦可實施於程式化處理器上。然而，控制器、流程圖及模組亦可實施於通用或專用電腦、程式化微處理器或微控制器及周邊積體電路元件、積體電路、諸如離散元件電路之硬體電子或邏輯電路、可程式化邏輯裝置或其類似者上。一般而言，上面駐留有能夠實施圖式中所展示之流程圖之有限狀態機的任何裝置可用於以實施本公開之處理器功能。 The methods, procedures, or operations according to the embodiments of the present disclosure may also be implemented on a programmed processor. However, the controller, flowchart, and module may also be implemented on a general-purpose or special-purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit components, an integrated circuit, hardware electronics such as discrete component circuits or logic circuits, a programmable logic device, or the like. In general, any device having a finite state machine capable of implementing the flowcharts shown in the figures may be used to implement the processor functions of the present disclosure.

替代實施例較佳地以儲存電腦可程式化指令之非暫態性電腦可讀儲存媒體形式實施根據本公開之實施例的方法、程序或操作。該等指令較佳地由較佳地與網路安全系統整合之電腦可執行組件執行。非暫態性電腦可讀儲存媒體可儲存於任何合適的電腦可讀媒體上，諸如RAM、 ROM、快閃記憶體、EEPROM、光學儲存裝置(CD或DVD)、硬碟機、軟碟機或任何合適的裝置。電腦可執行組件較佳地為處理器，但指令可替代地或另外由任何合適的專用硬體裝置執行。舉例而言，本公開之一實施例提供其中儲存有電腦可程式化指令之非暫態性電腦可讀儲存媒體。 Alternative embodiments preferably implement the methods, procedures, or operations according to embodiments of the present disclosure in the form of a non-transitory computer-readable storage medium storing computer-programmable instructions. These instructions are preferably executed by a computer-executable component that is preferably integrated with the network security system. The non-transitory computer-readable storage medium can be stored on any suitable computer-readable medium, such as RAM, ROM, flash memory, EEPROM, optical storage (CD or DVD), hard drive, floppy drive, or any other suitable device. The computer-executable component is preferably a processor, but the instructions may alternatively or additionally be executed by any suitable dedicated hardware device. For example, one embodiment of the present disclosure provides a non-transitory computer-readable storage medium having computer-programmable instructions stored therein.

雖然已用本公開之特定實施例描述本公開，但顯而易見，許多替代、修改及變化對於熟習此項技術者可為顯而易見的。舉例而言，在其他實施例中，實施例之各種組件可互換、添加或取代。另外，各圖之所有元件對於所揭示之實施例的操作並非必需的。舉例而言，將使得所揭示實施例之一般熟習此項技術者能夠藉由僅採用獨立請求項之元件進行並使用本公開之教示。因此，如本文中所闡述之本公開之實施例意欲為說明性的，而非限制性的。可在不脫離本公開之精神及範疇的情況下進行各種變化。 Although the present disclosure has been described using specific embodiments thereof, it is apparent that many alternatives, modifications, and variations may be apparent to those skilled in the art. For example, in other embodiments, various components of the embodiments may be interchanged, added, or substituted. In addition, not all elements shown in the figures are required for the operation of the disclosed embodiments. For example, it will enable those skilled in the art to carry out and utilize the teachings of the present disclosure by employing only the elements of the individual claims. Therefore, the embodiments of the present disclosure as described herein are intended to be illustrative and not restrictive. Various modifications may be made without departing from the spirit and scope of the present disclosure.

即使已在前述描述中闡述本公開之眾多特性及優點，連同本公開之結構及功能的細節，但本公開僅係說明性的。可在由表示所附申請專利範圍之術語的廣泛一般含義指示的本發明之原理內充分地改變細節，尤其在零件之形狀、大小及配置方面。 Although the numerous features and advantages of the present disclosure have been described in the foregoing description, along with details of its structure and function, this disclosure is illustrative only. The details, particularly in the shape, size, and arrangement of parts, may be varied substantially within the principles of the invention as indicated by the broad general meaning of the terms used in the appended claims.

11:病理學報告 12:經預先訓練模型 13:病理特徵 14:人口統計資料 15:臨床試驗匹配系統 151:操作 152:操作 153:操作 154:操作 11: Pathology Report 12: Pretrained Model 13: Pathology Features 14: Demographic Data 15: Clinical Trial Matching System 151: Operation 152: Operation 153: Operation 154: Operation

Claims

A method for matching clinical trials comprises: obtaining a first dataset comprising pathological feature data and demographic data from a pathology report, wherein at least one description in the first dataset is obtained by performing a sequence tagging task on the pathology report using a pre-trained model, and the at least one description in the first dataset comprises a description of at least one of the following fields: surgery, histology, tumor size, stage, or PDL1; obtaining a clinical trial A method for determining whether the first data set and the second data set match with respect to a first condition set is provided. The method comprises: determining a correlation value between the first data set and the second data set with respect to a second condition set when the first data set and the second data set match with respect to the first condition set; and determining that the clinical trial is recommended when the correlation value exceeds a threshold value.

The method of claim 1, wherein the relevance value is a sum of an individual relevance value for each of the second condition sets.

The method of claim 2, wherein the individual relevance value is associated with an individual assigned weight (Wq).

The method of claim 2, wherein the respective relevance value is associated with a respective inverse document frequency (IDF) of a respective keyword.

The method of claim 1, wherein the first set of conditions includes one or more of: estimated glomerular filtration rate (EFGR), surgery, histology, pathological stage, age, sex, or smoking.

The method of claim 1, wherein the second condition set comprises one or more of the following: ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, Her2, tumor size, tumor maximum diameter, programmed death-ligand 1 (PD-L1), lymph node metastasis, distant metastasis, CNS metastasis, bone metastases, wild type, anti-angiogenesis, platinum, EGFR TKIs, ALK inhibitors, PD-1/PD-L1 inhibitors, CTLA-4 inhibitors, radiation therapy, cisplatin/carboplatin, chemotherapy, systemic therapy, disease status, or Eastern Cooperative Oncology Group Performance Status (ECOG PS).

The method of claim 1, wherein obtaining the first data set comprises: performing a classification task on the pathology report using the pre-trained model to obtain at least one status value in the first data set.

The method of claim 7, wherein the classification task is performed to obtain a status value of one of the following fields: EGFR, ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, or Her2.

The method of claim 1, wherein the pre-trained model is trained by a masked language model and/or next sentence prediction.

A device for matching clinical trials, comprising: a processor; and a memory coupled to the processor, wherein the processor executes computer-readable instructions stored in the memory to perform operations, and the operations include: obtaining a first data set including pathological feature data and demographic data from a pathology report, wherein at least one description in the first data set is sequence-labeled on the pathology report by a pre-trained model The invention relates to a method for obtaining a clinical trial based on a first dataset, wherein the at least one description in the first dataset includes a description of at least one of the following fields: surgery, histology, tumor size, stage, or PDL1; obtaining a second dataset including pathological feature data and demographic data of a clinical trial; determining a correlation value between the first dataset and the second dataset with respect to a third condition set; and determining that the clinical trial is recommended when the correlation value exceeds a threshold value.

The apparatus of claim 10, further comprising: determining whether the first data set and the second data set match a fourth condition set, wherein the correlation value is determined when the first data set and the second data set match the fourth condition set.

The apparatus of claim 10, wherein the relevance value is a sum of an individual relevance value for each of the third condition sets.

The apparatus of claim 12, wherein the respective relevance value is associated with a respective inverse document frequency (IDF) for a respective keyword.

The device of claim 11, wherein the fourth set of conditions includes one or more of: estimated glomerular filtration rate (EFGR), surgery, histology, pathological stage, age, sex, or smoking.

The device of claim 10, wherein the third condition group includes one or more of the following: ALK, ROS1, KRAS, BRAF, RET, NTRK, MET, P53, Her2, tumor size, tumor maximum diameter, programmed death ligand 1 (PD-L1), lymph node metastasis, distant metastasis, CNS metastasis, bone metastasis, wild type, anti-angiogenesis, platinum, EGFR TKI, ALK inhibitor, PD-1/PD-L1 inhibitor, CTLA-4 inhibitor, radiation therapy, cis-platinum/carboplatin, chemotherapy, systemic therapy, disease status, or Eastern Cooperative Oncology Group performance status (ECOG PS).

The device of claim 10, wherein obtaining the first data set comprises performing a classification task on the pathology report using the pre-trained model to obtain at least one status value in the first data set.

A non-transitory computer storage medium having program instructions stored thereon, wherein the program instructions, when executed by a processor, cause the processor to perform operations, the operations comprising: obtaining a first data set comprising pathological feature data and demographic data from a pathology report, wherein at least one description in the first data set is obtained by performing a sequence labeling task on the pathology report using a pre-trained model, and the at least one description in the first data set comprises a description of at least one of the following fields: The method comprises the steps of: determining surgery, histology, tumor size, stage, or PDL1; obtaining a second dataset of a clinical trial including pathological feature data and demographic data; determining whether the first dataset and the second dataset match with respect to a fifth condition set; determining a correlation value between the first dataset and the second dataset with respect to a sixth condition set when the first dataset and the second dataset match with respect to the fifth condition set; and determining that the clinical trial is recommended when the correlation value exceeds a threshold value.