TWI873056B - Method and computer system for analyzing gastric endoscopic image using image segmentation technology - Google Patents
Method and computer system for analyzing gastric endoscopic image using image segmentation technology Download PDFInfo
- Publication number
- TWI873056B TWI873056B TW113124479A TW113124479A TWI873056B TW I873056 B TWI873056 B TW I873056B TW 113124479 A TW113124479 A TW 113124479A TW 113124479 A TW113124479 A TW 113124479A TW I873056 B TWI873056 B TW I873056B
- Authority
- TW
- Taiwan
- Prior art keywords
- feature
- mask
- focus modulation
- template
- decoder
- Prior art date
Links
- 230000002496 gastric effect Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 13
- 238000003709 image segmentation Methods 0.000 title claims abstract description 7
- 206010054949 Metaplasia Diseases 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000001839 endoscopy Methods 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 10
- 238000010606 normalization Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 208000007882 Gastritis Diseases 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 208000005718 Stomach Neoplasms Diseases 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000001574 biopsy Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 206010017758 gastric cancer Diseases 0.000 description 2
- 238000002575 gastroscopy Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 201000011549 stomach cancer Diseases 0.000 description 2
- 208000004300 Atrophic Gastritis Diseases 0.000 description 1
- 101100008048 Caenorhabditis elegans cut-4 gene Proteins 0.000 description 1
- 206010019375 Helicobacter infections Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 208000016644 chronic atrophic gastritis Diseases 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000003703 image analysis method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
本揭露是有關於使用深度學習技術來分析胃內視鏡影像的方法,特別是可以分割出影像中的胃黏膜腸化生區域。The present disclosure relates to a method for analyzing gastric endoscopic images using deep learning technology, and in particular, for segmenting gastric mucosal intestinal metaplasia areas in the images.
胃癌是世界癌症發生率第六位,癌症死亡原因第四位。胃癌是幽門桿菌感染後經由胃體發炎,胃萎縮,胃黏膜腸化生而發生。由於早期診斷才能改善疾病存活率,對已經罹患癌前病變,如胃黏膜腸化生的病人定期追蹤,便相當重要。如何在胃鏡檢查時,正確地診斷與分級胃體發炎與胃黏膜腸化生,是臨床重要議題。目前診斷胃體發炎與胃黏膜腸化生,需要在胃鏡檢查時,在胃部共五到六處進行活體組織切片,再經由病理科醫師判讀。優點是能經由病理判讀而正確診斷,但缺點是耗時,有出血等風險,不適合大規模篩檢。因此開發不需要侵入式切片且能快速地分析胃黏膜腸化生的技術,是精準健康中重要且仍待全新技術研發之課題。Gastric cancer ranks sixth in the incidence of cancer and fourth in the cause of cancer death in the world. Gastric cancer is caused by Helicobacter pylori infection, followed by gastric inflammation, gastric atrophy, and gastric mucosal intestinal metaplasia. Since early diagnosis can improve the survival rate of the disease, it is very important to regularly follow up patients who have precancerous lesions such as gastric mucosal intestinal metaplasia. How to correctly diagnose and grade gastric inflammation and gastric mucosal intestinal metaplasia during gastroscopy is an important clinical issue. Currently, the diagnosis of gastric inflammation and gastric mucosal intestinal metaplasia requires five to six biopsy sections in the stomach during gastroscopy, which are then interpreted by pathologists. The advantage is that it can be correctly diagnosed through pathological interpretation, but the disadvantage is that it is time-consuming, has risks such as bleeding, and is not suitable for large-scale screening. Therefore, the development of technology that does not require invasive biopsy and can quickly analyze gastric mucosal intestinal metaplasia is an important topic in precision health that still needs new technology development.
本揭露的實施例提出一種採用影像分割技術的胃內視鏡影像的分析方法,適用於一電腦系統,此分析方法包括:取得胃內視鏡影像,並將胃內視鏡影像切割成多個區塊;將其中一個區塊輸入至一編碼器以得到一編碼特徵圖;將編碼特徵圖輸入至一解碼器以得到一解碼特徵圖;將解碼特徵圖輸入至一遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級,遮罩聚焦調變級的其中之一用以接收解碼特徵圖與第一樣板特徵,將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵,將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵,並根據一遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵。上述的方法還包括:根據第二樣板特徵產生一預測遮罩,此預測遮罩包含多個像素,用以分割一胃黏膜腸化生區域。The disclosed embodiment proposes a method for analyzing gastric endoscopic images using image segmentation technology, which is applicable to a computer system. The method includes: obtaining a gastric endoscopic image and cutting the gastric endoscopic image into multiple blocks; inputting one of the blocks into an encoder to obtain a coded feature map; inputting the coded feature map into a decoder to obtain a decoded feature map; and inputting the decoded feature map into a mask focus modulation decoder. The mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and then substitute it into a projection function to obtain a scaling feature, and generate a second template feature according to a mask, the scaling feature and the focus modulation feature. The above method also includes: generating a predicted mask according to the second template feature, the predicted mask including a plurality of pixels, for segmenting a gastric mucosal intestinal metaplasia area.
在一些實施例中,根據第二樣板特徵產生預測遮罩的步驟包括:將遮罩聚焦調變級中的最後一級所對應的第二樣板特徵輸入至多個神經網路以得到一遮罩特徵以及一類別特徵;對遮罩特徵與解碼特徵圖做元素相乘以得到二值化遮罩;以及對二值化遮罩與類別特徵做元素相乘以得到預測遮罩。In some embodiments, the step of generating a predicted mask based on the second template feature includes: inputting the second template feature corresponding to the last level in the mask focus modulation level into multiple neural networks to obtain a mask feature and a class feature; performing element-wise multiplication of the mask feature and the decoded feature map to obtain a binary mask; and performing element-wise multiplication of the binary mask and the class feature to obtain a predicted mask.
在一些實施例中,根據遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵的步驟包含:對遮罩、縮放特徵與聚焦調變特徵做元素相乘以產生第二樣板特徵。In some embodiments, the step of generating a second template feature based on the mask, the scaling feature, and the focus modulation feature includes: performing element-wise multiplication on the mask, the scaling feature, and the focus modulation feature to generate the second template feature.
在一些實施例中,上述的遮罩包含多個數值,分析方法還包括:判斷每個數值是否小於一臨界值以產生一下一級遮罩。In some embodiments, the mask includes multiple values, and the analysis method further includes: determining whether each value is less than a critical value to generate a next-level mask.
在一些實施例中,上述的編碼器包含多個聚焦調變方塊,上述的解碼器為一像素解碼器。In some embodiments, the encoder includes a plurality of focus modulation blocks, and the decoder is a pixel decoder.
以另一個角度來說,本揭露的實施例提出一種電腦系統,包括記憶體與處理器。記憶體用以儲存多個指令,處理器通訊連接至記憶體,用以執行指令以完成多個步驟,此些步驟包括:取得胃內視鏡影像,並將胃內視鏡影像切割成多個區塊;將其中一個區塊輸入至一編碼器以得到一編碼特徵圖;將編碼特徵圖輸入至一解碼器以得到一解碼特徵圖;將解碼特徵圖輸入至一遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級,遮罩聚焦調變級的其中之一用以接收解碼特徵圖與第一樣板特徵,將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵,將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵,並根據一遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵。上述此些步驟還包括:根據第二樣板特徵產生一預測遮罩,此預測遮罩包含多個像素,用以分割一胃黏膜腸化生區域。From another perspective, the embodiment of the present disclosure provides a computer system, including a memory and a processor. The memory is used to store multiple instructions, and the processor is communicatively connected to the memory to execute the instructions to complete multiple steps, including: obtaining a gastric endoscopic image and cutting the gastric endoscopic image into multiple blocks; inputting one of the blocks into an encoder to obtain an encoded feature map; inputting the encoded feature map into a decoder to obtain a decoded feature map; and inputting the decoded feature map into a mask focus modulation decoder. The mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and then substitute it into a projection function to obtain a scaling feature, and generate a second template feature according to a mask, the scaling feature and the focus modulation feature. The above steps also include: generating a predicted mask according to the second template feature, the predicted mask including a plurality of pixels, for segmenting a gastric mucosal intestinal metaplasia area.
在一些實施例中,根據第二樣板特徵產生預測遮罩的步驟包括:將遮罩聚焦調變級中的最後一級所對應的第二樣板特徵輸入至多個神經網路以得到一遮罩特徵以及一類別特徵;對遮罩特徵與解碼特徵圖做元素相乘以得到二值化遮罩;以及對二值化遮罩與類別特徵做元素相乘以得到預測遮罩。In some embodiments, the step of generating a predicted mask based on the second template feature includes: inputting the second template feature corresponding to the last level in the mask focus modulation level into multiple neural networks to obtain a mask feature and a class feature; performing element-wise multiplication of the mask feature and the decoded feature map to obtain a binary mask; and performing element-wise multiplication of the binary mask and the class feature to obtain a predicted mask.
在一些實施例中,根據遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵的步驟包含:對遮罩、縮放特徵與聚焦調變特徵做元素相乘以產生第二樣板特徵。In some embodiments, the step of generating a second template feature based on the mask, the scaling feature, and the focus modulation feature includes: performing element-wise multiplication on the mask, the scaling feature, and the focus modulation feature to generate the second template feature.
在一些實施例中,上述的遮罩包含多個數值,處理器更用以判斷每個數值是否小於一臨界值以產生一下一級遮罩。In some embodiments, the mask includes multiple values, and the processor is used to determine whether each value is less than a threshold value to generate a next-level mask.
在一些實施例中,上述的編碼器包含多個聚焦調變方塊,上述的解碼器為一像素解碼器。In some embodiments, the encoder includes a plurality of focus modulation blocks, and the decoder is a pixel decoder.
關於本文中所使用之「第一」、「第二」等,並非特別指次序或順位的意思,其僅為了區別以相同技術用語描述的元件或操作。The terms “first,” “second,” etc. used herein do not particularly refer to order or sequence, but are only used to distinguish elements or operations described with the same technical term.
圖1是根據一實施例繪示電腦系統的示意圖。請參照圖1,電腦系統120包括了處理器121與記憶體122,用以處理胃內視鏡影像110以產生一個預測遮罩130,這個預測遮罩130用以分割胃黏膜腸化生的位置。以另一個角度來說,預測遮罩130也可以被稱為胃內視鏡影像110的分割結果,所要分割出的是胃黏膜腸化生區域。在此,胃內視鏡影像110可以是胃竇(antrum)、胃體(body)或胃賁門(cardia)的胃內視鏡影像,本揭露並不限制胃內視鏡影像110的部位、解析度與角度。FIG1 is a schematic diagram of a computer system according to an embodiment. Referring to FIG1 , a computer system 120 includes a processor 121 and a memory 122, which are used to process a gastric endoscopic image 110 to generate a prediction mask 130, and the prediction mask 130 is used to segment the location of gastric mucosal intestinal metaplasia. From another perspective, the prediction mask 130 can also be called the segmentation result of the gastric endoscopic image 110, and what is to be segmented is the gastric mucosal intestinal metaplasia area. Here, the gastric endoscopic image 110 can be a gastric endoscopic image of the antrum, the body, or the heart. The present disclosure does not limit the location, resolution, and angle of the gastric endoscopic image 110.
電腦系統120可以是個人電腦、伺服器、醫療相關設備、或是各種具有計算能力的各種電子裝置等,本發明並不在此限。處理器121可以是中央處理器、圖形處理器、微處理器、微控制器、張量處理當元(tensor processing unit)、特殊應用積體電路等。記憶體可以是隨機存取記憶體、唯讀記憶體、快閃記憶體、軟碟、硬碟、光碟、隨身碟、磁帶或可由網路存取之資料庫,其中儲存有多個指令,處理器會執行這些指令來完成胃內視鏡影像的分析方法,以下將說明此方法。The computer system 120 may be a personal computer, a server, medical-related equipment, or various electronic devices with computing capabilities, but the present invention is not limited thereto. The processor 121 may be a central processing unit, a graphics processor, a microprocessor, a microcontroller, a tensor processing unit, a special application integrated circuit, etc. The memory may be a random access memory, a read-only memory, a flash memory, a floppy disk, a hard disk, an optical disk, a flash drive, a tape, or a database accessible via a network, in which a plurality of instructions are stored. The processor executes these instructions to complete the gastric endoscopic image analysis method, which will be described below.
在此是用一個深度學習網路來產生預測遮罩,圖2是根據一實施例說明深度學習網路的示意圖。請參照圖2,首先在步驟201中,將胃內視鏡影像110分為多個區塊。舉例來說,胃內視鏡影像110的解析度為 ,在此對於橫軸與縱軸都切割4個區塊,切割完以後,每一個區塊的大小為 ,接下來可用一個可訓練的線性層(例如為全連接層)將區塊轉換為 的特徵圖作為輸入。上述的H、W、C為正整數,例如H=224、H=224、C=96,但本揭露並不限於這些數值。 Here, a deep learning network is used to generate a prediction mask. FIG2 is a schematic diagram of a deep learning network according to an embodiment. Referring to FIG2, first in step 201, the gastric endoscopic image 110 is divided into a plurality of blocks. For example, the resolution of the gastric endoscopic image 110 is , here we cut 4 blocks on both the horizontal and vertical axes. After cutting, the size of each block is , then a trainable linear layer (e.g. a fully connected layer) can be used to transform the block into The above-mentioned H, W, and C are positive integers, such as H=224, H=224, and C=96, but the present disclosure is not limited to these values.
從區塊轉換的特徵圖會輸入至編碼器210,此編碼器210例如為聚焦調變網路,包含了多個聚焦調變級211~214,詳細內容可以參照第一論文YANG, Jianwei, et al. Focal modulation networks. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217。在此實施例中,聚焦調變級211包含了2個聚焦調變方塊,聚焦調變級211輸出的特徵圖的大小為 。聚焦調變級212包含了2個聚焦調變方塊,聚焦調變級212輸出的特徵圖的大小為 。聚焦調變級213包含了6個聚焦調變方塊,聚焦調變級213輸出的特徵圖的大小為 。聚焦調變級214輸出的特徵圖的大小為 。編碼器210輸出的特徵圖亦稱為編碼特徵圖。透過聚焦調變,整個網路可以學習到從小視野(field of view)到大視野的視覺特徵。 The feature map converted from the block is input to the encoder 210. The encoder 210 is, for example, a focus modulation network, which includes a plurality of focus modulation stages 211 to 214. For details, please refer to the first paper YANG, Jianwei, et al. Focal modulation networks. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217. In this embodiment, the focus modulation stage 211 includes 2 focus modulation blocks. The size of the feature map output by the focus modulation stage 211 is The focus modulation stage 212 includes two focus modulation blocks. The size of the feature map output by the focus modulation stage 212 is The focus modulation stage 213 includes 6 focus modulation blocks. The size of the feature map output by the focus modulation stage 213 is The size of the feature map output by the focus modulation stage 214 is The feature map output by the encoder 210 is also called a coded feature map. Through focus modulation, the entire network can learn visual features from a small field of view to a large field of view.
接下來,把編碼特徵圖輸入至解碼器220,此解碼器220包含了多個層級(stage),每個層級產生不同解析度的特徵圖。解碼器220例如為像素解碼器,詳細內容可參照第二論文CHENG, Bowen, et al. Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 1290-1299,在此並不詳細贅述。解碼器220所產生的特徵圖稱為解碼特徵圖221,以下亦表示為 ,其中s表示為像素解碼器的層級,最後一個層級的輸出解析度等同於原始解析度的四分之一,即為 。 Next, the encoded feature map is input to the decoder 220, which includes multiple stages, each of which generates a feature map of different resolutions. The decoder 220 is, for example, a pixel decoder. For details, please refer to the second paper CHENG, Bowen, et al. Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 1290-1299, which is not described in detail here. The feature map generated by the decoder 220 is called the decoded feature map 221, which is also represented as follows , where s represents the level of pixel decoder, and the output resolution of the last level is equivalent to one quarter of the original resolution, that is, .
接下來,將解碼特徵圖 輸入至遮罩聚焦調變解碼器230,遮罩聚焦調變解碼器230包含多個遮罩聚焦調變級232~234,每個遮罩聚焦調變級232~234都會接收解碼特徵圖 且包含了3個遮罩聚焦調變方塊。此外,遮罩聚焦調變級232還會接受一可以訓練的樣板特徵231,其大小為 ,其中N為正整數(例如為256),以下表示為 ,其中b表示第幾個遮罩聚焦調變方塊。在每一個遮罩聚焦調變級232~234中,在第b個遮罩聚焦調變方塊中的樣板特徵 以及解碼特徵圖 會用來產生第b+1個遮罩聚焦調變方塊中的樣板特徵 。 Next, the decoded feature map The mask focus modulation decoder 230 includes a plurality of mask focus modulation stages 232-234. Each of the mask focus modulation stages 232-234 receives a decoded feature map. And contains 3 mask focus modulation blocks. In addition, the mask focus modulation stage 232 also receives a trainable template feature 231, whose size is , where N is a positive integer (for example, 256), is expressed as , where b represents the number of mask focus modulation blocks. In each mask focus modulation level 232-234, the sample feature in the bth mask focus modulation block is and decoded feature maps It will be used to generate the template features in the b+1th mask focus modulation block. .
圖3是根據一實施例繪示遮罩聚焦調變方塊的架構示意圖。請參照圖3,遮罩聚焦調變方塊300包含了遮罩聚焦調變層311~312、殘差正規化層321~323與前饋層331。遮罩301輸入至遮罩聚焦調變層311~312。樣板特徵302輸入至遮罩聚焦調變層311以及殘差正規化層321。解碼特徵圖221則輸入至遮罩聚焦調變層311~312。殘差正規化層321~323中的每一者都包含了殘差連接(Residual Connection)和層正規化(Layer Normalization)。遮罩聚焦調變方塊300會產生樣板特徵340給下一個遮罩聚焦調變方塊。遮罩301提供了分割結果,這用以在訓練過程中指導模型把注意力放在特定的區域,同時遮罩聚焦調變會提供不同尺寸下有語意(semantic)的紋理特徵。FIG3 is a schematic diagram showing the structure of a mask focus modulation block according to an embodiment. Referring to FIG3 , the mask focus modulation block 300 includes mask focus modulation layers 311~312, residual normalization layers 321~323 and a feedforward layer 331. The mask 301 is input to the mask focus modulation layers 311~312. The template feature 302 is input to the mask focus modulation layer 311 and the residual normalization layer 321. The decoded feature map 221 is input to the mask focus modulation layers 311~312. Each of the residual normalization layers 321~323 includes a residual connection and a layer normalization. The mask focus modulation block 300 generates template features 340 for the next mask focus modulation block. The mask 301 provides segmentation results, which are used to guide the model to focus on specific areas during training. At the same time, the mask focus modulation provides semantic texture features at different scales.
在此說明遮罩聚焦調變層311~312的運算。圖4是根據一實施例說明遮罩聚焦調變層的示意圖。請參照圖4,遮罩聚焦調變層的運算可以表示為以下數學式1。 [數學式1] Here, the operation of the mask focus modulation layer 311-312 is described. FIG4 is a schematic diagram of the mask focus modulation layer according to an embodiment. Referring to FIG4, the operation of the mask focus modulation layer can be expressed as the following mathematical formula 1. [Mathematical formula 1]
其中 表示遮罩聚焦調變層的整體運算。Z()為聚焦調變函數,在圖4中表示為聚焦調變函數410,在此 亦稱為聚焦調變特徵。R()為縮放函數。F()為投影函數,例如可用線性轉換來實現。 為第b個遮罩聚焦調變方塊中的遮罩301。在圖4中的樣板特徵302為數學式1中的 ,解碼特徵圖221為數學式1中的 。此外,運算子 為元素相乘(element-wise multiplication)。 in represents the overall operation of the mask focus modulation layer. Z() is the focus modulation function, which is represented as focus modulation function 410 in FIG. 4. Also known as focus modulation feature. R() is the scaling function. F() is the projection function, which can be implemented by linear transformation, for example. is the mask 301 in the b-th mask focus modulation block. The template feature 302 in FIG. 4 is the formula 1 in , the decoded feature graph 221 is the mathematical formula 1 In addition, the operator This is element-wise multiplication.
樣板特徵302會經過線性轉換(例如透過全連接層來實現)然後提供給多個卷積層411~414與多個閘(gate)421~424。關於聚焦調變函數410可參照上述的第一論文,在此並不詳細贅述。類似的,解碼特徵圖221會經過縮放函數,然後再進行線性轉換(即投影函數),藉此得到縮放特徵 。 The template feature 302 is linearly transformed (for example, by a fully connected layer) and then provided to a plurality of convolutional layers 411-414 and a plurality of gates 421-424. The focus modulation function 410 can be referred to in the first paper mentioned above and will not be described in detail here. Similarly, the decoded feature map 221 is scaled and then linearly transformed (i.e., projected) to obtain the scaled feature. .
在此,遮罩 是一個二元遮罩,在第b個遮罩聚焦調變方塊中遮罩 是從第b-1個遮罩聚焦調變方塊所產生。具體來說,上一級的遮罩包含多個數值,這些數值表示為 ,其中x、y是座標,接下來判斷這些數值的每一者是否小於一臨界值(例如為0.5)以產生下一級遮罩 ,如以下數學式2所示。 [數學式2] Here, the mask is a binary mask that masks in the bth mask focus modulation box is generated from the b-1th mask focus modulation block. Specifically, the mask of the previous level contains multiple values, which are represented as , where x and y are coordinates. Next, we determine whether each of these values is less than a critical value (for example, 0.5) to generate the next level mask. , as shown in the following mathematical formula 2. [Mathematical formula 2]
透過遮罩可以強迫模型專注在難以分類的像素上。以另一個角度來說,上述的數學式1是將樣板特徵 輸入至一聚焦調變函數Z()以得到聚焦調變特徵Z( ),將解碼特徵圖 輸入至縮放函數R()再代入投影函數F()以得到縮放特徵 ,然後對遮罩 、縮放特徵 與聚焦調變特徵 做元素相乘以產生後續網路所要使用的樣板特徵 。 Masking can force the model to focus on pixels that are difficult to classify. From another perspective, the above mathematical formula 1 converts the sample features Input into a focus modulation function Z() to obtain the focus modulation characteristic Z( ), the decoded feature map Input to the scaling function R() and then substitute into the projection function F() to get the scaling feature , and then apply the mask , Zoom Features and focus modulation characteristics Multiply elements together to generate template features for subsequent networks .
請參照圖2,經過遮罩聚焦調變解碼器230以後,最後一級的遮罩聚焦調變級234所產生的樣板特徵(亦稱第二樣板特徵)可用來產生預測遮罩130。具體來說,第二樣板特徵會輸入至多個神經網路241~242,這些神經網路241~242例如為多層感知器(Multilayer Perceptron,MLP)。神經網路241會輸出遮罩特徵251,而神經網路242會輸出類別特徵252,其中遮罩特徵251的大小是 ,類別特徵252的大小為 ,其中K為正整數。接下來,對遮罩特徵251與解碼器220所產生最大解析度的解碼特徵圖221做元素相乘以得到二值化遮罩261,其大小為 。然後,對二值化遮罩261與類別特徵252做元素相乘以得到預測遮罩130。 2, after the mask focus modulation decoder 230, the template feature (also called the second template feature) generated by the final mask focus modulation stage 234 can be used to generate the prediction mask 130. Specifically, the second template feature is input to a plurality of neural networks 241-242, which are, for example, multilayer perceptrons (MLP). The neural network 241 outputs a mask feature 251, and the neural network 242 outputs a class feature 252, wherein the size of the mask feature 251 is , the size of class feature 252 is , where K is a positive integer. Next, the mask feature 251 is element-wise multiplied with the decoded feature map 221 of the maximum resolution generated by the decoder 220 to obtain a binary mask 261, whose size is . Then, the binary mask 261 is element-wise multiplied with the class feature 252 to obtain the predicted mask 130.
本揭露所採用的損失函數如以下數學式3所示。 [數學式3] The loss function used in the present disclosure is shown in the following mathematical formula 3. [Mathematical formula 3]
其中 、 與 分別表示二元交叉熵損失(binary cross-entropy loss)、骰子損失(dice loss)以及分類損失(classification loss), 、 、 則是對應損失的權重,細節可以參考上述的第二論文。 in , and They represent binary cross-entropy loss, dice loss, and classification loss respectively. , , is the weight corresponding to the loss. For details, please refer to the second paper mentioned above.
預測遮罩130的維度是 ,其包含多個像素,用以分割胃黏膜腸化生區域,例如將胃黏膜腸化生類別設定為數值“1”,背景設定為數值“0”,但本揭露並不限於這樣的例子。圖5是根據一實施例繪示實驗結果的示意圖。圖5繪示了測試資料501~505,這些測試資料501~505是將預測遮罩疊加在原本的胃內視鏡影像所產生。例如,測試資料501中分割了胃黏膜腸化生發生的區域510,其他以此類推。 The dimension of the prediction mask 130 is , which includes a plurality of pixels, for segmenting the gastric mucosal intestinal metaplasia region, for example, the gastric mucosal intestinal metaplasia category is set to the value "1", and the background is set to the value "0", but the present disclosure is not limited to such an example. FIG. 5 is a schematic diagram illustrating experimental results according to an embodiment. FIG. 5 illustrates test data 501-505, which are generated by superimposing the predicted mask on the original gastric endoscopic image. For example, the test data 501 segmented the gastric mucosal intestinal metaplasia region 510, and the others are analogous.
在本揭露中,提出了遮罩聚焦調變網路來完成胃內視鏡影像的分析,可從中分割出胃黏膜腸化生發生的區域。在本揭露提出的網路中,由於編碼器與解碼器的合作,遮罩聚焦調變解碼器會專注在不同視野所產生的整體特徵,這使得網路聚焦在可學習的遮罩上,提供了更好的分割結果。In this disclosure, a mask focus modulation network is proposed to analyze gastric endoscopic images, from which the area where gastric mucosal intestinal metaplasia occurs can be segmented. In the network proposed in this disclosure, due to the cooperation between the encoder and the decoder, the mask focus modulation decoder will focus on the overall features generated by different fields of view, which makes the network focus on the learnable mask and provides better segmentation results.
圖6是根據一實施例繪示採用影像分割技術的胃內視鏡影像的分析方法的流程圖。請參照圖6,在步驟601,取得胃內視鏡影像,並將胃內視鏡影像切割成多個區塊。在步驟602,將區塊輸入至編碼器以得到編碼特徵圖。在步驟603,將編碼特徵圖輸入至解碼器以得到解碼特徵圖。在步驟604,將該解碼特徵圖輸入至遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級,其中一級用以接收解碼特徵圖與第一樣板特徵,將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵,將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵,並根據一遮罩、縮放特徵與聚焦調變特徵產生一第二樣板特徵。在步驟605,根據第二樣板特徵產生一預測遮罩,此預測遮罩包含多個像素,用以分割一胃黏膜腸化生區域。圖6中各步驟已詳細說明如上,在此便不再贅述。值得注意的是,圖6中各步驟可以實作為多個程式碼或是電路,本發明並不在此限。此外,圖6的方法可以搭配以上實施例使用也可以單獨使用,換言之,圖6的各步驟之間也可以加入其他的步驟。FIG6 is a flow chart illustrating a method for analyzing gastric endoscopic images using image segmentation technology according to an embodiment. Referring to FIG6, in step 601, a gastric endoscopic image is obtained and cut into a plurality of blocks. In step 602, the blocks are input into an encoder to obtain a coded feature map. In step 603, the coded feature map is input into a decoder to obtain a decoded feature map. In step 604, the decoded feature map is input into a mask focus modulation decoder. This mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and substitute it into a projection function to obtain a scaling feature, and generate a second template feature based on a mask, scaling feature and focus modulation feature. In step 605, a predicted mask is generated based on the second template feature, and the predicted mask includes a plurality of pixels for segmenting a gastric mucosal intestinal metaplasia area. Each step in FIG. 6 has been described in detail above and will not be repeated here. It is worth noting that each step in FIG. 6 can be implemented as a plurality of program codes or circuits, and the present invention is not limited thereto. In addition, the method of FIG. 6 can be used in conjunction with the above embodiments or can be used alone. In other words, other steps can be added between the steps of FIG. 6 .
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above by the embodiments, they are not intended to limit the present invention. Any person with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the scope of the attached patent application.
110:胃內視鏡影像 120:電腦系統 121:處理器 122:記憶體 130:預測遮罩 201:步驟 210:編碼器 211~214:聚焦調變級 220:解碼器 221:解碼特徵圖 230:遮罩聚焦調變解碼器 231:樣板特徵 232~234:遮罩聚焦調變級 241~242:神經網路 251:遮罩特徵 252:類別特徵 261:二值化遮罩 300:遮罩聚焦調變方塊 301:遮罩 302:樣板特徵 311~312:遮罩聚焦調變層 321~323:殘差正規化層 331:前饋層 340:樣板特徵 410:聚焦調變函數 411~414:卷積層 421~424:閘 501~505:測試資料 510:區域 601~605:步驟110: gastroendoscopic image 120: computer system 121: processor 122: memory 130: prediction mask 201: step 210: encoder 211~214: focus modulation level 220: decoder 221: decoded feature map 230: mask focus modulation decoder 231: template feature 232~234: mask focus modulation level 241~242: neural network 251: mask feature 252: class feature 261: binary mask 300: mask focus modulation block 301: mask 302: template feature 311~312: mask focus modulation layer 321~323: Residual normalization layer 331: Feedforward layer 340: Sample characteristics 410: Focus modulation function 411~414: Convolution layer 421~424: Gate 501~505: Test data 510: Region 601~605: Steps
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。 圖1是根據一實施例繪示電腦系統的示意圖。 圖2是根據一實施例說明深度學習網路的示意圖。 圖3是根據一實施例繪示遮罩聚焦調變方塊的架構示意圖。 圖4是根據一實施例說明遮罩聚焦調變層的示意圖。 圖5是根據一實施例繪示實驗結果的示意圖。 圖6是根據一實施例繪示採用影像分割技術的胃內視鏡影像的分析方法的流程圖。 In order to make the above features and advantages of the present invention more clear and easy to understand, the following examples are specifically cited and detailed with the attached figures. FIG. 1 is a schematic diagram of a computer system according to an embodiment. FIG. 2 is a schematic diagram of a deep learning network according to an embodiment. FIG. 3 is a schematic diagram of the structure of a mask focus modulation block according to an embodiment. FIG. 4 is a schematic diagram of a mask focus modulation layer according to an embodiment. FIG. 5 is a schematic diagram of experimental results according to an embodiment. FIG. 6 is a flow chart of a method for analyzing gastric endoscopic images using image segmentation technology according to an embodiment.
110:胃內視鏡影像 110: Gastroendoscopic images
130:預測遮罩 130: Prediction mask
201:步驟 201: Steps
210:編碼器 210: Encoder
211~214:聚焦調變級 211~214: Focus modulation level
220:解碼器 220:Decoder
221:解碼特徵圖 221: Decoding feature map
230:遮罩聚焦調變解碼器 230:Mask Focus Modulation Decoder
231:樣板特徵 231: Sample features
232~234:遮罩聚焦調變級 232~234: Mask focus modulation level
241~242:神經網路 241~242: Neural network
251:遮罩特徵 251:Mask feature
252:類別特徵 252:Classification characteristics
261:二值化遮罩 261: Binarization mask
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW113124479A TWI873056B (en) | 2024-07-01 | 2024-07-01 | Method and computer system for analyzing gastric endoscopic image using image segmentation technology |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW113124479A TWI873056B (en) | 2024-07-01 | 2024-07-01 | Method and computer system for analyzing gastric endoscopic image using image segmentation technology |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI873056B true TWI873056B (en) | 2025-02-11 |
| TW202603741A TW202603741A (en) | 2026-01-16 |
Family
ID=95557289
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW113124479A TWI873056B (en) | 2024-07-01 | 2024-07-01 | Method and computer system for analyzing gastric endoscopic image using image segmentation technology |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI873056B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9943230B2 (en) * | 2012-01-25 | 2018-04-17 | Fujifilm Corporation | Endoscope system, processor device of endoscope system, and image processing method |
| TW202037148A (en) * | 2018-11-07 | 2020-10-01 | 日商索尼半導體解決方案公司 | Camera and electronic equipment |
| CN113164589A (en) * | 2018-06-29 | 2021-07-23 | 维西欧制药公司 | Compositions and methods for modulating monocyte and macrophage inflammatory phenotype and immunotherapy uses thereof |
| TW202217280A (en) * | 2020-06-25 | 2022-05-01 | 美商布雷茲生物科學股份有限公司 | Systems and methods for simultaneous near-infrared light and visible light imaging |
-
2024
- 2024-07-01 TW TW113124479A patent/TWI873056B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9943230B2 (en) * | 2012-01-25 | 2018-04-17 | Fujifilm Corporation | Endoscope system, processor device of endoscope system, and image processing method |
| CN113164589A (en) * | 2018-06-29 | 2021-07-23 | 维西欧制药公司 | Compositions and methods for modulating monocyte and macrophage inflammatory phenotype and immunotherapy uses thereof |
| TW202037148A (en) * | 2018-11-07 | 2020-10-01 | 日商索尼半導體解決方案公司 | Camera and electronic equipment |
| TW202217280A (en) * | 2020-06-25 | 2022-05-01 | 美商布雷茲生物科學股份有限公司 | Systems and methods for simultaneous near-infrared light and visible light imaging |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Rifai et al. | Analysis for diagnosis of pneumonia symptoms using chest X-ray based on MobileNetV2 models with image enhancement using white balance and contrast limited adaptive histogram equalization (CLAHE) | |
| Khan et al. | Classification and region analysis of COVID-19 infection using lung CT images and deep convolutional neural networks | |
| CN114663426B (en) | Bone age assessment method based on key bone region positioning | |
| Alam et al. | Rat-capsnet: A deep learning network utilizing attention and regional information for abnormality detection in wireless capsule endoscopy | |
| CN116935044B (en) | Endoscopic polyp segmentation method with multi-scale guidance and multi-level supervision | |
| CN115249302B (en) | Intestinal wall blood vessel segmentation method based on multi-scale contextual information and attention mechanism | |
| CN119205819A (en) | Brain tumor image acceleration diffusion network segmentation method based on fuzzy learning | |
| CN118196113A (en) | A liver and tumor segmentation method based on SNAU-Net | |
| CN114708236A (en) | TSN and SSN based thyroid nodule benign and malignant classification method in ultrasonic image | |
| Amer et al. | Residual dilated U-net for the segmentation Of COVID-19 infection from CT images | |
| CN116884623A (en) | Medical rehabilitation prediction system based on laser scanning imaging | |
| CN119339093B (en) | CT image segmentation method based on double-branch CNN multi-scale fusion network | |
| TWI873056B (en) | Method and computer system for analyzing gastric endoscopic image using image segmentation technology | |
| Wang et al. | MambaFormer: State transformer attention network for accurate polyp segmentation from colonoscopy images | |
| Zhao et al. | VCMix-Net: A hybrid network for medical image segmentation | |
| CN118072027B (en) | Gland segmentation method and device and electronic equipment | |
| Naas et al. | An explainable AI for breast cancer classification using vision Transformer (ViT) | |
| CN118470500A (en) | Gastrointestinal endoscope image polyp identification method based on neural network | |
| Mahanty et al. | SRGAN Assisted Encoder-Decoder Deep Neural Network for Colorectal Polyp Semantic Segmentation. | |
| CN116524187A (en) | Swin Unet variant medical image segmentation method based on distraction | |
| WO2026006939A1 (en) | Gastroscopic image analysis method using image segmentation technology, and computer system | |
| Chen et al. | MFCNet: Multi-Feature Fusion Neural Network for Thoracic Disease Classification | |
| CN114022470A (en) | Segmentation method of nematode experimental image | |
| Kanagalakshmi et al. | Lung cancer prediction with improved graph convolutional neural networks | |
| Yang et al. | Detection of anatomical landmarks during laparoscopic cholecystectomy surgery based on improved YOLOv7 algorithm |