TWI873056B

TWI873056B - Method and computer system for analyzing gastric endoscopic image using image segmentation technology

Info

Publication number: TWI873056B
Application number: TW113124479A
Authority: TW
Inventors: 鄭修琦; 黃春融; 楊曉白; 張維倫; 楊貳翔; 許博翔
Original assignee: 國立成功大學; 東元醫療社團法人東元綜合醫院
Priority date: 2024-07-01
Filing date: 2024-07-01
Publication date: 2025-02-11

Abstract

This disclosure is related to a method for analyzing a gastric endoscopy image using image segmentation technology. The method includes: cutting the gastric endoscopy images into multiple patches; inputting the patches into an encoder to obtain an encoded feature map; inputting the encoded feature map into a decoder to obtain a decoded feature map; and inputting the decoded feature map into a mask focal modulation decoder. This mask focal modulation decoder inputs a first prototype feature into a focal modulation function to obtain a focal modulation feature, input the decoded feature map into a scaling function then into a projection function to obtain a scaled feature, and generate a second prototype feature based on a mask, the scaled feature, and the focal modulation feature. A predictive mask is generated based on the second prototype feature for segmenting gastric intestinal metaplasia area.

Description

Gastroendoscopic image analysis method and computer system using image segmentation technology

本揭露是有關於使用深度學習技術來分析胃內視鏡影像的方法，特別是可以分割出影像中的胃黏膜腸化生區域。The present disclosure relates to a method for analyzing gastric endoscopic images using deep learning technology, and in particular, for segmenting gastric mucosal intestinal metaplasia areas in the images.

胃癌是世界癌症發生率第六位，癌症死亡原因第四位。胃癌是幽門桿菌感染後經由胃體發炎，胃萎縮，胃黏膜腸化生而發生。由於早期診斷才能改善疾病存活率，對已經罹患癌前病變，如胃黏膜腸化生的病人定期追蹤，便相當重要。如何在胃鏡檢查時，正確地診斷與分級胃體發炎與胃黏膜腸化生，是臨床重要議題。目前診斷胃體發炎與胃黏膜腸化生，需要在胃鏡檢查時，在胃部共五到六處進行活體組織切片，再經由病理科醫師判讀。優點是能經由病理判讀而正確診斷，但缺點是耗時，有出血等風險，不適合大規模篩檢。因此開發不需要侵入式切片且能快速地分析胃黏膜腸化生的技術，是精準健康中重要且仍待全新技術研發之課題。Gastric cancer ranks sixth in the incidence of cancer and fourth in the cause of cancer death in the world. Gastric cancer is caused by Helicobacter pylori infection, followed by gastric inflammation, gastric atrophy, and gastric mucosal intestinal metaplasia. Since early diagnosis can improve the survival rate of the disease, it is very important to regularly follow up patients who have precancerous lesions such as gastric mucosal intestinal metaplasia. How to correctly diagnose and grade gastric inflammation and gastric mucosal intestinal metaplasia during gastroscopy is an important clinical issue. Currently, the diagnosis of gastric inflammation and gastric mucosal intestinal metaplasia requires five to six biopsy sections in the stomach during gastroscopy, which are then interpreted by pathologists. The advantage is that it can be correctly diagnosed through pathological interpretation, but the disadvantage is that it is time-consuming, has risks such as bleeding, and is not suitable for large-scale screening. Therefore, the development of technology that does not require invasive biopsy and can quickly analyze gastric mucosal intestinal metaplasia is an important topic in precision health that still needs new technology development.

本揭露的實施例提出一種採用影像分割技術的胃內視鏡影像的分析方法，適用於一電腦系統，此分析方法包括：取得胃內視鏡影像，並將胃內視鏡影像切割成多個區塊；將其中一個區塊輸入至一編碼器以得到一編碼特徵圖；將編碼特徵圖輸入至一解碼器以得到一解碼特徵圖；將解碼特徵圖輸入至一遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級，遮罩聚焦調變級的其中之一用以接收解碼特徵圖與第一樣板特徵，將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵，將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵，並根據一遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵。上述的方法還包括：根據第二樣板特徵產生一預測遮罩，此預測遮罩包含多個像素，用以分割一胃黏膜腸化生區域。The disclosed embodiment proposes a method for analyzing gastric endoscopic images using image segmentation technology, which is applicable to a computer system. The method includes: obtaining a gastric endoscopic image and cutting the gastric endoscopic image into multiple blocks; inputting one of the blocks into an encoder to obtain a coded feature map; inputting the coded feature map into a decoder to obtain a decoded feature map; and inputting the decoded feature map into a mask focus modulation decoder. The mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and then substitute it into a projection function to obtain a scaling feature, and generate a second template feature according to a mask, the scaling feature and the focus modulation feature. The above method also includes: generating a predicted mask according to the second template feature, the predicted mask including a plurality of pixels, for segmenting a gastric mucosal intestinal metaplasia area.

在一些實施例中，根據第二樣板特徵產生預測遮罩的步驟包括：將遮罩聚焦調變級中的最後一級所對應的第二樣板特徵輸入至多個神經網路以得到一遮罩特徵以及一類別特徵；對遮罩特徵與解碼特徵圖做元素相乘以得到二值化遮罩；以及對二值化遮罩與類別特徵做元素相乘以得到預測遮罩。In some embodiments, the step of generating a predicted mask based on the second template feature includes: inputting the second template feature corresponding to the last level in the mask focus modulation level into multiple neural networks to obtain a mask feature and a class feature; performing element-wise multiplication of the mask feature and the decoded feature map to obtain a binary mask; and performing element-wise multiplication of the binary mask and the class feature to obtain a predicted mask.

在一些實施例中，根據遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵的步驟包含：對遮罩、縮放特徵與聚焦調變特徵做元素相乘以產生第二樣板特徵。In some embodiments, the step of generating a second template feature based on the mask, the scaling feature, and the focus modulation feature includes: performing element-wise multiplication on the mask, the scaling feature, and the focus modulation feature to generate the second template feature.

在一些實施例中，上述的遮罩包含多個數值，分析方法還包括：判斷每個數值是否小於一臨界值以產生一下一級遮罩。In some embodiments, the mask includes multiple values, and the analysis method further includes: determining whether each value is less than a critical value to generate a next-level mask.

在一些實施例中，上述的編碼器包含多個聚焦調變方塊，上述的解碼器為一像素解碼器。In some embodiments, the encoder includes a plurality of focus modulation blocks, and the decoder is a pixel decoder.

以另一個角度來說，本揭露的實施例提出一種電腦系統，包括記憶體與處理器。記憶體用以儲存多個指令，處理器通訊連接至記憶體，用以執行指令以完成多個步驟，此些步驟包括：取得胃內視鏡影像，並將胃內視鏡影像切割成多個區塊；將其中一個區塊輸入至一編碼器以得到一編碼特徵圖；將編碼特徵圖輸入至一解碼器以得到一解碼特徵圖；將解碼特徵圖輸入至一遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級，遮罩聚焦調變級的其中之一用以接收解碼特徵圖與第一樣板特徵，將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵，將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵，並根據一遮罩、縮放特徵與聚焦調變特徵產生第二樣板特徵。上述此些步驟還包括：根據第二樣板特徵產生一預測遮罩，此預測遮罩包含多個像素，用以分割一胃黏膜腸化生區域。From another perspective, the embodiment of the present disclosure provides a computer system, including a memory and a processor. The memory is used to store multiple instructions, and the processor is communicatively connected to the memory to execute the instructions to complete multiple steps, including: obtaining a gastric endoscopic image and cutting the gastric endoscopic image into multiple blocks; inputting one of the blocks into an encoder to obtain an encoded feature map; inputting the encoded feature map into a decoder to obtain a decoded feature map; and inputting the decoded feature map into a mask focus modulation decoder. The mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and then substitute it into a projection function to obtain a scaling feature, and generate a second template feature according to a mask, the scaling feature and the focus modulation feature. The above steps also include: generating a predicted mask according to the second template feature, the predicted mask including a plurality of pixels, for segmenting a gastric mucosal intestinal metaplasia area.

在一些實施例中，上述的遮罩包含多個數值，處理器更用以判斷每個數值是否小於一臨界值以產生一下一級遮罩。In some embodiments, the mask includes multiple values, and the processor is used to determine whether each value is less than a threshold value to generate a next-level mask.

關於本文中所使用之「第一」、「第二」等，並非特別指次序或順位的意思，其僅為了區別以相同技術用語描述的元件或操作。The terms “first,” “second,” etc. used herein do not particularly refer to order or sequence, but are only used to distinguish elements or operations described with the same technical term.

圖1是根據一實施例繪示電腦系統的示意圖。請參照圖1，電腦系統120包括了處理器121與記憶體122，用以處理胃內視鏡影像110以產生一個預測遮罩130，這個預測遮罩130用以分割胃黏膜腸化生的位置。以另一個角度來說，預測遮罩130也可以被稱為胃內視鏡影像110的分割結果，所要分割出的是胃黏膜腸化生區域。在此，胃內視鏡影像110可以是胃竇(antrum)、胃體(body)或胃賁門(cardia)的胃內視鏡影像，本揭露並不限制胃內視鏡影像110的部位、解析度與角度。FIG1 is a schematic diagram of a computer system according to an embodiment. Referring to FIG1 , a computer system 120 includes a processor 121 and a memory 122, which are used to process a gastric endoscopic image 110 to generate a prediction mask 130, and the prediction mask 130 is used to segment the location of gastric mucosal intestinal metaplasia. From another perspective, the prediction mask 130 can also be called the segmentation result of the gastric endoscopic image 110, and what is to be segmented is the gastric mucosal intestinal metaplasia area. Here, the gastric endoscopic image 110 can be a gastric endoscopic image of the antrum, the body, or the heart. The present disclosure does not limit the location, resolution, and angle of the gastric endoscopic image 110.

電腦系統120可以是個人電腦、伺服器、醫療相關設備、或是各種具有計算能力的各種電子裝置等，本發明並不在此限。處理器121可以是中央處理器、圖形處理器、微處理器、微控制器、張量處理當元(tensor processing unit)、特殊應用積體電路等。記憶體可以是隨機存取記憶體、唯讀記憶體、快閃記憶體、軟碟、硬碟、光碟、隨身碟、磁帶或可由網路存取之資料庫，其中儲存有多個指令，處理器會執行這些指令來完成胃內視鏡影像的分析方法，以下將說明此方法。The computer system 120 may be a personal computer, a server, medical-related equipment, or various electronic devices with computing capabilities, but the present invention is not limited thereto. The processor 121 may be a central processing unit, a graphics processor, a microprocessor, a microcontroller, a tensor processing unit, a special application integrated circuit, etc. The memory may be a random access memory, a read-only memory, a flash memory, a floppy disk, a hard disk, an optical disk, a flash drive, a tape, or a database accessible via a network, in which a plurality of instructions are stored. The processor executes these instructions to complete the gastric endoscopic image analysis method, which will be described below.

在此是用一個深度學習網路來產生預測遮罩，圖2是根據一實施例說明深度學習網路的示意圖。請參照圖2，首先在步驟201中，將胃內視鏡影像110分為多個區塊。舉例來說，胃內視鏡影像110的解析度為，在此對於橫軸與縱軸都切割4個區塊，切割完以後，每一個區塊的大小為，接下來可用一個可訓練的線性層(例如為全連接層)將區塊轉換為的特徵圖作為輸入。上述的H、W、C為正整數，例如H=224、H=224、C=96，但本揭露並不限於這些數值。 Here, a deep learning network is used to generate a prediction mask. FIG2 is a schematic diagram of a deep learning network according to an embodiment. Referring to FIG2, first in step 201, the gastric endoscopic image 110 is divided into a plurality of blocks. For example, the resolution of the gastric endoscopic image 110 is , here we cut 4 blocks on both the horizontal and vertical axes. After cutting, the size of each block is , then a trainable linear layer (e.g. a fully connected layer) can be used to transform the block into The above-mentioned H, W, and C are positive integers, such as H=224, H=224, and C=96, but the present disclosure is not limited to these values.

從區塊轉換的特徵圖會輸入至編碼器210，此編碼器210例如為聚焦調變網路，包含了多個聚焦調變級211~214，詳細內容可以參照第一論文YANG, Jianwei, et al. Focal modulation networks. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217。在此實施例中，聚焦調變級211包含了2個聚焦調變方塊，聚焦調變級211輸出的特徵圖的大小為。聚焦調變級212包含了2個聚焦調變方塊，聚焦調變級212輸出的特徵圖的大小為。聚焦調變級213包含了6個聚焦調變方塊，聚焦調變級213輸出的特徵圖的大小為。聚焦調變級214輸出的特徵圖的大小為。編碼器210輸出的特徵圖亦稱為編碼特徵圖。透過聚焦調變，整個網路可以學習到從小視野(field of view)到大視野的視覺特徵。 The feature map converted from the block is input to the encoder 210. The encoder 210 is, for example, a focus modulation network, which includes a plurality of focus modulation stages 211 to 214. For details, please refer to the first paper YANG, Jianwei, et al. Focal modulation networks. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217. In this embodiment, the focus modulation stage 211 includes 2 focus modulation blocks. The size of the feature map output by the focus modulation stage 211 is The focus modulation stage 212 includes two focus modulation blocks. The size of the feature map output by the focus modulation stage 212 is The focus modulation stage 213 includes 6 focus modulation blocks. The size of the feature map output by the focus modulation stage 213 is The size of the feature map output by the focus modulation stage 214 is The feature map output by the encoder 210 is also called a coded feature map. Through focus modulation, the entire network can learn visual features from a small field of view to a large field of view.

接下來，把編碼特徵圖輸入至解碼器220，此解碼器220包含了多個層級(stage)，每個層級產生不同解析度的特徵圖。解碼器220例如為像素解碼器，詳細內容可參照第二論文CHENG, Bowen, et al. Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 1290-1299，在此並不詳細贅述。解碼器220所產生的特徵圖稱為解碼特徵圖221，以下亦表示為，其中s表示為像素解碼器的層級，最後一個層級的輸出解析度等同於原始解析度的四分之一，即為。 Next, the encoded feature map is input to the decoder 220, which includes multiple stages, each of which generates a feature map of different resolutions. The decoder 220 is, for example, a pixel decoder. For details, please refer to the second paper CHENG, Bowen, et al. Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 1290-1299, which is not described in detail here. The feature map generated by the decoder 220 is called the decoded feature map 221, which is also represented as follows , where s represents the level of pixel decoder, and the output resolution of the last level is equivalent to one quarter of the original resolution, that is, .

接下來，將解碼特徵圖輸入至遮罩聚焦調變解碼器230，遮罩聚焦調變解碼器230包含多個遮罩聚焦調變級232~234，每個遮罩聚焦調變級232~234都會接收解碼特徵圖且包含了3個遮罩聚焦調變方塊。此外，遮罩聚焦調變級232還會接受一可以訓練的樣板特徵231，其大小為，其中N為正整數(例如為256)，以下表示為，其中b表示第幾個遮罩聚焦調變方塊。在每一個遮罩聚焦調變級232~234中，在第b個遮罩聚焦調變方塊中的樣板特徵以及解碼特徵圖會用來產生第b+1個遮罩聚焦調變方塊中的樣板特徵。 Next, the decoded feature map The mask focus modulation decoder 230 includes a plurality of mask focus modulation stages 232-234. Each of the mask focus modulation stages 232-234 receives a decoded feature map. And contains 3 mask focus modulation blocks. In addition, the mask focus modulation stage 232 also receives a trainable template feature 231, whose size is , where N is a positive integer (for example, 256), is expressed as , where b represents the number of mask focus modulation blocks. In each mask focus modulation level 232-234, the sample feature in the bth mask focus modulation block is and decoded feature maps It will be used to generate the template features in the b+1th mask focus modulation block. .

圖3是根據一實施例繪示遮罩聚焦調變方塊的架構示意圖。請參照圖3，遮罩聚焦調變方塊300包含了遮罩聚焦調變層311~312、殘差正規化層321~323與前饋層331。遮罩301輸入至遮罩聚焦調變層311~312。樣板特徵302輸入至遮罩聚焦調變層311以及殘差正規化層321。解碼特徵圖221則輸入至遮罩聚焦調變層311~312。殘差正規化層321~323中的每一者都包含了殘差連接(Residual Connection)和層正規化(Layer Normalization)。遮罩聚焦調變方塊300會產生樣板特徵340給下一個遮罩聚焦調變方塊。遮罩301提供了分割結果，這用以在訓練過程中指導模型把注意力放在特定的區域，同時遮罩聚焦調變會提供不同尺寸下有語意(semantic)的紋理特徵。FIG3 is a schematic diagram showing the structure of a mask focus modulation block according to an embodiment. Referring to FIG3 , the mask focus modulation block 300 includes mask focus modulation layers 311~312, residual normalization layers 321~323 and a feedforward layer 331. The mask 301 is input to the mask focus modulation layers 311~312. The template feature 302 is input to the mask focus modulation layer 311 and the residual normalization layer 321. The decoded feature map 221 is input to the mask focus modulation layers 311~312. Each of the residual normalization layers 321~323 includes a residual connection and a layer normalization. The mask focus modulation block 300 generates template features 340 for the next mask focus modulation block. The mask 301 provides segmentation results, which are used to guide the model to focus on specific areas during training. At the same time, the mask focus modulation provides semantic texture features at different scales.

在此說明遮罩聚焦調變層311~312的運算。圖4是根據一實施例說明遮罩聚焦調變層的示意圖。請參照圖4，遮罩聚焦調變層的運算可以表示為以下數學式1。 [數學式1] Here, the operation of the mask focus modulation layer 311-312 is described. FIG4 is a schematic diagram of the mask focus modulation layer according to an embodiment. Referring to FIG4, the operation of the mask focus modulation layer can be expressed as the following mathematical formula 1. [Mathematical formula 1]

其中表示遮罩聚焦調變層的整體運算。Z()為聚焦調變函數，在圖4中表示為聚焦調變函數410，在此亦稱為聚焦調變特徵。R()為縮放函數。F()為投影函數，例如可用線性轉換來實現。為第b個遮罩聚焦調變方塊中的遮罩301。在圖4中的樣板特徵302為數學式1中的，解碼特徵圖221為數學式1中的。此外，運算子為元素相乘(element-wise multiplication)。 in represents the overall operation of the mask focus modulation layer. Z() is the focus modulation function, which is represented as focus modulation function 410 in FIG. 4. Also known as focus modulation feature. R() is the scaling function. F() is the projection function, which can be implemented by linear transformation, for example. is the mask 301 in the b-th mask focus modulation block. The template feature 302 in FIG. 4 is the formula 1 in , the decoded feature graph 221 is the mathematical formula 1 In addition, the operator This is element-wise multiplication.

樣板特徵302會經過線性轉換(例如透過全連接層來實現)然後提供給多個卷積層411~414與多個閘(gate)421~424。關於聚焦調變函數410可參照上述的第一論文，在此並不詳細贅述。類似的，解碼特徵圖221會經過縮放函數，然後再進行線性轉換(即投影函數)，藉此得到縮放特徵。 The template feature 302 is linearly transformed (for example, by a fully connected layer) and then provided to a plurality of convolutional layers 411-414 and a plurality of gates 421-424. The focus modulation function 410 can be referred to in the first paper mentioned above and will not be described in detail here. Similarly, the decoded feature map 221 is scaled and then linearly transformed (i.e., projected) to obtain the scaled feature. .

在此，遮罩是一個二元遮罩，在第b個遮罩聚焦調變方塊中遮罩是從第b-1個遮罩聚焦調變方塊所產生。具體來說，上一級的遮罩包含多個數值，這些數值表示為，其中x、y是座標，接下來判斷這些數值的每一者是否小於一臨界值(例如為0.5)以產生下一級遮罩，如以下數學式2所示。 [數學式2] Here, the mask is a binary mask that masks in the bth mask focus modulation box is generated from the b-1th mask focus modulation block. Specifically, the mask of the previous level contains multiple values, which are represented as , where x and y are coordinates. Next, we determine whether each of these values is less than a critical value (for example, 0.5) to generate the next level mask. , as shown in the following mathematical formula 2. [Mathematical formula 2]

透過遮罩可以強迫模型專注在難以分類的像素上。以另一個角度來說，上述的數學式1是將樣板特徵輸入至一聚焦調變函數Z()以得到聚焦調變特徵Z( )，將解碼特徵圖輸入至縮放函數R()再代入投影函數F()以得到縮放特徵，然後對遮罩、縮放特徵與聚焦調變特徵做元素相乘以產生後續網路所要使用的樣板特徵。 Masking can force the model to focus on pixels that are difficult to classify. From another perspective, the above mathematical formula 1 converts the sample features Input into a focus modulation function Z() to obtain the focus modulation characteristic Z( ), the decoded feature map Input to the scaling function R() and then substitute into the projection function F() to get the scaling feature , and then apply the mask , Zoom Features and focus modulation characteristics Multiply elements together to generate template features for subsequent networks .

請參照圖2，經過遮罩聚焦調變解碼器230以後，最後一級的遮罩聚焦調變級234所產生的樣板特徵(亦稱第二樣板特徵)可用來產生預測遮罩130。具體來說，第二樣板特徵會輸入至多個神經網路241~242，這些神經網路241~242例如為多層感知器(Multilayer Perceptron，MLP)。神經網路241會輸出遮罩特徵251，而神經網路242會輸出類別特徵252，其中遮罩特徵251的大小是，類別特徵252的大小為，其中K為正整數。接下來，對遮罩特徵251與解碼器220所產生最大解析度的解碼特徵圖221做元素相乘以得到二值化遮罩261，其大小為。然後，對二值化遮罩261與類別特徵252做元素相乘以得到預測遮罩130。 2, after the mask focus modulation decoder 230, the template feature (also called the second template feature) generated by the final mask focus modulation stage 234 can be used to generate the prediction mask 130. Specifically, the second template feature is input to a plurality of neural networks 241-242, which are, for example, multilayer perceptrons (MLP). The neural network 241 outputs a mask feature 251, and the neural network 242 outputs a class feature 252, wherein the size of the mask feature 251 is , the size of class feature 252 is , where K is a positive integer. Next, the mask feature 251 is element-wise multiplied with the decoded feature map 221 of the maximum resolution generated by the decoder 220 to obtain a binary mask 261, whose size is . Then, the binary mask 261 is element-wise multiplied with the class feature 252 to obtain the predicted mask 130.

本揭露所採用的損失函數如以下數學式3所示。 [數學式3] The loss function used in the present disclosure is shown in the following mathematical formula 3. [Mathematical formula 3]

其中、與分別表示二元交叉熵損失(binary cross-entropy loss)、骰子損失(dice loss)以及分類損失(classification loss)，、、則是對應損失的權重，細節可以參考上述的第二論文。 in , and They represent binary cross-entropy loss, dice loss, and classification loss respectively. , , is the weight corresponding to the loss. For details, please refer to the second paper mentioned above.

預測遮罩130的維度是，其包含多個像素，用以分割胃黏膜腸化生區域，例如將胃黏膜腸化生類別設定為數值“1”，背景設定為數值“0”，但本揭露並不限於這樣的例子。圖5是根據一實施例繪示實驗結果的示意圖。圖5繪示了測試資料501~505，這些測試資料501~505是將預測遮罩疊加在原本的胃內視鏡影像所產生。例如，測試資料501中分割了胃黏膜腸化生發生的區域510，其他以此類推。 The dimension of the prediction mask 130 is , which includes a plurality of pixels, for segmenting the gastric mucosal intestinal metaplasia region, for example, the gastric mucosal intestinal metaplasia category is set to the value "1", and the background is set to the value "0", but the present disclosure is not limited to such an example. FIG. 5 is a schematic diagram illustrating experimental results according to an embodiment. FIG. 5 illustrates test data 501-505, which are generated by superimposing the predicted mask on the original gastric endoscopic image. For example, the test data 501 segmented the gastric mucosal intestinal metaplasia region 510, and the others are analogous.

在本揭露中，提出了遮罩聚焦調變網路來完成胃內視鏡影像的分析，可從中分割出胃黏膜腸化生發生的區域。在本揭露提出的網路中，由於編碼器與解碼器的合作，遮罩聚焦調變解碼器會專注在不同視野所產生的整體特徵，這使得網路聚焦在可學習的遮罩上，提供了更好的分割結果。In this disclosure, a mask focus modulation network is proposed to analyze gastric endoscopic images, from which the area where gastric mucosal intestinal metaplasia occurs can be segmented. In the network proposed in this disclosure, due to the cooperation between the encoder and the decoder, the mask focus modulation decoder will focus on the overall features generated by different fields of view, which makes the network focus on the learnable mask and provides better segmentation results.

圖6是根據一實施例繪示採用影像分割技術的胃內視鏡影像的分析方法的流程圖。請參照圖6，在步驟601，取得胃內視鏡影像，並將胃內視鏡影像切割成多個區塊。在步驟602，將區塊輸入至編碼器以得到編碼特徵圖。在步驟603，將編碼特徵圖輸入至解碼器以得到解碼特徵圖。在步驟604，將該解碼特徵圖輸入至遮罩聚焦調變解碼器。此遮罩聚焦調變解碼器包含多個遮罩聚焦調變級，其中一級用以接收解碼特徵圖與第一樣板特徵，將第一樣板特徵輸入至一聚焦調變函數以得到一聚焦調變特徵，將解碼特徵圖輸入至一縮放函數再代入一投影函數以得到一縮放特徵，並根據一遮罩、縮放特徵與聚焦調變特徵產生一第二樣板特徵。在步驟605，根據第二樣板特徵產生一預測遮罩，此預測遮罩包含多個像素，用以分割一胃黏膜腸化生區域。圖6中各步驟已詳細說明如上，在此便不再贅述。值得注意的是，圖6中各步驟可以實作為多個程式碼或是電路，本發明並不在此限。此外，圖6的方法可以搭配以上實施例使用也可以單獨使用，換言之，圖6的各步驟之間也可以加入其他的步驟。FIG6 is a flow chart illustrating a method for analyzing gastric endoscopic images using image segmentation technology according to an embodiment. Referring to FIG6, in step 601, a gastric endoscopic image is obtained and cut into a plurality of blocks. In step 602, the blocks are input into an encoder to obtain a coded feature map. In step 603, the coded feature map is input into a decoder to obtain a decoded feature map. In step 604, the decoded feature map is input into a mask focus modulation decoder. This mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive a decoded feature map and a first template feature, input the first template feature into a focus modulation function to obtain a focus modulation feature, input the decoded feature map into a scaling function and substitute it into a projection function to obtain a scaling feature, and generate a second template feature based on a mask, scaling feature and focus modulation feature. In step 605, a predicted mask is generated based on the second template feature, and the predicted mask includes a plurality of pixels for segmenting a gastric mucosal intestinal metaplasia area. Each step in FIG. 6 has been described in detail above and will not be repeated here. It is worth noting that each step in FIG. 6 can be implemented as a plurality of program codes or circuits, and the present invention is not limited thereto. In addition, the method of FIG. 6 can be used in conjunction with the above embodiments or can be used alone. In other words, other steps can be added between the steps of FIG. 6 .

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above by the embodiments, they are not intended to limit the present invention. Any person with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the scope of the attached patent application.

110:胃內視鏡影像 120:電腦系統 121:處理器 122:記憶體 130:預測遮罩 201:步驟 210:編碼器 211~214:聚焦調變級 220:解碼器 221:解碼特徵圖 230:遮罩聚焦調變解碼器 231:樣板特徵 232~234:遮罩聚焦調變級 241~242:神經網路 251:遮罩特徵 252:類別特徵 261:二值化遮罩 300:遮罩聚焦調變方塊 301:遮罩 302:樣板特徵 311~312:遮罩聚焦調變層 321~323:殘差正規化層 331:前饋層 340:樣板特徵 410:聚焦調變函數 411~414:卷積層 421~424:閘 501~505:測試資料 510:區域 601～605:步驟110: gastroendoscopic image 120: computer system 121: processor 122: memory 130: prediction mask 201: step 210: encoder 211~214: focus modulation level 220: decoder 221: decoded feature map 230: mask focus modulation decoder 231: template feature 232~234: mask focus modulation level 241~242: neural network 251: mask feature 252: class feature 261: binary mask 300: mask focus modulation block 301: mask 302: template feature 311~312: mask focus modulation layer 321~323: Residual normalization layer 331: Feedforward layer 340: Sample characteristics 410: Focus modulation function 411~414: Convolution layer 421~424: Gate 501~505: Test data 510: Region 601～605: Steps

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。圖1是根據一實施例繪示電腦系統的示意圖。圖2是根據一實施例說明深度學習網路的示意圖。圖3是根據一實施例繪示遮罩聚焦調變方塊的架構示意圖。圖4是根據一實施例說明遮罩聚焦調變層的示意圖。圖5是根據一實施例繪示實驗結果的示意圖。圖6是根據一實施例繪示採用影像分割技術的胃內視鏡影像的分析方法的流程圖。 In order to make the above features and advantages of the present invention more clear and easy to understand, the following examples are specifically cited and detailed with the attached figures. FIG. 1 is a schematic diagram of a computer system according to an embodiment. FIG. 2 is a schematic diagram of a deep learning network according to an embodiment. FIG. 3 is a schematic diagram of the structure of a mask focus modulation block according to an embodiment. FIG. 4 is a schematic diagram of a mask focus modulation layer according to an embodiment. FIG. 5 is a schematic diagram of experimental results according to an embodiment. FIG. 6 is a flow chart of a method for analyzing gastric endoscopic images using image segmentation technology according to an embodiment.

110:胃內視鏡影像 110: Gastroendoscopic images

130:預測遮罩 130: Prediction mask

201:步驟 201: Steps

210:編碼器 210: Encoder

211~214:聚焦調變級 211~214: Focus modulation level

220:解碼器 220:Decoder

221:解碼特徵圖 221: Decoding feature map

230:遮罩聚焦調變解碼器 230:Mask Focus Modulation Decoder

231:樣板特徵 231: Sample features

232~234:遮罩聚焦調變級 232~234: Mask focus modulation level

241~242:神經網路 241~242: Neural network

251:遮罩特徵 251:Mask feature

252:類別特徵 252:Classification characteristics

261:二值化遮罩 261: Binarization mask

Claims

A method for analyzing gastric endoscopic images using image segmentation technology is applicable to a computer system, and the method comprises: Obtaining a gastric endoscopic image and cutting the gastric endoscopic image into a plurality of blocks; Inputting one of the blocks into an encoder to obtain a coded feature map; Inputting the coded feature map into a decoder to obtain a decoded feature map; The decoded feature map is input into a mask focus modulation decoder, wherein the mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive the decoded feature map and a first template feature, the first template feature is input into a focus modulation function to obtain a focus modulation feature, the decoded feature map is input into a scaling function and then substituted into a projection function to obtain a scaling feature, and a second template feature is generated according to a mask, the scaling feature and the focus modulation feature; and a predicted mask is generated according to the second template feature, the predicted mask includes a plurality of pixels, and is used to segment a gastric mucosal intestinal metaplasia area.

The analysis method as described in claim 1, wherein the step of generating the predicted mask according to the second template feature comprises: Inputting the second template feature corresponding to the last level of the mask focus modulation levels into multiple neural networks to obtain a mask feature and a class feature; Performing element-wise multiplication on the mask feature and the decoded feature map to obtain a binary mask; and Performing element-wise multiplication on the binary mask and the class feature to obtain the predicted mask.

The analysis method as described in claim 1, wherein the step of generating the second template feature according to the mask, the scaling feature and the focus modulation feature comprises: Element-wise multiplication of the mask, the scaling feature and the focus modulation feature to generate the second template feature.

The analysis method as described in claim 1, wherein the mask includes multiple values, and the analysis method further includes: Determining whether each of the values is less than a critical value to generate a next-level mask.

An analysis method as described in claim 1, wherein the encoder comprises a plurality of focus modulation blocks and the decoder is a pixel decoder.

A computer system includes: A memory storing a plurality of instructions; and A processor communicating with the memory for executing the instructions to complete a plurality of steps: Obtaining a gastric endoscopic image and cutting the gastric endoscopic image into a plurality of blocks; Inputting one of the blocks into an encoder to obtain a coded feature map; Inputting the coded feature map into a decoder to obtain a decoded feature map; The decoded feature map is input into a mask focus modulation decoder, wherein the mask focus modulation decoder includes a plurality of mask focus modulation stages, one of which is used to receive the decoded feature map and a first template feature, the first template feature is input into a focus modulation function to obtain a focus modulation feature, the decoded feature map is input into a scaling function and then substituted into a projection function to obtain a scaling feature, and a second template feature is generated according to a mask, the scaling feature and the focus modulation feature; and a predicted mask is generated according to the second template feature, the predicted mask includes a plurality of pixels, and is used to segment a gastric mucosal intestinal metaplasia area.

A computer system as described in claim 6, wherein the step of generating the predicted mask according to the second template feature comprises: Inputting the second template feature corresponding to the last level of the mask focus modulation levels into multiple neural networks to obtain a mask feature and a class feature; Performing element-wise multiplication on the mask feature and the decoded feature map to obtain a binary mask; and Performing element-wise multiplication on the binary mask and the class feature to obtain the predicted mask.

A computer system as described in claim 6, wherein the step of generating the second template feature based on the mask, the scaling feature and the focus modulation feature comprises: Element-wise multiplication of the mask, the scaling feature and the focus modulation feature to generate the second template feature.

A computer system as described in claim 6, wherein the mask comprises a plurality of numerical values, and the processor is further used to determine whether each of the numerical values is less than a critical value to generate a next-level mask.

A computer system as described in claim 6, wherein the encoder comprises multiple focus modulation blocks and the decoder is a pixel decoder.