CN115345819A

CN115345819A - Gastric cancer image recognition system, device and application thereof

Info

Publication number: CN115345819A
Application number: CN202210461368.2A
Authority: CN
Inventors: 朱圣韬; 张澍田; 闵力; 陈蕾
Original assignee: Beijing Friendship Hospital
Current assignee: Beijing Friendship Hospital
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2022-11-15
Also published as: CN109671053A

Abstract

本发明涉及一种胃癌图像识别系统、装置及其应用，该系统包括数据输入模块、数据预处理模块、图像识别模型构建模块和病变识别模块；同时该系统可以实现自我训练，从而准确的识别胃癌图像中的病变部位。The present invention relates to a gastric cancer image recognition system, device and application thereof. The system includes a data input module, a data preprocessing module, an image recognition model building module and a lesion recognition module; at the same time, the system can realize self-training, thereby accurately recognizing gastric cancer lesion in the image.

Description

A gastric cancer image recognition system, device and application thereof

技术领域technical field

本发明属于医学领域，更具体的涉及利用图像识别系统实现病理图像识别的技术领域。The invention belongs to the field of medicine, and more specifically relates to the technical field of utilizing an image recognition system to realize pathological image recognition.

背景技术Background technique

虽然胃癌的发病率从1975年起逐渐下降，但2012年仍有将近100万的新发病例，使之成为世界第五大最常见的恶性肿瘤。在死亡率方面，胃癌是世界第三大癌症死因。Although the incidence of gastric cancer has gradually decreased since 1975, there were still nearly 1 million new cases in 2012, making it the fifth most common malignancy in the world. In terms of mortality, gastric cancer is the third leading cause of cancer death in the world.

胃癌的预后极大程度上取决于它的分歧。有研究表明胃早癌的5年生存率几乎超过90％，而进展期胃癌的生存率却低于20％。所以，在高风险患癌人群中早期发现和规律随诊是降低胃癌发病率、提高患者生存率的最有效的手段。The prognosis of gastric cancer largely depends on its divergence. Studies have shown that the 5-year survival rate of early gastric cancer is almost more than 90%, while the survival rate of advanced gastric cancer is lower than 20%. Therefore, early detection and regular follow-up in high-risk cancer populations are the most effective means to reduce the incidence of gastric cancer and improve the survival rate of patients.

由于普通白光内镜诊断胃癌(尤其是浅表平坦型病变)的误诊、漏诊率相当高，各种内镜诊断技术应运而生。但是应用这些内镜设备不仅需要的高超的操作技巧，还需要可观的经济支持。因此，急需研发一种发现、诊断胃早癌及癌前病变的简单易得、经济实用并且安全可靠的诊断技术。Due to the high rate of misdiagnosis and missed diagnosis of gastric cancer (especially superficial flat lesions) by ordinary white-light endoscopy, various endoscopic diagnostic techniques have emerged as the times require. However, the application of these endoscopic devices requires not only superb operating skills, but also considerable economic support. Therefore, there is an urgent need to develop an easy-to-obtain, economical, practical, safe and reliable diagnostic technique for discovering and diagnosing early gastric cancer and precancerous lesions.

发明内容Contents of the invention

发明人在长期的医学实践中，为了减少人为内镜诊断所带来的各种问题，利用机器学习技术，经过多次开发、反复优化和训练获得了可用于胃癌诊断的系统，辅以系统而严格的图像筛选和预处理，进一步提高了训练的效能。本发明的诊断系统能够非常精准的识别病理图像(如胃镜图片和实时图像)中的癌症病变部位，其识别率甚至已经超过了内科专家医师。In the long-term medical practice, in order to reduce various problems caused by artificial endoscopic diagnosis, the inventor used machine learning technology to obtain a system that can be used for gastric cancer diagnosis after repeated development, repeated optimization and training. Strict image screening and preprocessing further improves the efficiency of training. The diagnostic system of the present invention can very accurately identify cancer lesion sites in pathological images (such as gastroscopy pictures and real-time images), and its recognition rate has even surpassed that of internal medicine experts.

本发明的第一个方面提供了一种胃癌图像识别系统，其包括：The first aspect of the present invention provides a gastric cancer image recognition system, which includes:

a、数据输入模块，用于输入包含胃癌病变部位的图像，所述图像优选为内窥镜图像；A, data input module, for inputting the image that comprises gastric cancer lesion, described image is preferably endoscopic image;

b、数据预处理模块，用于接收来自数据输入模块的图像，并精确框选胃癌的病变部位，在框选内的部分定义为阳性样本，而框选外的部分定义为阴性样本，并输出病变部位的坐标信息和/或病变类型信息；优选在框选前，所述模块还预先对图像进行脱敏处理，去除病患个人信息；b. The data preprocessing module is used to receive the image from the data input module, and accurately frame the lesion of gastric cancer. The part inside the frame is defined as a positive sample, and the part outside the frame is defined as a negative sample, and output Coordinate information and/or lesion type information of the lesion; preferably, before frame selection, the module also desensitizes the image in advance to remove the patient's personal information;

优选的，所述框选能够生成一个包含病灶部位的矩形框或正方形框；所述坐标信息优选为所述矩形框或正方形框的左上角和右下角的点的坐标信息；Preferably, the frame selection can generate a rectangular frame or square frame containing the lesion site; the coordinate information is preferably the coordinate information of the points in the upper left corner and lower right corner of the rectangular frame or square frame;

还优选的，框选部位由下述方法确定：2n位内镜医师以“背对背”方式进行框选，即将2n人随机分成n组，2人/组，同时将所有图像随机分成 n份，并随机分配给各组医师进行框选；当框选完成后，对比每组两位医师的框选结果，并对两位医师之间框选结果的一致性进行评估，最终确定框选部位，其中n为1-100之间的自然数，例如1、2、3、4、5、6、7、8、9、10、20、30、40、50、60、70、80、90或100；Also preferably, the frame selection site is determined by the following method: 2n endoscopists perform frame selection in a "back-to-back" manner, that is, 2n people are randomly divided into n groups, 2 people/group, and all images are randomly divided into n parts, and Randomly assign physicians to each group for frame selection; when the frame selection is completed, compare the frame selection results of the two physicians in each group, evaluate the consistency of the frame selection results between the two physicians, and finally determine the frame selection site, of which n is a natural number between 1-100, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

进一步优选的，所述对两位医师之间框选结果的一致性进行评估的标准如下：Further preferably, the criteria for evaluating the consistency of the frame selection results between the two physicians are as follows:

针对每一张病变图片，对比每组两位医师的框选结果的重叠面积，如果每组两位医师分别框选的部位重叠部分的面积(即交集)大于该两者的并集所覆盖的面积的50％，则认为两位医师的框选判断结果一致性好，并且将上述交集对应的对角线坐标，即左上角和右下角的点的坐标，保存为目标病变最终的定位；For each lesion picture, compare the overlapping area of the boxing results of the two doctors in each group. If the area of the overlapping part of the parts selected by the two doctors in each group (that is, the intersection) is greater than the area covered by the union of the two doctors 50% of the area, it is considered that the frame selection results of the two doctors are consistent, and the diagonal coordinates corresponding to the above intersection, that is, the coordinates of the upper left corner and the lower right corner, are saved as the final location of the target lesion;

若重叠部分的面积(即交集)小于该两者的并集所覆盖的面积的50％，则认为两位医师的框选判断结果相差较大，此类病变图片被单独挑选出来，由所有参与框选工作的2n位医师共同讨论确定目标病变的最终位置；If the area of the overlapping part (that is, the intersection) is less than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two doctors are quite different, and such lesion pictures are selected separately, and all participating doctors The 2n physicians in the frame selection work discuss together to determine the final location of the target lesion;

c、图像识别模型构建模块，能够接收经数据预处理模块处理后的图像，用于构建并训练基于神经网络的图像识别模型，所述神经网络优选为卷积神经网络；C, image recognition model construction module, can receive the image after data preprocessing module process, be used for building and training the image recognition model based on neural network, described neural network is preferably a convolutional neural network;

d、病变识别模块，用于将待检图像输入到训练后的图像识别模型，并基于图像识别模型的输出结果判定待检图像中是否存在病变和/或病变的位置。d. The lesion recognition module is used to input the image to be checked into the trained image recognition model, and determine whether there is a lesion and/or the location of the lesion in the image to be checked based on the output result of the image recognition model.

在一个实施方案中，所述图像识别模型构建模块包括特征提取器、候选区域生成器和目标识别器，其中：In one embodiment, the image recognition model building block includes a feature extractor, a candidate region generator and an object recognizer, wherein:

所述特征提取器用于对来自数据预处理模块的图像进行特征提取从而获得特征图，优选的，所述特征提取通过卷积操作进行；The feature extractor is used to perform feature extraction on the image from the data preprocessing module to obtain a feature map, preferably, the feature extraction is performed by a convolution operation;

所述候选区域生成器用于基于所述特征图生成若干候选区域；The candidate region generator is used to generate several candidate regions based on the feature map;

所述目标识别器计算所述候选区域的分类得分，所述得分指示该区域属于所述阳性样本和/或所述阴性样本的概率；同时目标识别器能够对每个区域的边框位置提出调整值，从而针对每个区域的边框位置进行调整，进而精确定病灶位置；优选的，所述分类得分和调整值的训练中使用了损失函数(Loss function)；The target recognizer calculates the classification score of the candidate region, the score indicates the probability that the region belongs to the positive sample and/or the negative sample; at the same time, the target recognizer can propose an adjustment value for the border position of each region , so as to adjust the frame position of each region, and then accurately determine the lesion position; preferably, a loss function (Loss function) is used in the training of the classification score and the adjusted value;

还优选的，在进行所述训练时，采用基于mini-batch的梯度下降法，即对每一张训练图片产生一个包含多个阳性和阴性候选区域的mini-batch；随后从每张图片中随机抽样256个候选区域直到阳性候选区域和阴性候选区域的比例接近1:1，随后计算对应的mini-batch的损失函数；若一张图片中阳性候选区域的数量少于128个，则用阴性候选区域去填补这个 mini-batch；Also preferably, when performing the training, a gradient descent method based on mini-batch is used, that is, a mini-batch containing a plurality of positive and negative candidate regions is generated for each training picture; Sample 256 candidate regions until the ratio of positive candidate regions to negative candidate regions is close to 1:1, and then calculate the corresponding mini-batch loss function; if the number of positive candidate regions in a picture is less than 128, use negative candidates area to fill this mini-batch;

进一步优选的，将前50000个mini-batch的学习率设置为0.001，将后50000个mini-batch的学习率设置为0.0001；动量项优选设置为0.9，权值衰减优选设置为0.0005。Further preferably, the learning rate of the first 50,000 mini-batches is set to 0.001, and the learning rate of the last 50,000 mini-batches is set to 0.0001; the momentum item is preferably set to 0.9, and the weight decay is preferably set to 0.0005.

在另一个实施方案中，其中所述特征提取器能够对输入的任意尺寸和 /或分辨率的图像进行特征提取，所述图像可以是原图尺寸和/或分辨率，也可以是改变尺寸和/或分辨率后输入的图像，获得多维(例如256维或 512维)的特征图；In another embodiment, wherein the feature extractor can perform feature extraction on an input image of any size and/or resolution, the image can be the original size and/or resolution, or can be changed in size and / or the input image after resolution to obtain a multi-dimensional (eg 256-dimensional or 512-dimensional) feature map;

具体的，所述特征提取器包含X个卷积层和Y个采样层，其中第i 个(i在1-X之间)卷积层包含个Q_i个尺寸为m*m*p_i的卷积核，其中m*m 表示卷积核的长和宽的像素值，p_i等于上一个卷积层的卷积核数量Q_i-1，在第i个卷积层中，卷积核以步长L对来自上一级的数据(例如原图、第 i-1个卷积层、或者采样层)进行卷积操作；每个采样层包含1个以步长 2L移动的，大小为2L*2L的卷积核，对卷积层输入的图像进行卷积操作；其中，经过特征提取器进行特征提取后，最终获得Qx维的特征图；Specifically, the feature extractor includes X convolutional layers and Y sampling layers, wherein the i-th (i is between 1-X) convolutional layers include Q _i size m*m*p _i Convolution kernel, where m*m represents the pixel value of the length and width of the convolution kernel, p _i is equal to the number of convolution kernels Q _i-1 of the previous convolution layer, in the i-th convolution layer, the convolution kernel Convolute the data from the upper level (such as the original image, the i-1th convolutional layer, or the sampling layer) with a step size L; each sampling layer contains 1 moving with a step size 2L, the size is The 2L*2L convolution kernel performs convolution operation on the image input by the convolution layer; among them, after feature extraction by the feature extractor, the feature map of Qx dimension is finally obtained;

其中X在1-20之间，例如，1、2、3、4、5、6、7、8、9、10、11、 12、13、14、15、16、17、18、19或20；Y在1-10之间，例如1、2、3、4、5、6、7、8、9或10；m在2-10之间，例如2、3、4、5、6、7、8、9 或10；p在1-1024之间，Q在1-1024之间，p或Q的数值分别例如1、2、 3、4、5、6、7、8、9、10、11、12、13、14、15、16、32、64、128、256、 512或1024。Wherein X is between 1-20, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 ; Y is between 1-10, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10; m is between 2-10, such as 2, 3, 4, 5, 6, 7 , 8, 9 or 10; p is between 1-1024, Q is between 1-1024, and the values of p or Q are respectively 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 64, 128, 256, 512 or 1024.

在另一个实施方案中，其中所述候选区域生成器在所述特征图中设置滑动窗口，滑动窗口的大小为n×n，例如3×3；使滑动窗口沿特征图滑动，同时对于滑动窗口所在的每一个位置，其中心点与原图中的相应位置存在对应关系，并以所述相应位置为中心在原图中生成k个具有不同的尺度和长宽比的候选区域；其中，如果k个候选区域具有x种(例如3种)不同的尺度和长宽比，则k＝x²(例如k＝9)。In another embodiment, wherein the candidate region generator sets a sliding window in the feature map, the size of the sliding window is n×n, such as 3×3; the sliding window slides along the feature map, and for the sliding window For each position, there is a corresponding relationship between its center point and the corresponding position in the original image, and k candidate regions with different scales and aspect ratios are generated in the original image centering on the corresponding position; wherein, if k The candidate regions have x types (eg, 3 types) of different scales and aspect ratios, then k=x ² (eg, k=9).

在另一个实施方案中，所述目标识别器又包括中间层，分类层和边框回归层，其中中间层用于映射滑窗操作所形成的候选区域的数据，是一个多维(例如256维或512维)的向量；In another embodiment, the target recognizer includes an intermediate layer, a classification layer and a border regression layer, wherein the intermediate layer is used to map the data of the candidate regions formed by the sliding window operation, and is a multi-dimensional (such as 256 dimensions or 512 dimension) vector;

分类层和边框回归层分别与中间层连接，分类层用于判定该目标候选区域是前景(即阳性样本)还是背景(即阴性样本)，边框回归层用于生成候选区域中心点的x坐标和y坐标、以及候选区域的宽w和高h。The classification layer and the frame regression layer are respectively connected to the middle layer. The classification layer is used to determine whether the target candidate area is the foreground (ie positive sample) or the background (ie negative sample), and the frame regression layer is used to generate the x coordinates and The y coordinate, and the width w and height h of the candidate area.

本发明的第二个方面提供了一种胃癌图像的识别装置，包括存储有胃癌诊断图像、图像预处理程序以及可训练的图像识别程序的存储单元，优选还包括运算单元和显示单元；A second aspect of the present invention provides a recognition device for gastric cancer images, including a storage unit storing diagnostic images of gastric cancer, an image preprocessing program, and a trainable image recognition program, preferably also including a computing unit and a display unit;

所述装置能够利用包含胃癌病变的图像的图像识别程序进行训练(优选为有监督训练)，从而使经过训练后的图像识别程序能够对待检图像中胃癌病变部位进行识别；The device can use an image recognition program that includes images of gastric cancer lesions for training (preferably supervised training), so that the trained image recognition program can identify the gastric cancer lesion in the image to be checked;

优选的，所述待检图像是内镜照片或者实时影像。Preferably, the image to be inspected is an endoscopic photo or a real-time image.

在一个实施方案中，其中所述图像预处理程序在所述的胃癌诊断图像中精确框选胃癌的病变部位，框选内的部分定义为阳性样本，而框选外的部分定义为阴性样本，并输出病变的位置坐标信息和/或病变类型信息；优选在框选前，还预先对图像进行脱敏处理，去除病患个人信息；In one embodiment, wherein the image preprocessing program accurately frames the lesion of gastric cancer in the gastric cancer diagnostic image, the part within the frame is defined as a positive sample, and the part outside the frame is defined as a negative sample, And output the location coordinate information and/or lesion type information of the lesion; preferably, before the frame selection, the image is desensitized in advance to remove the patient's personal information;

优选的，所述框选能够生成一个包含病灶部位的矩形框或正方形框；所述坐标信息优选为左上角和右下角的点的坐标信息；Preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion; the coordinate information is preferably the coordinate information of the points in the upper left corner and the lower right corner;

还优选的，框选部位由下列方法确定：2n位内镜医师以“背对背”方式进行框选，即将2n人随机分成n组，2人/组，同时将所有图像随机分成n份，并随机分配给各组医师进行框选；当框选完成后，对比每组两位医师的框选结果，并对两位医师之间框选结果的一致性进行评估，最终确定框选部位，其中n为1-100之间的自然数，例如1、2、3、4、5、6、7、8、 9、10、20、30、40、50、60、70、80、90或100；Also preferably, the frame selection site is determined by the following method: 2n endoscopists perform frame selection in a "back-to-back" manner, that is, 2n people are randomly divided into n groups, 2 people/group, and all images are randomly divided into n parts, and randomly Allocate physicians to each group for frame selection; when the frame selection is completed, compare the frame selection results of the two physicians in each group, and evaluate the consistency of the frame selection results between the two physicians, and finally determine the frame selection site, where n is a natural number between 1 and 100, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

针对每一张病变图像，对比每组2位医师的框选结果的重叠面积，如果每组两位医师分别框选的部位重叠部分的面积(即交集)大于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果一致性好，并且将上述交集对应的对角线坐标保存为目标病变最终的定位；For each lesion image, compare the overlapping area of the frame selection results of the two physicians in each group, if the area of the overlapping part of the parts selected by the two physicians in each group (that is, the intersection) is greater than the area covered by the union of the two physicians 50% of the area, it is considered that the frame selection results of the two doctors are consistent, and the diagonal coordinates corresponding to the above intersection are saved as the final location of the target lesion;

若重叠部分的面积(即交集)小于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果相差较大，此类病变图片被单独挑选出来，由所有参与框选工作的2n位医师共同讨论确定目标病变的最终位置。If the area of the overlapping part (that is, the intersection) is less than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two doctors are quite different, and such lesion pictures are selected separately, and all participating doctors The 2n physicians working on frame selection work together to discuss and determine the final location of the target lesion.

在另一个实施方案中，所述图像识别程序为可训练的基于神经网络的图像识别程序，所述神经网络优选为卷积神经网络；优选的，所述图像识别程序包括特征提取器、候选区域生成器和目标识别器，其中：In another embodiment, the image recognition program is a trainable image recognition program based on a neural network, and the neural network is preferably a convolutional neural network; preferably, the image recognition program includes a feature extractor, a candidate region generator and object recognizer, where:

所述特征提取器用于对图像进行特征提取从而获得特征图，优选的，所述特征提取通过卷积操作进行；The feature extractor is used to perform feature extraction on the image to obtain a feature map, preferably, the feature extraction is performed through a convolution operation;

所述目标识别器计算所述候选区域的分类得分，所述得分指示该区域属于所述阳性样本和/或所述阴性样本的概率；同时目标识别器能够对每个区域的边框位置提出调整值，从而针对每个区域的边框位置进行调整，从而精确定病灶位置；优选的，所述分类得分和调整值的训练中使用了损失函数(Loss function)；The target recognizer calculates the classification score of the candidate region, the score indicates the probability that the region belongs to the positive sample and/or the negative sample; at the same time, the target recognizer can propose an adjustment value for the border position of each region , so as to adjust the frame position of each region, so as to accurately determine the lesion position; preferably, a loss function (Loss function) is used in the training of the classification score and the adjusted value;

在另一个实施方案中，其中在进行所述训练时，采用基于mini-batch 的梯度下降法，即对每一张训练图片产生一个包含多个阳性和阴性候选区域的mini-batch。随后从每张图片中随机抽样256个候选区域直到阳性候选区域和阴性候选区域的比例接近1:1，随后计算对应的mini-batch的损失函数。若一张图片中阳性候选区域的数量少于128个，则用阴性候选区域去填补这个mini-batch；In another embodiment, when performing the training, a mini-batch-based gradient descent method is used, that is, a mini-batch containing multiple positive and negative candidate regions is generated for each training picture. Then randomly sample 256 candidate regions from each picture until the ratio of positive candidate regions and negative candidate regions is close to 1:1, and then calculate the loss function of the corresponding mini-batch. If the number of positive candidate regions in a picture is less than 128, fill the mini-batch with negative candidate regions;

优选的，将前50000个mini-batch的学习率设置为0.001，将后50000 个mini-batch的学习率设置为0.0001；动量项优选设置为0.9，权值衰减优选设置为0.0005。Preferably, the learning rate of the first 50,000 mini-batches is set to 0.001, and the learning rate of the last 50,000 mini-batches is set to 0.0001; the momentum item is preferably set to 0.9, and the weight decay is preferably set to 0.0005.

其中X在1-20之间，例如，1、2、3、4、5、6、7、8、9、10、11、 12、13、14、15、16、17、18、19或20；Y在1-10之间，例如1、2、3、 4、5、6、7、8、9或10；m在2-10之间，例如2、3、4、5、6、7、8、9或10；p在1-1024之间，Q在1-1024之间，p或Q的数值分别例如1、2、 3、4、5、6、7、8、9、10、11、12、13、14、15、16、32、64、128、256、 512或1024。Wherein X is between 1-20, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 ; Y is between 1-10, such as 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10; m is between 2-10, such as 2, 3, 4, 5, 6, 7 , 8, 9 or 10; p is between 1-1024, Q is between 1-1024, and the values of p or Q are respectively 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 64, 128, 256, 512 or 1024.

本发明的第三个方面提供了本发明第一个方面的系统或第二个方面的装置在胃癌和/或胃癌前病变的预测和诊断中的用途。The third aspect of the present invention provides the use of the system of the first aspect or the device of the second aspect of the present invention in the prediction and diagnosis of gastric cancer and/or gastric precancerous lesions.

本发明的第四个方面提供了本发明第一个方面的系统或第二个方面的装置在胃癌图像或胃癌图像中病变部位的识别中的用途。The fourth aspect of the present invention provides the use of the system of the first aspect or the device of the second aspect of the present invention in the identification of gastric cancer images or lesion parts in gastric cancer images.

本发明的第五个方面提供了本发明第一个方面的系统或第二个方面的装置在胃癌和/或胃癌前病变的实时诊断中的用途。The fifth aspect of the present invention provides the use of the system of the first aspect or the device of the second aspect of the present invention in the real-time diagnosis of gastric cancer and/or gastric precancerous lesions.

本发明的第六个方面提供了本发明第一个方面的系统或第二个方面的装置在胃癌图像或胃癌图像中病变部位的实时识别中的用途。The sixth aspect of the present invention provides the use of the system of the first aspect or the device of the second aspect of the present invention in real-time identification of gastric cancer images or lesion parts in gastric cancer images.

经过发明人长期的摸索发现，由于胃癌病变部位存在着自身特点，即病变部位不够显著，与周边组织界限不够清晰，因此图像识别模型训练的难度比起来常规的任务(如识别生活中物体)的难度更大，稍加不慎即会导致训练难以收敛从而导致失败。而在本发明中，发明人通过基于神经网络的图像识别模型，改进了其训练方法(例如通过框选精确界定训练图像中的目标病变位置，提高了图像识别模型的识别精准度等)，从而得到了一种对内镜图片中的胃癌病灶的智能高效识别的识别系统(和/或装置)，其识别率高于普通的内镜医师。使用机器学习加强后的实时诊断系统，还能够对消化道病变及其位置和概率进行实时监控和识别，从而能够极大提升普通医生对胃癌的检出率，降低误诊率，为胃癌诊断提供了安全可靠的技术。After a long period of exploration, the inventor found that the lesion of gastric cancer has its own characteristics, that is, the lesion is not obvious enough, and the boundary with the surrounding tissue is not clear enough, so the difficulty of image recognition model training is compared with that of conventional tasks (such as recognizing objects in daily life). It is more difficult, and a little carelessness will make the training difficult to converge and lead to failure. In the present invention, the inventor improved the training method of the neural network-based image recognition model (such as accurately defining the target lesion position in the training image by frame selection, improving the recognition accuracy of the image recognition model, etc.), thereby A recognition system (and/or device) for intelligently and efficiently recognizing gastric cancer lesions in endoscopic pictures is obtained, and its recognition rate is higher than that of ordinary endoscopists. The real-time diagnosis system enhanced by machine learning can also monitor and identify gastrointestinal lesions and their location and probability in real time, which can greatly improve the detection rate of gastric cancer for ordinary doctors, reduce the misdiagnosis rate, and provide a better diagnosis for gastric cancer. Safe and reliable technology.

附图说明Description of drawings

图1包含胃癌病灶部位的内窥镜图像Figure 1 contains endoscopic images of gastric cancer lesion sites

图2框选过程示意图Figure 2 Schematic diagram of frame selection process

图3本发明的图像识别系统所鉴定得到的胃癌的病灶部位。Fig. 3 is the lesion site of gastric cancer identified by the image recognition system of the present invention.

具体实施方案specific implementation plan

除非另外说明，本公开中使用的术语具有所属领域普通技术人员理解的一般含义。下面是一些术语在本公开内容中的含义，如果跟其他定义有不一致，以以下定义为准。Unless otherwise specified, terms used in this disclosure have their ordinary meanings as understood by those of ordinary skill in the art. The following are the meanings of some terms in this disclosure. If there is any inconsistency with other definitions, the following definitions shall prevail.

定义definition

术语“胃癌”，是指源于胃粘膜上皮细胞的恶性肿瘤，包含早期胃癌和进展期胃癌。The term "gastric cancer" refers to malignant tumors derived from gastric mucosal epithelial cells, including early gastric cancer and advanced gastric cancer.

术语“模块”，是指能够实现特定效果的功能集合，所述模块可以仅由计算机执行，也可以由人工执行，或者由计算机和人工一同完成。The term "module" refers to a set of functions capable of achieving a specific effect. The module can be executed only by a computer, or manually, or both.

获取病变数据Get lesion data

获取病变数据步骤的关键作用在于得到用于深度学习的样本材料。The key role of the step of obtaining lesion data is to obtain sample materials for deep learning.

在一个实施方案中，所述获取过程可以具体包括采集和初筛的步骤。In one embodiment, the acquisition process may specifically include the steps of collection and preliminary screening.

所述“采集”是指按照“诊断为胃癌”的标准在所有的内镜数据库中搜索采集所有患有胃癌的患者的所有内镜诊断图像，例如被诊断为“胃癌” 的患者所属文件夹内所有的图片，即某患者在整个内镜检查过程中所有存储的图片，因此还可能包括目标部位病变以外的胃镜检查图片，例如该患者被诊断为良性溃疡、息肉等，但其名下的文件夹中还包含了食管、胃底、胃体、十二指肠等检查过程中各部位存储的图片。The "acquisition" refers to searching and collecting all endoscopic diagnostic images of all patients with gastric cancer in all endoscopic databases according to the standard of "diagnosed as gastric cancer", for example, in the folder to which the patients diagnosed as "stomach cancer" belong All pictures, that is, all stored pictures of a patient during the entire endoscopy process, so it may also include gastroscopy pictures other than target lesions, for example, the patient was diagnosed with benign ulcers, polyps, etc., but the files under his name The folder also contains pictures stored in various parts of the esophagus, gastric fundus, gastric body, and duodenum during the examination process.

所述“初筛”是对采集得到的胃癌患者的病理图像进行筛选的步骤，具体可以由经验丰富的内镜医师依据病例中“内镜检查所见”结合“病理诊断”中的相关内容描述来进行的。由于用于深度学习网络的图片必须是质量清晰、特征准确的，否则会导致学习难度加大或者识别结果不准确。因此病变数据初筛的模块和/或步骤能够把存在明确胃癌病灶部位的图片从一套检查图片中挑选出来。The "preliminary screening" is a step of screening the collected pathological images of patients with gastric cancer, which can be specifically described by an experienced endoscopist based on the relevant content in the "endoscopy findings" and "pathological diagnosis" in the case to carry out. Because the pictures used for deep learning network must be of clear quality and accurate features, otherwise it will lead to more difficult learning or inaccurate recognition results. Therefore, the modules and/or steps of the primary screening of lesion data can select pictures with clear gastric cancer lesion sites from a set of examination pictures.

重要的是，初筛会结合病人活检后组织病理学结果即“病理诊断”中对萎缩部位的描述，精确定位病变，同时兼顾图片清晰度、拍摄角度、放大程度等，尽量选择那些清晰度高、放大程度适中、能窥见病变全貌的内镜图像。The important thing is that the initial screening will combine the histopathological results of the patient's biopsy, that is, the description of the atrophy site in the "pathological diagnosis", to accurately locate the lesion, and at the same time take into account the clarity of the picture, shooting angle, magnification, etc., and try to choose those with high definition , Moderate magnification, endoscopic images that can see the whole picture of the lesion.

通过初筛，能够保证输入训练集的图片均是高质量的包含确定病变部位的图像，从而提高录入训练的数据集的特征准确性，以便人工智能网络能够更好地从中归纳、总结出萎缩性病变的图像特征，提高诊断准确率。Through the preliminary screening, it can be ensured that the pictures input into the training set are all high-quality images containing certain lesion parts, thereby improving the feature accuracy of the input training data set, so that the artificial intelligence network can better summarize and summarize the atrophy Image features of lesions to improve diagnostic accuracy.

病变数据预处理Lesion Data Preprocessing

所述的预处理即完成精确框选胃癌的病灶部位的过程，在框选内的部分定义为阳性样本，而框选外的部分定义为阴性样本，并输出病灶的位置坐标信息和病灶类型信息。The preprocessing is to complete the process of accurately frame-selecting the lesion of gastric cancer, the part within the frame is defined as a positive sample, and the part outside the frame is defined as a negative sample, and the location coordinate information and lesion type information of the lesion are output .

在一个实施方案中，病变数据预处理全部或者部分是通过“图像预处理程序”所实现的。In one embodiment, all or part of the lesion data preprocessing is realized by an "image preprocessing program".

术语“图像预处理程序”，是指能够实现图像中目标区域的框选，从而标示出目标区域类型和范围的程序。The term "image preprocessing program" refers to a program that can realize the frame selection of the target area in the image, thereby marking the type and range of the target area.

在一个实施方案中，图像预处理程序还能够对图像进行脱敏处理，去除病患个人信息。In one embodiment, the image preprocessing program can also desensitize the image to remove personal information of the patient.

在一个实施方案中，图像预处理程序是一个利用计算机编程语言编写的能够执行前述功能的软件。In one embodiment, the image preprocessing program is a software written in a computer programming language capable of performing the aforementioned functions.

在另一个实施方案中，图像预处理程序是能够执行框选功能的软件。In another embodiment, the image preprocessing program is software capable of performing a frame selection function.

在一个具体实施方案中，执行框选功能的软件能够将待处理的图片导入软件，并在操作界面显示该图片，此时实施框选操作人员(例如医生) 只需在拟框出的目标病变部位沿着自左上至右下的(或其它的对角方向) 方向拖动鼠标，从而形成一个涵盖目标病变的矩形框或正方形框，且同时后台生成并存储该矩形框左上角和右下角的准确坐标以唯一定位。In a specific embodiment, the software that performs the frame selection function can import the picture to be processed into the software, and display the picture on the operation interface. At this time, the operator (for example, a doctor) who implements frame selection only needs to select the target lesion to be framed. Drag the mouse along the direction from upper left to lower right (or other diagonal directions) to form a rectangular frame or square frame covering the target lesion, and at the same time, the background generates and stores the upper left corner and lower right corner of the rectangular frame Accurate coordinates for unique positioning.

为了保证预处理(或者框选)的准确性，本发明进一步加强了对框选质量控制，这也是本发明的方法/系统能够获得更大准确性的一个重要保证，具体方式如下：选择2n位(例如6位、8位、10位等)内镜医师以“背对背”方式进行框选，即2n人随机分成n组，2人/组，同时将所有筛选后的训练图像随机也等分成n份，并随机分配给各组医师进行框选；当框选完成后，对比每组2位医师的框选结果，并对两位医师之间框选结果的一致性进行评估，最终确定框选部位。In order to ensure the accuracy of preprocessing (or frame selection), the present invention further strengthens the quality control of frame selection, which is also an important guarantee that the method/system of the present invention can obtain greater accuracy. The specific method is as follows: select 2n bits (For example, 6, 8, 10, etc.) Endoscopists select frames in a "back-to-back" manner, that is, 2n people are randomly divided into n groups, 2 people/group, and all the selected training images are also randomly divided into n and randomly assigned to each group of physicians for frame selection; when the frame selection was completed, compare the frame selection results of the two physicians in each group, and evaluate the consistency of the frame selection results between the two physicians, and finally determine the frame selection parts.

在一个实施方案中，一致性的评价标准为：对同一张病变图片来说，对比每组2位医师的框选结果也即对比对角坐标所确定的矩形框的重叠面积，我们规定若两矩形框重叠部分的面积(即交集)大于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果一致性好，并且将上述交集对应的对角线坐标保存为目标病变最终的定位。相反地，若两矩形框重叠部分的面积(即交集)小于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果相差较大，那么这样的病变图片将会被软件后台单独挑选出来，后期集中由所有参与框选工作的医师共同商讨确定目标病变的最终位置。In one embodiment, the consistency evaluation standard is: for the same lesion picture, comparing the frame selection results of 2 physicians in each group, that is, comparing the overlapping areas of the rectangular frames determined by the diagonal coordinates, we stipulate that if two If the area of the overlapped part of the rectangular frames (i.e. the intersection) is greater than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two physicians are consistent, and the diagonal coordinates corresponding to the above intersection are saved for the final location of the target lesion. Conversely, if the area of the overlapped part of the two rectangular frames (that is, the intersection) is less than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two doctors are quite different, and such lesion pictures will be It will be individually selected by the software background, and in the later stage, all physicians involved in the frame selection work will jointly discuss and determine the final location of the target lesion.

图像识别模型Image Recognition Model

术语“图像识别模型”是指基于机器学习和/或深度学习的原理构建的算法，也可以被称为“可训练的图像识别模型”或“图像识别程序”。The term "image recognition model" refers to an algorithm built on the principles of machine learning and/or deep learning, and may also be referred to as a "trainable image recognition model" or "image recognition program".

在一个实施方案中，该程序是一种神经网络，所述的神经网络优选是卷积神经网络；在另一个实施方案中，所述神经网络基于LeNet-5、RCNN、 SPP、Fast-RCNN和/或Faster-RCNN架构的卷积神经网络；其中 faster-RCNN可以看做是Fast-RCNN与RPN的组合，在一个实施方案中，基于faster-RCNN网络。In one embodiment, the program is a neural network, preferably a convolutional neural network; in another embodiment, the neural network is based on LeNet-5, RCNN, SPP, Fast-RCNN and / or the convolutional neural network of the Faster-RCNN architecture; where the faster-RCNN can be regarded as a combination of Fast-RCNN and RPN, in one embodiment, based on the faster-RCNN network.

图像识别程序至少包括下列层级：原图特征提取层、候选区域选定层和目标识别层，通过预设的算法调整可训练参数。The image recognition program includes at least the following layers: the original image feature extraction layer, the candidate area selection layer and the target recognition layer, and the trainable parameters are adjusted through the preset algorithm.

术语“原图特征提取层”是指能够对所输入的待训练图像经过数学计算从而多维度提取原图信息的层级或层级组合。所述层实际上可以表示多个不同功能层的组合。The term "original image feature extraction layer" refers to a layer or a combination of layers that can extract the original image information in multiple dimensions through mathematical calculations on the input image to be trained. Said layers may actually represent a combination of several different functional layers.

在一个实施方案中，原图特征提取层可以基于ZF或VGG16网络。In one embodiment, the original image feature extraction layer can be based on ZF or VGG16 network.

术语“卷积层”，是指在原图特征提取层中，负责对原始输入图像或者经过采样层处理后的图像信息进行卷积操作，从而提取信息的网络层。所述卷积操作实际上是通过一个特定大小的卷积核(如3*3)以一定的步长 (比如1个像素)在输入的图像上滑动，在卷积核移动的过程中将图片上的像素和卷积核的对应权重相乘，最后将所有乘积相加得到一个输出而实现的。在图像处理中，往往把图像表示为像素的向量，因此一副数字图像可以看作一个二维空间的离散函数，例如表示为f(x,y),假设有对于二维卷积操作函数C(u,v)，则会产生输出图像g(x,y)＝f(x,y)*C(u,v),利用卷积可以实现对图像模糊处理和信息提取。The term "convolutional layer" refers to the network layer responsible for performing convolution operations on the original input image or the image information processed by the sampling layer in the original image feature extraction layer to extract information. The convolution operation actually slides on the input image through a convolution kernel of a specific size (such as 3*3) with a certain step size (such as 1 pixel), and the image is moved during the movement of the convolution kernel. It is realized by multiplying the corresponding weights of the pixels on the convolution kernel and adding all the products together to obtain an output. In image processing, an image is often represented as a vector of pixels, so a digital image can be regarded as a discrete function in a two-dimensional space, for example, expressed as f(x,y), assuming that there is a two-dimensional convolution operation function C (u, v), the output image g(x, y)=f(x, y)*C(u, v) will be generated, and the image blurring and information extraction can be realized by convolution.

术语“训练”是指通过输入大量经过人工标注的样本，对可训练的图像识别程序进行参数的反复自调，从而实现预期的目的，即实现识别胃癌图像中的病变部位。The term "training" refers to the repeated self-adjustment of the parameters of the trainable image recognition program by inputting a large number of manually labeled samples, so as to achieve the expected purpose, that is, to realize the recognition of the lesion in the gastric cancer image.

在一个实施方案中，本发明基于faster-rcnn网络，并在步骤S4中采用如下的端到端的训练方法：In one embodiment, the present invention is based on faster-rcnn network, and adopts following end-to-end training method in step S4:

(1)使用在ImageNet上预训练的模型初始化目标候选区域生成网络 (RPN)的参数，并对该网络进行微调；(1) Use the model pre-trained on ImageNet to initialize the parameters of the target candidate region generation network (RPN), and fine-tune the network;

(2)同样使用ImageNet上的预训练的模型初始化Fast R-CNN网络参数，随后利用(1)中RPN网络提取的region proposal进行训练；(2) Also use the pre-trained model on ImageNet to initialize the Fast R-CNN network parameters, and then use the region proposal extracted by the RPN network in (1) for training;

(3)使用(2)的Fast R-CNN网络重新初始化RPN,固定卷积层微调 RPN网络，其中只微调中RPN的cls和/或reg层；(3) Use the Fast R-CNN network of (2) to reinitialize the RPN, fix the convolutional layer to fine-tune the RPN network, and only fine-tune the cls and/or reg layers of the RPN;

(4)固定(2)中Fast R-CNN的卷积层，使用(3)中RPN提取的region proposal对Fast R-CNN网络进行微调，其中仅微调Fast R-CNN的全连接层。(4) Fix the convolutional layer of Fast R-CNN in (2), use the region proposal extracted by RPN in (3) to fine-tune the Fast R-CNN network, and only fine-tune the fully connected layer of Fast R-CNN.

术语“候选区域选定层”：是指通过算法实现在原始图像上选定特定的区域用于分类识别和边框回归的层级或层级组合，与原图特征提取层类似，所述层也可以表示多个不同层的组合。The term "candidate area selection layer": refers to the layer or layer combination that selects a specific area on the original image for classification recognition and frame regression through an algorithm. Similar to the feature extraction layer of the original image, the layer can also represent A combination of several different layers.

在一个实施方案中候选区域选定层针对原始的输入层直接连接。In one embodiment the region candidate selection layer is directly connected to the original input layer.

在一个实施方案中，候选区域选定层与原图特征提取层的最后一层直接连接。In one embodiment, the candidate region selection layer is directly connected to the last layer of the original image feature extraction layer.

在一个实施方案中，“候选区域选定层”可以基于RPN。In one embodiment, the "candidate region selection layer" may be based on RPN.

术语“目标识别层”术语“采样层”，有时候可以叫做池化层，其操作类似于卷积层，只不过采样层的卷积核为只取对应位置的最大值、平均值等 (最大池化、平均池化)。The term "target recognition layer" and the term "sampling layer" can sometimes be called a pooling layer. Its operation is similar to a convolutional layer, except that the convolution kernel of the sampling layer only takes the maximum value, average value, etc. of the corresponding position (maximum pooling, average pooling).

术语“特征图”，也叫feature map，是指经过原图特征提取层对原图图像进行卷积运算后获得的小面积高维度的多通道图像，作为示例，特征图可以是尺度为51*39的256通道图像。The term "feature map", also called feature map, refers to a small-area, high-dimensional multi-channel image obtained after the original image feature extraction layer performs convolution operations on the original image. As an example, the feature map can be a scale of 51* 39 of the 256-channel images.

术语“滑动窗口”是指在特征图上生成的小尺寸(如2*2,3*3)的窗口，沿着特征图的每一个位置移动，虽然特征图尺寸也并不大，但是由于特征图已经经过多层的数据提取(如卷积)，因此在特征图上使用较小的滑动窗口即可实现更大的视野。The term "sliding window" refers to a window of small size (such as 2*2, 3*3) generated on the feature map, moving along each position of the feature map, although the size of the feature map is not large, but due to the feature The graph has been subjected to multiple layers of data extraction (such as convolution), so a larger view can be achieved with a smaller sliding window on the feature map.

术语“候选区域”，也可以称为候选窗口、目标候选区域、reference box、bounding box，另外在本文中也可以与anchor或anchor box替换使用。The term "candidate area" can also be called candidate window, target candidate area, reference box, bounding box, and it can also be used interchangeably with anchor or anchor box in this article.

在一个实施方案中，首先通过滑动窗口定位至特征图的一个位置，针对该位置，生成k个不同面积不同比例的矩形或正方形窗口，例如9个，并锚定于该位置的中心，因此也叫做anchor或anchor box，并基于特征图中每一个滑窗与原图的中心位置的关系，形成候选区域，所述候选区域本质上可以认为是最后一层卷积层上所移动的滑窗(3*3)所对应的原图区域范围。In one embodiment, the sliding window is first used to locate a position of the feature map. For this position, k rectangular or square windows with different areas and different proportions are generated, for example, 9, and are anchored at the center of the position, so also It is called anchor or anchor box, and based on the relationship between each sliding window in the feature map and the center position of the original image, a candidate area is formed. The candidate area can essentially be considered as the sliding window moved on the last convolutional layer ( 3*3) corresponds to the area of the original image.

在本发明的一个实施方案中，k＝9，在产生候选区域时包括下述步骤：In one embodiment of the present invention, k=9, comprise the following steps when generating candidate region:

(1)首先按照不同面积和长宽比生成9种anchor box，该anchor box 不以特征图或者原始输入图像的大小发生变化；(1) First generate 9 kinds of anchor boxes according to different areas and aspect ratios, and the anchor boxes do not change with the feature map or the size of the original input image;

(2)对于每张输入图像，根据图像大小计算每一个滑窗所对应原图的中心点；(2) For each input image, calculate the center point of the original image corresponding to each sliding window according to the image size;

(3)基于上述计算建立滑窗位置和原图位置的映射关系。(3) Establish the mapping relationship between the sliding window position and the original image position based on the above calculation.

术语“中间层”，是指利用滑窗形成目标候选区域后，将特征图进一步映射到一个多维(例如256维或512维)的向量中，可将该层视为一个新的层级，本发明中称之为为中间层。中间层后连接分类层和窗口回归层。The term "intermediate layer" means that after using the sliding window to form the target candidate area, the feature map is further mapped into a multi-dimensional (such as 256-dimensional or 512-dimensional) vector, and this layer can be regarded as a new level. The present invention referred to as the middle layer. The middle layer is connected to the classification layer and the window regression layer.

术语“分类层”(cls_score)，与中间层输出的连接的一条支路，该支路能够输出2k个得分，分别对应k个目标候选区域的两个得分，其中一个是前景(即阳性样本)得分，一个是背景(即阴性样本)得分，这个分数可以判断该目标候选区域是真正的目标还是背景。因此对于每一个滑窗位置，分类层即可从高维度(例如256维)特征中输出属于前景(即阳性样本)和背景(即阴性样本)的概率。The term "classification layer" (cls_score), a branch of the connection with the output of the intermediate layer, which can output 2k scores, corresponding to two scores of k target candidate regions, one of which is the foreground (ie positive sample) Score, one is the background (ie negative sample) score, this score can judge whether the target candidate area is the real target or the background. Therefore, for each sliding window position, the classification layer can output the probability of belonging to the foreground (ie positive samples) and background (ie negative samples) from high-dimensional (eg 256-dimensional) features.

具体的，在一个实施方案中，当候选区域与任意ground-truth box(真实样本边界，即需要识别的对象在原始图像中的边界)的IOU(交并比) 大于0.7是，可以被认为是阳性样本或正标签，当候选区域与任意 ground-truth box的IOU小于0.3时，则认为是背景(即阴性样本)，从而对每个anchor分配了类标签。其中IOU从数学含以上表示了候选区域与ground-truth box的重叠度，其计算方法如下：Specifically, in one embodiment, when the IOU (intersection-over-union ratio) of the candidate area and any ground-truth box (real sample boundary, that is, the boundary of the object to be recognized in the original image) is greater than 0.7, it can be considered as Positive samples or positive labels, when the IOU between the candidate area and any ground-truth box is less than 0.3, it is considered to be the background (ie negative samples), and a class label is assigned to each anchor. Among them, the IOU indicates the degree of overlap between the candidate area and the ground-truth box from the above mathematics, and its calculation method is as follows:

IOU＝(A∩B)/(A∪B)IOU＝(A∩B)/(A∪B)

分类层可以输出k+1维数组p，表示属于k类和背景的概率。对每个 RoI(Region ofInteresting)输出离散型概率分布，p则由k+1类的全连接层利用softmax计算得出。数学表达如下：The classification layer can output a k+1-dimensional array p, representing the probability of belonging to k classes and backgrounds. For each RoI (Region of Interesting), the discrete probability distribution is output, and p is calculated by the fully connected layer of the k+1 class using softmax. The mathematical expression is as follows:

p＝(p₀，p₁，...，p_k)p=(p ₀ , p ₁ , . . . , p _k )

术语“窗口回归层”(bbox_pred)，与中间层输出的连接的另一条支路，与分类层并列。该层能够输出在每一个位置上，9个anchor对应窗口应该平移缩放的参数。分别对应k个目标候选区域，每个目标候选区域有4个边框位置调整值，这4个边框位置调整值指的是目标候选区域的左上角的x_a坐标、y_a坐标和目标候选区域的高h_a和宽w_a的调整值。该支路的作用是对目标候选区域位置进行微调，使最后得到的结果位置更加准确。The term "windowed regression layer" (bbox_pred), another branch of the connection to the output of the intermediate layer, is juxtaposed with the classification layer. This layer can output the parameters that the 9 anchors correspond to the window should be panned and zoomed at each position. Corresponding to k target candidate areas respectively, each target candidate area has 4 border position adjustment values, these 4 border position adjustment values refer to the x _a coordinate, y _a coordinate of the upper left corner of the target candidate area and the target candidate area Adjustment value for height h _a and width w _a . The role of this branch is to fine-tune the position of the target candidate area to make the final result position more accurate.

窗口回归层可以输出bounding box回归的位移，输出4*K维数组t，表示分别属于k类时，应该平移缩放的参数。数学表达如下：The window regression layer can output the displacement of the bounding box regression, and output a 4*K-dimensional array t, indicating the parameters that should be translated and scaled when they belong to the k category. The mathematical expression is as follows:

k表示类别的索引，

是指相对于object proposal尺度不变的平移，

是指对数空间中相对于object proposal的高与宽。k represents the index of the category,

Refers to the scale-invariant translation relative to the object proposal,

Refers to the height and width relative to the object proposal in logarithmic space.

在一个实施方案中，本发明通过损失函数(Loss function)实现对分类层和窗口回归层的同时训练，该函数是由classification loss(即分类层 softmax loss)和regression loss(即L1 loss)按一定比重组成的。：In one embodiment, the present invention implements the simultaneous training of the classification layer and the window regression layer through a loss function (Loss function), which is composed of classification loss (ie classification layer softmax loss) and regression loss (ie L1 loss) according to a certain composed of specific gravity. :

计算softmax loss需要候选区域对应ground truth的标定结果和预测结果；计算regression loss需要三组信息：The calculation of softmax loss requires the calibration results and prediction results of the candidate area corresponding to the ground truth; the calculation of regression loss requires three sets of information:

(1)预测候选区域中心位置坐标x,y和宽高w,h；(1) Predict the coordinates x, y and width and height w, h of the center of the candidate area;

(2)候选区域周边9个锚点reference boxes中的每一个中心点位置坐标x_a,y_a和宽高w_a,h_a。(2) Position coordinates x _a , y _a and width and height w _a , h _a of each center point in the 9 anchor point reference boxes around the candidate area.

(3)真实标定框(ground truth)对应的中心点位置坐标x*,y*和宽高w*,h*。(3) The center point position coordinates x*, y* and width and height w*, h* corresponding to the ground truth.

计算regression loss和总Loss方式如下：The way to calculate regression loss and total Loss is as follows:

t_x＝(x-x_a)/w_a，t_y＝(y-y_a)/h_a，t _x =(xx _a )/w _a , t _y =(yy _a )/h _a ,

t_w＝log(w/w_a)，t_h＝log(h/h_a)，t _w =log(w/w _a ), t _h =log(h/h _a ),

其中，p_i为anchor预测为目标的概率。Among them, p _i is the probability that the anchor is predicted to be the target.

有两个数值，

等于0为负标签，

等于1是正标签。

has two values,

Equal to 0 for negative labels,

Equal to 1 is a positive label.

t_i表示预测的候选区域的4个参数化坐标的向量集合。t _i represents a vector set of 4 parameterized coordinates of the predicted candidate region.

表示postive anchor对应的ground truth包围盒的坐标向量。

Indicates the coordinate vector of the ground truth bounding box corresponding to the postive anchor.

在一个实施方案中，在损失函数的训练时，采用基于mini-batch的梯度下降法，即对每一张训练图片产生一个包含多个阳性和阴性候选区域 (anchor)的mini-batch。随后从每张图片中随机抽样256个anchor直到阳性anchor和阴性anchor的比例接近1:1，随后计算对应的mini-batch 的损失函数(Loss function)。若一张图片中阳性anchor的数量少于128 个，则用阴性anchor去填补这个mini-batch。In one embodiment, when the loss function is trained, a gradient descent method based on mini-batch is used, that is, a mini-batch comprising a plurality of positive and negative candidate regions (anchor) is generated for each training picture. Then randomly sample 256 anchors from each picture until the ratio of positive anchors to negative anchors is close to 1:1, and then calculate the corresponding mini-batch loss function (Loss function). If the number of positive anchors in a picture is less than 128, the mini-batch is filled with negative anchors.

在一个具体实施方案中，将前50000个mini-batch的学习率设置为 0.001，将后50000个mini-batch的学习率设置为0.0001；动量项优选设置为0.9，权值衰减优选设置为0.0005。In a specific embodiment, the learning rate of the first 50,000 mini-batches is set to 0.001, and the learning rate of the last 50,000 mini-batches is set to 0.0001; the momentum item is preferably set to 0.9, and the weight decay is preferably set to 0.0005.

经过上述训练后，将训练后的深度学习网络用于识别目标病变的内镜图片。在一个实施方案中，分类评分被设定为0.85，即深度学习网络确认病变概率超过85％的病灶才会被标示出来，从而该图片被判定为阳性；相反，如果一张图片中没有检测到可疑的病变区域，那么这张图片就被判定为阴性。After the above training, the trained deep learning network is used to identify endoscopic images of target lesions. In one embodiment, the classification score is set to 0.85, that is, the deep learning network confirms that the lesions with a lesion probability exceeding 85% will be marked, so that the picture is judged as positive; on the contrary, if no lesion is detected in a picture Suspicious lesion area, then this picture is judged as negative.

实施例Example

1.免除知情同意声明：1. Waiver of informed consent statement:

(1)本研究仅利用北京友谊医院消化科内镜中心在以往临床诊疗中获得的内镜图片及相关临床资料，进行回顾性的观察研究，不会对患者病情、治疗、预后甚至生命安全造成任何影响；(1) This study only uses the endoscopic pictures and relevant clinical data obtained in the previous clinical diagnosis and treatment by the Gastroenterology Endoscopy Center of Beijing Friendship Hospital to conduct retrospective observation and research, and will not affect the patient's condition, treatment, prognosis or even life safety. any effect;

(2)由主要研究者一人单独完成所有数据采集工作，并在图片数据采集完成后，立即应用特殊软件对所有图片进行抹去个人信息处理，确保在后续的医师筛选、框选及人工智能编程专家录入训练、调试及测试过程中，并未造成患者隐私信息的泄露；(2) The main researcher alone completes all data collection work, and immediately applies special software to erase personal information on all pictures after the picture data collection is completed, so as to ensure the follow-up physician screening, frame selection and artificial intelligence programming In the process of expert input training, debugging and testing, the patient's private information was not leaked;

(3)消化科内镜中心电子病历查询系统中，并未设置“联系方式”或 “家庭住址”等词条可显示，即该系统并未录入患者的联系信息，故本研究无法追溯到纳入患者签署知情同意书。(3) In the electronic medical record query system of the Gastroenterology Endoscopy Center, there are no entries such as "contact information" or "home address" to display, that is, the system has not entered the patient's contact information, so this study cannot be traced back to the inclusion Patients signed informed consent.

2.病理图像采集2. Pathological image acquisition

入选标准： Inclusion criteria :

(1)自2013年1月1日起至2017年6月10日止于北京友谊医院消化内镜中心接受内镜检查(包括电子胃镜、电子结肠镜、超声内镜、电子染色内镜、放大内镜及色素内镜)的患者；(1) From January 1, 2013 to June 10, 2017, endoscopic examinations (including electronic gastroscopy, electronic colonoscopy, ultrasonic endoscopy, electronic chromoendoscopy, and magnifying endoscopy) were accepted at the Digestive Endoscopy Center of Beijing Friendship Hospital endoscopy and chromoendoscopy) patients;

(2)镜下诊断“胃癌”(包括且不区分早期胃癌和进展期胃癌)的患者；(2) Patients diagnosed with "gastric cancer" under the microscope (including but not distinguishing early gastric cancer and advanced gastric cancer);

排除标准： Exclusion criteria :

(1)消化道恶性肿瘤累及部位广泛或不明确者；(1) Gastrointestinal malignant tumors involving extensive or unclear sites;

(2)仅患有胰胆系统恶性肿瘤者；(2) Those who only have malignant tumors of the pancreaticobiliary system;

(3)同时合并其他系统恶性肿瘤者；(3) Patients with other systemic malignancies at the same time;

(2)内镜图片不清晰和/或拍摄角度不符合要求者。(2) The endoscopic picture is not clear and/or the shooting angle does not meet the requirements.

3、实验流程和结果3. Experimental process and results

(1)数据采集：由研究者从北京友谊医院消化科内镜中心电子病历系统中查找出于2013年1月1日至2017年6月10日之间接受内镜检查 (包括电子胃镜、电子结肠镜、超声内镜、电子染色内镜、放大内镜及色素内镜)，并且镜下诊断为“胃癌”(包括且不区分早期胃癌和进展期胃癌)的患者的内镜图片及相关临床资料；(1) Data collection: The researchers searched from the electronic medical record system of the Gastroenterology Endoscopy Center of Beijing Friendship Hospital who received endoscopic examinations (including electronic gastroscopy, electronic Colonoscopy, ultrasonography, electronic chromoendoscopy, magnifying endoscopy and chromoendoscopy), and the endoscopic pictures and related clinical findings of patients diagnosed with "gastric cancer" (including but not distinguishing between early gastric cancer and advanced gastric cancer) material;

(2)抹去个人信息：采集完成后立即对所有图片进行抹去个人信息处理。(2) Erase personal information: Immediately after the collection is completed, all pictures will be erased from personal information.

(3)图片筛选：对所有处理后的图片进行精加工，筛选出有明确病理结果确认为胃癌的病例所对应的内镜图片，并根据活检病理部位，最终筛选出每个病例中包含目标病变部位的清晰、背景干扰少的图片，共计 3774张；(3) Image screening: all processed images are refined, and the endoscopic images corresponding to cases with clear pathological results confirmed as gastric cancer are screened out, and the target lesion is finally screened out in each case according to the pathological site of the biopsy A total of 3774 pictures with clear parts and less background interference;

(4)构建测试数据集：测试图片共100张，包括有病理结果确认的 “胃癌”(早期胃癌和进展期胃癌均可)50张，再另在数据库中随机采集有病理结果确证的胃的“非肿瘤性病变”(包括胃良性溃疡、息肉、间质瘤、脂肪瘤、异位胰腺)内镜图片50张。具体操作包括：(4) Construct a test data set: a total of 100 test pictures, including 50 "gastric cancer" (early gastric cancer and advanced gastric cancer) confirmed by pathological results, and randomly collected in the database. Stomach images confirmed by pathological results 50 endoscopic pictures of "non-neoplastic lesions" (including benign gastric ulcers, polyps, stromal tumors, lipomas, and heterotopic pancreas). Specific operations include:

首先从步骤(3)筛选出的所有胃癌图片中随机选取50张；First randomly select 50 pictures from all gastric cancer pictures screened out in step (3);

再另在数据库中随机采集有病理结果确证的胃的“非肿瘤性病变”In addition, the "non-neoplastic lesions" of the stomach confirmed by pathological results were randomly collected in the database

(包括胃良性溃疡、息肉、间质瘤、脂肪瘤、异位胰腺)内镜图片50张，并立即对上述50张图片进行抹去个人信息处理；(Including benign gastric ulcer, polyp, stromal tumor, lipoma, heterotopic pancreas) 50 endoscopic pictures, and immediately delete the personal information of the above 50 pictures;

(5)构建训练数据集：从步骤(3)筛选出的胃癌图片中，排除步骤 (4)中随机选择用于构建测试数据集的图片，剩余3724张用于深度学习网络训练，从而构成训练数据集；(5) Build a training data set: from the gastric cancer pictures screened in step (3), exclude the pictures randomly selected for building the test data set in step (4), and the remaining 3724 pictures are used for deep learning network training, thus forming the training data set;

(6)框选目标病变：6位内镜医师以“背对背”方式，将6人随机分成 3组，2人/组；所有筛选后的训练图片随机等分成3份，并随机分配给各组医师进行框选。病变框选步骤的实施基于自行编写的软件，所述软件能够将待处理的图片导入软件后即可在操作界面显示该图片，此时医师需在拟框出的目标病变部位沿着自左上至右下的方向拖动鼠标，从而形成一个涵盖目标病变的矩形框，且同时后台生成并存储该矩形框左上角和右下角的准确坐标以唯一定位。(6) Frame target lesion: 6 endoscopists randomly divided 6 people into 3 groups in a "back-to-back" manner, 2 people/group; all the training pictures after screening were randomly divided into 3 parts, and randomly assigned to each group Physician checks. The implementation of the lesion frame selection step is based on self-written software, which can display the image on the operation interface after the image to be processed is imported into the software. Drag the mouse in the lower right direction to form a rectangular frame covering the target lesion, and at the same time the background generates and stores the exact coordinates of the upper left corner and lower right corner of the rectangular frame for unique positioning.

框选完成后，对比每组2位医师的框选结果，对同一张病变图片来说，对比对角坐标所确定的矩形框的重叠面积，经过测试后，最终确定若两矩形框重叠部分的面积(即交集)大于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果一致性好，并且将上述交集对应的对角线坐标保存为目标病变最终的定位。相反若两矩形框重叠部分的面积(即交集)小于该两者的并集所覆盖的面积的50％，则认为2位医师的框选判断结果相差较大，那么这样的病变图片将会被软件后台(或者人工标记)单独挑选出来，后期集中由所有参与框选工作的医师共同商讨确定目标病变的最终位置。After the frame selection is completed, compare the frame selection results of the 2 physicians in each group. For the same lesion picture, compare the overlapping areas of the rectangular frames determined by the diagonal coordinates. If the area (that is, the intersection) is greater than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two physicians are consistent, and the diagonal coordinates corresponding to the above intersection are saved as the final target lesion. position. On the contrary, if the area of the overlapping part of the two rectangular frames (that is, the intersection) is less than 50% of the area covered by the union of the two, it is considered that the frame selection results of the two doctors are quite different, and such lesion pictures will be discarded. The software background (or manual marking) is selected individually, and in the later stage, all physicians involved in the frame selection work will jointly discuss and determine the final location of the target lesion.

(7)录入训练：将上述所有框选完成的图片录入基于faster-rcnn卷积神经网络中进行训练，并测试了ZF和VGG16两种网络结构；训练采用端对端的方式；(7) Input training: input all the above-mentioned frame-selected pictures into the convolutional neural network based on faster-rcnn for training, and test the two network structures of ZF and VGG16; the training adopts an end-to-end method;

其中ZF网络具有5个卷积层、3个全连层和一个softmax分类输出层，VGG16网络具有13个卷积层、3个全连层以及一个softmax分类输出层，在Faster-RCNN的框架下，ZF和VGG16模型均是用来提取训练图像特征的基础CNN。The ZF network has 5 convolutional layers, 3 fully connected layers and a softmax classification output layer, and the VGG16 network has 13 convolutional layers, 3 fully connected layers and a softmax classification output layer. Under the framework of Faster-RCNN , ZF and VGG16 models are the basic CNNs used to extract training image features.

训练时，采用基于mini-batch的梯度下降法，即对每一张训练图片产生一个包含多个阳性和阴性候选区域(anchor)的mini-batch。随后从每张图片中随机抽样256个anchor直到阳性anchor和阴性anchor的比例接近1:1，随后计算对应的mini-batch的损失函数(Loss function)。若一张图片中阳性anchor的数量少于128个，则用阴性anchor去填补这个 mini-batch。During training, a gradient descent method based on mini-batch is used, that is, a mini-batch containing multiple positive and negative candidate regions (anchor) is generated for each training picture. Then randomly sample 256 anchors from each picture until the ratio of positive anchors to negative anchors is close to 1:1, and then calculate the corresponding mini-batch loss function (Loss function). If the number of positive anchors in a picture is less than 128, the mini-batch is filled with negative anchors.

将前50000个mini-batch的学习率设置为0.001，将后50000个 mini-batch的学习率设置为0.0001；动量项优选设置为0.9，权值衰减优选设置为0.0005。Set the learning rate of the first 50,000 mini-batches to 0.001, and set the learning rate of the last 50,000 mini-batches to 0.0001; the momentum item is preferably set to 0.9, and the weight decay is preferably set to 0.0005.

在训练时使用的损失函数(Loss Function)如下：The loss function (Loss Function) used during training is as follows:

上式中，i代表每个batch中anchor的索引，p_i代表anchor是否为目标(Object)的概率；p_i*是该anchor的真实标签：当anchor为Object 则标签为1，反之则标签为0。t_i是一个4维向量分别表示bounding box 的参数化坐标，而t_i*则表示用于bounding box回归预测中的bounding box 参数化坐标的标签。In the above formula, i represents the index of the anchor in each batch, p _i represents the probability of whether the anchor is the target (Object); p _i * is the real label of the anchor: when the anchor is Object, the label is 1, otherwise the label is 0. t _i is a 4-dimensional vector representing the parameterized coordinates of the bounding box, and t _i * represents the label of the bounding box parameterized coordinates used in the bounding box regression prediction.

(8)测试及结果统计：利用测试数据集(包括50张胃癌和50张胃“非肿瘤性病变”的图片)，分别对人工智能系统、不同年资的消化科医师进行测试，比较、评价两者在诊断方面的敏感性、特异性、准确率、一致性等指标，并进行统计学分析。测试中，将训练后的深度学习网络用于识别目标病变的内镜图片时的分类评分设定为0.85，即深度学习网络确认病变概率超过85％的病灶才会被标示出来，从而该图片被判定为阳性；相反，如果一张图片中没有检测到可疑的病变区域，那么这张图片就被判定为阴性。(8) Test and result statistics: Using the test data set (including 50 pictures of gastric cancer and 50 pictures of gastric "non-tumor lesions"), the artificial intelligence system and gastroenterologists with different seniority were tested, compared and evaluated The sensitivity, specificity, accuracy, consistency and other indicators of the two in diagnosis were analyzed statistically. In the test, when the trained deep learning network is used to identify endoscopic pictures of target lesions, the classification score is set to 0.85, that is, only lesions with a lesion probability of more than 85% confirmed by the deep learning network will be marked. It is judged as positive; on the contrary, if no suspicious lesion area is detected in a picture, then this picture is judged as negative.

结果如下：The result is as follows:

基于国家消化疾病临床研究中心的平台，共有在胃癌内镜下病变诊断测试中，89名参与医师总体的敏感性波动于48％～100％范围，其中位数为88％，平均敏感性为87％；特异性波动于10％～98％范围(其中位数 78％，平均特异性为74％)，准确率则波动于51％～91％范围(其中位数 82％，平均准确率为80％)。而深度学习网络模型诊断的识别敏感性为 90％，特异性为50％，准确率70％。因此在基于胃镜图片的胃癌诊断方面，人工智能在敏感性高于总体医师水平，不过特异性相对于中位数水平偏低，准确率也稍低于医师的中位数水平，但是考虑到深度学习网络模型诊断模型的在识别中具有极佳的稳定性，而不同医师在特异性、准确率方面具有极大的波动和不稳定性，因此使用人工智能识别病灶仍然能够有效的排除医师个体差异带来的诊断偏差，因而具有良好的应用前景。Based on the platform of the National Clinical Research Center for Digestive Diseases, a total of 89 participating physicians had overall sensitivity fluctuations in the range of 48% to 100% in the diagnosis of gastric cancer endoscopic lesions, with a median of 88% and an average sensitivity of 87%. %; the specificity fluctuates in the range of 10% to 98% (the median is 78%, the average specificity is 74%), and the accuracy rate fluctuates in the range of 51% to 91% (the median is 82%, the average accuracy is 80%) %). The recognition sensitivity of the deep learning network model diagnosis is 90%, the specificity is 50%, and the accuracy rate is 70%. Therefore, in the diagnosis of gastric cancer based on gastroscopy pictures, the sensitivity of artificial intelligence is higher than that of the general physicians, but the specificity is lower than the median level, and the accuracy is slightly lower than the median level of physicians, but considering the depth The learning network model diagnostic model has excellent stability in identification, but different doctors have great fluctuations and instability in specificity and accuracy, so the use of artificial intelligence to identify lesions can still effectively exclude individual differences in doctors The diagnostic bias brought by it has a good application prospect.

其中，敏感性也叫做敏感度(sensitivity，SEN)，又称真阳性率(true positiverate，TPR)，即实际患病又被诊断标准正确地诊断出来的百分比。Among them, sensitivity is also called sensitivity (sensitivity, SEN), also known as true positive rate (true positive rate, TPR), that is, the percentage of the actual disease that is correctly diagnosed by the diagnostic criteria.

特异性，也叫做特异度(specificity，SPE)，又称真阴性率(true negative rate，TNR)，反映了筛检试验确定非病人的能力。Specificity, also known as specificity (specificity, SPE), or true negative rate (true negative rate, TNR), reflects the ability of a screening test to identify non-patients.

准确率＝正确识别的个体总数/识别出的个体总数。Accuracy rate = total number of correctly identified individuals/total number of identified individuals.

以上所述仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. All modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention are included within the protection scope of the present invention.

Claims

1. A gastric cancer image recognition system, comprising:

a. the data input module is used for inputting an image containing a gastric cancer lesion part, wherein the image is preferably an endoscope image;

b. the data preprocessing module is used for receiving the image from the data input module, precisely framing the lesion part of the gastric cancer, defining the part inside the framing as a positive sample and defining the part outside the framing as a negative sample, and outputting coordinate information and/or lesion type information of the lesion part; preferably, before the frame selection, the module also carries out desensitization treatment on the image in advance to remove personal information of the patient;

preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion; the coordinate information is preferably coordinate information of points at the upper left corner and the lower right corner of the rectangular frame or the square frame;

further preferably, the boxed location is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians randomly divide the 2n endoscopic physicians into n groups and 2 endoscopic physicians/groups, simultaneously randomly divide all the images into n images and randomly distribute the images to all the endoscopic physicians for selecting the images; when the frame selection is completed, comparing the frame selection results of each group of two physicians, and evaluating the consistency of the frame selection results between the two physicians, and finally determining a frame selection part, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

further preferably, the criterion for evaluating the consistency of the results of the frame selection between two physicians is as follows:

aiming at each lesion picture, comparing the overlapping area of the framing results of two doctors in each group, if the area (namely intersection) of the overlapping parts of the parts respectively framed and selected by the two doctors in each group is more than 50% of the area covered by the union of the two doctors, considering that the framing judgment results of the two doctors have good consistency, and storing the diagonal coordinates corresponding to the intersection, namely the coordinates of the points at the upper left corner and the lower right corner as the final positioning of the target lesion;

if the area of the overlapped part (namely the intersection) is less than 50% of the area covered by the union of the two, the frame selection judgment results of the two doctors are considered to have larger difference, the lesion pictures are selected independently, and all 2n doctors participating in the frame selection work discuss and determine the final position of the target lesion together;

c. the image recognition model building module can receive the image processed by the data preprocessing module and is used for building and training an image recognition model based on a neural network, and the neural network is preferably a convolutional neural network;

d. and the lesion recognition module is used for inputting the image to be detected into the trained image recognition model and judging whether a lesion and/or the position of the lesion exist in the image to be detected based on the output result of the image recognition model.

2. The system of claim 1, the image recognition model building module comprising a feature extractor, a candidate region generator, and a target identifier, wherein:

the feature extractor is used for performing feature extraction on the image from the data preprocessing module to obtain a feature map, and preferably, the feature extraction is performed through a convolution operation;

the candidate region generator is used for generating a plurality of candidate regions based on the feature map;

the target identifier calculating a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjustment value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value;

it is also preferable that the training is performed by a mini-batch based gradient descent method, that is, a mini-batch containing a plurality of positive and negative candidate regions is generated for each training picture; then randomly sampling 256 candidate regions from each picture until the proportion of the positive candidate region to the negative candidate region is close to 1, and then calculating a loss function of the corresponding mini-batch; if the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

further preferably, the learning rate of the first 50000 mini-batch is set to 0.001, and the learning rate of the last 50000 mini-batch is set to 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

3. The system of claim 2, wherein the feature extractor can perform feature extraction on an input image of any size and/or resolution, the image may be an original image size and/or resolution, or an input image with changed size and/or resolution, to obtain a feature map in multiple dimensions (e.g., 256 dimensions or 512 dimensions);

specifically, the feature extractor comprises X volumesA packed layer and Y sampling layers, wherein the ith (i is between 1-X) packed layer contains Q _i Size of m x m p _i Where m x m represents the pixel values of the length and width of the convolution kernel, p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

wherein X is between 1-20, e.g., 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20; y is between 1 and 10, e.g. 1, 2,3, 4, 5, 6, 7, 8, 9 or 10; m is between 2 and 10, such as 2,3, 4, 5, 6, 7, 8, 9 or 10; p is between 1 and 1024, Q is between 1 and 1024, and the value of p or Q is, for example, 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 32, 64, 128, 256, 512, or 1024, respectively.

4. A system according to claim 2 or 3, wherein the candidate region generator sets a sliding window in the feature map, the sliding window having a size of n x n, such as 3 x 3; enabling the sliding window to slide along the feature map, enabling a central point of each position where the sliding window is located to have a corresponding relation with a corresponding position in the original image, and generating k candidate regions with different scales and aspect ratios in the original image by taking the corresponding position as a center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

5. The system according to any one of claims 2-4, wherein the target identifier further comprises an intermediate layer, a classification layer and a bounding box regression layer, wherein the intermediate layer is used for mapping data of candidate regions formed by sliding window operations, and is a multi-dimensional (e.g. 256-dimensional or 512-dimensional) vector;

the classification layer and the frame regression layer are respectively connected with the intermediate layer, the classification layer is used for judging whether the target candidate region is a foreground (positive sample) or a background (negative sample), and the frame regression layer is used for generating an x coordinate and a y coordinate of a central point of the candidate region and a width w and a height h of the candidate region.

6. A stomach cancer image recognition device comprises a storage unit for storing a stomach cancer diagnosis image, an image preprocessing program and a trainable image recognition program, and preferably further comprises an arithmetic unit and a display unit;

the device can be trained (preferably supervised training) by using an image recognition program containing images of gastric cancer lesions, so that the trained image recognition program can recognize gastric cancer lesion parts in an image to be detected;

preferably, the image to be detected is an endoscopic photograph or a real-time image.

7. The apparatus according to claim 6, wherein the image preprocessing program precisely frames a lesion site of gastric cancer in the gastric cancer diagnostic image, the portion inside the frame is defined as a positive sample, and the portion outside the frame is defined as a negative sample, and outputs position coordinate information and/or lesion type information of the lesion; preferably, before the frame selection, desensitization treatment is carried out on the image in advance to remove personal information of the patient;

preferably, the frame selection can generate a rectangular frame or a square frame containing the lesion site; the coordinate information is preferably coordinate information of points at the upper left corner and the lower right corner;

also preferably, the boxed site is determined by the following method: the 2n endoscopic physicians select the images in a back-to-back mode, namely 2n endoscopic physicians are randomly divided into n groups, 2 endoscopic physicians/group, all the images are randomly divided into n parts at the same time, and the n parts are randomly distributed to all the endoscopic physicians for selecting the images; when the frame selection is completed, comparing the frame selection results of each group of two physicians, evaluating the consistency of the frame selection results between the two physicians, and finally determining a frame selection part, wherein n is a natural number between 1 and 100, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or 100;

aiming at each lesion image, comparing the overlapping area of the framing result of each group of 2 doctors, if the overlapping area (namely intersection) of the parts respectively framed by the two doctors in each group is more than 50% of the area covered by the union of the two doctors, considering that the framing judgment result of the 2 doctors has good consistency, and storing the diagonal coordinate corresponding to the intersection as the final positioning of the target lesion;

if the area of the overlapped part (i.e. the intersection) is less than 50% of the area covered by the union of the two, the framing judgment results of 2 doctors are considered to be greatly different, such lesion pictures are separately selected, and all 2n doctors participating in the framing work discuss the final position of the target lesion together.

8. The apparatus of claim 6 or 7, the image recognition program being a trainable neural network based image recognition program, the neural network preferably being a convolutional neural network; preferably, the image recognition program comprises a feature extractor, a candidate region generator and an object recognizer, wherein:

the feature extractor is configured to perform feature extraction on the image to obtain a feature map, and preferably, the feature extraction is performed by a convolution operation;

the target identifier calculating a classification score for the candidate region, the score being indicative of a probability that the region belongs to the positive sample and/or the negative sample; meanwhile, the target recognizer can provide an adjustment value for the frame position of each region, so that the frame position of each region is adjusted, and the position of a focus is accurately determined; preferably, a Loss function (Loss function) is used in the training of the classification score and the adjustment value.

9. The apparatus according to any one of claims 6 to 8, wherein in the training, a mini-batch based gradient descent method is used, i.e. a mini-batch comprising a plurality of positive and negative candidate regions is generated for each training picture. Then 256 candidate regions are randomly sampled from each picture until the ratio of positive candidate region to negative candidate region approaches 1, and then the corresponding mini-batch penalty function is calculated. If the number of the positive candidate areas in one picture is less than 128, filling the mini-batch with the negative candidate areas;

preferably, the learning rate of the first 50000 mini-batchs is set to be 0.001, and the learning rate of the last 50000 mini-batchs is set to be 0.0001; the momentum term is preferably set to 0.9 and the weight attenuation is preferably set to 0.0005.

10. The apparatus according to claim 8 or 9, wherein the feature extractor is capable of performing feature extraction on an input image of any size and/or resolution, the image may be an original image size and/or resolution, or an input image after the size and/or resolution is changed, so as to obtain a feature map of multiple dimensions (for example, 256 dimensions or 512 dimensions);

specifically, the feature extractor comprises X convolutional layers and Y sampling layers, wherein the ith convolutional layer (i is between 1-X) comprises Q _i Size of m x p _i Wherein m x m represents the pixel values of the length and width of the convolution kernel, p _i Number of convolution kernels equal to the last convolution layer Q _i-1 In the ith convolutional layer, the convolutional kernel performs convolutional operation on the data from the previous stage (such as the original image, the (i-1) th convolutional layer or the sampling layer) by a step length L; each sampling layer comprises 1 convolution kernel which moves by step length 2L and has the size of 2L x 2L, and the convolution operation is carried out on the image input by the convolution layer; after feature extraction is carried out by the feature extractor, a Qx-dimensional feature map is finally obtained;

11. The apparatus according to any of claims 8 to 10, wherein the candidate region generator sets a sliding window in the feature map, the sliding window having a size of n x n, such as 3 x 3; enabling the sliding window to slide along the feature map, enabling a central point of each position where the sliding window is located to have a corresponding relation with a corresponding position in the original image, and generating k candidate regions with different scales and aspect ratios in the original image by taking the corresponding position as a center; wherein k = x if the k candidate regions have x (e.g. 3) different scales and aspect ratios ² (e.g., k = 9).

12. The apparatus of any one of claims 8 to 11, wherein the target identifier further comprises an intermediate layer, a classification layer and a bounding box regression layer, wherein the intermediate layer is used for mapping data of candidate regions formed by sliding window operations, and is a multi-dimensional (e.g. 256-dimensional or 512-dimensional) vector;

and the classification layer and the frame regression layer are respectively connected with the intermediate layer, the classification layer is used for judging whether the target candidate region is a foreground (namely a positive sample) or a background (namely a negative sample), and the frame regression layer is used for generating an x coordinate and a y coordinate of a central point of the candidate region and a width w and a height h of the candidate region.

13. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for the prediction and diagnosis of gastric and/or pre-gastric cancer lesions.

14. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 in gastric cancer images or identification of lesion sites in gastric cancer images.

15. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for real-time diagnosis of gastric and/or pre-gastric cancer lesions.

16. Use of the system according to any one of claims 1 to 5 or the device according to any one of claims 6 to 12 for real-time identification of gastric cancer images or lesion sites in gastric cancer images.