CN111091127A

CN111091127A - Image detection method, network model training method and related device

Info

Publication number: CN111091127A
Application number: CN201911300256.3A
Authority: CN
Inventors: 陈鹏; 孙钟前
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-01
Anticipated expiration: 2039-12-16
Also published as: CN111091127B

Abstract

The application discloses an image detection method, a network model training method and a related device, wherein an image to be detected in a target field is obtained, and then the image to be detected is input into a first network model for removing a background so as to determine at least one region of interest; and inputting at least one region of interest into the second network model to obtain a distinguishing feature corresponding to the at least one region of interest in source field data, wherein the source field data is used for indicating marking data, and then determining a detection result according to the distinguishing feature. Therefore, the image detection in the target field is realized, so that the unmarked data in the target field can be marked and determined through the marked data in the source field, the detection error caused by field change is reduced, and the accuracy of the image detection is improved.

Description

Image detection method, network model training method and related device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image detection method, a network model training method, and a related apparatus.

Background

With the improvement of computer performance, the excellent performance of the deep neural network is remarkable, wherein image detection is an application of the neural network, the task of the image detection is to detect the position and the category of an interested object in a picture, and compared with picture classification, finer-grained annotation is required, so that the acquisition cost and the workload of annotation data are increased.

Generally, a training set, i.e., source domain data, is used to train a network model to obtain a correspondence between image features and labels, and then the network model is used to process target domain data to obtain relevant features.

However, the data acquisition of the source domain and the target domain are often in different environments, and the distribution of the source domain and the target domain is often different, for example, in brightness, angle, definition, and the like, at this time, the performance of the neural network model trained by the training set in the test set is often reduced, and an error of an image detection result is caused, which affects the accuracy of image detection.

Disclosure of Invention

In view of this, the present application provides an image detection method, which can effectively reduce detection errors caused by changes in image fields and improve the accuracy of image detection.

The present application provides an image detection method, which can be applied to a system or procedure for medical image detection, and specifically includes: acquiring an image to be detected, wherein the image to be detected is target field data, and the target field data is used for indicating unmarked data;

inputting the image to be detected into a first network model to determine at least one region of interest, wherein the first network model is used for removing the background of the image to be detected;

inputting at least one region of interest into a second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source domain data, wherein the source domain data is used for indicating label data, the second network model associates the source domain data with a plurality of target domain data through the region of interest, and the distinguishing feature is a key region used for determining a detection result in the region of interest;

and determining a detection result according to the distinguishing characteristics, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and corresponding labeling data.

Optionally, in some possible implementation manners of the present application, the image to be detected is a medical detection image, the position of the region of interest is a position of a focus, and the annotation data is a type of the focus.

Optionally, in some possible implementation manners of the present application, the inputting the image to be detected into a first network model to determine at least one region of interest includes:

generating at least one anchor frame on the image to be detected;

inputting the anchor frame into a first network model to obtain an anchor frame containing a foreground, wherein the foreground is used for indicating effective data in the image to be detected;

and determining at least one region of interest according to the anchor frame containing the foreground.

Optionally, in some possible implementation manners of the present application, the determining a detection result according to the distinguishing feature includes:

acquiring a region corresponding to the distinguishing feature;

and if the area corresponding to the distinguishing feature meets a preset condition, determining a detection result according to the distinguishing feature, wherein the preset condition is set based on the proportion of the background in the area corresponding to the distinguishing feature.

Optionally, in some possible implementations of the present application, after the image to be detected is acquired, the method further includes:

acquiring parameters of target field data corresponding to the image to be detected;

adjusting parameters of target field data corresponding to the image to be detected to corresponding parameters in the source field to obtain an image to be detected after image conversion;

the inputting the image to be detected into a first network model to determine at least one region of interest includes:

and inputting the converted image to be detected into the first network model to determine at least one region of interest.

A second aspect of the present application provides a method for training a network model, which specifically includes: acquiring an image set to be trained, wherein the image set to be trained comprises a source field data set and a target field data set, the source field data set comprises at least one source field data, the target field data set comprises at least one target field data, the source field data is used for indicating labeled data, and the target field data is used for indicating unlabeled data;

training a first network model by adopting the source domain data set and the target domain data set;

determining at least one region of interest according to the trained first network model;

and training a second network model by adopting the region of interest, the source field data set and the target field data set, wherein the trained second network model is used for indicating the marking data corresponding to the region of interest in the target field data.

Optionally, in some possible implementations of the present application, the method further includes:

training the second network model by adopting a first loss function, the source field data set and the target field data set to obtain a first gradient of the second network model, wherein the first loss function is used for aligning the characteristics of the source field data set and the characteristics of the target field data set, the first loss function belongs to an antagonistic learning loss function, and the first gradient of the second network model is used for updating the parameters of the second network model.

Optionally, in some possible implementation manners of the present application, the training the second network model by using the first loss function, the source domain data set, and the target domain data set to obtain the first gradient of the second network model includes:

inputting the first loss function, the source domain data set, and the target domain data set into a gradient inversion layer to invert a target function;

and training the second network model according to the reversed objective function to obtain a first gradient of the second network model.

respectively determining an interested region in the source field data set and an interested region in the target field data set according to the trained first network model;

training the second network model by adopting a second loss function, the region of interest in the source field data set and the region of interest in the target field data set to obtain a second gradient of the second network model, wherein the second loss function is used for aligning the characteristics of the region of interest in the source field data set and the characteristics of the region of interest in the target field data set, the second loss function belongs to a counterlearning loss function, and the second gradient of the second network model is used for updating parameters of the second network model.

determining a plurality of anchor boxes in the source domain data set;

setting weights for the source field data set and the target field data set according to the proportion of the foreground in the anchor frame;

the training the second network model with the first loss function, the source domain data set, and the target domain data set to obtain a first gradient of the second network model includes:

and training the second network model by adopting a first loss function, a source domain data set with the weight and a target domain data set with the weight to obtain a first gradient of the second network model.

A third aspect of the present application provides an apparatus for image detection, comprising: the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be detected, the image to be detected is target field data, and the target field data is used for indicating unmarked data;

the determining unit is used for inputting the image to be detected into a first network model to determine at least one region of interest, and the first network model is used for removing the background of the image to be detected;

the input unit is used for inputting at least one region of interest into a second network model so as to obtain a corresponding distinguishing feature of the at least one region of interest in source domain data, the source domain data is used for indicating label data, the second network model associates the source domain data and a plurality of target domain data through the region of interest, and the distinguishing feature is a key region used for determining a detection result in the region of interest;

and the detection unit is used for determining a detection result according to the distinguishing characteristics, and the detection result is used for indicating the position of the region of interest in the image to be detected and corresponding labeling data.

Optionally, in some possible implementations of the present application, the determining unit is specifically configured to generate at least one anchor frame on the image to be detected;

the determining unit is specifically configured to input the anchor frame into a first network model to obtain an anchor frame including a foreground, where the foreground is used to indicate effective data in the image to be detected;

the determining unit is specifically configured to determine at least one region of interest according to the anchor frame containing the foreground.

Optionally, in some possible implementation manners of the present application, the detection unit is specifically configured to obtain an area corresponding to the distinguishing feature;

the detection unit is specifically configured to determine a detection result according to the distinguishing feature if the region corresponding to the distinguishing feature satisfies a preset condition, where the preset condition is set based on a proportion of a background in the region corresponding to the distinguishing feature.

Optionally, in some possible implementation manners of the present application, the obtaining unit is further configured to obtain parameters of target domain data corresponding to the image to be detected;

the acquisition unit is further configured to adjust parameters of target field data corresponding to the image to be detected to corresponding parameters in the source field, so as to obtain an image to be detected after image conversion;

the determining unit is specifically configured to input the converted image to be detected into the first network model, so as to determine at least one region of interest.

A fourth aspect of the present application provides a training apparatus for a neural network model, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image set to be trained, the image set to be trained comprises a source field data set and a target field data set, the source field data set comprises at least one source field data, the target field data set comprises at least one target field data, the source field data is used for indicating labeled data, and the target field data is used for indicating unlabeled data;

a training unit, configured to train a first network model by using the source domain data set and the target domain data set;

a determining unit, configured to determine at least one region of interest according to the trained first network model;

the training unit is further configured to train a second network model by using the region of interest, the source domain data set, and the target domain data set, where the trained second network model is used to indicate the labeled data corresponding to the region of interest in the target domain data.

Optionally, in some possible implementation manners of the present application, the training unit is further configured to train the second network model by using a first loss function, the source domain data set, and the target domain data set to obtain a first gradient of the second network model, where the first loss function is used to align features of the source domain data set and features of the target domain data set, the first loss function belongs to a countering learning loss function, and the first gradient of the second network model is used to update parameters of the second network model.

Optionally, in some possible implementations of the present application, the training unit is specifically configured to input the first loss function, the source domain data set, and the target domain data set into a gradient inversion layer, so as to invert the target function;

the training unit is specifically configured to train the second network model according to the inverted objective function to obtain a first gradient of the second network model.

Optionally, in some possible implementation manners of the present application, the training unit is further configured to determine, according to the trained first network model, an area of interest in the source domain data set and an area of interest in the target domain data set respectively;

the training unit is further configured to train the second network model by using a second loss function, the region of interest in the source domain data set, and the region of interest in the target domain data set to obtain a second gradient of the second network model, where the second loss function is used to align features of the region of interest in the source domain data set and features of the region of interest in the target domain data set, the second loss function belongs to a countering learning loss function, and the second gradient of the second network model is used to update parameters of the second network model.

Optionally, in some possible implementations of the present application, the training unit is further configured to determine a plurality of anchor frames in the source domain data set;

the training unit is further used for setting weights for the source field data set and the target field data set according to the proportion of the foreground in the anchor frame;

the training unit is specifically configured to train the second network model by using a first loss function, a source domain data set with weights, and a target domain data set with weights, so as to obtain a first gradient of the second network model.

A fifth aspect of the present application provides a medical examination apparatus, comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of image detection according to the first aspect or any one of the first aspects according to instructions in the program code.

A fifth aspect of the present application provides a server comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of network model training according to any of the above second aspect or the second aspect according to instructions in the program code.

A sixth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method for image detection according to the first aspect or any one of the first aspects, or the method for network model training according to any one of the second aspects or the second aspects.

According to the technical scheme, the embodiment of the application has the following advantages:

determining at least one region of interest by acquiring an image to be detected in a target field and inputting the image to be detected into a first network model; and inputting at least one region of interest into the second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source field data, wherein the source field data is used for indicating marking data, and then determining a detection result according to the distinguishing feature, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and the corresponding marking data. The image detection in the target field is realized, and the target field and the source field are associated through the network model, so that the unmarked data in the target field can be marked and determined through the marked data in the source field, the detection error caused by field change is reduced, and the image detection accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of an architecture of a medical image inspection system according to an embodiment of the present application;

fig. 2 is a flowchart of image detection according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for image detection according to an embodiment of the present application;

fig. 4 is a scene schematic diagram of image detection according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of image conversion according to an embodiment of the present application;

fig. 6 is a flowchart of a method for training a network model according to an embodiment of the present disclosure;

FIG. 7 is an architecture diagram of a network model in image inspection according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a medical examination device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides an image detection method and a related device, which can be applied to a medical image detection system or program, and can determine at least one region of interest by acquiring an image to be detected in a target field and inputting the image to be detected into a first network model; and inputting at least one region of interest into the second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source field data, wherein the source field data is used for indicating marking data, and then determining a detection result according to the distinguishing feature, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and the corresponding marking data. The image detection in the target field is realized, and the target field and the source field are associated through the network model, so that the unmarked data in the target field can be marked and determined through the marked data in the source field, the detection error caused by field change is reduced, and the image detection accuracy is improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some nouns that may appear in the embodiments of the present application are explained.

Depth detection model: and inputting the picture based on an object detection model of the deep neural network, and identifying the position and the category of the target object in the picture.

Region of interest (ROI): in image processing, a region to be processed is outlined by various operators and functions in the form of a box, a circle, an ellipse, an irregular polygon, or the like from an image to be processed.

Source domain data: a data set containing manual annotations. For example, in the model development stage, the training data provided by the cooperative hospital includes the features of various lesions and the corresponding position information.

Target domain data: and the data distribution is different from the source field due to the lack of manually marked data sets.

Domain shift: the data distribution of the two domains is different.

The method has the following field adaptation: a model trained on the basis of source domain labeling data has good generalization performance in a target domain in machine learning.

Feature alignment: aligning the feature distributions of the two domains is one type of approach to address domain adaptation.

It should be understood that the present application may be applied to an Artificial Intelligence (AI) -based medical diagnosis scenario, and is particularly used for detecting and analyzing an input medical image, that is, outputting an analysis result of the medical image by using a medical image detection model, so that a medical staff or a researcher can obtain a more accurate diagnosis result. Specifically, referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a medical image detection system in an embodiment of the present application, as shown in the figure, a large number of medical images may be obtained by a medical detection device, and it should be noted that the medical images include, but are not limited to, a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, an ultrasound (ultrasound, US) image, and a molybdenum target image.

The medical field based on AI includes Computer Vision technology (CV), and Computer Vision is a science for researching how to make a machine "see", and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face recognition and fingerprint recognition.

The CT image is composed of a certain number of pixels with different gray scales from black to white which are arranged according to a matrix. These pixels reflect the X-ray absorption coefficients of the corresponding voxels. CT images are represented in different gray scales, reflecting the degree of absorption of X-rays by organs and tissues. Thus, like the black and white image shown in the X-ray image, the black image represents a low absorption region, i.e., a low density region, such as a lung containing much gas; white shading represents a high absorption zone, i.e. a high density zone, such as bone. However, in the CT image, the density resolution of CT is high, that is, the density resolution is high, compared with the X-ray image. Therefore, the CT image can better display organs composed of soft tissues such as brain, spinal cord, mediastinum, lung, liver, gallbladder, pancreas, and pelvic organs, and display images of lesions on a good anatomical image background.

MRI images have been applied to imaging diagnosis of various systems throughout the body. The best effect is craniocerebral, spinal cord, heart great vessel, joint skeleton, soft tissue, pelvic cavity and the like. The heart vessel disease not only can observe the anatomical change of each chamber, great vessel and valve, but also can be used for ventricular analysis, qualitative and semi-quantitative diagnosis, can be used for making a plurality of section images, has higher spatial resolution, displays the overall appearance of the heart and lesion and the relation between the heart and the surrounding structure, and is superior to other X-ray imaging, two-dimensional ultrasound, nuclide and CT examination. When diagnosing encephalomyelitis, it can be used to make coronal, sagittal and transverse images.

The US image reflects the difference of acoustic parameters in the medium, and information other than optical, X-ray, and Y-ray can be obtained. The ultrasonic wave has good resolving power to the soft tissue of the human body, and is beneficial to identifying the tiny pathological changes of the biological tissue. When the ultrasonic image displays the living tissue, the required image can be obtained without dyeing treatment.

The molybdenum target image is a new digital image technology combining the traditional radiation technology and the modern computer technology, and finally converts an analog image of the common X-ray photography into a digital image which can be processed quantitatively, so that the traditional X-ray photography technology and the image quality are substantially leaped, and a radiologist is more likely to find suspicious malignant lesions in a mammogram, so that the molybdenum target image is considered to be a method which is helpful for improving the early detection rate of breast cancer. The application mainly takes the molybdenum target image for detecting mammary gland as an example for introduction.

The medical detection equipment sends medical images to the server, and the medical images can be detected through a medical image detection model trained in the server, for example: if the calcified area exists in the medical image, the calcified area is extracted and then judged to be malignant or benign, if the calcified area is malignant, a malignant calcification positioning result is generated, the server sends the malignant calcification positioning result to the terminal equipment, and the terminal equipment can generate and print a report according to the malignant calcification positioning result and can also directly display the malignant calcification positioning result on a display screen.

It should be noted that the terminal device includes, but is not limited to, a palm computer, a mobile phone, a printer, a personal computer, a notebook computer, and a tablet computer.

It is understood that the medical examination device, the server and the terminal device included in the medical image examination system may be three independent devices, or may be integrated in the same system, and are not limited herein.

In order to solve the above problems, the present application provides an image detection method, which is applied to the flow framework of image detection shown in fig. 2, and as shown in fig. 2, for the flow framework of image detection provided in the embodiment of the present application, first, a medical detection device acquires an image of a relevant part of a user, and then sends the acquired image to a network model to obtain disease information included in the image and indicate a position of a disease, where the network model is obtained by training active domain data and target domain data.

It can be understood that the method provided by the present application may be a program written as a processing logic in a hardware system, or may be an image detection apparatus, and the processing logic is implemented in an integrated or external manner. As an implementation manner, the image detection device acquires an image to be detected in a target field, and then inputs the image to be detected into a first network model to determine at least one region of interest; and inputting at least one region of interest into the second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source field data, wherein the source field data is used for indicating marking data, and then determining a detection result according to the distinguishing feature, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and the corresponding marking data. The image detection in the target field is realized, and the target field and the source field are associated through the network model, so that the unmarked data in the target field can be marked and determined through the marked data in the source field, the detection error caused by field change is reduced, and the image detection accuracy is improved.

With reference to the above flow architecture, the following describes an image detection method in the present application, please refer to fig. 3, where fig. 3 is a flow chart of an image detection method according to an embodiment of the present application, and the embodiment of the present application at least includes the following steps:

301. and acquiring an image to be detected.

In this embodiment, the image to be detected is target field data, and the target field data is used to indicate unmarked data.

It can be understood that the image to be detected may be a medical detection image, and the annotation data is the type of the lesion, that is, the position and the type of a possible disease in the image to be detected may be obtained by inputting the image to be detected, and the specific disease type may refer to the example description in fig. 1, which is not described herein.

In addition, due to the difference between the target field and the source field, when the image to be detected does not have target field data, deviation may be generated in the process of identifying the characteristics; for a hospital scenario, different parameters may be used for the same instrument in different hospitals, for example: the detection angle, brightness, definition, etc. all can cause the deviation of feature detection, so that the target domain data can be associated with the source domain data and then the feature detection can be carried out.

302. Inputting the image to be detected into a first network model to determine at least one region of interest;

in this embodiment, the first network model may be a regional candidate network model (RPN); the first network model is used for capturing a region of interest in an image to be detected, wherein the region of interest may be preset, for example: setting a capture area as an image area containing the tumor; the region of interest may also be automatically generated, for example: capturing an area with uneven pigment distribution in the image; the region of interest can also be determined by combining the above methods, depending on the actual scene.

It should be noted that, while the first network model captures the region of interest in the image to be detected, it is also equivalent to removing the background of the image to be detected, and for the background of the image to be detected, different presentation forms may be possible in different scenes, for example, in a medical image scene, the background may be an unlit black scene part in the image; in the scene of the landscape photo, the background may be a large area of single pigment region such as sky or ocean.

As can be understood, the image to be detected includes a foreground and a background, where the foreground is an area including valid data; the background is an area containing no valid data, and as shown in fig. 4, is a scene schematic diagram of image detection provided in the embodiment of the present application, and the scene schematic diagram shows a foreground a1, a background a2, and a region of interest A3, where the foreground a1 contains an image that may indicate disease information, the background a2 generally has no specific content, and the region of interest A3 contains a portion of the foreground a1, it should be noted that a frame of the region of interest in the diagram is a square, and may be a circle in an actual scene. The specific shape of the diamond or the irregular figure depends on the actual scene, and is not limited herein. In addition, one region of interest is shown in the figure, but in an actual scene, a plurality of regions of interest may be provided, wherein the plurality of regions of interest may be independent of each other, or may have a certain overlapping area; it should also be noted that the position indicated by the label in the figure is an example, and the region having the image feature similar to the label has the similar attribute to the label region.

Optionally, considering that there may be a coincidence portion between the region of interest and the background, and the background is useless data, extra resources may be occupied in the feature determination process, and at this time, the selection of the region of interest may be optimized. Firstly, generating at least one anchor frame (anchor) on the image to be detected; then inputting the anchor frame into a first network model to obtain an anchor frame containing the foreground; and then determining at least one region of interest according to the anchor frame containing the foreground. The size and the width-to-height ratio of the anchor frame can be preset, the sizes of the anchor frames can be different, the sizes of the anchor frames can be adjusted according to the content in the anchor frame, and the relatively accurate region of interest can be obtained through continuous adjustment according to the content in the anchor frame.

Optionally, since the detection error is caused by the domain difference, after the image to be detected is obtained, the image to be detected can be input into the image converter, so as to convert the image to be detected from the target domain into the source domain; specifically, firstly, parameters of target field data corresponding to the image to be detected are obtained; then adjusting parameters of target field data corresponding to the image to be detected to corresponding parameters in the source field to obtain an image to be detected after image conversion; and then inputting the converted image to be detected into the first network model to determine at least one region of interest.

The conversion process of the image converter can be performed with reference to fig. 5, as shown in fig. 5, which is a schematic flow chart of image conversion provided in an embodiment of the present application; in the figure, X and Y represent pictures in two domains, G is a converter for converting the pictures in X into a Y style, and F is an inverse converter. Our goal is to train out G and F given X, Y. Dx and Dy are two domain discriminators for determining the quality of the generated picture. In order to ensure that the converted image can retain core content, extra consistency constraint is introduced in the cyclic countermeasure process, so that F (G (x)) is approximately equal to x, G (F (y)) is approximately equal to y, the accuracy of domain conversion is further optimized, conversion of target domain data to source domain data is achieved, the subsequent detection process is carried out, and the recall rate of features can be improved.

303. And inputting at least one region of interest into a second network model to obtain the corresponding distinguishing characteristic of the at least one region of interest in the source field data.

In this embodiment, the source domain data is used to indicate label data, and the second network model associates the source domain data and the plurality of target domain data through the region of interest; the second network model may be a Regional Convolutional Neural Network (RCNN), or may be another network model for deep learning, which is not limited herein.

It will be appreciated that discriminative features are features in the source domain data that have corresponding regions in the target domain, such as: judging that the characteristic is that the source field data is a lump detected under the brightness 50% angle of 45 degrees, and an area corresponding to the lump exists in an image with the brightness 60% angle of 40 degrees of the target field, and at the moment, correspondingly marking the target field data by marking data of the characteristic to be judged, thereby eliminating the influence of field change on the characteristic detection; in addition, the distinguishing feature can be understood as a key region for determining a detection result in the region of interest, and different explanations exist in different scenes for setting the key region. In a medical image, a key region is an image region indicating a lesion feature, for example: the focus is a texture image area of the tumor; in face recognition, the key region is an image region indicating a face feature, for example: an image region of a pupil in a human eye; the specific form depends on the actual scene, and is not limited herein.

In a possible scenario for the above process, after the RCNN obtains the ROI generated by the RPN, pooling features ROI-posing are performed on the feature map generated by the backbone feature extractor backbone according to the position of the ROI, so as to obtain features corresponding to the ROI, that is, discriminant features.

304. And determining a detection result according to the distinguishing characteristics.

In this embodiment, the detection result is used to indicate the position of the region of interest in the image to be detected and corresponding labeling data. Wherein, the position of the region of interest is the position of the focus in the image to be detected, and the marking data is the type corresponding to the focus, for example: lumps, erosion or damage.

Optionally, since the detection result may introduce some background regions due to domain transformation, at this time, the feature elements including the background may be further optimized. Specifically, firstly, obtaining a region corresponding to the distinguishing feature; when the area corresponding to the distinguishing feature meets a preset condition, determining a detection result according to the distinguishing feature, wherein the preset condition is set based on the proportion of the background in the area corresponding to the distinguishing feature; for example: and defining the distinguishing characteristic with the background accounting for less than 10 percent as a detection result. It can be understood that, for the discriminant features that do not satisfy the preset condition, the region of the ROI corresponding to the discriminant features may be further adjusted so that the proportion of the background in the corresponding region satisfies the preset condition.

According to the embodiment, the image to be detected in the target field is obtained, and then the image to be detected is input into the first network model to determine at least one region of interest; and inputting at least one region of interest into the second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source field data, wherein the source field data is used for indicating marking data, and then determining a detection result according to the distinguishing feature, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and the corresponding marking data. On one hand, as the target field and the source field are related through the network model, the unmarked data in the target field can be marked and determined through the marked data in the source field, thereby reducing the detection error caused by field change and improving the accuracy of image detection; on the other hand, the workload caused by the field change is reduced, and the detection efficiency is improved.

The above embodiment describes a processing method after image input, in which application of a first network model and a second network model is involved, it can be understood that the first network model and the second network model provided in this embodiment are not conventional source models, and are obtained by performing relevant feature training on a preset data set; referring to fig. 6, fig. 6 is a flowchart of a method for training a network model according to an embodiment of the present application, where the method for training a network model according to the present application includes at least the following steps:

601. and acquiring an image set to be trained.

In this embodiment, the image set to be trained includes a source domain data set and a target domain data set, where the source domain data set includes at least one source domain data, the target domain data set includes at least one target domain data, the source domain data is used to indicate labeled data, and the target domain data is used to indicate unlabeled data.

It will be appreciated that the source domain data set may include different means of detecting the same characteristic of the same condition, and may also include means of detecting the same characteristic of different conditions, for example: the source domain data set comprises a data set of different pathological characteristics for tuberculosis; the source domain data set may also include a feature data set for a tumor in different disease conditions, and the specific data set form depends on the actual scene and should include corresponding target domain data, which is not limited herein.

602. And training a first network model by adopting the source field data set and the target field data set.

In this embodiment, the first network model may be an RPN. The training process of the RPN is to acquire a preset region of interest containing disease characteristics through the RPN; correspondingly, the feature distribution conditions under different diseases in the source field data and the corresponding size information of the relevant features can be adopted in the training process of the RPN.

603. And determining at least one region of interest according to the trained first network model.

In this embodiment, the original domain data is associated with the target domain area through the region of interest.

604. And training a second network model by adopting the region of interest, the source field data set and the target field data set.

In this embodiment, the trained second network model is used to indicate the labeling data corresponding to the region of interest in the target domain data.

Optionally, to further improve the correspondence between the source domain data and the target domain data, feature alignment may be performed on the source domain data and the target domain data, and the feature alignment process may be performed in the following manner.

Firstly, feature alignment is carried out on image features.

Specifically, the second network model may be trained by using a first loss function, the source domain data set, and the target domain data set to obtain a first gradient of the second network model, where the first loss function is used to align features of the source domain data set and features of the target domain data set, the first loss function belongs to a countering learning loss function, and the first gradient of the second network model is used to update parameters of the second network model.

In one possible implementation, Source Domain data X is recorded_SThe feature map extracted by the back bone is F_S∈R^H ^×W×CH and W are respectively the height and width of the characteristic diagram, and C is the number of channels; correspondingly, object data X_TIs characterized by the feature diagram F_T∈R^H ^×W×C. Since it is assumed that the domain differences are mainly from low-level features of the image, the domain differences caused by the image style differences can be eliminated if the two feature maps can be aligned. It is understood that the feature alignment process is to align the features of the patch rather than the features of the whole image, that is, each output on the feature map represents a patch on the original, and the values of the channels are taken for the same position and are merged to obtain the source domain patch feature f_S∈R^CAnd patch feature f of the target Domain_T∈R^C。

Then, a domain discriminator DD is trained, and the first loss function defining the feature alignment at the image level may be:

L_img＝y_DDlogDD(f；θ_DD)+(1-y_DD)log(1-DD(f；θ_DD))

wherein y is_DDDomain label (source domain 1, target domain 0), DD (; f θ) representing input image_DDProbability of input image belonging to source domain calculated for domain discriminator, where_DDIs a training parameter of a domain discriminator, the training target is

Wherein theta is_BIs the training parameter of the backbone. By achieving the training goalAnd updating the parameters of the second network model when the first gradient is reached.

Optionally, to improve the efficiency of the training, the objective function may be inverted, i.e. minimized. Specifically, the method comprises the following steps. A Gradient Reversal Layer (GRL) may be introduced to opportunistically align features, the role of the GRL being to reverse the sign of the input gradient such that in a minimization operation, the parameters after the GRL are the minimization objective function and the parameters before the GRL are the maximization objective function, thereby increasing the efficiency of the opportunistic learning.

And secondly, performing feature alignment on the ROI.

Specifically, firstly, respectively determining an interested region in the source domain data set and an interested region in the target domain data set according to the trained first network model; then, a second loss function, the region of interest in the source domain data set and the region of interest in the target domain data set are adopted to train the second network model, so as to obtain a second gradient of the second network model, wherein the second loss function is used for aligning the features of the region of interest in the source domain data set and the features of the region of interest in the target domain data set, the second loss function belongs to a counterlearning loss function, and the second gradient of the second network model is used for updating parameters of the second network model.

In one possible implementation, for each ROI, ROI-posing is performed to obtain the characteristics of the ROI, and the source domain data and the target domain data are respectively recorded as

And

and performing feature alignment on the ROI feature once, namely adopting a second loss function:

wherein L is_insExample level feature aligned objective function, p^rROI feature f of a Domain discriminator pair representing an instance level^rThe calculated probability belongs to the source field and the training target is

θ_DD2Training parameters for a domain discriminator for feature alignment of the ROI. And obtaining a second gradient by reaching the training target, and further updating the parameters of the second network model.

Optionally, combining the above two feature alignment manners, the overall loss function L can be obtained as follows:

L＝L_img+L_ins+L_det

wherein L is_detIs the loss of supervision of the detection model.

It will be appreciated that one or more of the network models described in the above embodiments may be applied to the image detection scenario of the embodiment described in fig. 3.

According to the embodiment, the source field data and the target field data are aligned, so that the association degree of the source field data and the target field data in the network model is improved, and the accuracy of the image detection process is further improved.

In another possible scenario, since a large number of anchors generated a priori by the model may contain a large number of backgrounds, the foreground region to be detected is more concerned in the medical image detection, and the style change of the backgrounds does not need to be too much concerned, so if the foreground anchors and the background anchors are treated equally, the feature alignment loss generated by the most background anchors can cover up the feature alignment loss of the foreground anchors, and the correlation degree between the source domain data and the target domain data is affected.

To solve the above problem, a weight may be set for each anchor on the basis of the embodiment described in fig. 6. Specifically, the weight w may be:

w＝y(1-p)^γ+(1-y)p^γ

where y is the label given to the anchor (y 1 if foreground, 0 if background), and p is the confidence that the anchor calculated by the current model is foreground.

Then, corresponding to the feature alignment process, the first penalty function may be adjusted to:

the description of the related parameters, i.e., the subsequent operations, may refer to the first loss function in the embodiment corresponding to fig. 6, which is not described herein again.

By setting the weight, the problem of unbalanced category of the anchor frame is solved, and the influence of the anchor frame containing the background in feature alignment is reduced; and the problem of incomplete anchor frame coverage containing foregrounds in the training process is relieved.

The foregoing embodiments introduce various training processes for a network model, and the following describes an image detection method provided by the present application with reference to a specific scenario, as shown in fig. 7, which is an architecture diagram of a network model in image detection provided by the embodiments of the present application.

It should be noted that, the present embodiment relates to two network models, which are RPN and RCNN in the scenario, optionally, the two network models may also be combined into one network model, and combine two corresponding functions, for example, combine into fast-region convolutional neural network (fast-RCNN), and the specific combination form depends on the actual scenario, which is not limited herein.

Firstly, extracting the main features of an image to be detected to obtain a main feature set, wherein the source field data and the target field data in the main feature set are aligned by the image field discriminator, and pass through a gradient turning layer before being input into the image field discriminator to minimize a target function.

For the above-mentioned trunk characteristic set, on one hand, inputting the trunk characteristic set into RPN to select related candidate frame to determine anchor frame, and further determining ROI; and on the other hand, inputting the backbone feature set into a pooling layer in the RCNN, combining the determined ROI to obtain the features corresponding to the ROI, and performing feature alignment on the features corresponding to the ROI after passing through a gradient inversion layer.

And finally, determining the type corresponding to the disease and the position of the disease according to the obtained characteristics corresponding to the ROI.

It is understood that the related features in the embodiments shown in fig. 3 or fig. 6 can be referred to the flow of the architecture diagram, and are not described herein again.

In order to better implement the above-mentioned aspects of the embodiments of the present application, the following also provides related apparatuses for implementing the above-mentioned aspects. Referring to fig. 8, fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure, in which the image detection apparatus 800 includes:

an acquiring unit 801 configured to acquire an image to be detected, where the image to be detected is target domain data, and the target domain data is used to indicate unmarked data;

a determining unit 802, configured to input the image to be detected into a first network model to determine at least one region of interest, where the first network model is used to remove a background of the image to be detected;

an input unit 803, configured to input at least one region of interest into a second network model, so as to obtain a distinguishing feature corresponding to the at least one region of interest in source domain data, where the source domain data is used to indicate labeling data, the second network model associates the source domain data and a plurality of target domain data through the region of interest, and the distinguishing feature is a key region in the region of interest for determining a detection result;

the detecting unit 804 is configured to determine a detection result according to the distinguishing feature, where the detection result is used to indicate the position of the region of interest in the image to be detected and corresponding annotation data.

the determining unit 802 is specifically configured to input the anchor frame into the first network model to obtain an anchor frame including a foreground, where the foreground is used to indicate valid data in the image to be detected;

the determining unit 802 is specifically configured to determine at least one region of interest according to the anchor frame containing the foreground.

Optionally, in some possible implementation manners of the present application, the detecting unit 804 is specifically configured to obtain an area corresponding to the distinguishing feature;

the detecting unit 804 is specifically configured to determine a detection result according to the distinguishing feature if the area corresponding to the distinguishing feature satisfies a preset condition, where the preset condition is set based on a ratio of a background in the area corresponding to the distinguishing feature.

Optionally, in some possible implementation manners of the present application, the obtaining unit 801 is further configured to obtain parameters of target domain data corresponding to the image to be detected;

the obtaining unit 801 is further configured to adjust parameters of target domain data corresponding to the image to be detected to corresponding parameters in the source domain, so as to obtain an image to be detected after image conversion;

the determining unit 802 is specifically configured to input the converted image to be detected into the first network model to determine at least one region of interest.

The present embodiment further provides a network model training apparatus 900, as shown in fig. 9, including:

an obtaining unit 901, configured to obtain an image set to be trained, where the image set to be trained includes a source domain data set and a target domain data set, the source domain data set includes at least one source domain data, the target domain data set includes at least one target domain data, the source domain data is used to indicate labeled data, and the target domain data is used to indicate unlabeled data;

a training unit 902, configured to train the first network model by using the source domain data set and the target domain data set;

a determining unit 903, configured to determine at least one region of interest according to the trained first network model;

the training unit 902 is further configured to train a second network model by using the region of interest, the source domain data set, and the target domain data set, where the trained second network model is used to indicate the labeled data corresponding to the region of interest in the target domain data.

Optionally, in some possible implementations of the present application, the training unit 902 is further configured to train the second network model by using a first loss function, the source domain data set, and the target domain data set to obtain a first gradient of the second network model, where the first loss function is used to align features of the source domain data set and features of the target domain data set, the first loss function belongs to a countering learning loss function, and the first gradient of the second network model is used to update parameters of the second network model.

Optionally, in some possible implementations of the present application, the training unit 902 is specifically configured to input the first loss function, the source domain data set, and the target domain data set into a gradient inversion layer, so as to invert the target function;

the training unit 902 is specifically configured to train the second network model according to the inverted objective function to obtain a first gradient of the second network model.

Optionally, in some possible implementations of the present application, the training unit 902 is further configured to determine, according to the trained first network model, an area of interest in the source domain data set and an area of interest in the target domain data set respectively;

the training unit 902 is further configured to train the second network model by using a second loss function, the region of interest in the source domain data set, and the region of interest in the target domain data set, so as to obtain a second gradient of the second network model, where the second loss function is used to align features of the region of interest in the source domain data set and features of the region of interest in the target domain data set, the second loss function belongs to a counterlearning loss function, and the second gradient of the second network model is used to update parameters of the second network model.

Optionally, in some possible implementations of the present application, the training unit 902 is further configured to determine a plurality of anchor frames in the source domain data set;

the training unit 902 is further configured to set weights for the source domain data set and the target domain data set according to a proportion of the foreground in the anchor frame;

the training unit 902 is specifically configured to train the second network model by using the first loss function, the source domain data set with the weight, and the target domain data set with the weight, so as to obtain a first gradient of the second network model.

An embodiment of the present application further provides a medical detection apparatus, as shown in fig. 10, which is a schematic structural diagram of the medical detection apparatus provided in the embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown, and details of the specific technology are not disclosed, please refer to a part of a method in the embodiment of the present application.

Fig. 10 is a block diagram illustrating a partial structure of a medical detection apparatus provided in an embodiment of the present application. Referring to fig. 10, the medical examination apparatus includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (WiFi) module 1070, processor 1080, and power source 1090. Those skilled in the art will appreciate that the handset configuration shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes the components of the medical examination apparatus in detail with reference to fig. 10:

RF circuit 1010 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing downlink information of a base station after receiving the downlink information to processor 1080; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.

The memory 1020 may be used to store software programs and modules that the processor 1080 executes to perform various functional applications and data processing of the medical testing device by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the stored data area may store data (such as audio data, a phonebook, etc.) created according to the use of the medical examination device, and the like. Further, the memory 1020 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the medical examination apparatus. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 1031 using any suitable object or accessory such as a finger, a stylus, etc., and spaced touch operations within a certain range on the touch panel 1031) and drive corresponding connection devices according to a preset program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1080, and can receive and execute commands sent by the processor 1080. In addition, the touch panel 1031 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, or the like.

The display unit 1040 may be used to display information input by or provided to a user as well as various menus for the medical testing device. The display unit 1040 may include a display panel 1041, and optionally, the display panel 1041 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 1031 can cover the display panel 1041, and when the touch panel 1031 detects a touch operation on or near the touch panel 1031, the touch operation is transmitted to the processor 1080 to determine the type of the touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of the touch event. Although in fig. 10, the touch panel 1031 and the display panel 1041 are two separate components to implement the input and output functions of the medical detection device, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the medical detection device.

The medical detection device may also include at least one sensor 1050, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 1041 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 1041 and/or a backlight when the medical detection device is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of medical detection equipment, and related functions (such as pedometer and tapping) for vibration recognition; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the medical detection device, detailed descriptions thereof are omitted.

Audio circuitry 1060, speaker 1061, microphone 1062 may provide an audio interface between the user and the medical detection device. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the electrical signal is converted into a sound signal by the speaker 1061 and output; on the other hand, the microphone 1062 converts the collected sound signals into electrical signals, which are received by the audio circuit 1060 and converted into audio data, which are processed by the audio data output processor 1080 and then transmitted to, for example, another medical examination device via the RF circuit 1010, or output to the memory 1020 for further processing.

WiFi belongs to short-range wireless transmission technology, and the medical detection device can help the user send and receive e-mails, browse web pages, access streaming media and the like through the WiFi module 1070, and provides wireless broadband internet access for the user. Although fig. 10 shows the WiFi module 1070, it is understood that it does not belong to the essential constitution of the medical detection device, and may be omitted entirely as needed within the scope not changing the essence of the invention.

Processor 1080 is the control center for the medical testing device, and is coupled to various components of the overall medical testing device via various interfaces and lines, and is configured to perform various functions and process data of the medical testing device by running or executing software programs and/or modules stored in memory 1020, as well as invoking data stored in memory 1020, thereby providing overall monitoring of the medical testing device. Optionally, processor 1080 may include one or more processing units; optionally, processor 1080 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 1080.

The medical sensing device also includes a power source 1090 (e.g., a battery) for powering the various components, which may optionally be logically coupled to the processor 1080 via a power management system to manage charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the processor 1080 included in the terminal also has a function of executing the steps of the image detection method described above.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a server according to an embodiment of the present invention, where the server 1100 may generate relatively large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) storing an application program 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100.

The Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as a Windows Server^TM，Mac OS X^TM，Unix^TM,LinuxTM，FreeBSD^TMAnd so on.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 11.

An embodiment of the present application further provides a computer-readable storage medium, in which image detection instructions are stored, and when the computer-readable storage medium is executed on a computer, the computer is caused to perform the steps performed by the image detection apparatus in the method described in the foregoing embodiments shown in fig. 2 to 7.

Also provided in the embodiments of the present application is a computer program product including instructions for detecting an image, which when run on a computer, causes the computer to perform the steps performed by the image detection apparatus in the method described in the foregoing embodiments shown in fig. 2 to 7.

The embodiment of the present application further provides an image detection system, where the image detection system may include the image detection apparatus in the embodiment described in fig. 8 or the network model training apparatus described in fig. 9.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an image detection apparatus, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image detection method, comprising:

acquiring an image to be detected, wherein the image to be detected is target field data, and the target field data is used for indicating unmarked data;

inputting at least one region of interest into a second network model to obtain a corresponding distinguishing feature of the at least one region of interest in source domain data, wherein the source domain data is used for indicating marking data, the second network model associates the source domain data with the target domain data through the region of interest, and the distinguishing feature is a key region used for determining a detection result in the region of interest;

and determining the detection result according to the distinguishing characteristics, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and corresponding labeling data.

2. The method according to claim 1, wherein the image to be detected is a medical detection image, the position of the region of interest is a position of a lesion, the distinguishing feature is an image feature region of the lesion, and the annotation data is a type of the lesion.

3. The method of claim 1, wherein inputting the image to be detected into a first network model to determine at least one region of interest comprises:

generating at least one anchor frame on the image to be detected;

4. The method of claim 1, wherein determining a detection result from the discriminating characteristic comprises:

acquiring a region corresponding to the distinguishing feature;

5. The method according to any one of claims 1-4, wherein after said acquiring an image to be detected, the method further comprises:

6. A method for training a network model, comprising:

acquiring an image set to be trained, wherein the image set to be trained comprises a source field data set and a target field data set, the source field data set comprises at least one source field data, the target field data set comprises at least one target field data, the source field data is used for indicating labeled data, and the target field data is used for indicating unlabeled data;

7. The method of claim 6, further comprising:

8. The method of claim 7, wherein training the second network model with the first loss function, the set of source domain data, and the set of target domain data to obtain a first gradient of the second network model comprises:

9. The method of claim 7, further comprising:

10. The method according to any one of claims 7-9, further comprising:

determining a plurality of anchor boxes in the source domain data set;

11. An apparatus for detecting an image, comprising:

the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be detected, the image to be detected is target field data, and the target field data is used for indicating unmarked data;

12. An apparatus for training a network model, comprising:

13. A medical testing device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

determining a detection result according to the distinguishing characteristics, wherein the detection result is used for indicating the position of the region of interest in the image to be detected and corresponding labeling data;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

14. A server, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

training a second network model by adopting the region of interest, the source field data set and the target field data set, wherein the trained second network model is used for indicating marking data corresponding to the region of interest in the target field data;

15. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of image detection of any of the preceding claims 1 to 5, or the method of training a network model of any of the preceding claims 6 to 10.