CN111340051A

CN111340051A - Picture processing method and device and storage medium

Info

Publication number: CN111340051A
Application number: CN201811549920.3A
Authority: CN
Inventors: 刘达
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2020-06-26

Abstract

Embodiments of the present invention provide a picture processing method, device, and storage medium; the method includes: preprocessing a picture to be detected to obtain a processed picture; and extracting image feature information and text feature respectively from the processed picture information; according to the image feature information and the text feature information, determine the type of the picture to be detected, wherein the picture type includes a normal picture or an abnormal picture. The picture processing method, device and storage medium provided by the present invention can improve the accuracy of picture detection and identification.

Description

Image processing method, device and storage medium

技术领域technical field

本发明实施例涉及图像处理技术领域，尤其涉及一种图片处理方法、装置及存储介质。Embodiments of the present invention relate to the technical field of image processing, and in particular, to a picture processing method, device, and storage medium.

背景技术Background technique

随着互联网的普及，图片因其相对文字具有表达直观、内容丰富等优势，在越来越多的网页及应用中被广泛应用。例如，网购平台为各电商提供了各种商品信息发布机制，商家可以上传多角度、多背景的商品照片，以吸引用户。With the popularity of the Internet, pictures are widely used in more and more web pages and applications because of their advantages of intuitive expression and rich content compared to text. For example, online shopping platforms provide various e-commerce companies with various commodity information release mechanisms, and merchants can upload multi-angle and multi-background product photos to attract users.

很多互联网电商企业为了博取眼球效应，会上传一些不符合规定的图片，因此，如何在大数据环境下对风险图片或异常图片进行处理显得越来越重要，现有技术中，通常通过卷积神经网络(Convolution Neural Network, CNN)技术对图片进行检测与分类，其中，CNN技术主要是通过提取图片中的图像特征信息，根据图片的图像特征信息，判断该图片是正常图片还是异常图片。Many Internet e-commerce companies upload pictures that do not meet the regulations in order to gain attention. Therefore, it is more and more important to process risky pictures or abnormal pictures in a big data environment. In the existing technology, convolution is usually used. Neural network (Convolution Neural Network, CNN) technology detects and classifies pictures. Among them, CNN technology mainly extracts the image feature information in the picture, and judges whether the picture is a normal picture or an abnormal picture according to the image feature information of the picture.

然而，由于CNN技术只能提取图片中的图像特征信息，对于图片中包含大量文本信息的图片来说，通过CNN技术对图片进行检测时，图片的误检率较高，造成图片检测的准确性较低。However, since the CNN technology can only extract the image feature information in the picture, for the picture containing a large amount of text information, when the picture is detected by the CNN technology, the false detection rate of the picture is high, resulting in the accuracy of the picture detection. lower.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种图片处理方法、装置及存储介质，可以提高图片检测和识别的准确性。Embodiments of the present invention provide a picture processing method, device, and storage medium, which can improve the accuracy of picture detection and recognition.

根据本发明实施例的第一方面，提供一种图片处理方法，该方法包括：According to a first aspect of the embodiments of the present invention, there is provided a picture processing method, the method comprising:

对待检测图片进行预处理，获得处理后的图片；Preprocess the image to be detected to obtain the processed image;

从所述处理后的图片中，分别提取图像特征信息和文本特征信息；From the processed picture, extract image feature information and text feature information respectively;

根据所述图像特征信息和所述文本特征信息，确定所述待检测图片的类型，其中，所述图片的类型包括正常图片或异常图片。According to the image feature information and the text feature information, the type of the picture to be detected is determined, wherein the type of the picture includes a normal picture or an abnormal picture.

可选的，所述根据所述图像特征信息和所述文本特征信息，确定所述待检测图片的类型，包括：Optionally, determining the type of the picture to be detected according to the image feature information and the text feature information includes:

对所述图像特征信息和所述文本特征信息进行拼接处理，获得拼接特征信息；Perform splicing processing on the image feature information and the text feature information to obtain splicing feature information;

对所述拼接特征信息进行流形数据的降维处理，获得所述拼接特征信息的嵌入特征；Perform dimensionality reduction processing of manifold data on the splicing feature information to obtain an embedded feature of the splicing feature information;

根据所述拼接特征信息的嵌入特征和预先确定的超球，确定所述待检测图片的类型。The type of the picture to be detected is determined according to the embedded feature of the splicing feature information and the predetermined hypersphere.

可选的，所述根据所述拼接特征信息的嵌入特征和预先确定的超球，确定所述待检测图片的类型，包括：Optionally, determining the type of the picture to be detected according to the embedded feature of the splicing feature information and a predetermined hypersphere, including:

判断所述嵌入特征是否处于所述超球内；determining whether the embedded feature is within the hypersphere;

若所述嵌入特征处于所述超球内，则确定所述待检测图片为异常图片；If the embedded feature is within the hypersphere, determining that the picture to be detected is an abnormal picture;

若所述嵌入特征不处于所述超球内，则确定所述待检测图片为正常图片。If the embedded feature is not within the hypersphere, it is determined that the picture to be detected is a normal picture.

可选的，所述根据所述拼接特征信息的嵌入特征和预先确定的超球，确定所述待检测图片的类型之前，所述方法还包括：Optionally, before determining the type of the picture to be detected according to the embedded feature of the splicing feature information and a predetermined hypersphere, the method further includes:

获取多个样本图片；Get multiple sample images;

分别提取所述多个样本图片中每个样本图片的映射特征；respectively extracting the mapping feature of each sample picture in the plurality of sample pictures;

对各个所述映射特征进行数据描述，得到所述超球。Perform data description on each of the mapping features to obtain the hypersphere.

可选的，所述对各个所述映射特征进行数据描述，得到所述超球，包括：Optionally, performing data description on each of the mapping features to obtain the hypersphere includes:

通过支持向量数据描述SVDD算法对各个所述映射特征进行数据描述，得到所述超球。The hypersphere is obtained by performing data description on each of the mapping features through the support vector data description SVDD algorithm.

可选的，从所述处理后的图片中，提取图像特征信息，包括：Optionally, extract image feature information from the processed picture, including:

通过多级卷积神经网络CNN，从所述处理后的图片中，提取图像特征信息。Image feature information is extracted from the processed image through a multi-level convolutional neural network CNN.

可选的，所述通过多级卷积神经网络CNN，从所述处理后的图片中，提取图像特征信息，包括：Optionally, the multi-level convolutional neural network CNN is used to extract image feature information from the processed picture, including:

将所述处理后的图片输入至第一级CNN网络中，得到与所述处理后的图片对应的第一映射图像以及所述处理后的图片中目标图像的第一区域框坐标；The processed picture is input into the first-level CNN network, and the first mapping image corresponding to the processed picture and the first region frame coordinates of the target image in the processed picture are obtained;

将所述第一映射图像和所述第一区域框坐标输入至第二级CNN网络中，得到与所述处理后的图片对应的第二映射图像以及所述处理后的图片中目标图像的第二区域框坐标；The first mapping image and the coordinates of the first area frame are input into the second-level CNN network to obtain the second mapping image corresponding to the processed picture and the first image of the target image in the processed picture. Two area frame coordinates;

将所述第二映射图像和所述第二区域框坐标输入至第三级CNN网络中，得到所述图像特征信息。The second map image and the coordinates of the second region frame are input into the third-level CNN network to obtain the image feature information.

可选的，从所述处理后的图片中，提取文本特征信息，包括：Optionally, extract text feature information from the processed picture, including:

通过多级卷积神经网络CNN和循环神经网络RNN，从所述处理后的图片中，提取所述文本特征信息。The text feature information is extracted from the processed image through a multi-level convolutional neural network CNN and a recurrent neural network RNN.

可选的，所述对待检测图片进行预处理，获得处理后的图片，包括：Optionally, the said to-be-detected picture is preprocessed to obtain a processed picture, including:

对所述待检测图片进行图像金字塔处理，获得所述处理后的图片。Perform image pyramid processing on the picture to be detected to obtain the processed picture.

可选的，所述对所述待检测图片进行图像金字塔处理，获得所述处理后的图片之前，所述方法还包括：Optionally, before the image pyramid processing is performed on the picture to be detected and the processed picture is obtained, the method further includes:

对所述待检测图片进行图像颜色矫正处理，获得矫正后的图片；Perform image color correction processing on the to-be-detected picture to obtain a corrected picture;

所述对所述待检测图片进行图像金字塔处理，获得所述处理后的图片，包括：Performing image pyramid processing on the picture to be detected to obtain the processed picture, including:

对所述矫正后的图片进行图像金字塔处理，获得所述处理后的图片。Perform image pyramid processing on the corrected picture to obtain the processed picture.

根据本发明实施例的第二方面，提供一种图片处理装置，该装置包括：According to a second aspect of the embodiments of the present invention, there is provided a picture processing apparatus, the apparatus comprising:

预处理模块，用于对待检测图片进行预处理，获得处理后的图片；The preprocessing module is used to preprocess the image to be detected to obtain the processed image;

第一提取模块，用于从所述处理后的图片中，分别提取图像特征信息和文本特征信息；a first extraction module, used for extracting image feature information and text feature information from the processed picture, respectively;

确定模块，用于根据所述图像特征信息和所述文本特征信息，确定所述待检测图片的类型，其中，所述图片的类型包括正常图片或异常图片。A determination module, configured to determine the type of the picture to be detected according to the image feature information and the text feature information, wherein the type of the picture includes a normal picture or an abnormal picture.

可选的，所述确定模块，包括：Optionally, the determining module includes:

拼接子模块，用于对所述图像特征信息和所述文本特征信息进行拼接处理，获得拼接特征信息；a splicing submodule, configured to perform splicing processing on the image feature information and the text feature information to obtain splicing feature information;

降维处理子模块，用于对所述拼接特征信息进行流形数据的降维处理，获得所述拼接特征信息的嵌入特征；A dimensionality reduction processing submodule, configured to perform dimensionality reduction processing of manifold data on the splicing feature information to obtain the embedded feature of the splicing feature information;

确定子模块，用于根据所述拼接特征信息的嵌入特征和预先确定的超球，确定所述待检测图片的类型。A determination sub-module, configured to determine the type of the picture to be detected according to the embedded feature of the splicing feature information and the predetermined hypersphere.

可选的，所述确定子模块，具体用于：Optionally, the determining submodule is specifically used for:

可选的，所述装置还包括：Optionally, the device further includes:

获取模块，用于获取多个样本图片；The acquisition module is used to acquire multiple sample images;

第二提取模块，用于分别提取所述多个样本图片中每个样本图片的映射特征；The second extraction module is used to extract the mapping feature of each sample picture in the plurality of sample pictures respectively;

数据描述模块，用于对各个所述映射特征进行数据描述，得到所述超球。The data description module is used for performing data description on each of the mapping features to obtain the hypersphere.

可选的，所述数据描述模块，具体用于：Optionally, the data description module is specifically used for:

可选的，所述第一提取模块，还用于通过多级卷积神经网络CNN，从所述处理后的图片中，提取图像特征信息。Optionally, the first extraction module is further configured to extract image feature information from the processed picture through a multi-level convolutional neural network CNN.

可选的，所述第一提取模块，还用于：Optionally, the first extraction module is also used for:

可选的，第一提取模块，还用于通过多级卷积神经网络CNN和循环神经网络RNN，从所述处理后的图片中，提取所述文本特征信息。Optionally, the first extraction module is further configured to extract the text feature information from the processed picture through a multi-level convolutional neural network CNN and a recurrent neural network RNN.

可选的，所述预处理模块，还用于对所述待检测图片进行图像金字塔处理，获得所述处理后的图片。Optionally, the preprocessing module is further configured to perform image pyramid processing on the picture to be detected to obtain the processed picture.

可选的，所述装置还包括：Optionally, the device further includes:

矫正模块，用于对所述待检测图片进行图像颜色矫正处理，获得矫正后的图片；a correction module, configured to perform image color correction processing on the to-be-detected picture to obtain a corrected picture;

所述预处理模块，还用于：对所述矫正后的图片进行图像金字塔处理，获得所述处理后的图片。The preprocessing module is further configured to: perform image pyramid processing on the corrected picture to obtain the processed picture.

根据本发明实施例的第三方面，提供一种服务器，其特征在于，包括：According to a third aspect of the embodiments of the present invention, a server is provided, characterized in that it includes:

处理器；processor;

存储器；以及memory; and

计算机程序；Computer program;

其中，所述计算机程序被存储在所述存储器中，并且被配置为由所述处理器执行，所述计算机程序包括用于执行本发明实施例的第一方面所述的方法的指令。Wherein, the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method according to the first aspect of the embodiments of the present invention.

根据本发明实施例的第四方面，提供一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有计算机程序，所述计算机程序使得服务器执行本发明实施例的第一方面的方法。According to a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores a computer program, and the computer program causes a server to execute the first aspect of the embodiments of the present invention Methods.

本发明实施例提供的图片处理方法、装置及存储介质，通过对待检测图片进行预处理，获得处理后的图片；进而从处理后的图片中，分别提取图像特征信息和文本特征信息；并根据图像特征信息和文本特征信息，确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片；由于通过先对待检测的图片进行预处理，使得预处理后的图片更方便进行特征提取与图片类型的确定，另外，由于通过对预处理后的图片分别进行图像特征信息和文本特征信息的提取，可以提升待检测图片的检测和识别效果，而且提高了图片检测的准确度。In the picture processing method, device and storage medium provided by the embodiments of the present invention, a processed picture is obtained by preprocessing the picture to be detected; further, image feature information and text feature information are respectively extracted from the processed picture; Feature information and text feature information to determine the type of pictures to be detected, where the types of pictures include normal pictures or abnormal pictures; since the pictures to be detected are preprocessed first, the preprocessed pictures are more convenient for feature extraction and picture In addition, by extracting image feature information and text feature information from the preprocessed pictures, the detection and recognition effects of the pictures to be detected can be improved, and the accuracy of picture detection can be improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1是本发明根据一示例性实施例示出的一种图片处理方法的流程图。FIG. 1 is a flowchart of a picture processing method according to an exemplary embodiment of the present invention.

图2是本发明根据另一示例性实施例示出的一种图片处理方法的流程图。Fig. 2 is a flowchart of a picture processing method according to another exemplary embodiment of the present invention.

图3为现有技术中根据CNN网络对图片中图像识别算法的示意图。FIG. 3 is a schematic diagram of an image recognition algorithm in a picture according to a CNN network in the prior art.

图4为CNN网络的局部连接和共享权重的结构示意图。Figure 4 is a schematic diagram of the structure of the local connections and shared weights of the CNN network.

图5是本发明根据又一示例性实施例示出的一种图片处理方法的流程图。Fig. 5 is a flowchart of a picture processing method according to another exemplary embodiment of the present invention.

图6是本发明根据一示例性实施例示出的一种图片处理装置的框图。Fig. 6 is a block diagram of a picture processing apparatus according to an exemplary embodiment of the present invention.

图7是本发明根据另一示例性实施例示出的一种图片处理装置的框图。Fig. 7 is a block diagram of a picture processing apparatus according to another exemplary embodiment of the present invention.

图8是本发明根据又一示例性实施例示出的一种图片处理装置的框图。FIG. 8 is a block diagram of a picture processing apparatus according to another exemplary embodiment of the present invention.

图9是本发明根据再一示例性实施例示出的一种图片处理装置的框图。Fig. 9 is a block diagram of a picture processing apparatus according to yet another exemplary embodiment of the present invention.

图10是本发明实施例提供的一种服务器的结构示意图。FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供的图片处理方法，可以适用于对图片的类型进行检测的场景中，尤其可以应用于电商企业对图片进行筛选的应用场景中。现有技术中，互联网电商企业通常对大数据环境下的各式各样的图片进行筛选时，只是通过图片的相似度来进行检测与识别，即使通过卷积神经网络技术对图片进行识别，可以极大地提升图片检测和识别效率，但是由于卷积神经网络技术仅仅对图片中的特定的图像信息进行识别与检测，当面对图像变种和衍生图像，通过卷积神经网络技术无法有效识别，可能会将正常图片误杀掉。另外，对于图片中包含大量文本信息的图片来说，通过CNN技术对图片进行检测时，图片的误检率较高，造成图片检测的准确性较低。The image processing method provided by the embodiment of the present invention can be applied to the scene of detecting the type of the image, and can especially be applied to the application scene of the e-commerce enterprise screening the images. In the prior art, when Internet e-commerce companies usually screen various pictures in a big data environment, they only perform detection and identification based on the similarity of the pictures, even if the convolutional neural network technology is used to identify the pictures, It can greatly improve the efficiency of image detection and recognition, but because the convolutional neural network technology only recognizes and detects specific image information in the picture, when faced with image variants and derived images, the convolutional neural network technology cannot be used effectively. May kill normal pictures by mistake. In addition, for pictures containing a large amount of text information, when the pictures are detected by the CNN technology, the false detection rate of the pictures is high, resulting in low picture detection accuracy.

本发明考虑到上述的技术问题，本发明提供一种图片的处理方法，通过对待检测图片进行预处理，并从预处理后的图片中提取图像特征信息与文本特征信息，然后根据提取的图像特征信息与文本特征信息，确定待检测图片是正常图片还是异常图片。由于通过对预处理后的图片分别进行图像特征信息和文本特征信息的提取，从而根据图像特征信息和文本特征信息共同确定待检测图片的类型，由此可以提升待检测图片检测和识别的效果，而且提高了图片检测的准确度。In consideration of the above-mentioned technical problems, the present invention provides a picture processing method, by preprocessing the picture to be detected, and extracting image feature information and text feature information from the preprocessed picture, and then extracting image feature information according to the extracted image feature. Information and text feature information to determine whether the image to be detected is a normal image or an abnormal image. Since the image feature information and text feature information are extracted from the preprocessed pictures respectively, the type of the picture to be detected is jointly determined according to the image feature information and the text feature information, so that the effect of detecting and identifying the picture to be detected can be improved. Moreover, the accuracy of image detection is improved.

下面以具体的实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the present invention will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

图1为本发明根据一示例性实施例示出的一种图片处理方法的流程图，FIG. 1 is a flowchart of a picture processing method according to an exemplary embodiment of the present invention,

图2为本发明根据另一示例性实施例示出的一种图片处理方法的流程图，该方法可以由任意执行图片处理方法的装置来执行，该装置可以通过软件和/ 或硬件实现。本实施例中，该装置可以集成在服务器中。如图1所示，本发明实施例提供的图片处理方法包括如下步骤：FIG. 2 is a flowchart of a picture processing method according to another exemplary embodiment of the present invention. The method can be executed by any apparatus for executing the picture processing method, and the apparatus can be implemented by software and/or hardware. In this embodiment, the device may be integrated in the server. As shown in FIG. 1, the image processing method provided by the embodiment of the present invention includes the following steps:

步骤101，对待检测图片进行预处理，获得处理后的图片。Step 101: Preprocess the image to be detected to obtain a processed image.

在本步骤中，通过对待检测图片进行预处理，使得预处理后的图片能简化后续的识别与确定类型的操作。In this step, by preprocessing the image to be detected, the preprocessed image can simplify subsequent operations of identifying and determining the type.

其中，对待检测图片进行预处理，如图2所示，可选的，可以是通过图像金字塔处理对待检测图片进行预处理，通过将待检测图片根据图像的分辨率以金字塔形状逐级的排列，这样使得待检测图片可以在多尺度下对图片的图像特征和文本特征进行识别和检测。本发明对于图像金字塔的层级和尺度不做任何限制，本领域技术人员可以根据需要进行设定。Among them, the picture to be detected is preprocessed, as shown in Figure 2, optionally, the picture to be detected can be preprocessed by image pyramid processing, and the pictures to be detected are arranged in a pyramid shape step by step according to the resolution of the image, In this way, the image features and text features of the pictures to be detected can be recognized and detected at multiple scales. The present invention does not impose any limitation on the level and scale of the image pyramid, and those skilled in the art can set it as required.

在本步骤中，通过对待检测图片进行图像金字塔的预处理，将待检测图片的信息分为图像特征信息和文本特征信息，并将图像特征信息和文本特征信息分别分为不同多尺度，这样可以使得预处理后的图片更有利于后续的检测识别。In this step, by performing image pyramid preprocessing on the image to be detected, the information of the image to be detected is divided into image feature information and text feature information, and the image feature information and text feature information are divided into different multi-scales, so that the This makes the preprocessed images more conducive to subsequent detection and recognition.

可选的，对待检测图片进行图像金字塔处理之前，还可以对待检测图片进行图像颜色矫正处理，获得矫正后的图片；这样，服务器将会对矫正后的图片进行图像金字塔处理，获得处理后的图片。Optionally, before performing image pyramid processing on the image to be detected, image color correction processing may also be performed on the image to be detected to obtain a corrected image; in this way, the server will perform image pyramid processing on the corrected image to obtain a processed image. .

例如，当待检测图片为海报图片时，由于海报图片中的文本特征信息较多，且为了在海报中突出不同的信息，可能会将不同的字使用不同的颜色来达到醒目的效果，这时海报的文本中可能会有多种颜色，此时，通过对海报图片进行图像颜色矫正处理，可以将海报中的多种颜色压缩为较单一的颜色，抛弃海报中较为复杂的颜色或色调，这样可以使得文本中的颜色较统一。进而将文本颜色矫正处理后的图片进行图像金字塔处理，在对待检测图片进行图像金字塔处理时，由于矫正后的图片颜色比较单一，因此图像金字塔处理时的操作更简便。For example, when the image to be detected is a poster image, since there is a lot of text feature information in the poster image, and in order to highlight different information in the poster, different words may be used in different colors to achieve a striking effect. There may be multiple colors in the text of the poster. At this time, by performing image color correction processing on the poster image, the multiple colors in the poster can be compressed into a single color, and the more complex colors or tones in the poster can be discarded. It can make the color in the text more uniform. Then, the image after color correction is processed by image pyramid. When the image to be detected is processed by image pyramid, since the color of the corrected image is relatively single, the operation during image pyramid processing is simpler.

通过对待检测图片进行图像颜色矫正处理，剔除冗余的颜色信息，可以使得待检测图片的颜色较单一，进而使得在对矫正处理后的图片进行图像金子塔处理时，由于图片的颜色单一，使得图片处理的过程更简便，进而提高图片检测的速率。By performing image color correction processing on the image to be detected, and eliminating redundant color information, the color of the image to be detected can be made relatively single, so that when the corrected image is processed by image pyramid, the color of the image is single, so that the color of the image is single. The process of image processing is simpler, thereby improving the speed of image detection.

步骤102，从处理后的图片中，分别提取图像特征信息和文本特征信息。Step 102: Extract image feature information and text feature information from the processed picture, respectively.

在本步骤中，通过对待检测图片进行预处理后，将从预处理后的图片中分别提取图像特征信息和文本特征信息，其中，图像特征信息可以为待检测图片中的图像的特征信息，例如，图像特征信息可以为人物图像，动物图像或景色图像等，具体的可以根据需要预先设定多种图像特征信息，当进行提取图像特征信息时，可以根据预设定的图像特征信息进行检测和识别，进而进行图像特征信息提取。文本特征信息可以为文字或表格等信息，具体的可以根据需要预先设定多种文本特征信息，当进行提取文本特征信息时，可以根据预设定的文本特征信息进行检测和识别，进而进行文本特征信息提取。In this step, after preprocessing the image to be detected, image feature information and text feature information are respectively extracted from the preprocessed image, where the image feature information may be the feature information of the image in the image to be detected, for example , the image feature information can be a person image, an animal image or a scenery image, etc. Specifically, a variety of image feature information can be preset according to needs. When extracting image feature information, detection and detection can be performed according to the preset image feature information. identification, and then extract image feature information. The text feature information can be information such as text or tables. Specifically, a variety of text feature information can be preset as needed. When extracting text feature information, detection and recognition can be performed according to the preset text feature information, and then the text Feature information extraction.

另外，从预处理后的图片中提取图像特征信息时，可以通过多级CNN提取。In addition, when extracting image feature information from preprocessed pictures, it can be extracted by multi-level CNN.

下面先介绍一下CNN网络，以及CNN网络如何进行图像识别和特征提取。Let's first introduce the CNN network and how the CNN network performs image recognition and feature extraction.

CNN网络本质上是一种前向(feed-forward)神经网络。图3为根据CNN 网络对图片中图像识别算法的示意图，如图3所示，CNN网络主要由卷积层 (Convolutional layers)，降采样层(Pooling layers)以及全连接层(Fully connected layers)三部分结构组成。卷积神经网络的结构特征主要表现在两个方面：首先，CNN网络的神经元之间是局部连接的，而非全连接；其次，CNN网络的部分神经元之间共享权重。卷积神经网络的局部连接属性和权重共享权重属性可有效对图像空间特征进行提取，并且通过降低权重参数数量有效降低网络模型复杂度。CNN network is essentially a feed-forward neural network. Figure 3 is a schematic diagram of the image recognition algorithm in the picture according to the CNN network. As shown in Figure 3, the CNN network mainly consists of three convolutional layers (Convolutional layers), downsampling layers (Pooling layers) and fully connected layers (Fully connected layers). part of the structure. The structural characteristics of convolutional neural networks are mainly manifested in two aspects: first, the neurons of the CNN network are partially connected, not fully connected; second, the weights are shared between some neurons of the CNN network. The local connection attribute and weight sharing weight attribute of convolutional neural network can effectively extract image spatial features, and effectively reduce the complexity of the network model by reducing the number of weight parameters.

利用卷积神经网络中邻近层的神经元之间的局部连接属性，CNN网络可有效对二维图片中的局部图像特征信息进行提取。图4为CNN网络的局部连接和共享权重的结构示意图，如图4所示，在CNN网络中，部分神经元之间的连接被复制到整个当前层中，换而言之，神经元之间共享连接的权重(weights)和偏差(biases)。利用CNN网络的局部连接和共享权重的特殊结构，使得CNN网络在图片处理中具有较好的泛化能力和性能优势。Using the local connection properties between neurons in adjacent layers in convolutional neural networks, CNN networks can effectively extract local image feature information in two-dimensional images. Figure 4 is a schematic diagram of the structure of the local connections and shared weights of the CNN network. As shown in Figure 4, in the CNN network, the connections between some neurons are copied to the entire current layer, in other words, between neurons Shared connection weights and biases. Using the special structure of local connections and shared weights of CNN network, CNN network has better generalization ability and performance advantages in image processing.

通常情况下，卷积神经网络包含一个卷积层和一个降采样层，而深度卷积神经网络(Deep Convolution Neural Network,DCNN)由若干个卷积层和降采样层堆叠而成，这样可以实现深层网络结构。以传统CNN网络为例，每一个卷积层C_l，对前一层的输入数据或降采样层S_l-1所输出的特征数据通过与一个可学习的卷积核通过卷积后加权求和得到，由此可知，卷积层C_l的一个特征图(Feature map)可通过下式求得：Typically, a convolutional neural network consists of a convolutional layer and a downsampling layer, while a deep convolutional neural network (DCNN) consists of several convolutional layers and downsampling layers stacked, which can achieve Deep network structure. Taking the traditional CNN network as an example, each convolutional layer C _l , weights the input data of the previous layer or the output feature data of the down-sampling layer S _l-1 with a learnable convolution kernel through convolution and weighting. and obtained, it can be seen that a feature map (Feature map) of the _{convolutional} layer C1 can be obtained by the following formula:

其中，

表示卷积层C_l的第j个特征图中位置(x,y)处的神经元的特征值； m表示与S_l-1层中当前卷积成第j个特征图相连接的特征图；

表示与当前神经元与S_l-1层第m个特征图卷积核在位置(p,q)上的权重值；P_l和Q_l分别表示二维卷积核的在两个方向上的尺寸；b_lj表示卷积层C_l的第j个特征的偏置量。g(·) 表示激活函数，一般来说可使用sigmoid或tanh函数，分别可表示为：in,

Represents the feature value of the neuron at the position (x, y) in the jth feature map of the convolutional layer C _l ; m represents the feature map connected to the jth feature map currently convolved in the S _l-1 layer ;

Represents the weight value of the current neuron and the m-th feature map convolution kernel of the S _l-1 layer at the position (p, q); P _l and Q _l respectively represent the two-dimensional convolution kernel in two directions. size; b _lj represents the bias of the jth feature of the convolutional layer C _l . g( ) represents the activation function. Generally speaking, the sigmoid or tanh function can be used, which can be expressed as:

卷积层输出的每一个特征图都是该层所对应的上一层的特征图与不同卷积核卷积加权求和的结果。Each feature map output by the convolutional layer is the result of the weighted summation of the feature map of the previous layer corresponding to the layer and different convolution kernels.

降采样层是卷积神经网络的另一个重要组成部分，通过对卷积层得到的特征图进行降采样，以实现特征信息的尺度不变性。每一个降采样层分别对应一个卷积层，降采样层中的神经元通过降采样函数实现对卷积层的降采样。 CNN网络中最常用的降采样策略是基于max pooling的降采样，max pooling 一般可表示为：The down-sampling layer is another important part of the convolutional neural network. By down-sampling the feature map obtained by the convolution layer, the scale invariance of the feature information can be achieved. Each downsampling layer corresponds to a convolutional layer, and the neurons in the downsampling layer implement downsampling on the convolutional layer through the downsampling function. The most commonly used downsampling strategy in CNN networks is downsampling based on max pooling, which can generally be expressed as:

其中，u(n,1)为对于卷积层的窗函数，a_j表示邻域最大值。Among them, u(n,1) is the window function for the convolutional layer, and a _j represents the maximum value of the neighborhood.

CNN网络包含一个全连接层，一般使用逻辑回归(Logistic Regression,LR) 作为最终特征信息输出的分类器。LR是一种基于概率统计的分类器，利用概率分数计算类别变量和输入变量之间的相关性作为预测值输出。在一般深度 CNN网络中，使用Softmax回归作为最终输出层的LR分类器实现多分类任务。Softmax回归算法输出预测值和为1，因此可将输出结果视为一组条件概率。对于一个输入特征i，其满足输入特征属于类i的概率为：The CNN network contains a fully connected layer, and generally uses Logistic Regression (LR) as the classifier for the final feature information output. LR is a probability statistics-based classifier that uses probability scores to calculate the correlation between categorical variables and input variables as a predicted value output. In general deep CNN networks, the multi-classification task is implemented using Softmax regression as the LR classifier of the final output layer. The output of the Softmax regression algorithm sums the predicted values to 1, so the output can be regarded as a set of conditional probabilities. For an input feature i, the probability that the input feature belongs to class i is:

其中，W和b分别表示LR层的权重和偏置参数，W_i和b_i分别表示与类i对应的权重和偏置参数，Y为LR分类器输出的类别结果，s表示Softmax函数。 LR层的输出维度为LR分类器需要识别的类别数。将LR层作为深度卷积神经网络的顶层，则输入特征V为深度CNN网络最后一层卷积网络的输出特征向量。Among them, W and b represent the weight and bias parameters of the LR layer, respectively, Wi and b _i represent the weight and bias parameters corresponding to class _i , respectively, Y is the class result output by the LR classifier, and s represents the Softmax function. The output dimension of the LR layer is the number of categories that the LR classifier needs to recognize. Taking the LR layer as the top layer of the deep convolutional neural network, the input feature V is the output feature vector of the convolutional network of the last layer of the deep CNN network.

根据上述CNN网络中的全连接层获得的图像特征信息来确定图片的类型。The type of the picture is determined according to the image feature information obtained by the fully connected layer in the above CNN network.

在本步骤中，在现有技术的基础上将CNN网络分为多级任务来对图片中的图像特征信息进行提取，下面详细介绍多级CNN网络如何实现对经过图像金字塔处理的图片中的图像特征信息提取。In this step, on the basis of the existing technology, the CNN network is divided into multi-level tasks to extract the image feature information in the picture. Feature information extraction.

具体的，继续参考图2，多级CNN网络包括三级网络，分别为第一级网络、第二级网络和第三级网络，以实现逐步对图像特征信息的提取。首先，将处理后的图片输入至第一级CNN网络中，得到与处理后的图片对应的第一映射图像以及处理后的图片中目标图像的第一区域框坐标；具体的，第一级 CNN网络可以为P网络(Proposal Network)，P网络为包含5个卷积层和4 个池化层的全卷积神经网络(Fully Convolution Neural Network,FCNN)和一个全连接层。卷积核尺寸的选取需要根据包含需要检测的敏感信息的图像特征的尺寸进行设定，这样可以降低由于图片扭曲或拉伸对图像特征提取的影响。第一级CNN网络可以通过对图片中的图像特征信息进行全局粗略提取，输出第一映射图像，第一映射图像为根据第一级CNN网络从图片中图像特征信息提取出的所需的图像，第一级CNN网络根据第一映射图像，在不同长宽比下生成目标图像的第一区域框，第一映射图像在第一区域框中，同时输出目标图像的第一区域框的坐标，该坐标可以为矩形第一区域框的两个对角的坐标(a，b)和(c，d)，将第一映射图像和目标图像的第一区域框的坐标作为下一级CNN网络的输入数据。Specifically, continuing to refer to FIG. 2 , the multi-level CNN network includes three-level networks, namely, a first-level network, a second-level network, and a third-level network, so as to realize the step-by-step extraction of image feature information. First, the processed picture is input into the first-level CNN network, and the first mapping image corresponding to the processed picture and the coordinates of the first region frame of the target image in the processed picture are obtained; specifically, the first-level CNN The network can be a P network (Proposal Network), and the P network is a fully convolutional neural network (FCNN) including 5 convolutional layers and 4 pooling layers and a fully connected layer. The selection of the size of the convolution kernel needs to be set according to the size of the image features containing the sensitive information to be detected, which can reduce the influence of image distortion or stretching on the image feature extraction. The first-level CNN network can roughly extract the image feature information in the picture globally, and output the first mapping image. The first mapping image is the required image extracted from the image feature information in the picture according to the first-level CNN network. The first-level CNN network generates the first area frame of the target image under different aspect ratios according to the first mapping image, the first mapping image is in the first area frame, and outputs the coordinates of the first area frame of the target image at the same time. The coordinates can be the coordinates (a, b) and (c, d) of the two diagonal corners of the rectangular first area frame, and the coordinates of the first area frame of the first mapped image and the target image are used as the input of the next-level CNN network data.

进而，将第一映射图像和第一区域框坐标输入至第二级CNN网络中，得到与处理后的图片对应的第二映射图像以及处理后的图片中目标图像的第二区域框坐标。具体的，第二级CNN网络为R网络(Refine Network),R网络为包含4个卷积层和3个池化层的FCNN网络，将第一映射图像和第一区域框坐标输入至第二级CNN网络中，通过对输入的数据进一步进行图像特征信息的提取，也即对第一映射图像进行更精确的识别，通过R网络的精确识别，获得第二映射图像，同时获取图片中目标图像的第二区域框坐标，该坐标可以为矩形第二区域框的两个对角的坐标为(e，f)和(g，h)，将第二映射图像和目标图像的第二区域框的坐标作为下一级CNN网络的输入数据，其中，R网络输出的矩形第二区域框的面积小于P网络输出的矩形第二区域框的面积，通过R网络对图像特征的提取不仅可以使得图像特征信息较P网络提取的图像特征信息更精确，还可以降低P网络输出的图像特征信息的错误率。Further, the first mapping image and the coordinates of the first region frame are input into the second-level CNN network to obtain the second mapping image corresponding to the processed picture and the second region box coordinates of the target image in the processed picture. Specifically, the second-level CNN network is an R network (Refine Network), and the R network is an FCNN network including 4 convolutional layers and 3 pooling layers. The first mapping image and the coordinates of the first region frame are input to the second In the high-level CNN network, the image feature information is further extracted from the input data, that is, the first mapped image is more accurately identified, the second mapped image is obtained through the accurate identification of the R network, and the target image in the picture is obtained at the same time. The coordinates of the second area frame of The coordinates are used as the input data of the next-level CNN network. The area of the rectangular second area frame output by the R network is smaller than the area of the rectangular second area frame output by the P network. The extraction of image features by the R network can not only make the image features The information is more accurate than the image feature information extracted by the P network, and it can also reduce the error rate of the image feature information output by the P network.

将第二映射图像和第二区域框坐标输入至第三级CNN网络中，得到图像特征信息，具体的，第三级CNN网络可以为O网络(Output Network),O网络包含4个卷积层和3个池化层，将第二映射图像和第二区域框坐标输入至第三级CNN网络中，通过对输入数据进一步进行图像特征信息的提取，也即对第二映射图像进行更精确的识别，进而确定出待检测图片是否为含有所需的图像特征信息的图片，并在O网络实现多级任务联合，以对第二区域框中的目标图像进行检测，并回归矩形目标图像的区域框，精确出矩形区域框的对角坐标(x，y)和(m，n)，通过O网络对图像特征信息的提取，不仅可以获取待检测图片的识别结果，以及获得图像特征信息的具体坐标位置，还可以降低R网络输出的图像特征信息的错误率。Input the second map image and the coordinates of the second area frame into the third-level CNN network to obtain image feature information. Specifically, the third-level CNN network can be an O network (Output Network), and the O network includes 4 convolutional layers. and 3 pooling layers, input the second map image and the coordinates of the second area frame into the third-level CNN network, and further extract the image feature information by extracting the input data, that is, the second map image is more accurate. Identify, and then determine whether the picture to be detected is a picture containing the required image feature information, and implement multi-level task union in the O network to detect the target image in the second area frame, and return to the area of the rectangular target image frame, the diagonal coordinates (x, y) and (m, n) of the rectangular area frame are precisely determined, and the extraction of image feature information through the O network can not only obtain the recognition result of the image to be detected, but also obtain the specific image feature information. The coordinate position can also reduce the error rate of the image feature information output by the R network.

通过多级CNN网络对图像特征信息的提取，最后输出对图片检测的6 维特征信息，6维特征信息包括识别结果和图像特征的精确位置坐标的信息，识别结果可以表示为(1,0)或(0,1)，分别代表通过图像特征信息对待检测图片的识别结果为正常图片或异常图片。The multi-level CNN network extracts the image feature information, and finally outputs the 6-dimensional feature information of the image detection. The 6-dimensional feature information includes the recognition result and the precise position coordinates of the image feature. The recognition result can be expressed as (1,0) or (0, 1), respectively representing that the recognition result of the image to be detected through the image feature information is a normal image or an abnormal image.

在本步骤中，在现有技术中的图像特征信息的提取的基础上，将CNN网络分为多级网络，进而对图片中的图像特征信息进行逐级提取，从而使得提取出的图像特征信息更准确，且可以大大的提高图像特征提取的速率。In this step, based on the extraction of image feature information in the prior art, the CNN network is divided into multi-level networks, and then the image feature information in the picture is extracted step by step, so that the extracted image feature information It is more accurate and can greatly improve the rate of image feature extraction.

另外，从处理后的图片中，提取文本特征信息，可以通过多级卷积神经网络CNN和循环神经网络(Recurrent neural Network，RNN)从处理后的图片中，提取文本特征信息。具体的，通过多级CNN网络和RNN网络实现对敏感的文本信息进行提取。其中，CNN网络采用DenseNet网络模型结构，实现文本特征检测，RNN网络通过双层双向RNN网络实现对文字信息的识别。进而输出文本特征检测与识别的结果，该结果为各种敏感词的高维特征信息，敏感词为预先存储的文本信息，例如：“刷单”、“种菜”、“诚信”、“刷”或“好评”等文本可以定义为敏感词。In addition, to extract text feature information from the processed pictures, the text feature information can be extracted from the processed pictures through a multi-level convolutional neural network CNN and a Recurrent Neural Network (RNN). Specifically, the sensitive text information is extracted through multi-level CNN network and RNN network. Among them, the CNN network adopts the DenseNet network model structure to realize text feature detection, and the RNN network realizes the recognition of text information through a double-layer bidirectional RNN network. Then output the result of text feature detection and recognition, which is the high-dimensional feature information of various sensitive words, and the sensitive words are pre-stored text information, such as: "brush orders", "plant vegetables", "integrity", "brush" ” or “good reviews” can be defined as sensitive words.

在本步骤中，包括两个神经网络，分别为多级CNN网络和DenseNet+RNN 网络，多级CNN网络用于提取图像特征信息，DenseNet+RNN网络用于提取文本特征信息，神经网络的训练利用标签的数据集和反向传播算法 (Back-propagation,BP)实现。BP算法作为所有神经网络训练的核心基础算法，对于神经网络的训练需要通过定义合适的损失函数(Costfunction)定义误差度量，在本技术方案中为了实现网络参数的快速回归，使用Adadelta算子和随训练次数逐步减小学习率的训练策略。对于一般深度学习大数据量的情况，一般采用批量梯度下降(mini-batch)策略对CNN网络进行学习。对于输入的一个mini-batch，其损失函数可定义为：In this step, two neural networks are included, namely, a multi-level CNN network and a DenseNet+RNN network. The multi-level CNN network is used to extract image feature information, and the DenseNet+RNN network is used to extract text feature information. The training of the neural network uses Labeled dataset and Back-propagation (BP) implementation. The BP algorithm is the core basic algorithm of all neural network training. For the training of the neural network, the error metric needs to be defined by defining a suitable loss function (Costfunction). A training strategy where the number of training sessions gradually decreases the learning rate. For general deep learning with a large amount of data, a batch gradient descent (mini-batch) strategy is generally used to learn the CNN network. For a mini-batch of input, the loss function can be defined as:

其中，m为批量梯度下降尺寸，x_i和z_i分别表示批量数据的第i个数据的输出标签和类标签。Among them, m is the batch gradient descent size, and x _i and z _i represent the output label and class label of the ith data of the batch data, respectively.

在本步骤中，通过多级卷积神经网络CNN，从处理后的图片中，提取图像特征信息，获得待检测图片中图像的识别结果及图像的一个6维图像特征信息，通过多级卷积神经网络CNN和循环神经网络RNN，从处理后的图片中，提取文本特征信息，获得文本的高维特征信息。同时将图片中的图像和文本进行识别与检测，可以极大的改进现有技术中CNN网络算法对图片中的图像的变种和衍生图像无法有效识别的问题，另外，文本特征信息和图像特征信息的同时提取和识别可以有效避免现有技术中CNN网络算法对于衍生图像的检测和识别，这样对图片类型的确定提供更全面的依据，且确定出的图片类型更准确。In this step, through the multi-level convolutional neural network CNN, the image feature information is extracted from the processed image, and the recognition result of the image in the image to be detected and a 6-dimensional image feature information of the image are obtained. The neural network CNN and the cyclic neural network RNN extract the text feature information from the processed pictures to obtain the high-dimensional feature information of the text. At the same time, the identification and detection of images and texts in pictures can greatly improve the problem that the CNN network algorithm in the prior art cannot effectively identify variants of images and derived images in pictures. In addition, text feature information and image feature information Simultaneous extraction and identification can effectively avoid the detection and identification of derived images by the CNN network algorithm in the prior art, thus providing a more comprehensive basis for the determination of the image type, and the determined image type is more accurate.

步骤103，根据图像特征信息和文本特征信息，确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片。Step 103: Determine the type of the picture to be detected according to the image feature information and the text feature information, where the type of the picture includes a normal picture or an abnormal picture.

在本步骤中，通过上述获得的图像特征信息和文本特征信息，进而根据图像特征信息和文本特征信来确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片，正常图片为所需要的图片，异常图片是相对于正常图片进行定义的，当然对于不同的用户，对正常图片的定义并不相同，也即，图片G对于用户A来说为正常用户，但对于用户B来说为异常图片，因此，对于图片的图像特征和文本特征的设置，需要根据用户的需求来设定，本发明不做任何限制。In this step, the image feature information and text feature information obtained above are used to determine the type of the picture to be detected according to the image feature information and the text feature information, wherein the type of the picture includes a normal picture or an abnormal picture, and the normal picture is all The required pictures and abnormal pictures are defined relative to normal pictures. Of course, for different users, the definition of normal pictures is not the same, that is, picture G is a normal user for user A, but for user B It is an abnormal picture. Therefore, the setting of the image feature and the text feature of the picture needs to be set according to the needs of the user, and the present invention does not make any limitation.

本发明实施例提供的图片处理方法，通过对待检测图片进行预处理，获得处理后的图片；进而从处理后的图片中，分别提取图像特征信息和文本特征信息；并根据图像特征信息和文本特征信息，确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片；由于通过线对待检测的图片进行预处理，使得预处理后的图片更方便进行特征提取与图片类型的确定，另外，由于通过对预处理后的图片分别进行图像特征信息和文本特征信息的提取，可以提升待检测图片的检测和识别效果，而且提高了图片检测的准确度。In the image processing method provided by the embodiment of the present invention, a processed image is obtained by preprocessing a to-be-detected image; further, image feature information and text feature information are respectively extracted from the processed image; and according to the image feature information and text feature information to determine the type of the picture to be detected, wherein the type of the picture includes a normal picture or an abnormal picture; since the picture to be detected is preprocessed through the line, the preprocessed picture is more convenient for feature extraction and picture type determination. , because the image feature information and the text feature information are extracted respectively for the preprocessed picture, the detection and recognition effect of the picture to be detected can be improved, and the accuracy of the picture detection can be improved.

图5是本发明根据又一示例性实施例示出的一种图片处理方法的流程图，在图1的基础上，对如何根据图像特征信息和文本特征信息，确定待检测图片的类型的过程，进行详细说明，如图5所示，本实施例的方法可以包括：FIG. 5 is a flowchart of a picture processing method according to another exemplary embodiment of the present invention. On the basis of FIG. 1 , on the process of how to determine the type of the picture to be detected according to the image feature information and the text feature information, Describing in detail, as shown in FIG. 5 , the method of this embodiment may include:

步骤501，对待检测图片进行预处理，获得处理后的图片。Step 501: Preprocess the image to be detected to obtain a processed image.

步骤502，从处理后的图片中，分别提取图像特征信息和文本特征信息。Step 502: Extract image feature information and text feature information from the processed picture, respectively.

步骤501-步骤502与步骤101-步骤102类似，此处不再赘述。Steps 501 to 502 are similar to steps 101 to 102, and are not repeated here.

步骤503，对图像特征信息和文本特征信息进行拼接处理，获得拼接特征信息。Step 503: Perform splicing processing on the image feature information and the text feature information to obtain splicing feature information.

在本步骤中，通过将提取到的图像特征信息和文本特征信息进行拼接，获得拼接特征信息，具体的，可以将图像的6维特征信息与文本的高维特征信息进行拼接，获得拼接的特征信息，如图2中所示。例如拼接的特征信息可以为(1,0,x,y,m,n)+(1，0，0，1…)，其中第二项中的0，1表示对应的敏感词(如：刷单，种菜，诚信，刷，好评…)的检测识别结果。In this step, the splicing feature information is obtained by splicing the extracted image feature information and text feature information. Specifically, the 6-dimensional feature information of the image and the high-dimensional feature information of the text can be spliced to obtain the spliced feature. information, as shown in Figure 2. For example, the feature information of splicing can be (1, 0, x, y, m, n)+(1, 0, 0, 1...), where 0 and 1 in the second item represent the corresponding sensitive words (eg: brush Single, growing vegetables, integrity, brushing, praise...) detection and recognition results.

步骤504，对拼接特征信息进行流形数据的降维处理，获得拼接特征信息的嵌入特征。In step 504, the dimensionality reduction processing of the manifold data is performed on the splicing feature information to obtain the embedded feature of the splicing feature information.

在本步骤中，在获得拼接特征信息后，将拼接特征信息进行流形数据的降维处理，获得拼接特征信息的嵌入特征，其中，流形数据的降维处理为通过流行学习算法对拼接特征信息进行数据的降维处理，具体的，流行学习 (Manifold embedding,ME)算法利用样本点及其同类最近邻样本和异类最近邻样本构建样本局部块，并在局部块上利用聚类思想，实现数据的降维。并且引入迁移学习方法，保持数据结构特征。ME算法的最优变换矩阵W^*可表示为：In this step, after the splicing feature information is obtained, the splicing feature information is subjected to dimensionality reduction processing of the manifold data to obtain the embedded feature of the splicing feature information, wherein the dimensionality reduction processing of the manifold data is to use a popular learning algorithm to reduce the splicing feature. The information is used to reduce the dimensionality of the data. Specifically, the popular learning (Manifold embedding, ME) algorithm uses the sample points and their similar nearest neighbor samples and heterogeneous nearest neighbor samples to construct a local sample block, and uses the clustering idea on the local block to achieve Dimensionality reduction of data. And the transfer learning method is introduced to maintain the data structure characteristics. The optimal transformation matrix W ^* of the ME algorithm can be expressed as:

其中，X为训练样本集构成的矩阵，P为主成分分析变换矩阵。第二项为迁移正则项，用以在保持特征提取降维后原始数据的结构的主特征，使从训练样本学习的判别信息可有效迁移至测试数据。Among them, X is the matrix formed by the training sample set, and P is the principal component analysis transformation matrix. The second term is the transfer regular term, which is used to maintain the main features of the original data structure after feature extraction and dimension reduction, so that the discriminant information learned from the training samples can be effectively transferred to the test data.

在局部块的数据降维中，ME算法期望样本点与其余同类样本在低维空间中尽可能的聚类，而与异类样本在低维空间中尽可能分离，并且利用线性操作完成数据块上的局部判别。根据降维准则得到局部优化表达矩阵。在局部优化的基础上，利用整体排列方法，假设对于每一个局部块，其低维表示都是全局低维坐标的一个局部选择。由此，定义选择矩阵，并利用全局低维坐标与选择矩阵表示低维局部块。通过叠加全部由全局低维坐标与选择矩阵表示的低维局部块，得到全局排列。以迭代方法计算全局优化表达矩阵，并利用标准特征值分解或广义特征值分解方法计算投影变换矩阵。从而实现以样本类别信息的数据降维以实现特征提取。In the data dimensionality reduction of the local block, the ME algorithm expects the sample points to be clustered as much as possible in the low-dimensional space with other similar samples, and separated from the heterogeneous samples in the low-dimensional space as much as possible, and the linear operation is used to complete the data block. local discrimination. According to the dimensionality reduction criterion, the local optimized expression matrix is obtained. On the basis of local optimization, the global permutation method is used, assuming that for each local block, its low-dimensional representation is a local choice of global low-dimensional coordinates. From this, a selection matrix is defined, and a low-dimensional local block is represented by the global low-dimensional coordinates and the selection matrix. A global arrangement is obtained by superimposing low-dimensional local patches all represented by global low-dimensional coordinates and selection matrices. The global optimized expression matrix is calculated iteratively, and the projection transformation matrix is calculated using standard eigenvalue decomposition or generalized eigenvalue decomposition methods. In this way, dimensionality reduction based on the data of sample category information is realized to realize feature extraction.

在本步骤中，利用流形学习算法和迁移学习方法对图像特征信息和文本特征信息进行信息交互和数据降维，挖掘图像特征信息和文本特征信息的相关性，并通过利用图像特征信息和文本特征信息相关性中的有效信息来进行图片的识别。另外，通过对拼接信息进行流形数据的降维处理，可以提高拼接特征信息的嵌入特征更为准确。In this step, information interaction and data dimension reduction are performed on image feature information and text feature information by using manifold learning algorithm and transfer learning method, and the correlation between image feature information and text feature information is mined. The effective information in the feature information correlation is used to identify the picture. In addition, by performing dimensionality reduction processing of manifold data on the splicing information, the embedded feature of the splicing feature information can be improved to be more accurate.

步骤505，根据拼接特征信息的嵌入特征和预先确定的超球，确定待检测图片的类型。Step 505: Determine the type of the picture to be detected according to the embedded feature of the splicing feature information and the predetermined hypersphere.

可选的，在根据拼接特征信息的嵌入特征和预先确定的超球，确定待检测图片的类型之前，可以通过如下方式确定超球：获取多个样本图片；分别提取多个样本图片中每个样本图片的映射特征；对各个映射特征进行数据描述，得到超球。Optionally, before determining the type of the image to be detected according to the embedded features of the splicing feature information and the pre-determined hypersphere, the hypersphere may be determined in the following manner: acquiring multiple sample images; extracting each of the multiple sample images respectively; The mapping feature of the sample image; the data description of each mapping feature is performed to obtain a hypersphere.

具体的，通过先获取多个图片，并对获取的图片分别提取多个样本图片中每个样本图片的映射特征，样本图片为正常图片或异常图片，也即包括所需的图像特征信息与文本特征信息，图片的映射特征可以包括异常图片的图像特征信息与文本特征信息。进而对各个图片的映射特征进行数据描述，具体的，对各个映射特征进行数据描述可以通过支持向量数据描述 SVDD算法，根据数据描述的结果，进而得到超球。Specifically, by first obtaining multiple pictures, and extracting the mapping features of each sample picture in the multiple sample pictures from the obtained pictures, the sample pictures are normal pictures or abnormal pictures, that is, the required image feature information and text are included. Feature information, the mapping feature of the picture may include image feature information and text feature information of the abnormal picture. Further, the data description is performed on the mapping features of each image. Specifically, the data description for each mapping feature can be performed through the support vector data description SVDD algorithm, and the hypersphere is obtained according to the result of the data description.

其中，上述超球是根据支持向量数据描述(Support Vector Data Description,SVDD)定义的一个描述数据的闭合决策的边界，具体的，超球由球心a及球半径R＞0定义，该超球应包含样本数据且超球体积尽可能小。因此，定义损失函数：The above-mentioned hypersphere is a boundary of a closed decision to describe data defined according to the Support Vector Data Description (SVDD). Specifically, the hypersphere is defined by the center a and the radius R>0 of the The sample data should be included and the hypersphere volume should be as small as possible. Therefore, define the loss function:

F(R,a)＝R²+C∑_iξ_i F(R,a)=R ² +C∑ _i ξ _i

其中，ξ_i＞0为松弛变量。进一步，类似于支持向量机，引入拉格朗日算子α_i＞0将上述优化问题等价的转化为：Among them, ξ _i >0 is the slack variable. Further, similar to the support vector machine, the Lagrangian operator α _i > 0 is introduced to equivalently transform the above optimization problem into:

其中(·)表示内积运算。由此可求解拉格朗日算子α，确定超球中心为 a＝∑_iα_ix_i且超球半径R可由拉格朗日算子满足0＜α_i＜C的支持向量及超球心 a确定。对于任一数据判断其是否属于当前数据类，则仅需判断当前数据是否包含于当前的超球体，即数据与超球的距离是否小于等于超球半径。where (·) represents the inner product operation. From this, the Lagrangian operator α can be solved, the center of the hypersphere is determined as a=∑ _i α _i x _i and the radius R of the hypersphere can be determined by the Lagrangian operator to satisfy the support vector and hypersphere of 0<α _i <C Heart a sure. For any data to determine whether it belongs to the current data category, it is only necessary to determine whether the current data is contained in the current hypersphere, that is, whether the distance between the data and the hypersphere is less than or equal to the hypersphere radius.

对于包含负样本的支持向量数据描述，脚标i和j表示异常图片数据点， l和m表示正常图片数据点，其损失函数定义为：For the support vector data description containing negative samples, subscripts i and j represent abnormal image data points, l and m represent normal image data points, and the loss function is defined as:

F(R,a)＝R²+C₁∑_iξ_i+C₂∑_lξ_l F(R,a)=R ² +C ₁ ∑ _i ξ _i +C ₂ ∑ _l ξ _l

其中，ξ_i＞0和ξ_l＞0分别为异常图片数据点与正常图片数据点的松弛变量。同样的，引入拉格朗日算子则等价优化问题为：Among them, ξ _i > 0 and ξ _l > 0 are the slack variables of abnormal picture data points and normal picture data points, respectively. Similarly, introducing the Lagrangian operator, the equivalent optimization problem is:

超球中心可表示为a＝∑_iα_ix_i-∑_lα_lx_l,K(,)表示和函数，本发明技术方案中使用径向基和函数(RBF)。核函数方法引入SVDD，通过利用核函数代替内积运算将数据映射到可能更高维度的特征空间。通过核函数的引入，支持向量数据描述算法可更灵活有效地进行数据描述。The center of the hypersphere can be expressed as a=∑ _i α _i x _i -∑ _l α _l x _l , K(,) represents the sum function, and the radial basis sum function (RBF) is used in the technical solution of the present invention. The kernel function method is introduced into SVDD, which maps the data to a possibly higher dimensional feature space by using the kernel function instead of the inner product operation. Through the introduction of kernel function, the support vector data description algorithm can describe data more flexibly and effectively.

在本步骤中，通过利用SVDD数据描述方法对异常图片特征信息进行描述，可以作为图片数据分类器，相较于传统支持向量机(Support Vector Machine，SVM)分类器或全连接网络描述的线性分类器，SVDD数据描述使用特征空间中的闭合超球实现图片数据描述和分类，可以极大的提升算法的分类准确度和鲁棒性，并且改善了正常图片的误杀问题。In this step, the abnormal picture feature information is described by using the SVDD data description method, which can be used as a picture data classifier. Compared with the traditional Support Vector Machine (SVM) classifier or the linear classification described by the fully connected network The SVDD data description uses a closed hypersphere in the feature space to describe and classify image data, which can greatly improve the classification accuracy and robustness of the algorithm, and improve the manslaughter problem of normal images.

在确定出超球后，对于待检测的图片，将根据映射特征和预先训练确定的超球，确定待检测图片的类型，可以包括：判断映射特征是否处于超球内；若映射特征处于超球内，则确定待检测图片为异常图片；若映射特征不处于超球内，则确定待检测图片为正常图片。After the hypersphere is determined, for the image to be detected, the type of the image to be detected will be determined according to the mapping feature and the hypersphere determined by pre-training, which may include: judging whether the mapping feature is within the hypersphere; if the mapping feature is within the hypersphere If the mapping feature is not within the hypersphere, it is determined that the to-be-detected picture is a normal picture.

具体的，根据预先确定的超球，判断图片的映射特征是否处于超球内，例如，映射特征为C，若映射特征C不在预先确定的超球内，则确定待检测图片为正常图片，若映射特征C在超球内，则确定待检测图片为异常图片。由于通过预先确定的超球来判断待检测图片的映射特征，可以减少对待检测图片的类型确定上花费的时间，也即提高了图片类型确定的速率。Specifically, according to a predetermined hypersphere, it is determined whether the mapping feature of the picture is within the hypersphere. For example, the mapping feature is C. If the mapping feature C is not within the predetermined hypersphere, it is determined that the picture to be detected is a normal picture. If the mapping feature C is within the hypersphere, it is determined that the picture to be detected is an abnormal picture. Since the mapping feature of the picture to be detected is determined by the predetermined hypersphere, the time spent on determining the type of the picture to be detected can be reduced, that is, the rate of determining the type of the picture is increased.

本实施例提供的图片处理方法，通过对图像特征信息和文本特征信息进行拼接，获取拼接特征信息，进而又对拼接特征信息进行行流形数据的降维处理，获得拼接特征信息的嵌入特征，又根据流形降维和数据描述的映射特征和预先确定的超球，进而确定待检测图片的类型；由于通过对拼接特征信息进行流形数据的降维处理，可以实现以样本类别信息的数据降维以实现特征提取，根据预先确定的超球来判断待检测图片的类型，可以极大的提升算法的分类准确度和鲁棒性，并且改善了正常图片的误杀问题，同时可以减少对待检测图片的类型确定上花费的时间，也即提高了图片类型确定的速率。The image processing method provided in this embodiment obtains the splicing feature information by splicing the image feature information and the text feature information, and then performs dimensionality reduction processing of the line manifold data on the splicing feature information to obtain the embedded feature of the splicing feature information, According to the manifold dimension reduction and the mapping features described by the data and the pre-determined hypersphere, the type of the image to be detected is determined; because the dimensionality reduction of the manifold data is performed on the splicing feature information, the data reduction of the sample category information can be realized. It can greatly improve the classification accuracy and robustness of the algorithm, improve the manslaughter problem of normal pictures, and reduce the number of pictures to be detected. The time spent on determining the type of the picture, that is, the rate of determining the type of the picture is increased.

图6是本发明根据一示例性实施例示出的一种图片处理装置的流程图。如图6所示，该图片处理装置可以包括：预处理模块11、第一提取模块12 和确定模块13。FIG. 6 is a flowchart of a picture processing apparatus according to an exemplary embodiment of the present invention. As shown in FIG. 6 , the image processing apparatus may include: a preprocessing module 11 , a first extraction module 12 and a determination module 13 .

预处理模块11，用于对待检测图片进行预处理，获得处理后的图片；The preprocessing module 11 is used for preprocessing the image to be detected to obtain the processed image;

第一提取模块12，用于从处理后的图片中，分别提取图像特征信息和文本特征信息；The first extraction module 12 is used to extract image feature information and text feature information from the processed picture respectively;

确定模块13，用于根据图像特征信息和文本特征信息，确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片。The determining module 13 is configured to determine the type of the picture to be detected according to the image feature information and the text feature information, wherein the type of the picture includes a normal picture or an abnormal picture.

本发明实施例提供的图片处理装置，预处理模块11通过对待检测图片进行预处理，获得处理后的图片；第一提取模块12从处理后的图片中，分别提取图像特征信息和文本特征信息；确定模块13根据图像特征信息和文本特征信息，确定待检测图片的类型，其中，图片的类型包括正常图片或异常图片；由于通过线对待检测的图片进行预处理，使得预处理后的图片更方便进行特征提取与图片类型的确定，另外，由于通过对预处理后的图片分别进行图像特征信息和文本特征信息的提取，可以提升待检测图片的检测和识别效果，而且提高了图片检测的准确度。In the picture processing device provided by the embodiment of the present invention, the preprocessing module 11 obtains a processed picture by preprocessing the picture to be detected; the first extraction module 12 extracts image feature information and text feature information respectively from the processed picture; The determining module 13 determines the type of the picture to be detected according to the image feature information and the text feature information, wherein, the type of the picture includes a normal picture or an abnormal picture; because the pictures to be detected are preprocessed by lines, the preprocessed pictures are more convenient. Perform feature extraction and image type determination. In addition, by extracting image feature information and text feature information for the preprocessed images, the detection and recognition effects of the images to be detected can be improved, and the accuracy of image detection can be improved. .

图7是本发明根据另一示例性实施例示出的一种图片处理装置的流程图。在图6的基础上该确定模块13包括：拼接子模块131、降维处理子模块 132和确定子模块133。FIG. 7 is a flowchart of a picture processing apparatus according to another exemplary embodiment of the present invention. On the basis of FIG. 6 , the determining module 13 includes: a splicing sub-module 131, a dimensionality reduction processing sub-module 132 and a determining sub-module 133.

拼接子模块131，用于对图像特征信息和文本特征信息进行拼接处理，获得拼接特征信息。The splicing sub-module 131 is used for splicing the image feature information and the text feature information to obtain the splicing feature information.

降维处理子模块132，用于对拼接特征信息进行流形数据的降维处理，获得拼接特征信息的嵌入特征。The dimensionality reduction processing sub-module 132 is configured to perform dimensionality reduction processing of the manifold data on the splicing feature information to obtain the embedded feature of the splicing feature information.

确定子模块133，用于根据拼接特征信息的嵌入特征和预先确定的超球，确定待检测图片的类型。The determination sub-module 133 is configured to determine the type of the picture to be detected according to the embedded feature of the splicing feature information and the predetermined hypersphere.

可选的，确定子模块133，具体用于：Optionally, determine the submodule 133, which is specifically used for:

判断嵌入特征是否处于超球内；Determine whether the embedded feature is in the hypersphere;

若嵌入特征处于超球内，则确定待检测图片为异常图片；If the embedded feature is in the hypersphere, it is determined that the image to be detected is an abnormal image;

若嵌入特征不处于超球内，则确定待检测图片为正常图片。If the embedded feature is not within the hypersphere, it is determined that the image to be detected is a normal image.

本实施例提供的图片处理装置，拼接子模块131通过对图像特征信息和文本特征信息进行拼接，获取拼接特征信息，降维处理子模块132通过对拼接特征信息进行行流形数据的降维处理，获得拼接特征信息的嵌入特征，确定子模块133根据拼接特征信息的嵌入特征和预先确定的超球，进而确定待检测图片的类型；由于通过对拼接特征信息进行流形数据的降维处理，可以实现以样本类别信息的数据降维以实现特征提取，根据预先确定的超球来判断待检测图片的类型，可以极大的提升算法的分类准确度和鲁棒性，并且改善了正常图片的误杀问题，同时可以减少对待检测图片的类型确定上花费的时间，也即提高了图片类型确定的速率。In the image processing device provided in this embodiment, the splicing sub-module 131 obtains splicing feature information by splicing image feature information and text feature information, and the dimensionality reduction processing sub-module 132 performs dimensionality reduction processing of line manifold data on the splicing feature information , obtain the embedded feature of the splicing feature information, and determine the sub-module 133 to determine the type of the image to be detected according to the embedded feature of the splicing feature information and the predetermined hypersphere; It can realize dimensionality reduction with the data of sample category information to realize feature extraction, and judge the type of the picture to be detected according to the pre-determined hypersphere, which can greatly improve the classification accuracy and robustness of the algorithm, and improve the quality of normal pictures. The problem of manslaughter can be reduced, and the time spent on determining the type of the image to be detected can be reduced, that is, the rate of determining the image type can be improved.

可选的，该图片处理装置还包括：获取模块14、第二提取模块15和数据描述模块16，如图8所示。Optionally, the image processing apparatus further includes: an acquisition module 14 , a second extraction module 15 and a data description module 16 , as shown in FIG. 8 .

获取模块14，用于获取多个样本图片；an acquisition module 14, configured to acquire a plurality of sample pictures;

第二提取模块15，用于分别提取多个样本图片中每个样本图片的映射特征；The second extraction module 15 is used to extract the mapping feature of each sample picture in the plurality of sample pictures respectively;

数据描述模块16，用于对各个映射特征进行数据描述，得到超球。The data description module 16 is used for performing data description on each mapping feature to obtain a hypersphere.

可选的，数据描述模块16还用于：Optionally, the data description module 16 is also used for:

通过支持向量数据描述SVDD算法对各个映射特征进行数据描述，得到超球。The data description of each mapping feature is carried out through the support vector data description SVDD algorithm, and the hypersphere is obtained.

可选的，第一提取模块12用于通过多级卷积神经网络CNN，从处理后的图片中，提取图像特征信息。Optionally, the first extraction module 12 is configured to extract image feature information from the processed image through a multi-level convolutional neural network CNN.

可选的，第一提取模块12还用于：Optionally, the first extraction module 12 is also used for:

将处理后的图片输入至第一级CNN网络中，得到与处理后的图片对应的第一映射图像以及处理后的图片中目标图像的第一区域框坐标；Input the processed picture into the first-level CNN network, and obtain the first map image corresponding to the processed picture and the coordinates of the first region frame of the target image in the processed picture;

将第一映射图像和第一区域框坐标输入至第二级CNN网络中，得到与处理后的图片对应的第二映射图像以及处理后的图片中目标图像的第二区域框坐标；Input the first mapping image and the coordinates of the first region frame into the second-level CNN network to obtain the second mapping image corresponding to the processed picture and the second region frame coordinates of the target image in the processed picture;

将第二映射图像和第二区域框坐标输入至第三级CNN网络中，得到图像特征信息。The second map image and the coordinates of the second region frame are input into the third-level CNN network to obtain image feature information.

可选的，第一提取模块12还用于通过多级卷积神经网络CNN和循环神经网络RNN，从处理后的图片中，提取文本特征信息。Optionally, the first extraction module 12 is further configured to extract text feature information from the processed pictures through a multi-level convolutional neural network CNN and a recurrent neural network RNN.

可选的，预处理模块11，还用于对待检测图片进行图像金字塔处理，获得处理后的图片。Optionally, the preprocessing module 11 is further configured to perform image pyramid processing on the picture to be detected to obtain the processed picture.

可选的，该图片处理装置还包括：矫正模块17，如图9所示。Optionally, the image processing apparatus further includes: a correction module 17 , as shown in FIG. 9 .

矫正模块17，用于对待检测图片进行图像颜色矫正处理，获得矫正后的图片；The correction module 17 is used to perform image color correction processing on the picture to be detected to obtain a corrected picture;

预处理模块11，还用于对矫正后的图片进行图像金字塔处理，获得处理后的图片。The preprocessing module 11 is further configured to perform image pyramid processing on the corrected picture to obtain the processed picture.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

图10为本发明实施例提供的一种服务器的结构示意图。图10显示的服务器仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。FIG. 10 is a schematic structural diagram of a server according to an embodiment of the present invention. The server shown in FIG. 10 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present invention.

如图10所示，该服务器可以包括发送器60、处理器61、存储器62和至少一个通信总线63。通信总线63用于实现元件之间的通信连接。存储器62 可能包含高速RAM存储器，也可能还包括非易失性存储NVM，例如至少一个磁盘存储器，存储器62中可以存储各种程序，用于完成各种处理功能以及实现本实施例的方法步骤。另外，该服务器还可以包括接收器64，本实施例中的接收器64可以为相应的具有通信功能和接收信息功能的输入接口，本实施例中的发送器60可以为相应的具有通信功能和发送信息功能的输出接口。可选的，该发送器60和接收器64可以集成在一个通信接口中，也可以分别为独立的两个通信接口。As shown in FIG. 10 , the server may include a transmitter 60 , a processor 61 , a memory 62 and at least one communication bus 63 . The communication bus 63 is used to realize the communication connection between the elements. The memory 62 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory. Various programs may be stored in the memory 62 for performing various processing functions and implementing the method steps of this embodiment. In addition, the server may further include a receiver 64. The receiver 64 in this embodiment may be a corresponding input interface with a communication function and a function of receiving information, and the transmitter 60 in this embodiment may be a corresponding input interface with a communication function and a function of receiving information. Output interface for sending information function. Optionally, the transmitter 60 and the receiver 64 may be integrated into one communication interface, or may be two independent communication interfaces respectively.

另外，存储器62中存储有计算机程序，并且被配置为由处理器61执行，该计算机程序包括用于执行如上图1和图5所示实施例的方法的指令或者执行如上图1和图5所示实施例的方法的指令。In addition, the memory 62 has stored in the memory 62 and is configured to be executed by the processor 61, the computer program comprising instructions for performing the methods of the embodiments shown in FIGS. 1 and 5 above or performing the methods shown in FIGS. Instructions for the method of the illustrated embodiment.

本发明实施例还提供一种计算机可读存储介质，其中，计算机可读存储介质存储有计算机程序，计算机程序使得服务器执行前述图1和图5所示实施例提供的图片处理方法。其中，上述可读存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器 (ROM)，磁存储器，快闪存储器，磁盘或光盘。Embodiments of the present invention further provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program causes the server to execute the image processing methods provided by the embodiments shown in FIG. 1 and FIG. 5 . Wherein, the above-mentioned readable storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM) ), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于计算机可读取存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by program instructions related to hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the steps including the above method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope.

Claims

1. An image processing method, comprising:

preprocessing a picture to be detected to obtain a processed picture;

respectively extracting image characteristic information and text characteristic information from the processed picture;

and determining the type of the picture to be detected according to the image characteristic information and the text characteristic information, wherein the type of the picture comprises a normal picture or an abnormal picture.

2. The method according to claim 1, wherein the determining the type of the picture to be detected according to the image feature information and the text feature information comprises:

splicing the image characteristic information and the text characteristic information to obtain spliced characteristic information;

performing dimensionality reduction processing on the manifold data on the splicing characteristic information to obtain embedded characteristics of the splicing characteristic information;

and determining the type of the picture to be detected according to the embedded characteristic of the splicing characteristic information and a predetermined hypersphere.

3. The method according to claim 2, wherein the determining the type of the picture to be detected according to the embedded feature of the splicing feature information and a predetermined hyper-sphere comprises:

determining whether the embedded feature is within the hypersphere;

if the embedded features are in the hypersphere, determining that the picture to be detected is an abnormal picture;

and if the embedded feature is not in the hypersphere, determining that the picture to be detected is a normal picture.

4. The method according to claim 2 or 3, wherein before determining the type of the picture to be detected according to the embedded feature of the splicing feature information and a predetermined hyper-sphere, the method further comprises:

acquiring a plurality of sample pictures;

respectively extracting mapping characteristics of each sample picture in the plurality of sample pictures;

and carrying out data description on each mapping characteristic to obtain the hypersphere.

5. The method of claim 4, wherein said data describing each of said mapped features to obtain said hypersphere comprises:

and carrying out data description on each mapping characteristic through a Support Vector Data Description (SVDD) algorithm to obtain the hypersphere.

6. The method of claim 1, wherein extracting image feature information from the processed picture comprises:

and extracting image characteristic information from the processed picture through a multi-stage Convolutional Neural Network (CNN).

7. The method according to claim 6, wherein the extracting image feature information from the processed picture through a multi-stage Convolutional Neural Network (CNN) comprises:

inputting the processed picture into a first-level CNN network to obtain a first mapping image corresponding to the processed picture and a first area frame coordinate of a target image in the processed picture;

inputting the first mapping image and the first area frame coordinate into a second-level CNN network to obtain a second mapping image corresponding to the processed picture and a second area frame coordinate of a target image in the processed picture;

and inputting the second mapping image and the second area frame coordinate into a third-level CNN network to obtain the image characteristic information.

8. The method of claim 1, wherein extracting text feature information from the processed picture comprises:

and extracting the text characteristic information from the processed picture through a multi-stage Convolutional Neural Network (CNN) and a cyclic neural network (RNN).

9. The method according to any one of claims 1 to 8, wherein the pre-processing the picture to be detected to obtain a processed picture comprises:

and carrying out image pyramid processing on the picture to be detected to obtain the processed picture.

10. The method according to claim 9, wherein before the image pyramid processing is performed on the picture to be detected and the processed picture is obtained, the method further comprises:

carrying out image color correction processing on the picture to be detected to obtain a corrected picture;

the image pyramid processing is performed on the picture to be detected to obtain the processed picture, and the image pyramid processing comprises the following steps:

and carrying out image pyramid processing on the corrected picture to obtain the processed picture.

11. A picture processing apparatus, comprising:

the preprocessing module is used for preprocessing the picture to be detected to obtain a processed picture;

the first extraction module is used for respectively extracting image characteristic information and text characteristic information from the processed picture;

and the determining module is used for determining the type of the picture to be detected according to the image characteristic information and the text characteristic information, wherein the type of the picture comprises a normal picture or an abnormal picture.

12. The apparatus of claim 11, wherein the determining module comprises:

the splicing submodule is used for splicing the image characteristic information and the text characteristic information to obtain splicing characteristic information;

the dimension reduction processing submodule is used for carrying out dimension reduction processing on the manifold data on the splicing characteristic information to obtain the embedded characteristic of the splicing characteristic information;

and the determining submodule is used for determining the type of the picture to be detected according to the embedded characteristic of the splicing characteristic information and a predetermined hyper-sphere.

13. The apparatus according to claim 12, wherein the determination submodule is specifically configured to:

determining whether the embedded feature is within the hypersphere;

14. The apparatus of claim 12 or 13, further comprising:

the acquisition module is used for acquiring a plurality of sample pictures;

the second extraction module is used for respectively extracting the mapping characteristics of each sample picture in the plurality of sample pictures;

and the data description module is used for carrying out data description on each mapping characteristic to obtain the hypersphere.

15. The apparatus of claim 14, wherein the data description module is specifically configured to:

16. The apparatus of claim 11, wherein the first extraction module is further configured to extract image feature information from the processed picture through a multi-stage Convolutional Neural Network (CNN).

17. The apparatus of claim 16, wherein the first extraction module is further configured to:

18. The apparatus of claim 11, wherein the first extraction module is further configured to extract the text feature information from the processed picture through a multi-stage Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN).

19. The apparatus according to any one of claims 11 to 18, wherein the preprocessing module is further configured to perform image pyramid processing on the picture to be detected to obtain the processed picture.

20. The apparatus of claim 19, further comprising:

the correcting module is used for carrying out image color correction processing on the picture to be detected to obtain a corrected picture;

the preprocessing module is further used for carrying out image pyramid processing on the corrected picture to obtain the processed picture.

21. A server, comprising:

a processor;

a memory; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-10.

22. A computer-readable storage medium, characterized in that it stores a computer program that causes a server to execute the method of any one of claims 1-10.