CN112801132A

CN112801132A - Image processing method and device

Info

Publication number: CN112801132A
Application number: CN202011580763.XA
Authority: CN
Inventors: 韩森尧; 喻庐军; 李驰; 刘岩
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Tongji Wuhan Hospital
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-05-14
Anticipated expiration: 2040-12-28
Also published as: CN112801132B

Abstract

The invention discloses an image processing method and an image processing device, wherein a specific implementation mode of the method comprises the steps of obtaining an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model; inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features to fuse medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification value; analyzing the text block region data, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval grade. Therefore, the embodiment of the invention can be applied to an image retrieval platform, and can realize automatic screening and filtering of fuzzy character influences, so that the result of subsequent image retrieval meets the definition condition.

Description

Image processing method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an image processing method and apparatus.

Background

At present, a plurality of scenes including medical and health images relate to processing of images with characters, and screening and filtering of unusable fuzzy images are needed in advance. The image quality evaluation is difficult to do conventionally, and the difficulty mainly lies in that the image quality is in a continuous state, a boundary between clear image and fuzzy image is difficult to define, the evaluation on the image quality is very subjective, and different people have different quality judgment opinions on the same image and have no uniform standard. The existing image evaluation methods for the document class are few, the method generally judges based on the whole image or the cut image in a blocking mode, the final result is not satisfactory, and the accuracy and the recall rate reach the actual available effect or have a certain distance.

That is to say, in the existing business, fuzzy document images are manually screened, and when the magnitude of the images is larger and larger, the manual screening efficiency is too low, which greatly affects the automation of the subsequent process of image retrieval and also affects the speed and precision of the subsequent results.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image processing method and apparatus, which can be applied to an image retrieval platform to automatically filter out fuzzy text effects, so that the results of subsequent image retrieval satisfy the definition condition.

In order to achieve the above object, according to an aspect of the embodiments of the present invention, an image processing method is provided, including obtaining an image to be processed, and adjusting the image to be processed to a target image to be processed based on a preset preprocessing model; inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features to fuse medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification level; analyzing the text block region data, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval grade.

Optionally, adjusting the image to be processed to be a target image to be processed based on a preset preprocessing model includes:

and performing white edge filling on the image to be processed to adjust the image to be processed into a target shape, and then scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

Optionally, before inputting the target image to be processed into the trained image quality assessment neural network, the method includes:

constructing an image quality evaluation neural network, and adopting a multitask loss function comprising classification loss and regression loss; wherein, the classification loss in the multitask loss function is endowed with larger weight, so that the training of the image quality evaluation neural network is more biased to reduce the classification loss.

Optionally, inputting the target image to be processed into a trained image quality assessment neural network, including:

sequentially carrying out feature extraction on the target image to be processed through five convolution pooling layers;

performing upsampling operation on the features obtained after the fifth layer of convolution pooling layer, and further superposing the results of 1x1 convolution on the features obtained after the fourth layer of convolution pooling layer and the results of the upsampling operation to obtain a first feature map;

performing upsampling operation on the first feature map, and further superposing a result of 1x1 convolution on the feature obtained after the feature passes through a third layer of convolution pooling layer and the result of the upsampling operation to obtain a second feature map;

and performing upsampling operation on the second feature map to obtain a target feature map, so that the target feature map generates text block region data and a text block region fuzzy and clear classification value through two preset two fully-connected layers.

Optionally, the five-layer convolution pooling layer comprises:

the first convolution pooling layer employs 64 convolution kernels of 3 × 3 and 1 pooling layer of maxporoling; the second convolution pooling layer uses 128 convolution kernels of 3 × 3 and 1 pooling layer of maxporoling; the third convolution pooling layer adopts 2 layers of 256 convolution kernels with a size of 3 × 3, and then uses 1 layer of 256 convolution layers with a size of 1 × 1 and 1 pooling layer of maxporoling; the fourth convolution pooling layer adopts 2 layers of 512 convolution kernels of 3 × 3, and then uses 1 layer of 512 convolution layers of 1 × 1 and 1 pooling layer of maxporoling; the fifth convolution pooling layer uses 2 layers of 512 convolution kernels of 3 × 3, and then 1 layer of 512 convolution layers of 1 × 1 and 1 pooling layer of maxpooling.

Optionally, the method further comprises:

after the target feature map is operated through a preset two-layer full connection layer, outputting text block region data, wherein the text block region data comprises an upper left corner coordinate, a lower right corner coordinate and a confidence coefficient corresponding to the text block region;

and after the target characteristic diagram is operated through another preset two-layer full-connection layer, outputting a fuzzy value and a clear value corresponding to the text block region, and further processing the fuzzy value and the clear value through softmax to obtain a numerical value between 0 and 1 as a fuzzy and clear classification value of the text block region.

Optionally, a preset evaluation engine is called, the fuzzy clear classification value of the image to be processed is calculated based on the fuzzy clear classification value of the text block region, so as to generate evaluation data of the image to be processed according to the fuzzy clear interval level, including:

sequencing all text block regions from large to small according to the confidence score, and further calculating the coincidence degree of the text block regions from the text block region with the maximum confidence score according to an intersection-parallel comparison algorithm, so as to compare through a preset coincidence degree value threshold value and filter out the text block regions smaller than the coincidence degree value threshold value;

and obtaining a fuzzy clear classification value of the image to be processed by utilizing a weighted voting method for the remaining text block regions, and matching the fuzzy clear classification value of the image to be processed with the fuzzy clear interval grade to obtain a fuzzy clear classification grade as evaluation data of the image to be processed.

In addition, the invention also provides an image processing device, which comprises an acquisition module, a preprocessing module and a processing module, wherein the acquisition module is used for acquiring the image to be processed and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model; the processing module is used for inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features, fusing medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification level; and the generation module is used for analyzing the data of the text block region, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval grade.

One embodiment of the above invention has the following advantages or benefits: the method carries out fuzzy and clear judgment on text positioning based on the non-precise large text block area, can replace manual screening in the image retrieval process, can realize the fuzzy evaluation result of the document image within 0.1 second, has recall rate and accuracy rate of more than 95 percent, greatly improves the efficiency of image retrieval, and reduces the related labor cost. And the three links of marking, network structure and loss function are optimized and changed, so that the detection precision is sacrificed (the detection task is simplified), but the fuzzy classification judgment precision of the text block area and the overall speed of the method are improved. Meanwhile, through a final grading strategy, images with clear parts of fuzzy parts are screened out. In addition, the invention can be applied to an image retrieval platform to obtain the evaluation data of the image, so that the result of the subsequent image retrieval can meet the most basic definition condition.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of an image processing method according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of an image quality assessment neural network according to an embodiment of the present invention;

FIG. 3 is an example of image recognition according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a main flow of an image processing method according to a second embodiment of the present invention;

fig. 5 is a schematic diagram of main blocks of an image processing apparatus according to an embodiment of the present invention;

FIG. 6 is an exemplary device architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic structural diagram of a computer apparatus of a terminal device or a server suitable for implementing an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of an image processing method according to a first embodiment of the present invention, the image processing method including:

step S101, acquiring an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model.

In some embodiments, the adjusting the image to be processed to be the target image to be processed based on a preset preprocessing model specifically includes: and performing white edge filling on the image to be processed to adjust the image to be processed into a target shape, and then scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

And S102, inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features to integrate medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification level.

In some embodiments, before inputting the target to-be-processed image into the trained image quality assessment neural network, the method includes constructing the image quality assessment neural network, and adopting a multitask loss function including classification loss and regression loss; wherein, the classification loss in the multitask loss function is endowed with larger weight, so that the training of the image quality evaluation neural network is more biased to reduce the classification loss.

As an embodiment, inputting the target image to be processed into a trained image quality evaluation neural network, and as shown in fig. 2, sequentially performing feature extraction on the target image to be processed through five convolution pooling layers; performing upsampling operation on the features obtained after the fifth layer of convolution pooling layer, and further superposing the results of 1x1 convolution on the features obtained after the fourth layer of convolution pooling layer and the results of the upsampling operation to obtain a first feature map; performing upsampling operation on the first feature map, and further superposing a result of 1x1 convolution on the feature obtained after the feature passes through a third layer of convolution pooling layer and the result of the upsampling operation to obtain a second feature map; and performing upsampling operation on the second feature map to obtain a target feature map, so that the target feature map generates text block region data and a text block region fuzzy and clear classification value through two preset two fully-connected layers.

Preferably, the five-layer convolution pooling layer comprises: the first convolution pooling layer employs 64 convolution kernels of 3 × 3 and 1 pooling layer of maxporoling; the second convolution pooling layer uses 128 convolution kernels of 3 × 3 and 1 pooling layer of maxporoling; the third convolution pooling layer adopts 2 layers of 256 convolution kernels with a size of 3 × 3, and then uses 1 layer of 256 convolution layers with a size of 1 × 1 and 1 pooling layer of maxporoling; the fourth convolution pooling layer adopts 2 layers of 512 convolution kernels of 3 × 3, and then uses 1 layer of 512 convolution layers of 1 × 1 and 1 pooling layer of maxporoling; the fifth convolution pooling layer uses 2 layers of 512 convolution kernels of 3 × 3, and then 1 layer of 512 convolution layers of 1 × 1 and 1 pooling layer of maxpooling.

It is further worth explaining that after the target feature map is operated through a preset two-layer fully-connected layer, text block region data is output, and the text block region data comprises an upper left corner coordinate, a lower right corner coordinate and a confidence coefficient corresponding to the text block region; and after the target characteristic diagram is operated through another preset two-layer full-connection layer, outputting a fuzzy value and a clear value corresponding to the text block region, and further processing the fuzzy value and the clear value through softmax to obtain a numerical value between 0 and 1 as a fuzzy and clear classification value of the text block region.

As another embodiment, before inputting the target to-be-processed image into the trained image quality assessment neural network, the image quality assessment neural network needs to be trained, and the specific implementation process includes:

m images of a text are acquired, the images are white-filled to a target shape (e.g., a square), and then the images are scaled to a fixed size (e.g., 768x 768). The method comprises the steps of carrying out text block labeling on m images, wherein as long as the contents of text blocks are approximately framed, the contents of large text blocks are not required to be accurately labeled (as shown in figure 3), small text target areas are abandoned, fuzzy judgment can be dealt with as long as the contents of the large text blocks are not too far away from the spectrum, the precision of the fuzzy judgment is improved although the precision of target detection is slightly sacrificed, and the parameters are small and much faster when the large text blocks are detected (instead of text lines or small text blocks). Labels for text blocks are labeled as clear or fuzzy.

And constructing a convolutional neural network, performing convolution and fusion operation on the image to obtain features of different scales, regressing an inaccurate text block region from the fused features, further continuously classifying the text block region, and outputting classifications 0 and 1 of the text block (0 represents a clear text and 1 represents a fuzzy text). Wherein, the loss function adopts a multitask loss function, including classification loss and regression loss, specifically: loss function ═ classification loss function + regression loss function, i.e.

L＝λ_sL_s+λ_gL_g

Preferably, the classification loss L_sUsing the classical classification loss function cross entropy function, regression loss L_gThe IOU loss is lost using the classical regression loss function. Sorting losses in the loss function, i.e. λ_sAnd appropriately giving more weight to make the convolutional neural network training more biased to reduce the classification loss. Wherein λ is_sAnd λ_gThe weights are preset weights of the classification loss function and the regression loss function respectively.

The constructed convolutional neural network inputs are the rgb three channel image and the fuzzy classification of the image. And extracting features (ResNet50) with different scales through a convolutional neural network, and performing feature fusion and output. The specific structure comprises 5 layers of feature extraction:

a first layer: 1 convolutional layer and 1 pooling layer, using 64 3 × 3 convolutional kernels and 1 max pooling layer.

A second layer: 2 convolutional layers and 1 pooling layer, using 128 3 x 3 convolutional kernels and 1 max pooling layer.

And a third layer: 3 convolutional layers and 1 pooling layer, 2 layers of 256 convolutional kernels of 3 × 3 are used first, and 1 layer of 256 convolutional layers of 1 × 1 and 1 pooling layer of maxporoling are used.

A fourth layer: 3 convolutional layers and 1 pooling layer, 2 layers of 512 convolutional cores of 3 × 3 are adopted, and then 1 layer of 512 convolutional layers of 1 × 1 and 1 pooling layer of maxporoling are used.

And a fifth layer: 3 convolutional layers and 1 pooling layer, 2 layers of 512 convolutional cores of 3 × 3 are adopted, and then 1 layer of 512 convolutional layers of 1 × 1 and 1 pooling layer of maxporoling are used.

The specific structure of feature fusion comprises: the output of the fifth layer of the operation is firstly subjected to an upsampling operation, the size of the upsampling operation is restored to the result of the previous step of the convolution pooling operation, and simultaneously the output of the fourth layer is subjected to convolution by 1x1 and then is superposed with the result of the upsampling operation to obtain a first characteristic diagram. And performing upsampling operation on the first feature map, and simultaneously performing convolution on the third layer by 1x1 and then overlapping the result of the upsampling operation to obtain a second feature map. The second feature map is directly up-sampled, is not overlapped with the previous convolution pooling result, only the features of a lower layer (more paying attention to target position information) in the network are extracted by adopting the features, the features of a higher layer (more paying attention to semantic information) are not combined, namely, the selected text block area target is larger, and the method can be used only by positioning a rough character area without being too accurate, so that the network parameters are reduced, the network speed is increased, the purpose of roughly positioning the character area is also combined, and inaccurate detection is realized.

The specific structure of the network output comprises the fuzzy clear classification value of the regression text block area and regression text block area data. The text block region data is obtained through the following processes: performing two-layer full-connection operation on the second feature map after upsampling, generating 8 candidate frames to be selected on each pixel point, wherein the candidate frames have different sizes, marking the values of all text block regions of a training sample as 1 and the values of non-text block regions as 0 during model training, then regressing the candidate frames of the regions with the value of 1 by using a regression mode, and merging the candidate frames into a final text block region by using an NMS (non-maximum suppression) algorithm to obtain text block region data: the upper left and lower right coordinates of the text block region. And during training, the text block regions are labeled with fuzzy and clear classification, and then the corresponding classification is regressed in each text block region by using a regression method.

It is worth noting that the confidence score is output at the same time as the text block region data is output. Preferably, all text block regions are sorted from large to small according to their respective confidence score, the text block region with the largest confidence score is used to calculate the coincidence degree of the text block regions according to an intersection ratio (IOU) algorithm, screening is performed (for example, the coincidence degree value is set to 0.75), and then the final image fuzzy and clear judgment is obtained for the remaining text block regions by using a weighted voting method, that is, the numerical value after weighted voting matches the fuzzy and clear classification level.

In addition, another two-layer full-connection operation is carried out on the second feature map after the upsampling to obtain a fuzzy clear classification value of a regression text block area, and the fuzzy clear classification value is further processed by softmax (normalized exponential function) to obtain a value between 0 and 1, the value after voting weighting is 3 grades (shown in the following table), 0-0.3 is a fuzzy image, 0.3-0.7 is a partially fuzzy partially clear image, and 0.7-1 is a clear image. When determining the degree of blur of an actual medical text image, there are many images that are partially clear and partially blurred due to various reasons such as the shooting angle and compression during transmission. Therefore, classification by the predicted value realizes that the image which is a kind of image which is a part of the fuzzy part is selected clearly, but the image is larger in practical application.

Watch (A)

Class value interval class	Quality conclusions
		0～0.3	Blurring
0.3～0.7	Partially blurred and partially clear
		0.7～1	Clear and clear

Step S103, analyzing the text block region data, acquiring the confidence of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the text block region fuzzy clear classification value, and generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval grade.

In some embodiments, invoking a preset evaluation engine, calculating a blur and sharpness classification value of the image to be processed based on a text block region blur and sharpness classification value, so as to generate evaluation data of the image to be processed according to a blur and sharpness interval level, including: sequencing all text block regions from large to small according to the confidence score, and further calculating the coincidence degree of the text block regions from the text block region with the maximum confidence score according to an intersection-parallel comparison algorithm, so as to compare through a preset coincidence degree value threshold value and filter out the text block regions smaller than the coincidence degree value threshold value; and obtaining a fuzzy clear classification value of the image to be processed by utilizing a weighted voting method for the remaining text block regions, and matching the fuzzy clear classification value of the image to be processed with the fuzzy clear interval grade to obtain a fuzzy clear classification grade as evaluation data of the image to be processed.

The weighted voting algorithm is that according to the remaining text block areas, each text block area has a confidence score and also has an output value of fuzzy and clear classification, the confidence score of each text block area is divided by the sum of the confidence scores of all the remaining text block areas to obtain the weight of each text block area, then the output value of the fuzzy and clear classification of each text block area is multiplied, the sum is obtained to obtain the final value which is used as the fuzzy and clear classification value of the image, and then judgment is carried out according to the classification value interval grade to obtain the final quality evaluation conclusion.

It should be noted that the evaluation data may include the blur and sharpness classification level of each text block region in the image and the blur and sharpness classification level of the entire image, so that the present invention may evaluate the blur and sharpness of each text block region and the entire image, and realize the omnidirectional dynamic evaluation, that is, the blur and sharpness evaluation of the text block region and the entire image is performed on different images based on the blur and sharpness classification level of the text block region of the image. In summary, it can be seen that the fuzzy and clear classification of the text block region is judged through the text block region which is not accurately detected, and the judgment result of the text block region above the threshold is selected for voting according to the confidence threshold, so as to obtain the clear and fuzzy conclusion of the final image. Meanwhile, the method abandons the existing text line detection idea, but adopts the detection idea of large text block areas, and achieves better fuzzy and clear judgment precision and higher speed as long as approximately proper text block areas are screened out (as shown in figure 3). In addition, the invention can evaluate and process a large amount of medical health images in an image retrieval platform, screen and reject some fuzzy unusable images, and can also use some half fuzzy images through manual identification, thereby greatly improving the application efficiency of the images. That is, the present invention can distinguish blurred and clear images, and can also distinguish some images as special images with partially blurred and partially clear images, which are special but appear in large numbers in the practical application process, and are very necessary to be screened out separately, and cannot be mixed with blurred or clear images, which causes image recognition deviation.

Fig. 4 is a schematic diagram of a main flow of an image processing method according to a second embodiment of the present invention. The image processing method comprises the following steps:

step S401, acquiring an image to be processed.

And S402, based on a preset preprocessing model, performing white edge filling on the image to be processed to adjust the image to be processed into a target shape, and further scaling the adjusted image to be processed to a preset size to obtain a target image to be processed.

And S403, sequentially performing feature extraction on the target image to be processed through five layers of convolution pooling layers.

Step S404, performing an upsampling operation on the features obtained after the fifth layer of convolution pooling layer, and further superimposing the result of the feature obtained after the fourth layer of convolution pooling layer after being convolved by 1 × 1 with the result of the upsampling operation to obtain a first feature map.

Step S405, perform an upsampling operation on the first feature map, and further superimpose a result obtained by convolving the features obtained after passing through the third layer of convolution pooling layer by 1 × 1 with a result of the upsampling operation to obtain a second feature map.

Step S406, performing an up-sampling operation on the second feature map to obtain a target feature map, so that the target feature map respectively generates text block region data and a text block region fuzzy and clear classification value through two preset two fully-connected layers.

In an embodiment, after the target feature map is operated by a preset two-layer fully-connected layer, text block region data including five nodes is output, where four nodes are an upper-left corner coordinate and a lower-right corner coordinate, and one node is a confidence corresponding to a text block region. And after the target characteristic diagram is operated through another preset two-layer full-connection layer, outputting a fuzzy value and a clear value corresponding to the text block region, and further processing the fuzzy value and the clear value through softmax to obtain a numerical value between 0 and 1 as a fuzzy and clear classification value of the text block region.

Step S407, a preset evaluation engine is called, all text block regions are sorted from large to small according to the confidence score in the text block region data, and then the coincidence degree of the text block regions is calculated according to an intersection comparison algorithm from the text block region with the maximum confidence score, so that comparison is performed through a preset coincidence degree value threshold, and the text block regions smaller than the coincidence degree value threshold are filtered.

Step S408, obtaining a fuzzy and clear classification value of the image to be processed by utilizing a weighted voting method for the remaining text block regions, matching the fuzzy and clear classification value of the image to be processed with the fuzzy and clear interval grade to obtain a fuzzy and clear classification grade, and outputting the fuzzy and clear classification grade as the evaluation data of the image to be processed.

In summary, the present invention determines whether the text is blurred based on the text block region that is not accurately detected, and further determines whether the entire document image is blurred, so as to achieve the purpose of focusing the blur determination of the document image on the text content. Marking (large text block region marking and detection with coarse granularity), network structure (discarding high-level semantic information features in feature extraction, focusing on middle-level and low-level network features, and increasing 1x1 convolution to improve abstract expression capability of features) and loss function (improving weight of classification loss in the loss function), and optimizing and changing the three links to achieve the aim of improving fuzzy classification judgment precision of text block regions while sacrificing detection precision. Therefore, the invention greatly improves the speed of the whole network, meets the requirement of on-line real-time property and reduces the difficulty of labeling under the condition of simplifying the whole detection task. In addition, the invention can obtain clear or fuzzy conclusion of the image, and can also screen out the image with clear part of the fuzzy part through the final grading strategy.

It is worth to say that the invention can be applied to an image retrieval platform, can be used as one of the functional modules, and can also provide service separately to the outside, supporting interface call.

In an application embodiment of the invention, in an image retrieval platform, for processing of a large number of medical health images, all uploaded images need to be subjected to certain preprocessing, and some blurred and unusable images, such as some health archive images or physical examination health images, are screened out.

Fig. 5 is a schematic diagram of main blocks of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 5, the image processing apparatus 500 includes an acquisition module 501, a processing module 502, and a generation module 503. The obtaining module 501 obtains an image to be processed, and adjusts the image to be processed into a target image to be processed based on a preset preprocessing model; the processing module 502 inputs the target image to be processed into a trained image quality evaluation neural network, extracts multi-scale network features to fuse medium and low-level network features, and outputs text block region data and a text block region fuzzy and clear classification level; the generating module 503 analyzes the text block region data, obtains the confidence of the text block region, further invokes a preset evaluation engine, calculates a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generates and outputs evaluation data of the image to be processed according to the fuzzy clear interval level.

In some embodiments, the obtaining module 501 adjusts the image to be processed into the target image to be processed based on a preset preprocessing model, including performing white edge filling on the image to be processed to adjust the image to be processed into a target shape, and then scaling the adjusted image to be processed to a preset size to obtain the target image to be processed.

In some embodiments, before the processing module 502 inputs the target to-be-processed image into the trained image quality assessment neural network, the method includes:

In some embodiments, the processing module 502 inputs the target to-be-processed image into a trained image quality assessment neural network, including:

In some embodiments, the five-layer convolution pooling layer comprises:

In some embodiments, the processing module 502 further comprises:

In some embodiments, the generating module 503 invokes a preset evaluation engine, and calculates a blur and sharpness classification value of the image to be processed based on the blur and sharpness classification value of the text block region, so as to generate evaluation data of the image to be processed according to a blur and sharpness interval level, including

Sequencing all text block regions from large to small according to the confidence score, and further calculating the coincidence degree of the text block regions from the text block region with the maximum confidence score according to an intersection-parallel comparison algorithm, so as to compare through a preset coincidence degree value threshold value and filter out the text block regions smaller than the coincidence degree value threshold value; and obtaining a fuzzy clear classification value of the image to be processed by utilizing a weighted voting method for the remaining text block regions, and matching the fuzzy clear classification value of the image to be processed with the fuzzy clear interval grade to obtain a fuzzy clear classification grade as evaluation data of the image to be processed.

It should be noted that the image processing method and the image processing apparatus according to the present invention have corresponding relation in the specific implementation contents, and therefore, the description of the repeated contents is omitted.

Fig. 6 shows an exemplary device architecture 600 to which the image processing method or the image processing device of the embodiments of the present invention can be applied.

As shown in fig. 6, the apparatus architecture 600 may include

terminal devices

601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The

terminal devices

601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

601, 602, 603 may be various electronic devices having image processing screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

601, 602, 603. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the image processing method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the computing device is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 7, shown is a schematic diagram of a computer apparatus 700 suitable for use in implementing a terminal device of an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer apparatus 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer device 700 are also stored. The CPU701, the ROM702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including components such as a Cathode Ray Tube (CRT), a liquid crystal image processor (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the apparatus of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based apparatus that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a processing module, and a generation module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device comprises a processing module for acquiring an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model; inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features to fuse medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification level; analyzing the text block region data, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval grade.

According to the technical scheme of the embodiment of the invention, the embodiment of the invention can be applied to an image retrieval platform, and can automatically screen and filter out fuzzy character influences, so that the result of the subsequent image retrieval meets the definition condition.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring an image to be processed, and adjusting the image to be processed into a target image to be processed based on a preset preprocessing model;

inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features to fuse medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification value;

analyzing the text block region data, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting evaluation data of the image to be processed according to the fuzzy clear interval grade.

2. The method according to claim 1, wherein adjusting the image to be processed to a target image to be processed based on a preset preprocessing model comprises:

3. The method of claim 1, wherein before inputting the target to-be-processed image into the trained image quality assessment neural network, comprising:

4. The method of claim 1, wherein inputting the target image to be processed into a trained image quality assessment neural network, extracting multi-scale network features to fuse low-level and medium-level network features, and outputting text block region data and a text block region fuzzy and clear classification value comprises:

5. The method of claim 4, wherein the five-layer convolution pooling layer comprises:

6. The method of claim 4, further comprising:

7. The method according to any one of claims 1 to 6, wherein a preset evaluation engine is invoked, a fuzzy-clear classification value of the image to be processed is calculated based on a text block region fuzzy-clear classification value, so as to generate evaluation data of the image to be processed according to a fuzzy-clear interval level, and the method comprises the following steps:

8. An image processing apparatus characterized by comprising:

the system comprises an acquisition module, a pre-processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed and adjusting the image to be processed into a target image to be processed based on a preset pre-processing model;

the processing module is used for inputting the target image to be processed into a trained image quality evaluation neural network, extracting multi-scale network features, fusing medium and low-level network features, and outputting text block region data and a text block region fuzzy and clear classification level;

and the generation module is used for analyzing the data of the text block region, acquiring the confidence coefficient of the text block region, calling a preset evaluation engine, calculating a fuzzy clear classification value of the image to be processed based on the fuzzy clear classification value of the text block region, and generating and outputting the evaluation data of the image to be processed according to the fuzzy clear interval grade.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.