Background
With the increase of network speed and the increase of the number of people on the internet, the internet becomes a global economic and cultural communication link, and meanwhile, brings opportunities for lawbreakers, and the problem that bad information such as pornographic images and the like is spread on the internet is caused. The identification and analysis of poor (e.g., pornographic) images has become a hot problem in the current development of the internet.
How to analyze, identify and filter the bad images is a research topic which is widely focused on nowadays. The traditional filtering technology based on keywords is gradually replaced by the analysis technology based on image content due to the defects of low accuracy rate, multiple meanings and the like. The analysis based on the image content is to extract some features representing the content of the input image, such as color, shape, texture, structure and the like, when the image is analyzed, and then carry out matching analysis on the features and the corresponding features of the reference image, so as to realize the identification or the filtration of the input image.
In the color image matching analysis technology based on image content, a color histogram (oil k. Jain and adithya variaya 1996 Pattern Recognition 29 1233) is applied to extract the pixel number ratio corresponding to each color in an image, so that the color parameters of the image can be analyzed from a statistical angle, and the color image matching analysis technology has the advantages of scale invariance and rotation invariance.
The existing color image matching analysis technology based on color histogram can be divided into the following two categories:
(1) Techniques for image matching analysis using histograms only (Aigranin O H, zhang H, petkovic D. Content-based presentation and geographic of visual media: a state-of-the-art review. Multimedia Tools and Application,1996,3 (1): 179-182). The technique first extracts the color histogram of the color image. Then, according to a certain rule, the matching judgment is directly carried out on the specific color part of the histogram. The technology is insensitive to the geometric distribution of colors, so that the matching analysis precision is low.
(2) Image matching analysis techniques (Cinque L, ciocca G, levialdi S, et al color-based image retrieval spatial-colorimetric imaging image visual Computing, 2001.19). The technology combines the color histogram with the position of a specific color pixel, and extracts the color statistical characteristic and the position characteristic of the pixel so as to improve the accuracy of image matching judgment. However, the position information extraction of the pixels is simple, and the spatial and logical relationship between the pixels cannot be embodied, so that the analysis effect of the technology is greatly influenced.
In a word, the existing color image matching analysis technology has the disadvantages of unsatisfactory effect, low image matching precision, and high misjudgment rate when applied to image filtering, which affects the actual application effect.
Disclosure of Invention
The object of the present invention is to provide a new color image analysis method based on color content and distribution, which can overcome the disadvantages and shortcomings of the prior art. Compared with the prior art, the method has the advantages of high efficiency, low misjudgment rate, high accuracy and the like in the identification and analysis application of the bad images.
The invention adopts a method of combining fuzzy correlation based on a color histogram with morphological matrix analysis, and carries out matching analysis on an input image and a reference image in a database (a set of images with certain characteristics) so as to judge whether an image matched with the input image exists in the reference image database. The method comprises the following steps:
extracting color histograms of an input image and a reference image, and judging whether the two images are matched or not according to the matching degree of the two histograms; then, the form characteristics of the specific area of the input image are extracted, the form matrix of the input image is obtained, and the matching degree of the input image is judged according to the result of comparing the form matrix of the input image with the form matrix of the reference image.
1) Extracting a color histogram of the image by adopting a color quantization method, and according to a fuzzy relation membership function:
and a matching threshold value alpha 1 And determining the color peak pair of the color matching.
2) According to the following formula and a matching threshold value alpha 2 Determining a highly matched color matching peak pair:
adding them to obtain
And according to a threshold value alpha 3 And judging whether the images are matched.
3) The particular pair of matching color peaks may be weighted and the degree of matching of the image calculated according to the following equation, as desired.
Wherein the weight is u i Representing the degree of emphasis on different pairs of matching color peaks and according to a threshold value alpha 3 It is determined whether the images match.
4) And taking the form matrix as a characteristic parameter, extracting and comparing form information of the image, and calculating whether the form characteristics of the image are matched according to the following formula and a matching threshold value r':
the invention has the positive effects that:
1) High efficiency: the method adopts methods such as color histogram, weighting, fuzzy correlation, form matrix and the like, adopts methods such as mathematical statistics and the like, does not relate to complex digital image processing operations such as boundary analysis, geometric calculation and the like, greatly controls the mathematical complexity of image analysis, and obviously improves the efficiency of image matching analysis.
2) Low misjudgment rate:
because the invention adopts a plurality of image recognition technologies and combines with the color histogram, the invention can extract and analyze a plurality of information of the image more comprehensively and can carry out matching analysis on the image from a plurality of different angles as much as possible. Therefore, when the matching judgment between the input image and the reference image in the sample database is performed, the accuracy of the image matching judgment can be greatly improved.
3) High accuracy:
the invention adopts a weighting algorithm, so that a certain specific color can be artificially and heavily analyzed; because of the adoption of a fuzzy correlation comparison method of the histogram, the method has certain intelligence in the aspect of color matching, has good adaptability to the color discrimination of the image, and improves the actual comparison effect; by adopting the form matrix method, the form distribution information of the image can be comprehensively extracted and analyzed, whether the form characteristics of the input image are matched with the form characteristics of the reference image or not is judged from the form characteristics of the input image, and a better identification effect is achieved.
4) The technology is particularly suitable for identifying and filtering color images.
The color image matching analysis method provided by the invention can filter the color image input by the network according to the matching degree of the color image and the reference image in the poor image library (defined according to the color and the morphological characteristics). Hardware equipment required by a filtering system of network bad images: a hub or a router, a network server, a communication line and poor image filtering system software running at the server end. The data sent by the Internet is analyzed at the server side through the filtering system software, the data containing the bad image information are filtered, then the data are sent to each terminal, and meanwhile, the related website and the access terminal are written into the network service log for query.
Detailed Description
The salient and substantial features and improvements of the present invention can be seen in the examples that follow. They do not limit the invention in any way.
As shown in the figure:
1. matching of image color information
The input image S is compared with the image T in the reference image database, and the matching relationship thereof is determined.
1) Extracting color histograms
Replacing the original RGB colors of the images S and T with 16 index colors, and establishing a color histogram H of the images S and T S And H T (see FIG. 1).
2) Color matching of histogram color peaks
Calculating histogram H one by using fuzzy relation membership function (1) formula of color S And H T Color matching coefficient between all color peaks in (1):
wherein
And
respectively represent histograms H
S And H
T A pair of color vectors to be compared.
Deblurring according to an alpha-level relationship, having
Namely when
Greater than a certain value alpha
1 When the utility model is used, the water is discharged,
take the value of 1 and consider this color vector
And a known color vector
And otherwise, the data are not matched.
Extracting a histogram H according to the matching analysis result S And H T All color matched color pairs in (1) and noted
3) Height matching of color matching peaks
For two histograms H S And H T The color peak of the medium color matching calculates its height matching coefficient according to the following equation (4):
wherein h is i And h i ' indicates the heights of two color matching peaks, respectively. The fuzzy matching relationship between the heights of the two color matching peaks is determined according to the following formula:
namely when
Greater than a certain value alpha
2 When the utility model is used, the water is discharged,
the value is 1, and the color peak pair is considered to be matched, otherwise, the value is notThere is no match.
4) Matching of color histograms
Accumulating the number of the color peak pairs corresponding to the matching obtained by the calculation to obtain fuzzy matching coefficients of the two color histograms:
where m is the total number of color peaks in the color histogram.
The fuzzy match relationship of the two color histograms is determined as follows:
when R is h Greater than a certain value alpha 3 When R is h The value is 1 and the color histogram pair is considered to match, otherwise it is not.
5) Weighted analysis of specific colors
To emphasize the match between particular colors of images S and T, the calculation of equation (6) can be applied to determine the color peak corresponding to the particular colorMultiplying the value by a set value u i To increase the weight of the color peak in the color histogram matching calculation:
to determine the matching degree between color images after color weighting.
2. Analysis of image morphological features
If the input image S and the image T in the reference image database cannot be matched according to the color features, further matching analysis needs to be performed on morphological features of the image S and the reference image T:
1. the image S is color-segmented, and an area of the image with a certain color to be analyzed is extracted and marked as F.
2 divide the region F at regular intervals as shown in fig. 2. Calculating the number of pixels and the area F included in each gridA ratio of the total number of pixels of (1), each grid obtaining a ratio value W ij In W with ij Is an element, a matrix W = { W is calculated ij The i, j is determined by a mesh division method }, called a form matrix, and records form and distribution information of the image S.
And 3, performing matching calculation with data in the sample image morphological characteristic database based on the morphological matrix W of the image S.
Let W' be a morphological matrix in the sample image morphological feature database, then their degree of match can be calculated by:
if the matching threshold is R ', then when R' is less than or equal to R ', the two morphology matrices W and W' are considered to be matched, i.e. the image S has data
And (4) considering the sample morphological information in the library, namely the image S is considered as a matching image.
Fig. 3 is a detailed flow chart of the present invention. The method comprises the following specific steps:
1. the image is input, denoted as A.
2. Histogram information of the image a is extracted to obtain a histogram H (a).
3. Reference is made to the image database histogram dataset H (D).
4. And calculating the matching degree of the histograms H (A) and H (D) according to a weighting method, namely judging whether the image A is matched with each reference image T in the image database.
5. Reference is made to the image database morphology matrix dataset S (D).
6. It is determined whether the degree of matching is greater than a predetermined value. If not, turning to 7; if so, turn to 10.
7. If the judgment of 6 is not, the form matrix S (A) of the image A is extracted, and the matching degree of the S (A) and the S (D) is calculated.
8. And (4) judging whether the matching degree parameter is larger than a preset value or not by the matching degree parameter calculated by the step (7). If not, turning to 9; if so, turn to 10.
9. And outputting an image analysis result, namely the image A is a mismatched image.
10. And outputting an image analysis result, namely the image A is a matching image.
Fig. 4 is a system configuration diagram. The components and functions of the system structure, a hub or a router, a network server, a communication line and the poor image filtering system software running at the server end are described in detail. And analyzing the data sent by the Internet at the server side through filtering system software, filtering the matched data containing the poor image information, then sending the data to each terminal, and simultaneously writing related websites and access terminals into a network service log for query. FIG. 4 illustrates the following in detail:
1. internet access server network server 3 (the access mode can be ADSL or broadband access).
2. And the bad image filtering software analyzes data flowing into the server from the Internet based on the technologies such as fuzzy correlation, weighting and form matrix, blocks data containing suspicious image contents, records visitors of the images and the sources of the images, and writes the data into a database for query. The software runs on a web server 3.
3. The web Server (which may be IBM xSeries 206 or SUN Fire V60x Server) accesses the Internet. (twisted pair or optical fiber may be used for connection to the internet).
4. Network switch (optional D-Link)DES-3226Or 3ComSuperStack3Switch 4400).
5. Connected with a network switch (the connection can adopt twisted pair, coaxial cable or wireless connection and the likeMode) network terminal (IBMThinkCentreM or Dimension can be selected TM 8300 and the like desktop and notebook computers).