CN112685591A

CN112685591A - Accurate picture retrieval method for user interest area and feedback guidance

Info

Publication number: CN112685591A
Application number: CN202011641876.6A
Authority: CN
Inventors: 李蕊男; 王斌
Original assignee: Jingmen Huiyijia Information Technology Co ltd
Current assignee: Jingmen Huiyijia Information Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-20

Abstract

The invention mainly breaks through the process of how to obtain the user interest area and how to introduce user feedback information to guide retrieval, provides a picture accurate retrieval method of the user interest area and feedback guidance, improves the feedback learning process by introducing a particle swarm optimization algorithm based on the user feedback algorithm of an SVM (support vector machine), and establishes an S-P (user-feed back) guide algorithm fusing the SVM and a PSO (particle swarm optimization), and specifically comprises the following steps: firstly, improving a saliency region extraction method for a visual saliency model, and providing image retrieval based on a human eye attention model, wherein the image retrieval comprises an overall architecture based on a human eye attention calculation model, a BoF feature vector construction process and a retrieval algorithm flow based on a human eye attention model and BoF features; secondly, optimizing three aspects of significance region extraction, SVM training parameters and feature selection processes based on the features of the SVM and particle swarm optimization algorithm, and providing a picture retrieval method based on a human eye attention model and S-P user feedback.

Description

Accurate picture retrieval method for user interest area and feedback guidance

Technical Field

The invention relates to an accurate picture retrieval method, in particular to an accurate picture retrieval method for a user interest area and feedback guidance, and belongs to the technical field of accurate picture retrieval.

Background

With the increasing storage capacity of the database, in order to solve the subjectivity problem of the manual markup language existing in the traditional text-based retrieval, the content-based picture retrieval is carried out, and the technology retrieves pictures which can meet the requirements of users from the picture library according to the picture contents. In recent years, the continuous development of the information industry and the rapid development of related supporting technologies, large data are widely applied in various industries, and pictures and videos all belong to unstructured data forms and occupy a very important position in large data concentration. People live in a world full of various sensors, the number of information collection, storage and analysis is increasing continuously, especially, the number of image and video information acquisition ways is increasing, the scale of image databases is increasing, such data explosion provides higher standards for data computing capacity, and the requirements of various application fields for information processing capacity are also increasing continuously, so that users want to find out images meeting requirements from huge image databases quickly and accurately, and provide higher requirements for effective storage, organization and query of multimedia resources.

Meanwhile, technologies such as machine learning and picture understanding are continuously developed, technical support is provided for content-based picture retrieval, and the content-based picture retrieval method is promoted to be rapidly developed. The technology is widely applied to the Internet industry from the viewpoint of user requirements, content-based picture retrieval enables users to accurately retrieve similar pictures from a visual angle, the retrieval mode is closer to the perceptual cognition of the users, and meanwhile, people can conveniently retrieve online pictures efficiently and massively.

In the big data era, the ways for obtaining pictures are more and more, the capability requirement for processing picture information is continuously improved, the picture content is used as basic input for content-based image retrieval, and the required picture information is quickly retrieved from a mass database based on a certain number of training samples and a machine learning algorithm, so that the method has wide scientific research and commercial application values.

However, because the content-based picture retrieval method in the prior art cannot achieve a degree that is sufficiently satisfactory to users, with the development of the statistical learning method and the theory and technology associated with big data, the content-based picture retrieval method has a great performance improvement space, and the problems and the places needing to be further improved at present mainly appear in the following aspects:

firstly, the picture retrieval precision is low, and almost no picture retrieval system in the prior art can achieve satisfactory degree in the aspect of accuracy, and when the retrieved pictures are queried, users still need to spend a great deal of time on screening hundreds of returned pictures;

secondly, the interaction degree with the user is not enough, the user needs to actively participate and give feedback opinions in many application fields, because pictures belonging to different categories have different characteristics, different algorithms need to be designed to perform targeted operation aiming at different picture characteristics, and meanwhile, the user also wants to have more free space in the aspects of selective extraction and similar matching algorithms;

thirdly, the relevance between the semantic features and the bottom layer features is not enough, a plurality of characters have been made on the bottom layer features of the picture in the prior art, but some problems still exist, the feedback information of the user is analyzed consciously in the prior art, but still great limitation exists, the problems are not solved fundamentally, in addition, the retrieval effect can be improved by manual participation, but the time consumed by retrieval is increased, and the real-time retrieval is not facilitated.

Fourth, with the rapid expansion of internet information, internet retrieval has entered into a more vivid and significantly broader picture retrieval stage, the development of picture retrieval methods has stimulated the internet industry to continuously release related products, and the development needs of the industry continuously put forward new requirements for the development of related technologies.

Fifth, the method based on local features in the prior art draws a lot of attention, the search method divides an image into a certain number of local feature points, and then extracts corresponding features, the picture region division is a basic step of the picture search based on regions, but the division is a very difficult work under the condition of no picture division priori knowledge, when a large number of target objects to be divided exist in a scene or clear targets do not exist, the work becomes more difficult, the method for extracting significant points in the prior art has advantages, but a large amount of time is consumed due to low efficiency of significant point detection and calculation, in addition, a detection operator cannot fully describe complex contents of the picture, and a region of most interest of a certain type of picture cannot be fully reproduced by using a point set. Most of the prior art methods are learning theories when a large number of samples exist, because a large number of learning samples are difficult to obtain in practical application, the effect is not ideal, and the calculation complexity of some algorithms is higher.

Sixth, a support vector machine model in the prior art is often used for user feedback, but the average accuracy of a feedback system based on an SVM cannot achieve an expected effect, mainly because when the number of positive feedback samples is small, an SVM classifier is unstable, an optimal hyperplane of the SVM is very sensitive to a small number of samples, and a user often marks a small number of pictures during feedback, and cannot guarantee that all samples are sufficiently and accurately marked, so that when the samples are insufficient and the marks are inaccurate, the SVM cannot achieve a better effect, and in addition, when the number of positive samples is less than that of negative samples, the calculated hyperplane also generates a deviation, in this case, negative samples are easily fed back into positive samples, the number of training samples during feedback learning may be lower than the dimension of a feature vector, and in this case, the problem of small samples is easily caused;

seventh, no retrieval method in the prior art can achieve satisfactory effects on all pictures, and based on this basic knowledge, the present invention analyzes from the perspective of user requirements, and the main problems existing in the picture retrieval process in the prior art include two points: firstly, the region of interest of a user cannot be positioned sufficiently and accurately, and the extracted picture target has deviation with the region concerned by the user; secondly, the target picture which really meets the requirements of the user cannot be accurately searched. Based on the two points, the invention takes the region of interest of the user and the guiding retrieval process of the user feedback information as breakthrough points, and provides the picture retrieval method based on the human eye attention model and the user feedback.

Disclosure of Invention

Aiming at the defects of the prior art, the method for accurately retrieving the picture guided by the interest area and the feedback of the user mainly comprises the following steps: firstly, salient region extraction is one of key innovation contents of the method, and the precondition of effectively selecting salient local features is that a target interest region in a picture can be correctly extracted; secondly, extracting local visual image features, namely, carrying out in-depth research on bag-of-words model features (BoF) based on the picture retrieval method for extracting the local visual image features, and carrying out comparative analysis on the bag-of-words model features (BoF) and picture retrieval based on bottom layer features (color, texture and shape) to obtain a conclusion; and thirdly, a user feedback mechanism with system self-adaption and user self-adjustment functions is also the key innovation content of the research of the invention, a PSO-based user feedback algorithm is applied, a feedback algorithm integrating SVM and PSO is applied, after a user marks a result obtained by preliminary retrieval, the processes of positive and negative sample selection, SVM training parameter and feature selection are optimized by using a particle swarm optimization algorithm, an optimized result SVM is used for training a classifier, pictures in a picture library are classified, and the retrieval result is judged and output according to the probability of two classes of SVM discriminant functions.

In order to achieve the technical effects, the technical scheme adopted by the invention is as follows:

the picture accurate retrieval method based on the human eye attention model and the user feedback comprises the picture retrieval method based on the human eye attention model and the picture retrieval method based on the user feedback, wherein the visual saliency model is improved and the whole framework of the human eye attention calculation model is provided firstly based on the visual saliency model, and the image area which is interested by the user is extracted; secondly, analyzing an image feature extraction method, and selecting BoF features; on the basis of extracting a target interest area, extracting image characteristics and initially retrieving, introducing a particle swarm optimization algorithm to improve a feedback learning process based on a user feedback algorithm of an SVM (support vector machine), and establishing an S-P (user-feedback-guidance) algorithm fusing the SVM and a PSO (particle swarm optimization);

the picture retrieval based on the human eye attention model comprises the steps of generating a saliency map, extracting a saliency region, improving a method for extracting the saliency region, extracting a feature vector, generating the saliency map, extracting primary visual features, generating the feature map, combining the feature map, generating the saliency map, positioning and transferring an attention focus, extracting the feature vector, extracting local image features, generating a visual dictionary, encoding feature description, extracting an attention region based on the human eye attention model, and based on the human eye attention model of the visual saliency model, the method comprises the steps of improving key steps of generating a saliency map and extracting a saliency region, providing a BoF feature construction process based on a human eye attention calculation model overall framework, including local image feature extraction, generating a visual dictionary, and integrating feature quantization, coding and feature to obtain a picture retrieval algorithm flow based on the saliency region extraction;

the image retrieval method based on the user feedback comprises user feedback formed by fusing an SVM (support vector machine) and a particle group optimization algorithm and an image retrieval method based on an eye attention model and S-P (server-peer) user feedback, wherein the user feedback formed by fusing the SVM and the particle group optimization algorithm comprises the combination of the particle group optimization algorithm and the SVM, the selection of optimization characteristics of the particle group optimization algorithm, the optimization of SVM parameters based on the particle group optimization algorithm and the selection of positive and negative samples optimized by the particle group optimization algorithm, the user feedback is introduced in the image retrieval process based on the eye attention model, and the feedback process is divided into the following three steps:

step one, primary retrieval: firstly, a user submits an example picture, the system transfers the example submitted by the user to a feature vector expression, the feature relevance comparison is carried out on the example and the picture in a picture library, the example is sorted from big to small according to the relevance of the example and the query picture, and the first N results are returned to the user;

marking positive pictures meeting the requirements according to the retrieval result by the user, excluding negative pictures, feeding the result back to the system, modifying algorithm parameters or retrieval rules by the system according to the feedback information of the user, and then performing a new round of retrieval;

step three, feeding back for multiple times, and finishing the query after the user requirements can be met;

introducing a particle swarm optimization algorithm into a user feedback process, optimizing the feedback learning process, and establishing an S-P user feedback guidance algorithm fusing an SVM and a PSO.

The method for accurately retrieving the picture guided by the user interest area and feedback further comprises the following steps of: for the input of a color picture, three primary visual characteristics of brightness, direction and color are extracted through linear filtering, wherein the brightness is described BY a brightness characteristic channel, the color characteristics comprise two characteristics of Red Green (RG) and Blue Yellow (BY), the direction characteristics adopt Gabor filtering and comprise characteristics in four directions, and the primary visual characteristics have seven sub-characteristics in total;

for the directional features, firstly, extracting brightness features, and performing filtering acquisition in four directions of 0 degree, 45 degrees, 90 degrees and 135 degrees by using a Gabor filter; for the brightness characteristics, the average value of RGB three components of the color picture is used for representing the brightness characteristics; for the color feature dimension, calculating a difference value graph between red and green, blue and yellow to express different contrast effects, wherein the formula is as follows:

the invention introduces direction and contrast information to a color characteristic channel according to a biological visual color perception mechanism, and the color processing is finished along a retina-lateral knee-body-U1-U2-PIT-IT ventral path from a biological angle, wherein RG, BY and a brightness path are formed BY Double-Opponent type neurons in a U1 area, wherein RG and BY respectively comprise color confrontation information, the two color paths show contrast sensitivity, and a constructed acceptor field mathematical model has the following functions:

T(x，y，λ)＝d_HH(λ)t_H(x，y)+d_NM(λ)t_N(x，y)+d_CC(λ)t_C(x，y)

wherein, (x, y) is the coordinate position of the image, H (lambda), N (lambda), C (lambda) correspond to RGB three channels of the picture respectively, t_H、t_N、t_CRepresenting the spatial acuity distribution of each input, and also the shape of its receptive field, approximated by a difference of gaussians DOG function, d_H、d_N、d_CIs a coefficient;

and for the red and green channels, firstly, taking the positive component of the DOG filter as a convolution template to perform convolution on the R channel of the picture, taking the negative component of the G channel as the convolution template to perform convolution, and then, obtaining the RG channel response.

The method for accurately retrieving the picture guided by the user interest area and feedback further comprises the following steps of: respectively performing Gaussian smoothing on each characteristic subchannel image, then performing downsampling operation by taking 2 as a step length to obtain a second scale image, continuing to perform the operation on the basis to generate a next scale image, forming a Gaussian pyramid by the images with different sizes, and obtaining 9 scales of images by each characteristic subchannel in the visual saliency model, wherein the images are 9 layers of pyramid images in total;

obtaining contrast information of a local center and a peripheral background in a picture, wherein the picture with small scale is easy to highlight background information, the picture with large scale is easy to highlight local information, and the contrast between a central target and the background can be obtained by subtracting the two information; the specific algorithm in the model is as follows: the picture scale s represents local center information, s belongs to {3,4,5}, the picture s represents background information, s is s + delta, wherein delta belongs to {3,4}, 6 scale pairs are provided in total, linear interpolation is carried out on the small-scale pictures to enable the small-scale pictures to have the same size as the pictures in the scale pairs, point-to-point subtraction operation is carried out to detect a region with strong contrast between the center and the periphery, seven channels are obtained finally, and a total of 42 central periphery difference maps, namely feature maps, are obtained.

The method for accurately retrieving the picture guided by the user interest area and the feedback further comprises the following steps of combining the characteristic graph and generating a saliency map: in the merging stage of the feature maps, various independent features are integrated, the extracted feature maps are integrated and competed according to a certain strategy, so that a final significance map is generated, a basis is provided for attention focus selection and migration, the feature maps are merged by adopting a local iteration method, each feature map is normalized firstly, then is subjected to iterative convolution with a Gaussian difference function, and after standardized iterative operation, the feature maps under various scales are superposed and then are weighted under different features to obtain the significance map;

combining feature maps by adopting a local iteration method, generating a saliency map for analysis, adding global saliency information to generate the saliency map, namely adding the global saliency information to the saliency map generated by a visual saliency model, and mainly comprising the following steps of:

firstly, converting a color space, namely converting an LAB color space close to a uniform color space of human vision, and converting a picture (M ═ weight × height) from RGB into LAB;

secondly, calculating the pixel average value of the LAB color space L, A, B, and respectively recording the pixel average value as AvgL, AvgA and AvgB;

thirdly, denoising the LAB channels respectively;

fourthly, calculating the significance value of each point according to the following formula:

Salmap_i＝(L_i-AvgL)²+(A_i-AvgA)²+(B_i-AvgB)²

the Salmap represents the difference of the pixel value of each point in the picture relative to the whole picture, namely the global significance measure, and then the Salmap is synthesized with the significance map generated by the visual significance model.

The method for accurately retrieving the picture guided by the user interest area and the feedback further comprises the following steps of: extracting a salient region according to the acquired attention focus, improving the contour extraction of the attention focus region based on a model for salient region detection, and extracting the salient region mainly comprises the following steps:

step 1, determining the features with the largest significance in the three features of color, brightness and direction, namely respectively obtaining the pixel values corresponding to the focal positions in the three types of saliency maps, wherein the feature corresponding to the largest pixel value is the saliency feature;

step 2, secondly, determining the characteristic diagram with the maximum significance in each characteristic diagram, namely respectively obtaining the pixel values corresponding to the focal positions of each characteristic diagram, wherein the characteristic diagram with the maximum pixel value is the most significant characteristic diagram;

step 3, carrying out binarization on the obtained feature map, searching a connected domain, and extracting the connected domain containing the focus;

and 4, plotting the outline of the salient region.

The method for accurately retrieving the picture guided by the user interest area and feedback further improves the method for extracting the salient area: when the most significant region is extracted, the influence of the significance of the neighborhood points of the focuses is considered, the invention provides that the weighted values of all significant regions are calculated and sorted from large to small, the region with the largest weighted value is the most significant region, for the significant region determined by each focus, the number of pixels with the significant value reaching a certain critical value in the region is firstly counted as the region area zone_iDirectly eliminating when the area of the region is too small, and calculating the zone of the current region_iThe ratio to the sum of all areas, thus determining the region weight value, and then extracting the most significant region:

and (3) generating a saliency map and an extracted most salient region by using a human eye attention calculation model.

The method for accurately retrieving the picture guided by the interest area and the feedback of the user further introduces the user feedback in the picture retrieval process based on the human eye attention model, and the feedback process is divided into the following three steps:

the method combines an SVM (support vector machine) and a particle swarm optimization algorithm, is based on the rapid convergence of the particle swarm optimization algorithm and is suitable for small-quantity feedback sample optimization, when a user evaluates a primary retrieval result, a particle swarm feedback parameter is initialized, the Fitness of particles is calculated, the optimal position of an individual is updated, an SVM classifier is trained by using the updated result, pictures in a picture library are classified, the distance between the pictures and a classification surface is calculated, and the retrieval result is output in a sequencing manner;

in the specific feedback process, a particle group optimization algorithm utilizes feedback information of a user to improve the processes of feature selection, SVM classification algorithm and saliency region extraction in the picture secondary retrieval process, the particle group optimization algorithm is an optimization problem that a main body shares and interacts with the environment to evolve, and a certain number of individual populations move in a retrieval space, and each particle x moves in a retrieval space_iEach represents a potentially optimal solution represented by three values: speed U_iCurrent position X_iAnd fitness value Fitnessi, current position X_iDescribing a point of a particle in a retrieval space, wherein the Best position of the particle experienced by the particle in an iteration process is an individual extreme value Best _ q of the particle, the Best position of the whole population experienced by the particle is Best _ total, in the iteration process, the particle updates the position of the particle by updating the two extreme values, the self cognition ability of the particle is measured by Best _ q, the global cognition ability of the particle is measured by Best _ total, and after the two optimal values are found, the particle updates the speed and the position of the particle, and the method mainly comprises the following steps:

step 1), selecting a critical value and iteration times;

step 2), initializing X_iAnd U_iThe population number n;

step 3), calculating the fitness of each particle;

step 4), for each particle, comparing the fitness value of the particle with the fitness value of the individual Best _ q, and if the fitness value is better, taking the position as the optimal Best _ q of the particle;

step 5), for each particle, comparing the Fitness of the particle with the Fitness value of the global optimal position Best _ total of the population, and if the Fitness value is better, taking the current position as the global optimal position;

step 6), updating the particles;

step 7), starting circulation from the step 3) until the iteration times are reached or an error condition is met, jumping out of the circulation, and outputting the optimal position of the particles;

the particle swarm optimization algorithm is applied to the space position representation of particles which are considered first in the field of picture retrieval, the picture is represented and extracted by a feature vector, if picture image features of the whole picture library are extracted, a vector space is obtained, each particle is represented by the feature vector, the space position coordinates of the particles are represented by each dimension of the corresponding vector, the particle swarm optimization algorithm retrieves an optimized solution in a specific space, the process is to search a regular picture corresponding to the optimal feature vector in the feature vector space, and the picture retrieval process is optimized based on user feedback by combining an SVM algorithm and the particle swarm optimization algorithm.

The method for accurately retrieving the picture guided by the user interest area and feedback further comprises the following steps of: guiding a characteristic optimization process by utilizing a particle evolution direction, firstly carrying out user feedback on a preliminary retrieval result, supposing that a user carries out primary feedback on the result, marking M regular example pictures, extracting local characteristics of the M pictures, calculating an average value on each dimensional characteristic, taking the average value as an initial individual optimal characteristic vector, and calculating according to the following formula:

current position X_iDescribing a point of the particle in the retrieval space, wherein the Best position of the particle experienced by the particle in the iterative process is an individual extreme value Best _ q of the particle, the Best position of the whole population experienced by the particle is Best _ total, M is the number of regular example pictures, g is a characteristic dimension, M is a feature dimension, and M is a feature dimension_positiveFor the quantity of the positive example pictures, according to a fitness function of particle evolution, when the particle evolution meets a termination condition, a global optimal position Best _ total, namely an optimal feature vector, is obtained, then the association degree of all the positive example picture feature vectors and the vectors in all dimensions is calculated, if all the vectors are associated in a certain dimension, a larger weight value is given to the dimension, otherwise, a smaller weight value is given to the dimension, so that the weight values in the feature dimensions of the inquired pictures and the pictures in the picture library are adjusted, and an optimal feature data set suitable for classification of an SVM in the next step is found.

The method for accurately retrieving the picture of the user interest area and the feedback guidance further comprises the following steps of SVM parameter optimization based on a particle swarm optimization algorithm: the method comprises the steps of optimizing parameter selection in the SVM algorithm process by using a particle swarm optimization algorithm, dividing a feature data set into a training set and a testing set for an optimal feature subset found by the particle swarm optimization algorithm, training a support vector machine by using the training set to obtain a support vector machine model file, applying the obtained model to the testing feature data set to obtain a prediction classification result, wherein a fitness function is a key for measuring the performance of the particle swarm optimization algorithm, the spatial position of particles is represented by a group of parameters of the SVM algorithm, including a kernel function parameter, an error control coefficient and a penalty factor, and the fitness corresponding to the particles represents the quality of the training result under the group of parameters;

the SVM algorithm flow based on the particle swarm optimization algorithm is as follows:

extracting characteristic data of a training set and a test set, and counting the number and the proportion of positive samples in an actual test set to obtain a prediction result;

step two, initializing a particle swarm optimization algorithm: initializing parameters of a particle swarm optimization algorithm, and initializing the speed and the position of each particle, wherein the position of one particle is represented by a set of parameters;

setting the Best _ q of the particle as the current position;

step four, setting the global optimal position Best _ total as the current position with the lowest Fitness in the particles;

performing SVM training on the training set, calculating the Fitness of each particle, updating the Best _ q and the Best _ total of the particles, updating the Best _ q if the current Fitness of the particles is superior to the Best _ q, and updating the Best _ total if the current Fitness of the particles is superior to the Best _ total;

and step six, continuously updating the position and the speed of the particles until the maximum iteration times or the error termination condition is reached.

The accurate retrieval method of the picture guided by the user interest area and the feedback further fuses the extracted significant area and the user feedback, and the algorithm flow of the picture retrieval based on the human eye attention model and the S-P user feedback is as follows:

step 1, extracting a significant region: a user selects a picture database, extracts the most significant area and stores the area for each picture;

step 2, image feature extraction: extracting BoF characteristics from each image saliency area in the image library, and establishing a characteristic database;

step 3, preliminary retrieval: extracting the most significant region and BoF feature vectors of the images to be retrieved, performing similarity measurement calculation with a feature database, sorting according to similarity, and outputting the front M sorted images as a primary retrieval result;

and 4, user feedback: marking the preliminary retrieval result by the user, namely indicating the associated and non-associated results, and feeding back the associated m pictures to the system;

step 5, initializing feedback parameters;

and 6, optimizing a particle swarm optimization algorithm: optimizing positive and negative sample selection and image feature extraction by using the fed back m-frame picture information, and establishing feature data set SVM training based on a particle swarm optimization algorithm;

and 7, searching again: classifying and sequencing the picture data sets by using an SVM classifier, and outputting a retrieval result;

and 8, circulating the steps 1 to 7 until the user is satisfied, and outputting a final retrieval result.

Compared with the prior art, the invention has the following contributions and innovation points:

firstly, aiming at the problems of incomplete extraction of a salient region, deviation and the like of a visual salient model, the invention improves the key technology, provides an integral framework of a human eye attention calculation model, accurately positions the interesting region of a user and provides a basis for accurately searching a picture meeting the requirements of the user; the method has less dependence on extra resources, has stronger feasibility in practical application, and is a simple, high-efficiency and strong-practicability accurate retrieval method for the user interest area and the feedback-guided picture;

secondly, the invention provides a user feedback algorithm with PSO and SVM fused, wherein different fitness functions are selected to perform feedback optimization on the processes of salient region extraction, feature selection and SVM algorithm, and the method used by the invention is compared with the primary retrieval and retrieval results based on SVM feedback in experiments; compared with other complex algorithms, the method is simpler and more convenient and easier to operate, and the picture retrieval and judgment are more accurate;

thirdly, the picture retrieval precision retrieval method guided by the user interest area and the feedback provided by the invention has high picture retrieval precision, can achieve satisfactory degree in the aspect of accuracy, has good interaction effect with the user, and needs the user to actively participate and give feedback opinions in a plurality of application fields, because pictures belonging to different classes have different characteristics, different algorithms are designed to carry out targeted operation aiming at different picture characteristics, and the user has more free space in the aspects of selective extraction and similar matching algorithm; the invention has good correlation between semantic features and bottom layer features, consciously analyzes feedback information of a user, and is beneficial to real-time retrieval; aiming at large-scale picture data, the method can effectively retrieve the required picture and obtain more satisfactory effect by applying a user feedback technology;

fourthly, the accurate retrieval method of the user interest region and the feedback-guided picture is based on the region of interest extraction of the human eye attention model, adopts the human eye attention model of the visual saliency model as a basis, improves the key steps of generating the saliency map and extracting the saliency region aiming at the problems of incomplete extraction of the saliency region, deviation and the like of the model, provides the overall framework of the human eye attention calculation model, provides the construction process of the BoF characteristics, comprises the steps of extracting the local image characteristics, generating the visual dictionary, quantizing the characteristics, coding the characteristics and collecting the characteristics, and provides the picture retrieval algorithm flow based on the saliency region extraction. The retrieval result comparison based on BoF characteristics and bottom layer characteristics can show that the retrieval accuracy is greatly improved by comparing two schemes of retrieval and direct retrieval for extracting the salient region based on the advantages of local characteristic description, thereby fully embodying the necessity and the advancement of extracting the salient region;

fifth, the present invention provides a user interest area and feedback-guided accurate picture retrieval method, which aims at the problem that the average accuracy of the feedback system based on the SVM in the prior art cannot achieve the expected effect, when the number of positive feedback samples is small, the SVM classifier is unstable, the optimal hyperplane of the SVM is very sensitive to a small number of samples, and the user often marks a small number of pictures during feedback, and cannot guarantee that all samples are marked sufficiently and accurately, so that the SVM cannot achieve the better effect when the samples are insufficient and inaccurate, in addition, when the number of positive samples is less than that of negative samples, the calculated hyperplane also generates a deviation, in this case, the negative samples are easily fed back into the positive samples, the number of training samples during feedback learning may be lower than the dimension of the feature vector, and in this case, the small sample problem is also easily caused. In order to make up for the defects of the feedback algorithm based on the SVM, the particle swarm optimization algorithm is introduced into the user feedback process, the S-P user feedback guide algorithm fusing the SVM and the PSO is established, the particle swarm optimization algorithm has no requirement on the balance of the sample and is fast in convergence, the whole algorithm process is optimized, and the performance of picture target retrieval is greatly improved.

Drawings

FIG. 1 is a block diagram of a content-based picture retrieval system constructed in accordance with the present invention.

FIG. 2 is a diagram of the image retrieval method architecture based on the human eye attention model.

FIG. 3 is a process diagram of the feature description encoding and assembling of the present invention.

FIG. 4 is a flow chart of an S-P user feedback guidance algorithm for fusing SVM and PSO.

Detailed Description

The following further describes a technical solution of the accurate picture retrieval method for the user interest area and the feedback guidance provided by the present invention with reference to the accompanying drawings, so that those skilled in the art can better understand the present invention and can implement the present invention.

Based on the basic cognition, the invention analyzes from the perspective of user requirements, and the main problems existing in the picture retrieval process comprise two points: firstly, the region of interest of a user cannot be positioned sufficiently and accurately, and the extracted picture target has deviation with the region concerned by the user; secondly, the target picture which really meets the requirements of the user cannot be accurately searched. Based on the two points, the invention takes the region of interest of the user and the guiding retrieval process of the user feedback information as breakthrough points, and provides the picture retrieval method based on the human eye attention model and the user feedback.

Firstly, based on a visual saliency model, aiming at problems of the model, the visual saliency model is improved, the overall architecture of a human eye attention calculation model is proposed, and image areas interesting to a user are extracted, so that the retrieval effect can be effectively improved; secondly, analyzing an image feature extraction method, selecting BoF features to be adopted aiming at the problem that bottom layer features are difficult to keep unchanged when the picture is changed in illumination, size and angle, and verifying the superiority of BoF image feature extraction through retrieval comparison based on BoF and the bottom layer features; according to the method, on the basis of target interest area extraction, image feature extraction and preliminary retrieval, a user feedback algorithm based on the SVM is designed, a particle swarm optimization algorithm is introduced to improve the feedback learning process, an S-P user feedback guide algorithm integrating the SVM and PSO is established, and in an experiment, the method is compared with the preliminary retrieval and the retrieval result based on SVM feedback, so that the method can effectively improve the accuracy of image retrieval.

First, picture accurate retrieval method architecture

The content-based picture retrieval mainly comprises picture extraction characteristic vectors, picture similarity evaluation and feedback guidance, and the content-based picture accurate retrieval method established by the invention is shown in figure 1. Due to the diversity of the pictures, one picture may include many redundant information or a plurality of target objects, for example, in the query picture of fig. 1, including more than one target, in this case, the user always wants to retrieve the picture including the most interesting target, the key point lies in the extraction of the target interest region in one picture, therefore, before extracting the features, the invention adds the key step of extracting the target interest region, obtains the target interest regions of the picture database and the query picture respectively, then extracts the image features of the target interest region, evaluates and retrieves the similarity, in order to further improve the performance of the retrieval system, and guides the user feedback technology to perform user feedback on the initial retrieval result, performs dynamic adjustment according to the feedback information, and then performs optimization retrieval again until the picture satisfied by the user is retrieved.

Image retrieval based on human eye attention model

The premise of extracting the target interest area is to accurately and objectively analyze the picture data, and analyzing and processing the data by simulating the working principle of a human visual system is a reliable way for solving the problem.

The invention takes a visual saliency model as a basis, aiming at the problems of incomplete extraction saliency areas and deviation of the model, improves the key steps of generating a saliency map and extracting the saliency areas, and provides a picture retrieval method based on a human eye attention model, wherein the model calculation overall architecture is shown as figure 2, and the step of generating the saliency map comprises the steps of primary visual feature extraction, feature map generation and combination, and focus positioning and migration.

(one) generating saliency maps

1. Extracting primary visual features

For the input of color pictures, three primary visual characteristics of brightness, direction and color are extracted through linear filtering, wherein the brightness is described BY a brightness characteristic channel, the color characteristics comprise two characteristics of Red Green (RG) and Blue Yellow (BY), the directional characteristics adopt Gabor filtering and comprise characteristics in four directions, and therefore the primary visual characteristics have seven sub-characteristics in total.

to enrich color eigenchannel response, the present invention introduces direction and contrast information to color eigenchannels according to the mechanism of bio-visual color perception, and from a biological perspective, human and mammalian visual systems can process visual content hierarchically, and color processing is performed along the retina-lateral knee-U1-U2-PIT-IT ventral pathway, wherein RG, BY and luminance pathways are formed BY Double-Opponent type neurons in the U1 region, wherein RG and BY in turn comprise color opposition information, respectively, and thus the two color pathways exhibit contrast sensitivity. To represent these two color channels, the invention constructs a mathematical model of the receptive field with the functions:

T(x，y，λ)＝d_HH(λ)t_H(x，y)+d_NN(λ)t_N(x，y)+d_CC(λ)t_C(x，y)

wherein, (x, y) is the coordinate position of the image, H (lambda), N (lambda), C (lambda) correspond to RGB three channels of the picture respectively, t_H、t_N、t_CRepresenting the spatial acuity distribution of each input, and also the shape of its receptive field, approximated by a difference of gaussians DOG function, d_H、d_N、d_CAre coefficients.

And (3) calculating RG and BY characteristic channel responses, for red and green channels, firstly, taking the positive component of the DOG filter as a convolution template to perform convolution on the R channel of the picture, taking the negative component of the G channel as the convolution template to perform convolution, and then calculating the RG channel responses.

2. Generating a feature map

And respectively performing Gaussian smoothing on each characteristic sub-channel image, performing downsampling operation by taking 2 as a step length to obtain a second scale image, continuing to perform the operation on the basis to generate a next scale image, forming a Gaussian pyramid by using the images with different sizes, and in the visual saliency model, obtaining 9 scales of images by each characteristic sub-channel, wherein the 9 scales of images are 9 layers of pyramid images in total, and the embodiment takes a brightness channel image as an example to generate the Gaussian pyramid images of the images with different sizes.

In order to better highlight the target interest area of the picture by the finally extracted feature map, contrast information of a local center and a peripheral background in the picture is obtained, the background information is easily highlighted by a picture with small scale, the local information is easily highlighted by a picture with large scale, and the contrast between the central target and the background can be obtained by subtracting the local information from the picture with large scale. The specific algorithm in the model is as follows: the picture scale s represents local center information, s belongs to {3,4,5}, the picture s represents background information, s is s + delta, wherein delta belongs to {3,4}, 6 scale pairs are provided in total, linear interpolation is carried out on the small-scale pictures to enable the small-scale pictures to have the same size as the pictures in the scale pairs, point-to-point subtraction operation is carried out to detect a region with strong contrast between the center and the periphery, seven channels are obtained finally, and a total of 42 central periphery difference maps, namely feature maps, are obtained.

3. Merging feature maps and generating saliency maps

The merging stage of the characteristic diagram integrates various independent characteristics, and the extracted characteristic diagrams are integrated and competed according to a certain strategy, so that a final significance diagram is generated, and a basis is provided for attention focus selection and migration.

The method analyzes and compares the merging effect of the feature maps of various methods, adopts a local iteration method to merge the feature maps, generates a saliency map for analysis, and can reproduce a local area with higher saliency when a visual saliency model generates the saliency map. In contrast, the invention adds global saliency information to generate a saliency map, namely adds global saliency information to the saliency map generated by the visual saliency model, and comprises the following main steps:

thirdly, denoising the LAB channels respectively;

Salmap_i＝(L_i-AvgL)²+(A_i-AvgA)²+(B_i-AvgB)²

4. Localization and migration of focus of interest

The saliency map represents the most salient features in a scene, a winner competition neural network for a king strategy is applied in the visual saliency model to enable an attention focus to be intelligently selected and positioned to the most salient position under the guidance of the saliency map, and when a plurality of focuses with the same saliency appear at the same time, the attention focus is moved to the salient position closest to the previous attention focus.

(II) extracting salient regions

The method comprises the following steps of extracting a salient region according to an acquired attention focus, improving the contour extraction of the attention focus region based on a model for detecting the salient region, and extracting the salient region:

and 4, plotting the outline of the salient region.

(III) improvement of method for extracting significant region

There are some problems for the model, and from the method step, the most significant map is determined directly from the position value of the focus of interest in the model, and the influence of significant points near the focus is not considered, and thus the extracted significant region may generate deviation. The method for extracting the salient region is further improved, so that an interested target in the picture is more accurately acquired, redundant information is removed, and the retrieval performance of the content image is improved.

When the most significant region is extracted, the influence of the significance of the neighborhood points of the focuses is considered, the invention provides that the weighted values of all significant regions are calculated and sorted from large to small, the region with the largest weighted value is the most significant region, for the significant region determined by each focus, the number of pixels with the significant value reaching a certain critical value in the region is firstly counted as the region area zone_iDirectly eliminating when the area of the region is too small, and calculating the zone of the current region_iThe ratio to the sum of all areas, thus determining the region weight value, and then extracting the most significant region:

(IV) extracting feature vectors

After the significant region in the picture is obtained, the next step is to extract a feature vector from the target interest region. The traditional bottom layer characteristics (shape, color and texture) can describe the content of a global picture, but are difficult to keep unchanged when the picture is subjected to illumination, size and angle changes, and the requirements on accurate description and distinction of the local content of the picture cannot be met, so that the method adopts a characteristic vector constructed based on a local characteristic descriptor, and the local characteristic descriptor adopts a BoF characteristic model to represent the characteristics.

The process of constructing the BoF features comprises local image feature extraction, visual dictionary generation, feature quantization and coding and feature collection.

1. Extracting local image features

The first step of BoF is local feature description, the invention adopts SURF operator as local feature descriptor for building BoF features, the SURF operator is an improvement on SIFT algorithm, and the method has the main advantages of high calculation speed and strong description capability (including reproducibility, stability and significance). Compared to the SIFT algorithm, the SURF algorithm does not use a gradient picture, but reduces the computation time by a black matrix and an integral picture. The local image feature extraction strategy adopted by SURF mainly comprises interest point detection, point description and matching based on a blackplug matrix.

(1) Integral picture

The pixel sum in a certain block region is calculated by the integral picture, once the integral picture of one picture is calculated, the pixel sum in any rectangular region of the picture can be calculated by adding or subtracting the integral picture for three times, the calculation time does not depend on the size of the region, and the point is particularly important when a filtering template with a larger size is used for convolution.

(2) Construction of blackplug matrix

The characteristic point detection based on the black-plug matrix is the key of the SURF operator, the matrix determinant generated by each pixel point is calculated, the approximate value of the determinant is worked out to be used as a discriminant, whether each point is an extreme point is judged by judging whether the determinant value is greater than 0, the characteristic points in the picture are extracted, and the calculation time can be greatly reduced by calculating the matrix approximate value by an integral picture method. For any point in the picture x ═ x, y)^TThe blackplug matrix L (x, r) is defined at x as:

wherein H_xx(x, r) is the convolution of the second order partial derivative on x in picture J by a gaussian function, in order to reduce the amount of computation, a 9 × 9 box filter is used, the scale is 1.2, r in the gauss kernel is 1.2, and the filter is used to obtain the result for H_xx、H_yy、H_xyBy approximation of A_xx，A_yy，A_xyThen, the approximate value of the black plug determinant is:

det(L_approc)＝H_xxH_yy-(tH_xy)²

the parameter t balances the black-filled row-column expression while also preserving the energy of the gaussian kernel and the approximate gaussian kernel.

(3) Construction of a scale space

In order to achieve scale invariance, the method constructs pyramid scale space similar to SIFT, but is different from a method for continuously smoothing down-sampling pictures to construct pyramids in SIFT.

(4) Accurately positioning feature points

In order to accurately position the feature points, the threshold value is increased, the number of the detected feature points is reduced when the threshold value range is increased, only a few feature points which can keep the stability most are finally reserved, the scale picture where a certain pixel point is located is assumed to be the fifth pyramid picture, 18 pixel points in corresponding positions in 8 neighborhood points of the pixel and upper and lower two adjacent scale layer pictures (the third pyramid picture and the sixth pyramid picture) are added, 26 points are totally used as surrounding pixel points of the pixel of the point, if the pixel value of the point is greater than the pixel values of the other 26 points, the point is determined to be the feature point of the area, and the feature points are further accurately positioned by the method.

(5) Generation of feature descriptors

The feature point is used as a window center pixel of 8 multiplied by 8, the obtained gradient mode and the gradient direction correspond to each pixel, then the weighting is carried out on the pixel and the Gaussian function, each pixel corresponds to a feature vector, HOG in eight directions is calculated in a window with the step length of 4, a seed point is formed, 4 seed points form a feature point, each seed point has 8 vector information, and finally a 64-dimensional feature vector is formed.

2. Visual dictionary generation

For each generated picture feature description, clustering by using a clustering algorithm cluster, wherein the gravity center of the cluster is a visual vocabulary similar to text classified words, a codebook generated by clustering is a visual dictionary, a hard clustering algorithm K-means algorithm is adopted to classify the cluster for extracting features, an extreme value adjustment iterative operation rule is worked out by using an objective function, the Euclidean distance is used as a similarity measurement standard, a square error and a criterion function are used as a clustering objective function, the problem of working out the optimal solution of the objective function is converted, a global minimum extreme value point is sought, and the retrieval direction of the objective function is along the direction of reducing the square of the objective function and the error.

3. Feature description coding

Feature description coding assigns a most intuitive visual word or multiple adjacent visual words based on how well the visual word matches a local feature, assuming X_a×MRepresenting the generated M feature vectors, where x_i∈T^dD represents a feature dimension, and N is the number of local features; e_a×MRepresents a set of words, wherein e_i∈T^dD represents the feature dimension, and N represents the number of words; u represents the encoding of the characteristic t, forming a set, U_iDenotes x_iOf the coded vector, U_ijRepresenting a visual word e_jFor x_iThe process of encoding, encoding and assembling of (2) is shown in fig. 3.

As shown in fig. 3, the function g is to encode the feature descriptor, the function f is to collect the visual vocabulary, so that the representation of the picture is more robust, when the visual vocabulary assignment process of the features is performed, the local features are assigned to a nearest visual vocabulary according to the proximity of the visual feature vector and the local features, if the visual vocabulary is assigned, the corresponding encoding is 1, and the rest are 0, and the formula is expressed as:

and allocating a most intuitive visual word or a plurality of adjacent visual words according to the matching degree of the visual words and the local characteristics.

The method is mainly based on the region-of-interest extraction of a human eye attention model, the human eye attention model of the visual saliency model is adopted as a basis, the key steps of generating a saliency map and extracting the saliency region are improved aiming at the problems of incomplete extraction of the saliency region, deviation and the like of the model, the overall framework of the human eye attention calculation model is provided, the construction process of BoF characteristics is provided, the construction process comprises the steps of extracting local image characteristics, generating a visual dictionary, quantizing the characteristics, encoding and collecting the characteristics, and the image retrieval algorithm flow based on the saliency region extraction is provided. According to the retrieval result comparison based on the BoF characteristics and the bottom layer characteristics, based on the advantages of local characteristic description, the retrieval accuracy is greatly improved by comparing two schemes of retrieval and direct retrieval for extracting the significant region, and therefore the necessity and the advancement of extracting the significant region can be reflected.

Third, picture retrieval method based on user feedback

The salient region extraction based on the human eye attention model can effectively position the user interested region, then image feature extraction is carried out on the region, the preliminary retrieval can retrieve the picture meeting the user requirement, but the difference between the picture feature and the high-level semantics is obvious, and the preliminary retrieval result can not meet the user requirement. In order to better improve the picture retrieval performance, the invention introduces user feedback in the picture retrieval process based on the human eye attention model, and the feedback process is divided into the following three steps:

and step three, feeding back for many times, and finishing the query after the user requirements can be met.

The support vector machine model in the prior art is often used for user feedback, but the average accuracy of a feedback system based on an SVM cannot achieve the expected effect, the main reason is that when the number of positive feedback samples is small, an SVM classifier is unstable, the optimal hyperplane of the SVM is very sensitive to a small number of samples, and a user often marks a small number of pictures during feedback, and cannot guarantee that all samples are marked fully and accurately, so that when the samples are insufficient and not marked accurately, the SVM cannot achieve the better effect.

In order to make up for the defects of the feedback algorithm based on the SVM, the particle swarm optimization algorithm is introduced into the user feedback process, the particle swarm optimization algorithm has no requirement on the balance of the sample and is fast in convergence, the feedback learning process is optimized, and the S-P user feedback guiding algorithm fusing the SVM and the PSO is established.

User feedback fusing SVM and particle swarm optimization algorithms

The method combines an SVM (support vector machine) and a particle swarm optimization algorithm, is based on the rapid convergence of the particle swarm optimization algorithm and is suitable for small-quantity feedback sample optimization, initializes the particle swarm feedback parameters after a user evaluates a primary retrieval result, calculates the Fitness of particles, updates the optimal position of an individual, trains an SVM (support vector machine) classifier by using the updated result, classifies pictures in a picture library, calculates the distance between the pictures and a classification surface, sorts and outputs the retrieval result.

The S-P user feedback guidance algorithm overall algorithm framework is shown in FIG. 4, in the specific feedback process, the particle swarm optimization algorithm utilizes the feedback information of the user to improve the processes of feature selection, SVM classification algorithm and saliency region extraction in the picture secondary retrieval process, further optimize the overall algorithm process and improve the picture target retrieval performance.

1. Combination of particle swarm optimization algorithm with SVM

The particle group optimization algorithm is an optimization problem in which a population including a certain number of individuals moves in a certain search space, and each particle x_iEach represents a potentially optimal solution represented by three values: speed U_iCurrent position X_iAnd fitness value Fitnessi, current position X_iDescribing particles in retrievalAt one point in the space, the Best position that the particle has experienced in the iterative process is an individual extreme value Best _ q of the particle, the Best position that the whole population has experienced in the iterative process is Best _ total, in the iterative process, the particle updates its position by updating the two extreme values, the self-cognition ability of the particle is measured by Best _ q, the global cognition ability of the particle is measured by Best _ total, after the two optimal values are found, the particle updates its speed and position, and the main steps of the particle group optimization algorithm are as follows:

step 1), selecting a critical value and iteration times;

step 2), initializing X_iAnd U_iThe population number n;

step 3), calculating the fitness of each particle;

step 6), updating the particles;

and 7) starting to circulate from the step 3) until the iteration times are reached or the error condition is met, jumping out of the circulation, and outputting the optimal position of the particles.

The particle swarm optimization algorithm is applied to the space position representation of particles which are considered first in the field of picture retrieval, the picture is represented and extracted by a feature vector, if picture image features of the whole picture library are extracted, a vector space is obtained, each particle is represented by the feature vector, the space position coordinates of the particles are represented by each dimension of the corresponding vector, the particle swarm optimization algorithm retrieves an optimized solution in a specific space, and the process is to search a regular example picture corresponding to the optimal feature vector in the feature vector space.

2. Particle swarm optimization algorithm optimization feature selection

The method guides the characteristic optimization process by utilizing the particle evolution direction for the characteristic selection of the pictures in the picture library. Firstly, carrying out user feedback on a preliminary retrieval result, supposing that the user carries out primary feedback on the result, marking M regular example pictures in total, extracting local features of the M pictures, calculating an average value on each dimension feature, taking the average value as an initial individual optimal feature vector, and calculating according to the following formula:

3. SVM parameter optimization based on particle swarm optimization algorithm

The SVM considers the complexity of a learning model on the basis of ensuring the accuracy of a training sample classification result, embodies the generalization capability of the learning model and prevents an over-learning phenomenon. The method has the advantages of high classification efficiency and high accuracy in high-dimensional characteristic pattern recognition and solving of the problem of nonlinear classification in small samples. Firstly, a multi-dimensional linear subspace is constructed in a multi-dimensional feature space, a kernel function is applied to nonlinear classification, a regularization factor is adopted for maximization of a balance interval and minimization of a training error, and a decision surface is used for realizing the binary classification of a sample.

Although the SVM can better solve the problems of high dimension and small sample, the parameter selection needs to be considered. The selection of kernel function parameters, error control coefficients and penalty factors is more important. The difference between an error control coefficient control function and a known set, the size of an existing error, the number of support vectors and the applicability of an algorithm to an unknown sample are controlled by the value of a parameter, the error coefficient reflects the acuteness of a vector machine model to input variable noise, the error control coefficient is selected little, the regression estimation precision is high, but the number of the support vectors is increased, the error control coefficient is selected big, the number of the support vectors is small, the regression estimation residual error is reduced, and the sparsity of an SVM is increased; the kernel function influences the mapping condition of the sample data in the high-dimensional feature space; the penalty factor influences the popularization of the learning machine by adjusting the penalty value of the empirical error.

Aiming at the problem of SVM parameter influence, the invention optimizes the parameter selection of the SVM algorithm process by using a particle swarm optimization algorithm, firstly divides a feature data set into a training set and a testing set for an optimal feature subset found by the particle swarm optimization algorithm, trains a support vector machine by using the training set to obtain a support vector machine model file, and then applies the obtained model to the testing feature data set to obtain a prediction classification result. The fitness function is a key for measuring the performance of the particle swarm optimization algorithm, the space position of the particle is represented by a group of parameters of the SVM algorithm, the parameters comprise kernel function parameters, error control coefficients and penalty factors, the fitness corresponding to the particle represents the quality of a training result under the group of parameters, the average absolute error is used as the fitness function, and the method is ended when the prediction error reaches a given value or the maximum iteration times in the iteration process.

setting the Best _ q of the particle as the current position;

The method fully overcomes the blindness in SVM parameter selection, and has more definite guidance compared with the trial and error method used in the prior art.

4. Particle swarm optimization algorithm for optimizing selection of positive and negative samples

When the number of positive example feedback samples is small, the SVM classifier is unstable, particularly when the number of positive and negative examples is not smooth or the positive example picture is not accurately marked, the SVM is difficult to achieve the best effect, the invention provides a positive and negative sample selection process based on particle swarm optimization algorithm optimization SVM training, a feature vector of each picture in a picture library is used as an initial particle position, a Fitness value is calculated, meanwhile, the influence of the fed-back positive example picture and negative example picture on the particle Fitness value is considered, if the particles are closer to a positive example picture set, the Fitness value Fitness of the particles is smaller, vice versa, if the Fitness value of the particles is lower, the corresponding positions of the particles are better, the particles are reordered according to the Fitness value of each particle, namely, the relevance between each picture in the picture library and a query picture is ordered, so that the positive example picture and the negative example picture are re-determined, the fitness function changes along with the dynamic change of the positive and negative example pictures in each iteration process, and when the preset iteration times or the number of the positive samples reaches the preset number, the process is terminated to indicate that the positive and negative samples for training are selected.

(II) image retrieval method based on human eye attention model and S-P user feedback

The invention integrates the extracted significant region with user feedback, and the algorithm flow of the picture retrieval based on the human eye attention model and the S-P user feedback is as follows:

step 5, initializing feedback parameters;

The picture retrieval is a hot problem in the field of computer vision nowadays, wherein the content-based picture retrieval is a key technology of the picture retrieval, and the methods in the prior art have some obvious defects, which are mainly expressed as follows: firstly, the goal of accurately describing the interest of the user cannot be achieved, and secondly, the goal picture which really meets the requirement of the user cannot be accurately searched. Based on the two points, the invention mainly breaks through the guidance and retrieval processes of how to acquire the user interest area and how to introduce user feedback information, provides an accurate retrieval method of the user interest area and the feedback guidance picture, introduces a particle swarm optimization algorithm to improve the feedback learning process based on the user feedback algorithm of the SVM, and establishes an S-P user feedback guidance algorithm fusing the SVM and the PSO, and specifically comprises the following steps: firstly, improving a saliency region extraction method for a visual saliency model, and providing image retrieval based on a human eye attention model, wherein the image retrieval comprises an overall architecture based on a human eye attention calculation model, a BoF feature vector construction process and a retrieval algorithm flow based on a human eye attention model and BoF features; secondly, optimizing three aspects of significance region extraction, SVM training parameters and feature selection processes based on the features of the SVM and particle swarm optimization algorithm, and providing a picture retrieval method based on a human eye attention model and S-P user feedback.

Claims

1. The method is characterized in that a user interest area and a feedback-guided picture accurate retrieval method is provided, wherein a retrieval process is guided by acquiring a user interest area and introducing user feedback information, and a picture retrieval method based on a human eye attention model and user feedback is provided; secondly, analyzing an image feature extraction method, and selecting BoF features; on the basis of extracting a target interest area, extracting image characteristics and initially retrieving, introducing a particle swarm optimization algorithm to improve a feedback learning process based on a user feedback algorithm of an SVM (support vector machine), and establishing an S-P (user-feedback-guidance) algorithm fusing the SVM and a PSO (particle swarm optimization);

2. The method of claim 1, wherein the method for accurately retrieving the user interest area and the feedback-guided picture extracts a primary visual feature: for the input of a color picture, three primary visual characteristics of brightness, direction and color are extracted through linear filtering, wherein the brightness is described BY a brightness characteristic channel, the color characteristics comprise two characteristics of Red Green (RG) and Blue Yellow (BY), the direction characteristics adopt Gabor filtering and comprise characteristics in four directions, and the primary visual characteristics have seven sub-characteristics in total;

T(x，y，λ)＝d_HH(λ)t_H(x，y)+d_NN(λ)t_N(x，y)+d_CC(λ)t_C(x，y)

3. The method for accurately retrieving the user interest area and the feedback-guided picture according to claim 1, wherein a feature map is generated: respectively performing Gaussian smoothing on each characteristic subchannel image, then performing downsampling operation by taking 2 as a step length to obtain a second scale image, continuing to perform the operation on the basis to generate a next scale image, forming a Gaussian pyramid by the images with different sizes, and obtaining 9 scales of images by each characteristic subchannel in the visual saliency model, wherein the images are 9 layers of pyramid images in total;

4. The method of claim 1, wherein the method for accurately retrieving the images guided by the interest area and the feedback comprises the steps of combining feature maps and generating a saliency map: in the merging stage of the feature maps, various independent features are integrated, the extracted feature maps are integrated and competed according to a certain strategy, so that a final significance map is generated, a basis is provided for attention focus selection and migration, the feature maps are merged by adopting a local iteration method, each feature map is normalized firstly, then is subjected to iterative convolution with a Gaussian difference function, and after standardized iterative operation, the feature maps under various scales are superposed and then are weighted under different features to obtain the significance map;

thirdly, denoising the LAB channels respectively;

Salmap_i＝(L_i-AvgL)²+(A_i-AvgA)²+(B_i-AvgB)²

5. The method for accurately retrieving the user interest area and the feedback-guided picture according to claim 1, wherein the salient area is extracted: extracting a salient region according to the acquired attention focus, improving the contour extraction of the attention focus region based on a model for salient region detection, and extracting the salient region mainly comprises the following steps:

and 4, plotting the outline of the salient region.

6. The method for accurately retrieving a picture guided by the user interest area and feedback according to claim 1, wherein the improvement of the method for extracting the salient area comprises: when the most significant region is extracted, the influence of the significance of the neighborhood points of the focuses is considered, the invention provides that the weighted values of all significant regions are calculated and sorted from large to small, the region with the largest weighted value is the most significant region, for the significant region determined by each focus, the number of pixels with the significant value reaching a certain critical value in the region is firstly counted as the region area zone_iDirectly eliminating when the area of the region is too small, and calculating the zone of the current region_iThe ratio to the sum of all areas, thus determining the region weight value, and then extracting the most significant region:

7. The method for accurately retrieving the picture guided by the interest area and the feedback of the user according to claim 1, wherein the user feedback is introduced in the picture retrieving process based on the human eye attention model, and the feedback process is divided into the following three steps:

step 1), selecting a critical value and iteration times;

step 2), initializing X_iAnd U_iThe population number n;

step 3), calculating the fitness of each particle;

step 6), updating the particles;

8. The method of claim 7, wherein the particle swarm optimization algorithm optimizes feature selection by: guiding a characteristic optimization process by utilizing a particle evolution direction, firstly carrying out user feedback on a preliminary retrieval result, supposing that a user carries out primary feedback on the result, marking M regular example pictures, extracting local characteristics of the M pictures, calculating an average value on each dimensional characteristic, taking the average value as an initial individual optimal characteristic vector, and calculating according to the following formula:

9. The method for accurately retrieving the user interest area and the feedback-guided picture according to claim 1, wherein the SVM parameter optimization based on the particle swarm optimization algorithm comprises: the method comprises the steps of optimizing parameter selection in the SVM algorithm process by using a particle swarm optimization algorithm, dividing a feature data set into a training set and a testing set for an optimal feature subset found by the particle swarm optimization algorithm, training a support vector machine by using the training set to obtain a support vector machine model file, applying the obtained model to the testing feature data set to obtain a prediction classification result, wherein a fitness function is a key for measuring the performance of the particle swarm optimization algorithm, the spatial position of particles is represented by a group of parameters of the SVM algorithm, including a kernel function parameter, an error control coefficient and a penalty factor, and the fitness corresponding to the particles represents the quality of the training result under the group of parameters;

setting the Best _ q of the particle as the current position;

10. The method for accurately retrieving the picture guided by the user interest area and the feedback as claimed in claim 1, wherein the extracted significant area and the user feedback are fused, and the algorithm flow of the picture retrieval based on the human eye attention model and the S-P user feedback is as follows:

step 5, initializing feedback parameters;