CN106529606A

CN106529606A - Method of improving image recognition accuracy

Info

Publication number: CN106529606A
Application number: CN201611099552.8A
Authority: CN
Inventors: 程国艮; 李欣杰
Original assignee: Global Tone Communication Technology Co ltd
Current assignee: Global Tone Communication Technology Co ltd
Priority date: 2016-12-01
Filing date: 2016-12-01
Publication date: 2017-03-22

Abstract

The invention discloses a method of improving the image recognition accuracy. According to the method, the word embedding technique is adopted for analyzing the associated text of an image, so that the image recognition accuracy is improved. The method comprises the steps of extracting an image and a text description thereof; for the image, recognizing a training neural network based on the deep learning technology; for image recognition, taking previous m recognition results and subjecting the previous m recognition results to subsequent treatment; recognizing a noun sequence in the text based on the noun recognition technology; training and calculating term vectors; and filtering image recognition results based on the word vector approximation degree to improve the recognition accuracy. According to the technical scheme of the invention, based on the word embedding technique, the associated text of the image is analyzed, so that the image recognition accuracy is improved.

Description

A kind of method for lifting image recognition accuracy rate

Technical field

The invention belongs to technical field of image processing, more particularly to a kind of method for lifting image recognition accuracy rate.

Background technology

Over 2006, start to be subject to using " deep learning " (Deep Learning) technology of deep neural network (DNN) Academia extensive concern, has become a upsurge of the Internet big data and artificial intelligence today.Deep learning is by setting up Similar to the hierarchical mode structure of human brain, input data is extracted step by step from bottom to high-rise feature, so as to build well The vertical mapping relations from bottom layer signal to high-level semantic, are machine intelligence field this maximum progress over 10 years recently, know in image The maximum breakthrough in other field.

When sorting out to image using deep learning, as a rule, to a picture, a series of recognition results can be provided And its probability.However, image is not lonely presence.For news picture, except picture, there are headline and news Content.Headline and news content are closely related with the content in news picture.For electric business picture, generally sell The title description of commodity, these word descriptions and commodity picture are closely related.

Art methods are, using technologies such as deep learnings, to recognize that image draws recognition result.For news image or Electric business image, is often furnished with word description, existing technology of identification, not using upper these character description informations.Profit of the invention With character description information, candidate's knot low with the word description degree of association in the candidate result of deep learning technology identification is removed Really, reach the purpose for improving identification accuracy

The content of the invention

It is an object of the invention to provide a kind of method for lifting image recognition accuracy rate, it is intended to solve existing identification skill Art, does not utilize the character description information that news image or electric business image are furnished with, it is impossible to remove candidate's knot of prior art identification With the low candidate result of the word description degree of association in fruit, it is impossible to improve the identification problem of accuracy.

The present invention is achieved in that a kind of method for lifting image recognition accuracy rate, and the lifting image recognition is accurate The method of rate adopts word embedded technology, calculate candidate result and the comment of image recognition the degree of association (with minimum range come Characterize), the low candidate result of the degree of association is removed, the accuracy rate of image recognition is lifted；

Comprise the following steps that：

The extraction of picture, word description pair；

For image, using deep learning technology, neural network recognization is trained, recognition result is class probability sequence (C₁, P₁), (C₂, P₂) ... ... (C_n,P_n), the class probability sequence is according to probability sorting, specially P₁≥P₂≥……≥P_n；Using Neutral net have but be not limited to AlexNet, GoogLeNet, VGG, Inception, ResNet；

For image recognition, front m items recognition result (C is taken₁, P₁), (C₂, P₂) ... ... (C_m,P_m) (m≤n), participate in follow-up Process；

For descriptive text, the noun sequence in word is identified by noun technology of identification, noun sequence is gone The noun sequence obtained after the noun for falling repetition is designated as N₁,N₂,…N_k；

The training of term vector and calculating；

Carry out filtration to lift identification accuracy to image recognition result using the term vector degree of approximation.

Further, the picture, the extraction of word description pair includes：

A) for news picture, headline is extracted, using the headline extracted as the word description to news picture；

B) description of electric business product for electric business picture, is extracted, using the description of the electric business product for extracting as to electric business figure The word description of piece.

What news article and electric business product data were obtained by webpage capture, what crawl came is html contents, by right Html contents carry out structured analysis, can extract the description of headline and electric business product.

Further, the training of the term vector is included with calculating：

1) news corpus are adopted, news term vector model is trained；Using electric business language material, electric business term vector model is trained；Choosing Select the term vector that corresponding term vector model calculates noun sequence；

2) calculate noun sequence N₁,N₂,…N_kTerm vector, be designated as V_n1,V_n2,…V_nk；

3) calculate classification sequence C₁,C₂,…C_mTerm vector, be designated as V_c1,V_c2,…V_cm。

Further, the employing term vector degree of approximation carries out filtration to lift identification accuracy bag to image recognition result Include：

(1) two term vector V are remembered₁, V₂Between the degree of approximation be d_v1,v2；The degree of approximation is nearer, represents that two word meanings get over phase Closely；For the term vector V of each classification sequence_Ci, calculate from V_CiThe term vector V of nearest noun sequence_nj, minimum distance is d_vci,nj；

(2) setpoint distance threshold value t, works as d_vci,njDuring more than t, d is represented_vci,njClassification more than t is closed with iamge description text Connection degree is low, abandons the category；

(3) remaining sequence is before the little classification of minimum distance comes, as most according to being ranked up with minimum distance Whole image recognition result.

Further, in step (1), calculate from V_CiThe term vector V of nearest noun sequence_nj, minimum distance is d_vci,nj, tool Body includes：

COS distance method or Euclidean distance method is selected to calculate distance between term vector；Noun sequence N₁,N₂,…N_kWord Vector, is designated as V_n1,V_n2,…V_nk；It is assumed that candidate result C_mTerm vector be V_cm；V is calculated respectively_cmAnd V_n1,V_n2,…V_nkAway from From being designated as d_vcm,n1,d_vcm,n2,…,d_vcm,nk；Then, minimum distance d_vcm,vn=min (d_vcm,n1,d_vcm,n2,…,d_vcm,nk)。

Further, in step (3), the remaining sequence is：It is assumed that candidate result is C₁,C₂,…C_m, their word to Measure as V_c1,V_c2,…V_cm；Noun sequence N₁,N₂,…N_kTerm vector, be designated as V_n1,V_n2,…V_nk；Calculate each candidate result with The degree of association of ranking sequence；The low result of the degree of association is abandoned, it is remaining for remaining sequence.Such as, it is assumed that have C₁, C₂, C₃Three times Result is selected, by calculating, C₂Low with the degree of association of ranking sequence, then remaining sequence is C₁, C₃。

Further, Euclidean distance method is：

Give two vector V₁(x₁₁,x₁₂,…,x_1n) and V₂(x₂₁,x₂₂,…,x_2n), Euclidean distance refers to Euclidean distance；

COS distance method is：COS distance is two vectorial angle cosines；

A kind of method of lifting image recognition accuracy rate that the present invention is provided, using word embedded technology, analyzes picture Association word, improves the accuracy rate of image recognition with this.

The present invention utilizes character description information, removes in the candidate result of deep learning technology identification and word description is closed The low candidate result of connection degree, reaches the purpose for improving identification accuracy.

The present invention adopts electric business language material, trains the term vector model of 60 latitudes.Using Euclidean distance, distance threshold is 10, row Except the low classification of the degree of association, and entered after rearrangement according to minimum range, final identifies accurate result.

Description of the drawings

Fig. 1 is the method flow diagram for lifting image recognition accuracy rate provided in an embodiment of the present invention.

Specific embodiment

In order that the objects, technical solutions and advantages of the present invention become more apparent, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that specific embodiment described herein is not used to only to explain the present invention Limit the present invention.

Below in conjunction with the accompanying drawings the application principle of the present invention is described in detail.

As shown in figure 1, the method for lifting image recognition accuracy rate provided in an embodiment of the present invention, the lifting image recognition The method of accuracy rate adopts word embedded technology, analyzes the association word of picture, lifts the accuracy rate of image recognition；

Comprise the following steps that：

S101：The extraction of picture, word description pair；

S102：For image, using deep learning technology, neural network recognization is trained, recognition result is class probability sequence Row (C₁, P₁), (C₂, P₂) ... ... (C_n,P_n), the class probability sequence is according to probability sorting, specially P₁≥P₂≥……≥ P_n；Using neutral net have but be not limited to AlexNet, GoogLeNet, VGG, Inception, ResNet；

S103：For image recognition, front m items recognition result (C is taken₁, P₁), (C₂, P₂) ... ... (C_m,P_m) (m≤n), participate in Subsequent treatment；

S104：For descriptive text, the noun sequence in word is identified by noun technology of identification, noun sequence is entered The noun sequence obtained after the capable noun for removing repetition is designated as N₁,N₂,…N_k；

S105：The training of term vector and calculating；

S106：Carry out filtration to lift identification accuracy to image recognition result using the term vector degree of approximation.

Further, the picture, the extraction of word description pair includes：

Further, the training of the term vector is included with calculating：

In step (1), calculate from V_CiThe term vector V of nearest noun sequence_nj, minimum distance is d_vci,nj, specifically include：

In step (3), the remaining sequence is：It is assumed that candidate result is C₁,C₂,…C_m, their term vector is V_c1, V_c2,…V_cm；Noun sequence N₁,N₂,…N_kTerm vector, be designated as V_n1,V_n2,…V_nk；Calculate each candidate result and ranking sequence The degree of association；The low result of the degree of association is abandoned, it is remaining for remaining sequence.Such as, it is assumed that have C₁, C₂, C₃Three candidate results, By calculating, C₂Low with the degree of association of ranking sequence, then remaining sequence is C₁, C₃。

Euclidean distance method is：

COS distance method is：COS distance is two vectorial angle cosines；

Embodiment：

Word description is " the big bag PU of 2016 trendy Korea Spro's version fashion knapsacks of Ou Shina pieces school bag both shoulders bag together ".Using Inception Network Recognition, the candidate result of first 20 is：

Mailbag (0.1564)

Knapsack (0.0818)

Ice hockey (0.0596)

Button (0.0332)

Knee cap (0.0270)

Cuirass (0.0180)

Corset (0.0169)

Military uniform (0.0169)

Satcheel (0.0150)

T-shirt (0.0110)

Shield (0.0104)

Radix Ipomoeae/Tao Di (0.0103)

Apron (0.0101)

Leather sheath (0.0098)

Sport shirt (0.0096)

Football helmet (0.0092)

Bullet-proof vest (0.0067)

Maillot/tightss (0.0063)

Using electric business language material, the term vector model of 60 latitudes is trained.Using Euclidean distance, distance threshold is 10, excludes association The low classification of degree, and entered after rearrangement according to minimum range, final recognition result is：

Knapsack

Mailbag

Knee cap

Sport shirt

Apron

Leather sheath

Bullet-proof vest

Nightwear

Corset

Cuirass

T-shirt

Military uniform

Button

Candidate result after treatment is more accurate.

Presently preferred embodiments of the present invention is the foregoing is only, not to limit the present invention, all essences in the present invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims

1. it is a kind of lifted image recognition accuracy rate method, it is characterised in that the method for the lifting image recognition accuracy rate is adopted Word embedded technology, calculates the candidate result of image recognition and the degree of association of comment, removes the low candidate result of the degree of association, Lift the accuracy rate of image recognition；

Comprise the following steps that：

The extraction of picture, word description pair；

For image, using deep learning technology, neural network recognization is trained, recognition result is class probability sequence (C₁, P₁), (C₂, P₂) ... ... (C_n,P_n), the class probability sequence is according to probability sorting, specially P₁≥P₂≥……≥P_n；Using Neutral net has but is not limited to AlexNet, GoogLeNet, VGG, Inception, ResNet；

For image recognition, front m items recognition result (C is taken₁, P₁), (C₂, P₂) ... ... (C_m,P_m) (m≤n), participate in subsequent treatment；

For descriptive text, the noun sequence in word is identified by noun technology of identification, noun sequence is carried out removing weight The noun sequence obtained after multiple noun is designated as N₁,N₂,…N_k；

The training of term vector and calculating；

2. the method for lifting image recognition accuracy rate as claimed in claim 1, it is characterised in that the picture, word description To extraction include：

B) description of electric business product for electric business picture, is extracted, using the description of the electric business product for extracting as to electric business picture Word description.

3. the as claimed in claim 1 method for lifting image recognition accuracy rate, it is characterised in that the training of the term vector with Calculating includes：

1) news corpus are adopted, news term vector model is trained；Using electric business language material, electric business term vector model is trained；It is right to select The term vector model answered calculates the term vector of noun sequence；

4. the method for lifting image recognition accuracy rate as claimed in claim 1, it is characterised in that the employing term vector is approximate Degree carries out filtration and recognizes that accuracy includes to be lifted to image recognition result：

(1) two term vector V are remembered₁, V₂Between the degree of approximation be d_v1,v2；The degree of approximation is nearer, represents that two word meanings are more close；It is right In the term vector V of each classification sequence_Ci, calculate from V_CiThe term vector V of nearest noun sequence_nj, minimum distance is d_vci,nj；

(2) setpoint distance threshold value t, works as d_vci,njDuring more than t, d is represented_vci,njClassification more than t and iamge description textual association degree It is low, abandon the category；

(3) remaining sequence is before the little classification of minimum distance comes, as final according to being ranked up with minimum distance Image recognition result.

5. the method for lifting image recognition accuracy rate as claimed in claim 4, it is characterised in that in step (1), calculate from V_Ci The term vector V of nearest noun sequence_nj, minimum distance is d_vci,nj, specifically include：

COS distance method or Euclidean distance method is selected to calculate distance between term vector；Noun sequence N₁,N₂,…N_kTerm vector, It is designated as V_n1,V_n2,…V_nk；It is assumed that candidate result C_mTerm vector be V_cm；V is calculated respectively_cmAnd V_n1,V_n2,…V_nkDistance, be designated as d_vcm,n1,d_vcm,n2,…,d_vcm,nk；Then, minimum distance d_vcm,vn=min (d_vcm,n1,d_vcm,n2,…,d_vcm,nk)。

6. the method for lifting image recognition accuracy rate as claimed in claim 4, it is characterised in that in step (3), the remainder Sequence be：It is assumed that candidate result is C₁,C₂,…C_m, their term vector is V_c1,V_c2,…V_cm；Noun sequence N₁,N₂,…N_k Term vector, be designated as V_n1,V_n2,…V_nk；Calculate the degree of association of each candidate result and ranking sequence；Abandon the low knot of the degree of association Really, it is remaining for remaining sequence.

7. the as claimed in claim 5 method for lifting image recognition accuracy rate, it is characterised in that Euclidean distance method is：

d_{e d} (V_{1}, V_{2}) = \sqrt{u = 1 n} | x_{1 u} - x_{2 u} | 2;

COS distance method is：COS distance is two vectorial angle cosines；

d_{c o s} (V_{1}, V_{2}) = \frac{Σ_{u = 1}^{n} (x_{1 u} \times x_{2 u})}{\sqrt{Σ_{u = 1}^{n} ({x_{1 u}}^{2})} \times \sqrt{Σ_{u = 1}^{n} ({x_{2 u}}^{2})}} .