[go: up one dir, main page]

CN118570503B - Image recognition method, device and system - Google Patents

Image recognition method, device and system Download PDF

Info

Publication number
CN118570503B
CN118570503B CN202411034778.4A CN202411034778A CN118570503B CN 118570503 B CN118570503 B CN 118570503B CN 202411034778 A CN202411034778 A CN 202411034778A CN 118570503 B CN118570503 B CN 118570503B
Authority
CN
China
Prior art keywords
image
images
similarity value
similarity
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411034778.4A
Other languages
Chinese (zh)
Other versions
CN118570503A (en
Inventor
江俊林
朱树磊
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202411034778.4A priority Critical patent/CN118570503B/en
Publication of CN118570503A publication Critical patent/CN118570503A/en
Application granted granted Critical
Publication of CN118570503B publication Critical patent/CN118570503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image recognition method, device and system. The method comprises the following steps: extracting image features of an image to be identified through a feature extraction network, wherein the feature extraction network is obtained through training of a non-tag data set and a tag data set, and the non-tag data set corresponds to different data types of the tag data set; based on the image characteristics and the image base, the first m images with the highest first similarity value are determined, wherein m is a positive integer; comparing the image features with the unlabeled data set to determine second similarity values corresponding to the m images respectively; and determining a target image corresponding to the image to be identified from the m images according to the first similarity value and the second similarity value. By adopting the method, the interference can be effectively reduced, and the accuracy of image recognition can be improved.

Description

Image recognition method, device and system
Technical Field
The present invention relates to the field of face recognition technologies, and in particular, to an image recognition method, device, and system.
Background
When the face recognition is carried out, the feature extraction is carried out on the test sample through the trained model, and the cosine similarity is calculated with the image to be recognized based on the face feature extracted by the test sample, so that the target image is determined from the test sample according to the cosine similarity result.
The existing model is mainly characterized in that the existing N types of training samples are subjected to classified training, so that the characteristic distribution in the classes is tighter, meanwhile, the distances among the classes are pulled apart, and the model obtained through training is more differentiated. However, due to different similarity distribution among samples in the class, the mode often leads to false alarm or missing alarm, and reduces the recognition accuracy of the face recognition model.
Therefore, how to improve the accuracy of face recognition is a problem that needs to be solved at present.
Disclosure of Invention
The invention provides an image recognition method, device and system, which are used for improving the accuracy of image recognition.
In a first aspect, the present invention provides an image recognition method, which can be performed by an image recognition apparatus, the method comprising:
Extracting image features of an image to be identified through a feature extraction network, wherein the feature extraction network is obtained through training of a non-tag data set and a tag data set, and the non-tag data set corresponds to different data types of the tag data set; based on the image characteristics and the image base, the first m images with the highest first similarity value are determined, wherein m is a positive integer; comparing the image features with the unlabeled data set to determine second similarity values corresponding to the m images respectively; and determining a target image corresponding to the image to be identified from the m images according to the first similarity value and the second similarity value.
In a possible implementation manner, the determining, based on the comparing the image features with the unlabeled dataset, second similarity values corresponding to the m images respectively includes:
Selecting k images with highest similarity values with the image characteristics from the unlabeled data set; determining k similarity values corresponding to each image in the m images based on the k images respectively; and determining the similarity value obtained by carrying out normalization processing on k similarity values corresponding to each image as a second similarity value corresponding to the image.
In one possible implementation manner, the determining, according to the first similarity value and the second similarity value, the target image corresponding to the image to be identified from the m images includes:
Determining a third similarity value corresponding to each image in the m images according to the first similarity value and the second similarity value corresponding to each image in the m images; and determining an image with the highest third similarity value in the m images as a target image corresponding to the image to be identified.
In a possible implementation manner, the determining, according to the first similarity value and the second similarity value corresponding to each image of the m images, a third similarity value corresponding to each image of the m images includes:
And calculating to obtain a third similarity value corresponding to each image according to the weight corresponding to the first similarity value and the weight corresponding to the second similarity value.
In one possible implementation, the feature extraction network is obtained by performing a first stage training through the labeled dataset, and performing a second stage training again through the unlabeled dataset and the labeled dataset after the first stage training is completed.
In one possible implementation, the similarity of the unlabeled dataset with respect to each data category in the labeled dataset in the feature extraction network resulting from completion of the second stage training is below a threshold similarity.
In one possible implementation, the unlabeled dataset is derived from the labeled training set for training the feature extraction network.
The label-free dataset is obtained by performing feature extraction on the label dataset through the feature extraction network and performing normalization processing on the extracted features.
In a second aspect, an embodiment of the present invention provides an image recognition apparatus, including:
The device comprises an acquisition module, a feature extraction module and a data classification module, wherein the acquisition module is used for extracting image features of an image to be identified through a feature extraction network, the feature extraction network is obtained through training of a non-tag data set and a tag data set, and the non-tag data set corresponds to different data types of the tag data set;
The processing module is used for comparing the image characteristics with an image base and determining the first m images with the highest first similarity value, wherein m is a positive integer; comparing the image features with the unlabeled data set to determine second similarity values corresponding to the m images respectively; and determining a target image corresponding to the image to be identified from the m images according to the first similarity value and the second similarity value.
In one possible implementation manner, the processing module is specifically configured to:
Selecting k images with highest similarity values with the image characteristics from the unlabeled data set; determining k similarity values corresponding to each image in the m images based on the k images respectively; and determining the similarity value obtained by carrying out normalization processing on k similarity values corresponding to each image as a second similarity value corresponding to the image.
In one possible implementation manner, the processing module is specifically configured to:
Determining a third similarity value corresponding to each image in the m images according to the first similarity value and the second similarity value corresponding to each image in the m images; and determining an image with the highest third similarity value in the m images as a target image corresponding to the image to be identified.
In one possible implementation manner, the processing module is specifically configured to:
And calculating to obtain a third similarity value corresponding to each image according to the weight corresponding to the first similarity value and the weight corresponding to the second similarity value.
In one possible implementation, the feature extraction network is obtained by performing a first stage training through the labeled dataset, and performing a second stage training again through the unlabeled dataset and the labeled dataset after the first stage training is completed.
In one possible implementation, the similarity of the unlabeled dataset with respect to each data category in the labeled dataset in the feature extraction network resulting from completion of the second stage training is below a threshold similarity.
In one possible implementation, the unlabeled dataset is derived from the labeled training set for training the feature extraction network.
In a third aspect, embodiments of the present invention also provide an image recognition apparatus, the apparatus comprising a memory for storing a computer program or instructions and a processor; the processor is configured to invoke a computer program or instructions stored in the memory to perform a method as in any of the possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present application provides a chip system, including: a processor and an interface from which the processor invokes and executes a computer program, which when executed, enables the implementation of the method described in the first aspect or any one of the possible designs of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer readable storage medium having a computer program for performing the method described in the first aspect or any one of the possible designs of the first aspect.
In a sixth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed, enables the implementation of the method described in the first aspect or any one of the possible designs of the first aspect.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
When the image recognition is carried out, for example, in the face recognition process, the evaluation set composed of the label-free data is introduced to assist in carrying out face recognition, m images obtained by screening from the image base are subjected to similarity equalization, and the target image corresponding to the image to be recognized is determined from the m images according to the equalized similarity, so that interference generated by different sample similarity distribution in different classes can be reduced in the recognition process, recognition can be carried out more comprehensively, the face recognition accuracy is effectively improved, and false alarm can be greatly reduced especially for pictures of some fuzzy, low-quality, illumination complex, non-face scenes and the like.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a characteristic space distribution diagram of different quality pictures according to an embodiment of the present invention;
fig. 2 is a schematic flow chart corresponding to an image recognition method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training phase according to an embodiment of the present invention;
FIG. 4 is a schematic diagram showing a method for calculating similarity between N-dimensional features according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an image recognition device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an image recognition device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When image recognition is performed, for example, in the face recognition process, feature extraction is performed on a test sample mainly through a trained model, and cosine similarity is calculated with an image to be recognized based on face features extracted by the test sample, so that a target image is determined from the test sample according to a cosine similarity result.
The model adopted in the related technology is mainly characterized in that the existing N types of training samples are subjected to classified training, so that the characteristic distribution in the types is tighter, and meanwhile, the distances among the types are pulled apart, so that the model obtained by training is more differentiated. However, due to different similarity distribution among samples in the class, the mode often leads to false alarm or missing alarm, and reduces the recognition accuracy of the face recognition model.
For example, after the classification model is trained, the model is used to extract features of the related data, and the features are normalized.
In the embodiment of the application, as shown in a of fig. 1, by selecting part of pictures and comparing the facial pictures with the feature distribution of the pictures, the pictures with better image quality and smaller angles can be primarily analyzed, the feature distribution is sparse, and the distance between the features is larger.
And as shown in b of fig. 1, in some scenes with blurs, complex illumination and large angles, partial characteristics are distributed more tightly, and the distances among the characteristics are smaller and even tend to coincide.
Therefore, it can be understood that, in the related technical solutions, when performing face recognition, the feature extraction network often learns the pictures of the scenes poorly, so that during recognition, high-score false positives are easily generated in the scenes.
In summary, how to improve the accuracy of image recognition is a problem that needs to be solved at present.
Based on the above, the application provides an image recognition method, for example, in the face recognition process, an evaluation set composed of label-free data is introduced to assist in face recognition, similarity equalization is carried out on m images obtained by screening from an image library, and a target image corresponding to the image to be recognized is determined from the m images according to the equalized similarity, so that interference generated by different sample similarity distribution in different classes can be reduced in the recognition process, recognition can be carried out more comprehensively, the face recognition accuracy is effectively improved, and false alarm can be greatly reduced especially for pictures of some fuzzy, low-quality, illumination complex, non-face scenes and the like.
In order to better describe the image recognition method provided by the application, the full text of the description is described by taking face recognition as an example, and it should be noted that the face recognition scene is only one application scene of the image recognition method of the application, and is not limited to the embodiment of the application, and any other recognition scene suitable for the image recognition method of the application belongs to the protection scope of the application.
Fig. 2 is a schematic flow chart corresponding to an image recognition method according to an embodiment of the present invention.
The process can be executed by an image recognition device, and the device can be realized by a software mode, a hardware mode and a combination mode of software and hardware. As shown in fig. 2, the process includes the steps of:
in step 201, image features of an image to be identified are extracted through a feature extraction network, which is trained by an unlabeled dataset and a labeled dataset.
The unlabeled data set and the labeled data set in the embodiment of the application correspond to different data types.
As an example, the feature extraction network in the embodiment of the present application is obtained by performing a first stage training through the labeled dataset, and performing a second stage training through the unlabeled dataset and the labeled dataset after the first stage training is completed.
As an example, the similarity of the unlabeled dataset with respect to each data category in the labeled dataset in the feature extraction network resulting from completion of the second stage training is below a threshold similarity such that the feature distribution of the unlabeled dataset is not biased towards any one of the labeled dataset.
As an example, assuming that the current application scenario is a face recognition scenario, the labeled dataset according to the embodiment of the present application may be understood as a dataset with a face identity, and the evaluation set of the unlabeled dataset may be understood as an intersection set without the same face identity as the labeled dataset, where the part of the unlabeled dataset may include, but is not limited to, images of various scenarios such as a normal scenario, a very blurred scenario, a complex illumination scenario, a large angle scenario, and a non-face scenario.
Step 202, comparing the image features with an image base, and determining the first m images with the highest first similarity value, wherein m is a positive integer.
And 203, determining second similarity values corresponding to the m images respectively based on the image features and the unlabeled dataset for comparison.
As an example, the embodiment of the present application may determine the second similarity values corresponding to the m images respectively by:
firstly, the embodiment of the application can firstly select k images with highest similarity values with the image characteristics from the unlabeled data set. Then, determining each image in the m images based on k similarity values respectively corresponding to the k images. And finally, determining the similarity value obtained by normalizing the k similarity values corresponding to each image as a second similarity value corresponding to the image.
And 204, determining a target image corresponding to the image to be identified from the m images according to the first similarity value and the second similarity value.
As an example, the embodiment of the present application may determine the target image corresponding to the image to be identified from the m images by:
By way of example, the embodiment of the present application may determine a third similarity value corresponding to each of the m images according to the first similarity value and the second similarity value corresponding to each of the m images, and then determine an image with the highest third similarity value of the m images as the target image corresponding to the image to be identified.
Further, when determining the third similarity value corresponding to each image of the m images according to the first similarity value and the second similarity value corresponding to each image of the m images, the third similarity value corresponding to each image may be calculated according to the weight corresponding to the first similarity value and the weight corresponding to the second similarity value.
Further, in order to better understand the content of the embodiments of the present application, the following embodiments are adopted to describe the image recognition method provided by the present application in detail according to different scenes:
Scene one: and (5) training a feature extraction network.
In the first scene feature extraction network training process according to the embodiment of the present application, as shown in fig. 3, feature extraction network training may be divided into two stages.
And step one, performing feature extraction network training through the labeled data set.
By classifying and training the existing N-type training samples, namely the labeled data sets, the margin is added in the loss function, so that the characteristic distribution in the class is tighter, and meanwhile, the distance between the classes is pulled, so that the model obtained by training is more differentiated, and the characteristic extraction network A is obtained.
And step two, combining the unlabeled data set to assist in performing feature extraction network training.
Specifically, the embodiment of the application can respectively add the unlabeled data sets into a training set according to the target proportion, and train the feature extraction network A again together with the labeled data sets as the training set.
As an alternative, in the training process of the second stage, the embodiment of the present application may calculate the correction value and the final classification probability of each sample on the corresponding class; then, selecting k categories with highest classification probability from the N categories of labels to calculate no label loss; and finally, adding different weights to the loss and the label-free data loss generated when the training samples corresponding to the N types of labels are subjected to classified training, and optimizing the feature extraction network A based on the weights to obtain a feature extraction network B.
In an exemplary embodiment of the present application, after the first-stage feature extraction network training is completed, the feature extraction network a is obtained, and the unlabeled dataset is added to the training set in a certain proportion to perform the second-stage training.
Further, embodiments of the present application calculate a correction value (logits) for each sample on the corresponding class and a final classification probability.
According to practical calculation analysis, when the number of categories with higher discovery probability is very small, that is, the similarity between the few categories and the sample is high, from the calculation efficiency point of view, k categories with highest classification probability can be selected from N categories to calculate no label loss, and the probability of k categories can be expressed as:
Further, by continuously lowering And (3) the target optimization is realized.
Wherein the probabilityThe following equation 1 is satisfied:
Equation 1
Wherein n in the formula 1 is the total labeled category number.
In the optimization process, a preferred target optimization result is to optimize p1 … pk to 1/n as all as possible, which ideal case indicates that the similarity of the nearest k categories in the model unlabeled dataset and the training set is minimized, at which time the feature distribution of the unlabeled dataset is balanced to each data classification in the labeled dataset.
Further, the loss of the unlabeled dataset in the embodiment of the present application can be calculated by the following equation 2:
Equation 2
Wherein, Indicating a loss of unlabeled data.
Finally, different weights are added to the loss of the labeled data set and the loss of the unlabeled data set, and comprehensive optimization can be realized, specifically, the following formula 3 can be seen:
Equation 3
Wherein, For the classification loss of a labeled dataset, u is the adjustmentScaling factor of loss value.
The feature extraction network optimization method provided by the application is easy to realize, wherein the label-free data set in the embodiment of the application can be flexibly added, the specific scene and the number can be freely selected according to factors such as identification scenes and machine calculation force, and experiments show that a small amount of label-free data can obtain better effect improvement, and the generalization capability of a model is improved.
Scene two, image recognition.
The second recognition stage of the embodiment of the present application may be further refined into the following stages.
And firstly, extracting image features of the image to be identified through the feature extraction network B, and comparing the image features with an image base to determine the first m images with the highest first similarity value, wherein m is a positive integer.
Illustratively, assume that the current search scenario is 1: n search scenes, namely capturing images to an image base, can search in a massive image base by using one capturing image based on the scenes, take the first m images with highest similarity sequence as alternative items, and then calculate the features of the capturing images and the first m images to obtain the similarity
And step two, introducing the characteristic of the unlabeled dataset to assist in identification and calibration.
By way of example, the calibration of the similarity values for the m images provided by the embodiment of the present application refers to that in the evaluation process, the obtained feature of the label-free dataset is used as a reference set for our evaluation, and whether the two faces are identical is determined by simply not calculating and comparing the cosine similarity of the two faces, but comprehensively considering the cosine similarity of the two faces and the respective neighbor relation of the two faces to make a final determination.
Exemplary, as shown in a of fig. 4, the cosine similarity of two features is calculated as the score of the final evaluation in the evaluation, and b of fig. 4 shows that the image recognition method provided by the application is adopted to comprehensively evaluate the samples in consideration of the influence of the neighboring samples while calculating the feature similarity.
Based on the two evaluation modes, it can be seen that the influence of the neighboring samples is considered to comprehensively evaluate the obtained result while calculating the feature similarity, so that the result is more accurate and the reliability is stronger.
The calibration analysis may be performed based on the local similarity of the unlabeled dataset, for example, the analysis may be performed from the expression form of the loss function of ArcFace, and the local relationship between two pictures and their neighbors may be expressed as the following formula 4:
Equation 4
Wherein, Is a coefficient;
can be expressed as the following equation 5:
Equation 5
Can be expressed as the following equation 6:
Equation 6
Wherein, the aboveGives outThe relationship of the k unlabeled exemplars closest to each other.
The corrected similarity relationship between the two picture features is expressed as the following equation 7:
Equation 7
Where u is a coefficient of the coefficient,Cosine similarity of features of two pictures.
Further, apply at 1: when N searches top1, T also needs normalization processing.
Specifically, the application is 1: n, when top1 is searched, the k local similarity is ranked, a local similarity maximum value and a local similarity minimum value are determined, and normalized similarity is calculated for each local similarity; and calibrating the m similarity values based on the normalized similarity.
Illustratively, each of the first m bottom library images is paired with the unlabeled dataset, the most similar k pictures corresponding to each of the m bottom library images are searched out in the unlabeled dataset according to the steps described above, and local similarities are calculated, and the first k local similarities are ranked, a maximum value T max and a minimum value T min are recorded, and the normalized similarity is calculated for each local similarity.
The final first m calibrated similarities can be expressed as the following equation 8:
Equation 8
In an actual application scenario, 1: n is searched, the searched bottom library image is not necessarily the same person as the capturing photo, and the bottom library corresponding to the capturing photo is possibly arranged at the second and fifth positions. By introducing the method for calibrating the label-free data, the opposite similarity distribution can be calibrated to a very close level, so that the method for calibrating the label-free data improves 1: the first hit rate of N.
In addition, the face recognition method provided by the embodiment of the application can be applied to special optimization of complex scenes, for example, aiming at scenes which are prone to high-resolution false alarms such as fuzzy, large-angle side faces, shielding and the like in actual application scenes, the false alarm problem can be solved in a targeted manner by constructing a label-free training set. For example, for the problem of large angle, a large number of samples containing large angle faces can be added in the unlabeled data set, and in the training stage, the distance between the normal samples and the large angle face samples is pulled away, so that the model can well process the problem, and in the 1:N testing stage, false alarm can be reduced to a certain extent by a method of recalibrating the similarity.
In addition, it should be understood that in the present application, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone. In the text description of the present application, the character "/", generally indicates that the front-rear associated object is an or relationship.
In addition, unless specified to the contrary, the embodiments of the present application may refer to the ordinal terms "first," "second," etc., for distinguishing between a plurality of objects, and are not intended to limit the order, timing, priority, or importance of the plurality of objects, nor are the descriptions of the "first," "second," etc., to limit the objects to be different.
The various numbers referred to in this disclosure are merely descriptive convenience and are not intended to limit the scope of embodiments of the present disclosure. The sequence number of each process does not mean the sequence of the execution sequence, and the execution sequence of each process should be determined according to the function and the internal logic. In the present disclosure, the word "exemplary" or "such as" is used to mean an example, instance, or illustration, and any embodiment or design described as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. The use of the word "exemplary" or "such as" is intended to present the relevant concepts in a concrete fashion to facilitate understanding.
Fig. 5 is a schematic diagram of an internal module of an image recognition device 500 according to an embodiment of the invention. As shown in fig. 5, the apparatus may include: the acquisition module 501 and the processing module 502, optionally, further include a storage module, where the storage module is used to store computer instructions or programs, and the processing module 502 may call the computer instructions or programs in the storage module.
An obtaining module 501, configured to extract image features of an image to be identified through a feature extraction network, where the feature extraction network is obtained through training of an unlabeled dataset and a labeled dataset, and the unlabeled dataset corresponds to different data categories with the labeled dataset;
the processing module 502 is configured to determine, based on the image features and the image base, first m images with the highest first similarity value, where m is a positive integer; comparing the image features with the unlabeled data set to determine second similarity values corresponding to the m images respectively; and determining a target image corresponding to the image to be identified from the m images according to the first similarity value and the second similarity value.
In one possible implementation, the processing module 502 is specifically configured to:
Selecting k images with highest similarity values with the image characteristics from the unlabeled data set; determining k similarity values corresponding to each image in the m images based on the k images respectively; and determining the similarity value obtained by carrying out normalization processing on k similarity values corresponding to each image as a second similarity value corresponding to the image.
In one possible implementation, the processing module 502 is specifically configured to:
Determining a third similarity value corresponding to each image in the m images according to the first similarity value and the second similarity value corresponding to each image in the m images; and determining an image with the highest third similarity value in the m images as a target image corresponding to the image to be identified.
In one possible implementation, the processing module 502 is specifically configured to:
And calculating to obtain a third similarity value corresponding to each image according to the weight corresponding to the first similarity value and the weight corresponding to the second similarity value.
In one possible implementation, the feature extraction network is obtained by performing a first stage training through the labeled dataset, and performing a second stage training again through the unlabeled dataset and the labeled dataset after the first stage training is completed.
In one possible implementation, the similarity of the unlabeled dataset with respect to each data category in the labeled dataset in the feature extraction network resulting from completion of the second stage training is below a threshold similarity.
In one possible implementation, the unlabeled dataset is derived from the labeled training set for training the feature extraction network.
Fig. 6 is a schematic structural diagram of an image recognition device 600 according to an embodiment of the present application. As shown in fig. 6, the device includes at least one processor 601 and a memory 602 connected to the at least one processor 601, and the specific connection medium between the processor 601 and the memory 602 is not limited in the embodiment of the present application, and in fig. 6, the processor 601 and the memory 602 are connected by a bus as an example. The buses may be divided into address buses, data buses, control buses, etc.
In the embodiment of the present invention, the memory 602 stores instructions executable by the at least one processor 601, and the at least one processor 601 may implement the steps of the face recognition method by executing the instructions stored in the memory 602.
Where the processor 601 is the control center of the computer device, various interfaces and lines may be utilized to connect various portions of the computer device for resource setting by executing or executing instructions stored in the memory 602 and invoking data stored in the memory 602. Alternatively, the processor 601 may include one or more processing units, and the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601. In some embodiments, processor 601 and memory 602 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 601 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or a combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 602 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 602 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
Based on the same technical idea, the embodiments of the present application further provide a computer-readable storage medium, in which computer-readable instructions are stored, which when read and executed by a computer, cause the computer to perform the method in any one of the possible designs of the first aspect.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. An image recognition method, the method comprising:
Extracting image features of an image to be identified through a feature extraction network, wherein the feature extraction network is obtained by performing first-stage training through a labeled data set, performing second-stage training through an unlabeled data set and the labeled data set after the first-stage training is completed, and the unlabeled data set and the labeled data set correspond to different data types;
based on the comparison of the image features and an image base, determining the first m images with the highest first similarity values with the image features from the image base, wherein m is a positive integer;
Based on the m images and the image characteristics, comparing the m images with the unlabeled dataset, and determining a second similarity value corresponding to part or all of the images in the m images relative to the unlabeled dataset;
Determining a third similarity value corresponding to each image in the m images according to the first similarity value and the second similarity value corresponding to each image in the m images;
And determining an image with the highest third similarity value in the m images as a target image corresponding to the image to be identified.
2. The method of claim 1, wherein the determining a second similarity value for each of the m images relative to some or all of the images in the unlabeled dataset based on the m images and the image features comprises:
Based on the image characteristics and the unlabeled data set, selecting k images with highest similarity values with the image characteristics from the unlabeled data set;
Determining k similarity values corresponding to each image in the m images based on the k images respectively;
And determining the similarity value obtained by carrying out normalization processing on k similarity values corresponding to each image as a second similarity value corresponding to the image.
3. The method of claim 1, wherein the determining a third similarity value for each of the m images from the first similarity value and the second similarity value for each of the m images comprises:
And calculating to obtain a third similarity value corresponding to each image according to the weight corresponding to the first similarity value and the weight corresponding to the second similarity value.
4. The method of claim 1, wherein the similarity of the unlabeled dataset to each data category in the labeled dataset in the feature extraction network after completion of the second stage training is below a threshold similarity.
5. An image recognition apparatus, comprising:
The device comprises an acquisition module, a feature extraction module and a data classification module, wherein the acquisition module is used for extracting image features of an image to be identified through a feature extraction network, the feature extraction network performs first-stage training through a labeled data set, and after the first-stage training is completed, the image features are obtained through performing second-stage training through an unlabeled data set and the labeled data set, and the unlabeled data set corresponds to different data types;
The processing module is used for comparing the image features with an image base, and determining the first m images with the highest first similarity value with the image features from the image base, wherein m is a positive integer; based on the m images and the image characteristics, comparing the m images with the unlabeled dataset, and determining a second similarity value corresponding to part or all of the images in the m images relative to the unlabeled dataset; determining a third similarity value corresponding to each image in the m images according to the first similarity value and the second similarity value corresponding to each image in the m images; and determining an image with the highest third similarity value in the m images as a target image corresponding to the image to be identified.
6. An image recognition apparatus, comprising:
a memory for storing a computer program or instructions;
a processor for invoking a computer program or instructions stored in the memory to perform the method of any of claims 1-4.
7. An image recognition system, the system comprising:
A processor and an interface from which the processor invokes and executes a computer program, which when executed by the processor implements the method according to any of claims 1-4.
8. A computer readable storage medium having instructions stored therein which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 4.
CN202411034778.4A 2024-07-30 2024-07-30 Image recognition method, device and system Active CN118570503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411034778.4A CN118570503B (en) 2024-07-30 2024-07-30 Image recognition method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411034778.4A CN118570503B (en) 2024-07-30 2024-07-30 Image recognition method, device and system

Publications (2)

Publication Number Publication Date
CN118570503A CN118570503A (en) 2024-08-30
CN118570503B true CN118570503B (en) 2024-10-25

Family

ID=92473213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411034778.4A Active CN118570503B (en) 2024-07-30 2024-07-30 Image recognition method, device and system

Country Status (1)

Country Link
CN (1) CN118570503B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472533A (en) * 2019-07-31 2019-11-19 北京理工大学 A face recognition method based on semi-supervised training
CN111914908A (en) * 2020-07-14 2020-11-10 浙江大华技术股份有限公司 Image recognition model training method, image recognition method and related equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065409B (en) * 2021-03-09 2025-02-21 北京工业大学 An unsupervised person re-identification method based on camera distribution difference alignment constraint
CN113936312B (en) * 2021-10-12 2024-06-07 南京视察者智能科技有限公司 Face recognition base screening method based on deep learning graph convolution network
CN115019052B (en) * 2022-06-01 2025-07-25 河南讯飞人工智能科技有限公司 Image recognition method, device, electronic equipment and storage medium
CN115170868B (en) * 2022-06-17 2026-02-06 湖南大学 Small sample image classification contrast learning method based on clustering
CN115424053B (en) * 2022-07-25 2023-05-02 北京邮电大学 Small sample image recognition method, device, equipment and storage medium
CN114937179B (en) * 2022-07-27 2022-12-13 深圳市海清数字技术有限公司 Junk image classification method and device, electronic equipment and storage medium
KR20240032283A (en) * 2022-09-02 2024-03-12 삼성전자주식회사 Method of training image representation model and computing apparatus performing the method
CN116310856B (en) * 2023-01-08 2025-10-21 南京理工大学 Remote sensing target detection method and system based on multi-scale cross-instance clustering mutually exclusive information comparison self-supervision
CN117218477A (en) * 2023-05-19 2023-12-12 腾讯科技(深圳)有限公司 Image recognition and model training methods, devices, equipment and storage media
CN116433747B (en) * 2023-06-13 2023-08-18 福建帝视科技集团有限公司 A detection model construction method and detection device for bamboo tube wall thickness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472533A (en) * 2019-07-31 2019-11-19 北京理工大学 A face recognition method based on semi-supervised training
CN111914908A (en) * 2020-07-14 2020-11-10 浙江大华技术股份有限公司 Image recognition model training method, image recognition method and related equipment

Also Published As

Publication number Publication date
CN118570503A (en) 2024-08-30

Similar Documents

Publication Publication Date Title
US8358856B2 (en) Semantic event detection for digital content records
Li et al. Efficient saliency-model-guided visual co-saliency detection
US8213725B2 (en) Semantic event detection using cross-domain knowledge
US11294624B2 (en) System and method for clustering data
EP2701098B1 (en) Region refocusing for data-driven object localization
CN111428733B (en) Zero-shot object detection method and system based on semantic feature space conversion
US20230215125A1 (en) Data identification method and apparatus
US20190179850A1 (en) Generating congruous metadata for multimedia
US9177226B2 (en) Object detection in images based on affinity determinations
US10007678B2 (en) Image processing apparatus, image processing method, and recording medium
CN114168780B (en) Multimodal data processing methods, electronic devices and storage media
CN112766139A (en) Target identification method and device, storage medium and electronic equipment
Gonzalez-Diaz et al. Neighborhood matching for image retrieval
Karaoglu et al. Detect2rank: Combining object detectors using learning to rank
CN112270204A (en) Target identification method and device, storage medium and electronic equipment
CN107315984B (en) Pedestrian retrieval method and device
Liu et al. A novel SVM network using HOG feature for prohibition traffic sign recognition
He et al. Aggregating local context for accurate scene text detection
CN116342955B (en) A target detection method and system based on an improved feature pyramid network
CN115115825B (en) Method, device, computer equipment and storage medium for detecting object in image
CN117036392A (en) Image detection method and related device
CN118570503B (en) Image recognition method, device and system
CN117746266B (en) Tree crown detection method, device and medium based on semi-supervised interactive learning
CN112861974A (en) Text classification method and device, electronic equipment and storage medium
CN111582107B (en) Training method and recognition method of target re-recognition model, electronic equipment and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant