Detailed Description
The embodiment of the application provides an image recognition method and a related device, which can improve the recognition accuracy of a face image.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The image recognition method provided by the embodiment of the application is realized based on artificial intelligence (artificial intelligence, AI). Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
In the embodiments of the present application, the artificial intelligence techniques mainly include the above-mentioned directions of Computer Vision (CV), machine Learning (ML), and the like. For example, it may relate to image recognition in computer vision technology, etc.; deep learning (deep learning) in machine learning may also be involved, including convolutional neural networks (convolutional neural network, CNN) and the like.
The image recognition method provided by the embodiment of the application can be applied to a recognition device with data processing capability, for example: terminal equipment or a server, etc., and the embodiment of the present application is not specifically limited. The terminal device may include, but is not limited to, a smart phone, a desktop computer, a notebook computer, a tablet computer, a smart speaker, a vehicle-mounted device, a smart watch, and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms, and the like. In addition, the terminal device and the server may be directly connected or indirectly connected by wired communication or wireless communication, and the present application is not particularly limited.
The identification means mentioned above may be provided with processing capabilities for implementing computer vision techniques. The mentioned computer vision technology is a science for researching how to make the machine "look at", and further means that the camera and computer are used to replace human eyes to perform machine vision such as object recognition, track tracing and measurement, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to the instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
In addition, the recognition device may also have machine learning capabilities. Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
The artificial intelligent model is adopted in the image recognition method provided by the embodiment of the application, mainly relates to application to a neural network, and the biological image of the target object is recognized through the neural network, so that the recognition result indicates the face recognition condition of the target object.
In recent years, development of machine learning has presented three major trends: the model structure is more and more complex, the model level is deepened continuously, and a mass data set is developed continuously. However, as the demands of the mobile terminal and the embedded platform for performing edge computation by using the neural network model are continuously increasing, the neural network model is required to be as small as possible and the computing efficiency is high due to the limited resources of the edge computing platform. For this reason, various types of model compression methods such as model pruning, low-precision quantization of model parameters, and the like have been proposed in the academia and industry in recent years. Schematically, a knowledge distillation (knowledge distillation) method is provided, which can use a large neural network model obtained by training on a large training data set as a teacher model (reach network), and a small neural network model as a student model (student network), and train the student model through the probability distribution vector output by the teacher model and the artificial annotation of the training set, so as to overcome the difficulty of training the small neural network model on the large data set, and can obtain test results approaching or exceeding the teacher model on classification tasks after training is completed. The method may be regarded as a means of knowledge migration (knowledge transfer) of knowledge from a teacher model to a student model through training. After migration is completed, a large and heavy teacher model is replaced by a student model which is designed to be fast and flexible, so that tasks are applied, and the neural network model is greatly facilitated to be deployed on an edge side platform.
In the related scheme, however, the characteristics of the student model are constrained mainly by adopting a single teacher model, so that the recognition of the face image is realized through the student model. However, since a single teacher model can only extract a biological feature from one sampling point in a face image, the biological feature extracted from one sampling point cannot accurately represent the real feature condition of the face image, so that the recognition result of the face image recognized by the constrained student model is different from the real face image condition, and the recognition accuracy is poor. Moreover, training a student model directly with a large amount of data often does not yield a model that meets accuracy requirements. This is mainly due to the small fitting capacity of the student model as a small neural network model, which is trapped in the local minima of the constraint function during training. Referring to fig. 1, a schematic diagram of a loss value change during network optimization in a prior art scheme is shown. As shown in fig. 1, in training a student model as a small neural network model, a loss function thereof falls into a local minimum P2 of a constraint function, and the loss function in the face recognition process cannot be optimized to a global minimum P1 in training.
Based on this, in order to solve the above-mentioned problems, an improvement in recognition accuracy can be achieved. The embodiment of the application provides an image recognition method and a related device, which are characterized in that a plurality of target teacher models with the same characteristic distribution are used for carrying out knowledge distillation training on a biological image sample to obtain a target student model, further the target student model is used for carrying out characteristic extraction on a biological image to obtain the target biological characteristic of a target object, so that a recognition result is determined, and the recognition result can be used for indicating the face recognition condition of the target object. By the method, the student model is trained through the teacher model with the same characteristic distribution instead of being trained by a single teacher model, so that the target student model obtained through training in the embodiment of the application can be used for extracting biological characteristics from a plurality of sampling points in a biological image of a target object, and the accuracy of face recognition can be improved.
Fig. 2 shows an application scenario schematic diagram provided by the embodiment of the application.
As shown in fig. 2, the application scenario schematic diagram includes a terminal device and a server, where the terminal device and the server may be directly or indirectly connected through a wired or wireless communication manner, and the present application is not limited herein.
In addition, the terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and the like. The terminal device may refer broadly to one of a plurality of terminals, and the present embodiment is illustrated by way of example only with respect to the terminal device. In the embodiment of the present application, the number of terminal devices and the device types are not specifically limited.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. The server is used for providing background service for the application program operated by the terminal equipment.
Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (local area network, LAN), metropolitan area network (metropolitan area network, MAN), wide area network (wide area network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks. In some embodiments, the data exchanged over the network is represented using techniques and/or formats including hypertext markup language (hyper text markup language, HTML), extensible markup language (extensible markup language, XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (secure socket layer, SSL), transport layer security (transport layer security, TLS), virtual private network (virtual private network, VPN), internet protocol security (internet protocol security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The method for identifying the image provided by the embodiment of the application can be schematically completed by a server or a terminal device, and the execution subject of the method for identifying the image is not limited in the embodiment of the application.
In addition, the method provided by the embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and the embodiment of the application is not particularly limited.
The method for identifying the image recognition element provided by the embodiment of the application is described below by taking terminal equipment as a recognition device with reference to the attached drawings.
Fig. 3 shows a flowchart of a method for image recognition according to an embodiment of the present application. As shown in fig. 3, the image recognition method may include the steps of:
301. a biological image of a target object is acquired.
In this example, the terminal device may obtain the biological image of the target object by initiating an image recognition request for the target object, and further according to the image recognition request. For example, the terminal device may capture a target object according to the image recognition request based on a preset image capturing device, so as to obtain a biological image of the target object.
By way of example, the biological image of the target object may include, but is not limited to, a face image of the target object, and the like, and the embodiment of the present application is not specifically limited thereto. Illustratively, taking a face image as an example of a biological image, the face image may include one or more faces of the target object to be identified. The terminal device may identify a face of one or more target objects in the face image based on the face image.
The terminal device is an exemplary client deployed with application software of face recognition function, and deployed with a trained face recognition model, so as to recognize the face on the side of the terminal device. For example, in the field of financial payment, a target object may perform operations requiring authentication such as transferring money, paying money, or modifying account information through a smart phone, and authentication may be implemented by recognizing a face image of the target object. In the process, after the terminal equipment collects the face image to be detected, the terminal equipment adopts the trained face recognition model to recognize the face image so as to obtain a face recognition result. Or the terminal equipment is a client side of application software deployed with a face recognition function, and the server is deployed with a trained face recognition model so as to recognize the face on the side of the terminal equipment. At this time, after the terminal device collects the face image to be detected, the face image is uploaded to the server, or the server directly invokes the face image to be detected in the database, and then the server adopts a trained face recognition model to recognize the face image so as to obtain a face recognition result. The server can feed back the face recognition result to the terminal equipment, and can also store the face recognition result locally for other business application or processing.
For example, the terminal device may collect a face image of a target object of a real scene through a built-in camera. The terminal device can also collect face images of target objects of the real scene through an external camera associated with the terminal. For example, the terminal device may be connected to an image capturing device through a connection line or a network, and the image capturing device captures a face image of a target object in a real scene through a camera, and transmits the captured face image to the terminal device. The camera may be a monocular camera, a binocular camera, a depth camera, a three-dimensional (3 d) camera, etc., and the embodiment of the present application is not limited in particular. The terminal device may collect a face image of a target object in a real scene, or may collect an existing image including a face in a real scene, such as an identity document scanner, etc., which is not specifically limited in the embodiment of the present application.
302. And carrying out knowledge distillation training treatment on the biological image samples based on a plurality of target teacher models to obtain target student models, wherein the characteristic distribution of each target teacher model in the plurality of target teacher models is the same.
In this example, the feature distribution of the plurality of target teacher models is the same, which can be understood as that the feature spaces of the plurality of target teacher models are aligned, and further, knowledge distillation training is performed on the biological image sample through the plurality of target teacher models with the aligned feature spaces, so that the feature distribution of the trained target student model can be constrained to be consistent with the intersection feature between the features of the plurality of target teacher models. Moreover, since the extracted feature of each of the plurality of target teacher models is a sampled feature of the biological image, the common distribution of the plurality of target teacher models is extracted to obtain an intersection between the features of the plurality of target teacher models, thereby obtaining a more accurate feature distribution of the biological image. Therefore, the feature distribution of the target student model obtained through final training is constrained to be consistent with the intersection features among the features of a plurality of target teacher models, so that the target student model can extract more accurate biological features, a foundation is laid for subsequent determination of the recognition result, and the recognition accuracy is improved.
For example, with respect to the above-mentioned target teacher model, the training process thereof can be understood with reference to the following content of the embodiment shown in fig. 4.
The following describes in detail a process of model training for a target teacher model according to an embodiment of the present application with a terminal device as an execution subject. As shown in fig. 4, the model training process of the target teacher model at least includes the following steps:
401. a biological image sample is obtained.
In this example, the terminal device can construct by taking a biological image sample of the target object
The sample set is trained. Thus, in the process of training the target teacher model, the terminal device can acquire the biological image sample of the target object from the training sample set.
402. And extracting the characteristics of the biological image based on a first target model to obtain biological characteristics corresponding to the first target model, wherein the first target model is a machine learning model obtained by taking the biological characteristics of the biological sample image as a training target and taking the biological image sample as training data for iterative training.
In this example, the first object model may be understood as a large neural network model. In addition, the first target model is a machine learning model obtained by taking biological characteristics of a biological sample image as a training target and taking the biological image sample as training data for iterative training.
The training process for this first object model may be implemented, for example, with reference to the following: extracting features of the biological image based on a preset initial model to obtain biological features corresponding to the preset initial model; then, determining the category center characteristics of the preset initial model according to the characteristic centers of the biological characteristics of each category in the biological characteristics corresponding to the preset initial model; determining a first probability value according to the category central characteristics of the preset initial model and the biological characteristics corresponding to the preset initial model, wherein the first probability value can be understood as the probability of the picture category to which the biological image belongs; determining a third loss value based on the first probability value and a first label, the first label being used to indicate a category labeling condition of the biological image; and finally, adjusting model parameters of the preset initial model based on a third loss value to obtain the first target model.
For example, fig. 5 shows a schematic diagram of a training process of the first object model according to an embodiment of the present application. As shown in fig. 5, taking the biological image as an example of a face image, in the process of training the first target model, at least a training data preparation module, a basic recognition network unit module, a category center storage module, a loss function calculation module, an objective function optimization module and the like may be involved. In the training process, the training data accurate module can collect face images, combine the collected face images into a batch and send the batch to the basic recognition network unit module for processing. The basic recognition network unit model can extract the spatial characteristics of the face image, and the biological characteristics of the preset initial model are obtained. The biometric feature retains spatial structure information of the face image. It should be noted that, the basic identification network element module, a common element structure may be a convolutional neural network model, and the convolutional neural network includes operations such as convolutional (convolution) calculation, nonlinear activation function (relu) calculation, pooling (pooling) calculation, and the like.
In addition, the class center storage model may store class centers for each ID in the training data, i.e., the
The method comprises the steps of presetting a class center feature of an initial model, wherein the shape of the class center feature is (d multiplied by m), d is the feature dimension of single training data, and m is the class number of the training data. Each class of training data corresponds to a class center feature that characterizes the classification as an overall feature of the class of training data. Therefore, the category center feature of the preset initial model can be determined according to the feature centers of the biological features of each category in the biological features corresponding to the preset initial model.
Because the output of the deep neural network is expected to be as close as possible to the truly desired value, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the truly desired target value and then based on the difference between the two (of course, there is typically an initialization process prior to the first update, i.e., pre-configuring parameters for each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower and adjusted continuously until the neural network can predict the truly desired target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.
The loss function calculation module can calculate the matrix of the category central feature of the preset initial model and the biological feature corresponding to the preset initial model, so as to calculate a first probability value, and the first probability value can understand the probability value of the picture category to which the human face image belongs. Then, the loss function calculation module takes the first probability value and the first label marked manually as input of the loss function, so as to calculate a third loss value. The first label described can be understood as the category labeling of the manually labeled biometric image. It should be noted that, the loss function herein may include, for example, a class function such as a softmax function, a softmax function of various additive margin types, or may include using an objective function of other types, which is not specifically limited in the embodiment of the present application.
The objective function optimization module can perform training optimization on the whole preset initial model based on the gradient descent algorithm until the training result meets the condition of terminating model training. It should be noted that the gradient descent algorithm may include random gradient descent, random gradient descent with a driving term, adaptive gradient algorithm (ada ptive gradient), adaptive matrix estimation algorithm (adaptive moment estimation), and the like, which are not particularly limited in the embodiment of the present application. In addition, the condition for terminating the model training may include that the training iteration number satisfies a preset value or the loss value is smaller than a preset value, etc., which is not specifically limited in the present application.
403. Calculating first included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the target class, and calculating second included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the first class, wherein the central characteristics of the target class are obtained by the characteristic centers of the biological characteristics of each class in the biological characteristics corresponding to the first target model, and the central characteristics of the first class are different from the central characteristics of the target class.
In this example, the training process of the first target model may be understood with reference to the foregoing description of fig. 5, which is not described herein. In addition, after the first target model is obtained through training, the biological image can be used as input of the first target model, so that the biological characteristics corresponding to the first target model can be extracted. And determining the center features of the target categories according to the feature centers of the biological features of the categories in the biological features corresponding to the first target model, and determining the center features of the first categories respectively corresponding to the feature centers of the biological features of the categories in the biological features corresponding to other models. The biological characteristics of the biological images are gathered around the space of the central characteristics of the target category, and at the moment, the first included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the target category is calculated, and the second included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the first category corresponding to other respective models is calculated.
404. A second loss value is determined based on the first included angle information and the second included angle information.
In this example, after the first angle information and the second angle information are calculated in step 403, a second loss value may be determined according to the first angle information and the second angle information. For example, consider a loss function with a margin constraint as a loss model, such as:
the terminal device may use the first angle information and the second angle information as input of the loss model, so as to calculate the first angle information and the second angle information through the loss model, thereby calculating the second loss value. In the loss model, the parameter N is the number of images, y i The category for the ith biometric image belongs to tag y,for the first included angle information, θ j Is the second included angle information, s, m 1 、m 2 Is an adjustable parameter, L margin-loss Is the second loss value.
In practical application, the loss function may be other types of loss functions, which are not limited in the embodiment of the present application.
405. And adjusting model parameters of the first target model based on the second loss value to obtain a plurality of target teacher models.
In this example, the model parameters of the first target model are adjusted by the calculated second loss value, so that a plurality of target teacher models can be obtained.
For example, fig. 6 shows a schematic diagram of a training flow of the target teacher model according to the embodiment of the present application. As shown in fig. 6, taking the biological image as the face image as an example, in the process of training the target teacher model, at least a training data preparation module, an identification network element module, a category center storage module, a random seed control module, a loss function calculation module, a loss super-parameter control module, an objective function optimization module and the like may be involved. In the training process, the training data accurate module can collect face images, combine the collected face images into a batch and send the batch to the recognition network unit module for processing. The recognition network unit model can extract the spatial characteristics of the face image, and the biological characteristics corresponding to the first target model are obtained. The biometric feature retains spatial structure information of the face image. In addition, the category center storage module may be understood with reference to the content of the category center storage module described in fig. 5, which is not described herein.
In addition, the random seed control module can randomly initialize the training model, the random seed of the initialization of each teacher model is inconsistent, and the diversity of the model can be enriched.
The loss function calculation module can calculate first included angle information between the biological feature corresponding to the first target model and the target class center feature, and calculate second included angle information between the biological feature corresponding to the first target model and the first class center feature. Then, the loss function calculation module calculates a second loss value according to the first included angle information and the second included angle information. In the actual training process, only the parameters of the identified network element module are required to be updated, and each type of central storage element module in the training data only provides gradientsCalculation without participating in the update process of the parameters. And in a loss-over-tolerance control module, the module is capable of controlling a model loss function, typically a loss function with a margin constraint, such as L described above margin-loss Is understood, and is not described in detail herein. In the process of using the loss function, since the class center direction of the training data is fixed, that is, the class center feature of the first target model is taken as a reference, the direction vector of the biological feature of each biological image in the feature space, that is, the distance between the biological feature and the class center feature of the first target model can be determined. Thus, during the training of complementarity, by adjusting the parameters s, m in the loss function 1 、m 2 The diversity of the biological image in the feature space can be ensured, and the first target model can be promoted to learn the complementary knowledge between the first target model and other models. Thus, the parameters s, m are controlled by the loss-over-tolerance control module 1 、m 2 Can learn and guide to train out a plurality of target teacher models with different configurations.
It should be understood that the described target teacher model may also be understood as a large neural network model. For example, the feature distribution of each of the plurality of target teacher models can be mapped into the same feature space, so that the alignment of the feature space is realized, and for the same biological image, the plurality of target teacher models sample the biological features extracted from each of the same biological image as the ideal features of the biological image.
After the training process is performed to obtain a plurality of target teacher models, the plurality of target teacher models can be further used for guiding the training of the target student models. Illustratively, the method of image recognition may further comprise: extracting features of the biological image based on a first teacher model to obtain a first biological feature, wherein the first teacher model is any target teacher model randomly selected from a plurality of target teacher models; extracting features of the biological image based on a preset initial student model to obtain a second biological feature; calculating a difference between the first biometric feature and the second biometric feature to obtain a first loss value; and performing iterative training on a preset initial student model based on the first loss value to obtain a target student model. For example, in calculating the difference between the first biometric feature and the second biometric feature to obtain the first loss value, a feature similarity between the first biometric feature and the second biometric feature may be calculated first, and then the first loss value may be determined based on the feature similarity.
For example, fig. 7 shows a training schematic of a target student model provided in an embodiment of the application. As shown in fig. 7, in the process of training the target student model, at least the training data preparation module, the preset initial learning model, the teacher model sampling control module, the teacher identification network element module, the knowledge distillation loss function calculation module, the knowledge distillation target function optimization module and the like are included. The training data preparation module described herein may be understood with reference to the foregoing description of fig. 6, which is not described herein. In addition, the preset initial learning model can realize feature extraction of the biological image, and second biological features are obtained.
The teacher model sampling control module is a random number generation module, and the generated random number can be used as a model number of a target teacher model in a target teacher model pool, such as a target teacher model 1 to a target teacher model n, wherein n is more than or equal to 2, and n is an integer. Also, the teacher model sampling control module is capable of randomly sampling from a plurality of target teacher models in a target teacher model pool. For example, in the process of iteratively training the target student model, the relation between the iteration number and the sampling frequency of the randomly sampled target teacher model may be set to 3:1, i.e. the target teacher model is iteratively trained 3 times, and the target teacher model is resampled once, so that the model iteration is in a stable state. It should be understood that, in practical application, the relationship between the iteration number and the sampling frequency may be set according to actual needs, which is not specifically limited in the embodiment of the present application.
After a target teacher model is randomly sampled by the teacher model sampling control module, the sampled target teacher model (i.e., the first teacher model) can be used to perform feature extraction on the biological image, so as to obtain the first biological feature. In each iteration training, only one model is sampled for the target teacher model in the teacher model pool, so that the time of forward training of model training can be greatly reduced, and the target teacher model finally used for extracting the characteristics is randomly sampled in the teacher model pool, so that the target student model can be restrained from learning the intersection among the characteristic distribution of each target teacher model.
In the knowledge distillation loss function calculation module, a cosine similarity loss function is generally adopted in the module, and can be used for evaluating the similarity between a first biological feature extracted by a first teacher model and a second biological feature extracted by a preset initial student model. The cosine similarity loss function L f =||F X -F Y || 2 . Wherein F is X As a second biological feature, F Y As a first biological feature, L f Is the feature similarity. In practical application, other loss functions may be used, and the embodiment of the present application is not specifically limited.
Thus, after the feature similarity is obtained, the feature similarity may be determined as the first loss value. At this time, the knowledge distillation objective function optimization module may perform iterative training optimization on the first loss value based on the gradient descent algorithm, until the training result meets the condition of terminating the model training. It should be noted that the gradient descent algorithm may include random gradient descent, random gradient descent with a driving term, adaptive gradient algorithm (ada ptive gradient), adaptive matrix estimation algorithm (adaptive moment estimation), and the like, which are not particularly limited in the embodiment of the present application. In addition, the condition for terminating the model training may include that the number of training iterations satisfies a preset value, or that the first loss value is smaller than a preset value, etc., which is not specifically limited in the present application.
It should be noted that the execution sequence of the step 301 and the step 302 is specifically defined and described in the steps in the embodiment of the present application. In practical application, step 302 may be executed first, and then step 301 may be executed; alternatively, step 301 and step 302 may be performed simultaneously.
303. And extracting the characteristics of the biological image based on the target student model to obtain the target biological characteristics of the target object.
Thus, after training to obtain the target student model, the target student model is used for extracting the characteristics of the biological image, so that the target biological characteristics of the target object can be obtained. The terminal device may perform feature extraction on the biometric image through the target student model to obtain biometric key point information of the target object, such as position information of key points of nose, mouth, eyes, and the like, so as to use the biometric key point information as a target biometric feature of the target object. It should be noted that the model parameters of the described target student model are much smaller than those of the described target teacher model. Therefore, the feature extraction of the biological image is realized through the target student model, and the time in the extraction process can be saved.
304. And determining a recognition result based on the target biological characteristics of the target object, wherein the recognition result is used for indicating the face recognition condition of the target object.
In this example, after the training to obtain the target student model, the biological image may be used as an input of the target student model, so that a corresponding target biological feature is extracted through the target student model, and a corresponding recognition result is determined through the target biological feature. Illustratively, as shown in FIG. 8, the process of obtaining a recognition model includes a model training phase and a model deployment phase. The model training stage comprises the steps of training a preset initial model to obtain a first target model after training, guiding training to obtain a plurality of target teacher models through the first target model after training, and then carrying out knowledge distillation training on the preset initial student model based on the plurality of target teacher models to obtain a target student model after training, namely the recognition model. The model deployment stage is used for carrying out combined deployment on the related modules obtained in the module training stage to obtain a complete face recognition model, for example, as shown in fig. 9, a picture acquisition input module, an image feature extraction module and a feature comparison search module are integrated. The image acquisition input module acquires a biological image of a target object, such as a human face image, and then performs feature extraction on the biological image according to the image feature extraction module to obtain corresponding target biological features, such as human face image features. And then, the characteristic comparison searching module compares the target biological characteristics with preset biological characteristics so as to determine the identification result, thereby completing the identification of the biological image of the target object. For example, the feature comparison search module compares the face image of the extracted target object with a preset face image, so as to perform face recognition. The recognition model is formed by carrying out knowledge migration on a target teacher model and a target student model with stronger expression capability, and the extracted feature distribution of the recognition model is higher in similarity with the feature distribution of the target teacher model and the feature distribution of the target student model, so that the recognition model has higher recognition accuracy.
Fig. 10 is another flow chart of a method for image recognition according to an embodiment of the present application. As shown in fig. 10, the image recognition method may include the steps of: and extracting the characteristics of the biological image based on the preset initial model to obtain biological characteristics corresponding to the preset initial model, and determining the category center characteristics of the preset initial model according to the characteristic centers of the biological characteristics of each category in the biological characteristics corresponding to the preset initial model. And then, determining a first probability value based on the category central feature of the preset initial model and the biological feature corresponding to the preset initial model, wherein the first probability value is the probability of the picture category to which the biological image belongs, and determining a third loss value based on the first probability value and a first label, and the first label is used for indicating the category labeling condition of the biological image. And then, adjusting model parameters of a preset initial model based on the third loss value to obtain a first target model.
Thus, after the first target model is obtained, the biological image is subjected to feature extraction based on the first target model, and the biological feature corresponding to the first target model is obtained. And calculating first included angle information between the biological feature corresponding to the first target model and the central feature of the target class, and calculating second included angle information between the biological feature corresponding to the first target model and the central feature of the first class, wherein the central feature of the target class is obtained from the feature centers of the biological features of all classes in the biological feature corresponding to the first target model, and the central feature of the first class is different from the central feature of the target class. And then, determining a second loss value based on the first included angle information and the second included angle information, and adjusting model parameters of the first target model based on the second loss value to obtain a plurality of target teacher models.
And then, acquiring a biological image sample, and extracting features of the biological image sample based on a first teacher model to obtain a first biological feature, wherein the first teacher model is any target teacher model randomly selected from a plurality of target teacher models. And extracting the characteristics of the biological image sample based on a preset initial student model to obtain a second biological characteristic. And then, calculating the difference between the first biological feature and the second biological feature, acquiring a first loss value, and adjusting model parameters of a preset initial student model based on the first loss value to obtain a target student model.
Thus, after the biological image of the target object is acquired, the characteristic extraction processing is performed on the biological image of the target object based on the target student model, so as to obtain the target biological characteristic of the target object. Then, a recognition result is determined according to the target biological characteristics of the target object, and the recognition result is used for indicating the face recognition condition of the target object.
In the embodiment of the application, knowledge distillation training is performed on a biological image sample through a plurality of target teacher models with the same characteristic distribution to obtain a target student model, and then the target student model is used for extracting the characteristics of the biological image to obtain the target biological characteristics of the target object, so that the identification result is determined, and the identification result can be used for indicating the face recognition condition of the target object. By the method, the student model is trained through the teacher model with the same characteristic distribution instead of being trained by a single teacher model, so that the target student model obtained through training in the embodiment of the application can be used for extracting biological characteristics from a plurality of sampling points in a biological image of a target object, and the accuracy of face recognition can be improved.
The foregoing description of the solution provided by the embodiments of the present application has been mainly presented in terms of a method. It should be understood that, in order to implement the above-described functions, hardware structures and/or software modules corresponding to the respective functions are included. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the functional modules of the device according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
In the following, a detailed description of the identification device according to an embodiment of the present application is provided, and fig. 11 is a schematic diagram of an embodiment of the identification device according to an embodiment of the present application. As shown in fig. 11, the identification device may include an acquisition unit 1101 and a processing unit 1102.
Wherein the acquiring unit 1101 is configured to acquire a biological image of a target object. It is specifically understood that the foregoing description of step 301 in fig. 3 is referred to, and details are not repeated herein.
The processing unit 1102 is configured to perform knowledge distillation training processing on the biological image samples based on a plurality of target teacher models, so as to obtain target student models, where feature distribution of each of the plurality of target teacher models is the same;
the processing unit 1102 is configured to perform feature extraction on the biological image based on a target student model to obtain a target biological feature of the target object, where the target student model is a model obtained by performing knowledge distillation processing on a biological image sample based on a plurality of target teacher models, and feature distributions of the plurality of target teacher models are the same. The processing unit 1102 is configured to determine a recognition result according to the target biological feature of the target object, where the recognition result is used to indicate the face recognition situation of the target object.
In some alternative examples, the acquisition unit 1101 is configured to acquire a biometric image sample. The processing unit 1102 is further configured to: extracting features of a biological image sample based on a first teacher model to obtain a first biological feature, wherein the first teacher model is any target teacher model randomly selected from a plurality of target teacher models; performing feature extraction on the biological image sample based on a preset initial student model to obtain a second biological feature; calculating a difference between the first biometric feature and the second biometric feature to obtain a first loss value; and adjusting model parameters of a preset initial student model based on the first loss value to obtain a target student model.
In other alternative examples, processing unit 1102 is configured to: calculating the feature similarity between the first biological feature and the second biological feature; a first loss value is determined based on the feature similarity.
In other alternative examples, the processing unit 1102 is further configured to: extracting features of the biological image based on a first target model to obtain biological features corresponding to the first target model, wherein the first target model is a machine learning model obtained by taking biological features of a biological image sample as a training target and taking the biological image sample as training data for iterative training; calculating first included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the target class, and calculating second included angle information between the biological characteristics corresponding to the first target model and the central characteristics of the first class, wherein the central characteristics of the target class are obtained by the characteristic centers of the biological characteristics of each class in the biological characteristics corresponding to the first target model, and the central characteristics of the first class are different from the central characteristics of the target class; determining a second loss value based on the first angle information and the second angle information; and adjusting model parameters of the first target model based on the second loss value to obtain a plurality of target teacher models.
In other alternative examples, the processing unit 1102 is further configured to: extracting features of the biological image based on a preset initial model to obtain biological features corresponding to the preset initial model; determining the category center characteristics of the preset initial model according to the characteristic centers of the biological characteristics of each category in the biological characteristics corresponding to the preset initial model; determining a first probability value based on the class center feature of the preset initial model and the biological feature corresponding to the preset initial model, wherein the first probability value is the probability of the picture class to which the biological image belongs; determining a third loss value based on the first probability value and a first label, wherein the first label is used for indicating the category labeling condition of the biological image; and adjusting model parameters of a preset initial model based on the third loss value to obtain a first target model.
In other alternative examples, processing unit 1102 is configured to: and comparing the target biological characteristics of the target object with preset biological characteristics to determine the identification result.
In yet other alternative examples, processing unit 1102 is configured to initiate an image recognition request for a target object; and, the acquisition unit 1101 is configured to acquire a biological image of the target object according to the image recognition request.
In other optional examples, the obtaining unit 1101 is configured to capture, based on a preset image capturing device, the target object according to the image recognition request, so as to obtain a biological image of the target object.
In other alternative examples, the biometric image includes a facial image.
The identification device in the embodiment of the present application is described above from the point of view of the modularized functional entity, and the identification device in the embodiment of the present application is described below from the point of view of hardware processing. Fig. 12 is a schematic structural diagram of an identification device according to an embodiment of the present application. The identification means may vary considerably due to configuration or performance. The identification means may comprise at least one processor 1201, communication lines 1207, a memory 1203 and at least one communication interface 1204.
The processor 1201 may be a general purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (server IC), or one or more integrated circuits for controlling the execution of the program of the present application.
Communication lines 1207 may include a pathway to transfer information between the components.
Communication interface 1204, using any transceiver-like device for communicating with other devices or communication networks, such as ethernet, radio access network (radio access network, RAN), wireless local area network (wireless local area networks, WLAN), etc.
The memory 1203 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that may store information and instructions, and the memory may be stand alone and coupled to the processor via a communication line 1207. The memory may also be integrated with the processor.
The memory 1203 is used for storing computer-executable instructions for executing the present application, and is controlled by the processor 1201. The processor 1201 is configured to execute computer-executable instructions stored in the memory 1203, thereby implementing the method provided by the above-described embodiment of the present application.
Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not particularly limited in the embodiments of the present application.
In a specific implementation, the identification means may comprise a plurality of processors, such as processor 1201 and processor 1202 in fig. 12, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In a specific implementation, as an embodiment, the identification apparatus may further include an output device 1205 and an input device 1206. The output device 1205 is in communication with the processor 1201 and may display information in a variety of ways. The input device 1206 is in communication with the processor 1201 and may receive input of a target object in a variety of ways. For example, the input device 1206 may be a mouse, a touch screen device, a sensing device, or the like.
The identification means may be a general purpose device or a special purpose device. In a specific implementation, the identifying means may be a server, a terminal device, etc. or a device having a similar structure as in fig. 12. The embodiment of the application is not limited to the type of the identification device.
It should be noted that the processor 1201 in fig. 12 may cause the identifying device to execute the method in the method embodiment corresponding to fig. 3 to 10 by calling the computer-executable instructions stored in the memory 1203.
In particular, the functions/implementation of the processing unit 1102 in fig. 11 may be implemented by the processor 1201 in fig. 12 invoking computer executable instructions stored in the memory 1203. The function/implementation procedure of the acquisition unit 1101 in fig. 11 can be implemented by the communication interface 1204 in fig. 12.
The embodiment of the present application also provides a computer storage medium storing a computer program for electronic data exchange, the computer program causing a computer to execute part or all of the steps of any one of the image recognition methods described in the above method embodiments.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of a method of any one of the image recognition methods described in the method embodiments above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof, and when implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When the computer-executable instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be stored by a computer or data storage devices such as servers, data centers, etc. that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., SSD)), or the like.
It will be appreciated that in the specific embodiments of the present application, related data such as user information, personal data of a user, etc. are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data is required to comply with relevant laws and regulations and standards of relevant countries and regions.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.