CN115810207A - Method and apparatus for training a deep learning network for face recognition - Google Patents
Method and apparatus for training a deep learning network for face recognition Download PDFInfo
- Publication number
- CN115810207A CN115810207A CN202111070532.9A CN202111070532A CN115810207A CN 115810207 A CN115810207 A CN 115810207A CN 202111070532 A CN202111070532 A CN 202111070532A CN 115810207 A CN115810207 A CN 115810207A
- Authority
- CN
- China
- Prior art keywords
- output vector
- face
- image
- alignment
- inputting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域technical field
本公开关于机器学习,尤指一种基于知识蒸馏技巧,从而实现具备脸部对齐效果的脸部辨识网络模型的训练方法与相关装置。The present disclosure is about machine learning, especially a training method and a related device for realizing a face recognition network model with face alignment effect based on knowledge distillation techniques.
背景技术Background technique
现今的脸部辨识算法主要针对脸部图像进行身份识别。为了让脸部辨识的深度学习网络,尽量得到相同环境下的脸部图像,一般会在脸部辨识网络模型之前,加上脸部坐标检测器(landmark detector),此检测器可基于脸部重要特征(如:眼耳口鼻等)的坐标,进行脸部对齐(face alignment)处理。如图1所示的架构,来源图像会先经过脸部检测器10,脸部检测器10会从来源图像中找出脸部图形,并将其从来源图像中撷取出来。接着,撷取出的脸部图形会被输入至脸部坐标检测器20,脸部坐标检测器20会对脸部图形进行脸部对齐。其中,脸部坐标检测器20依据脸部重要特征的坐标,对脸部图形进行平移、缩放、或者是二维/三维旋转等几何处理。经过脸部对齐处理后的图像,才会被输入脸部辨识网络模型30,进行脸部辨识。脸部对齐的目的在于避免图形歪斜或比例错误等问题,对脸部辨识网络模型30造成负面影响,进而提升辨识正确率。然而,若要实现脸部坐标检测器20,则需要从系统中配置运算资源来进行:脸部五官坐标的深度学习模型的运算、基于五官坐标计算脸部图形需要进行多少角度的旋转、以及利用计算出的角度对图像进行旋转等操作。对于运算资源相对有限的嵌入式平台来说,额外加上模块来实现脸部对齐会让系统的整体运算效率显著降低。Today's facial recognition algorithms mainly perform identity recognition on facial images. In order to make the deep learning network of face recognition try to obtain facial images in the same environment, a face coordinate detector (landmark detector) is usually added before the face recognition network model. Coordinates of features (eg, eyes, ears, mouth, nose, etc.) are processed for face alignment. In the architecture shown in FIG. 1 , the source image will first pass through the
公开内容public content
有鉴于此,本公开的目的在于提供一种脸部辨识的深度学习网络模型的训练方法。通过本公开的训练方法,可以省略脸部辨识算法中,对于脸部对齐处理的需求。其中,本公开采用知识蒸馏(Knowledge Distillation),利用已经对齐处理后的脸部图像,预先训练一教师模型(teacher model)。接着,再利用已经训练完成的教师模型以及未经对齐处理的脸部图像,训练一学生模型(student model)。由于采用未经脸部对齐处理的脸部图像进行训练,所以提升了学生模型对于角度歪斜或者是比例错误的脸部图像的适应能力。后续在运用学生模型进行脸部辨识时,便可在省略已知架构中的脸部坐标检测器(landmarkdetector)前提下,实现同等良好的识别能力。In view of this, the purpose of the present disclosure is to provide a training method of a deep learning network model for face recognition. Through the training method of the present disclosure, the need for face alignment processing in the face recognition algorithm can be omitted. Among them, the present disclosure adopts Knowledge Distillation to pre-train a teacher model by using aligned face images. Next, train a student model by using the trained teacher model and the unaligned face images. Since the face images without face alignment processing are used for training, the adaptability of the student model to face images with skewed angles or wrong proportions is improved. Later, when using the student model for face recognition, the same good recognition ability can be achieved under the premise of omitting the face coordinate detector (landmark detector) in the known architecture.
本公开的一实施例提供一种用于训练一脸部辨识的深度学习网络的方法,该方法包括:使用一脸部坐标检测器(landmark detector)对至少一个撷取图像进行脸部对齐处理,从而输出至少一个对齐图像;将该至少一个对齐图像输入一教师模型,以获得一第一输出向量;将该至少一个撷取图像输入对应于该教师模型的一学生模型,以获得一第二输出向量;以及依据该第一输出向量与该第二输出向量,调整该学生模型的参数设定。An embodiment of the present disclosure provides a method for training a deep learning network for face recognition, the method comprising: using a face coordinate detector (landmark detector) to perform face alignment processing on at least one captured image, thereby outputting at least one aligned image; inputting the at least one aligned image into a teacher model to obtain a first output vector; inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and adjusting parameter settings of the student model according to the first output vector and the second output vector.
本公开的一实施例提供一种用于训练一脸部辨识的深度学习网络的装置,该装置包括:一存储单元以及一处理单元。该存储单元用以存储一程序代码。该处理单元用以执行该程序代码,以至于该处理单元得以执行以下操作:对至少一个撷取图像进行脸部对齐处理,从而输出至少一个对齐图像;将该至少一个对齐图像输入一教师模型,以获得一第一输出向量;将该至少一个撷取图像输入对应于该教师模型的一学生模型,以获得一第二输出向量;以及依据该第一输出向量与该第二输出向量,调整该学生模型的参数设定。An embodiment of the present disclosure provides a device for training a deep learning network for face recognition, the device includes: a storage unit and a processing unit. The storage unit is used for storing a program code. The processing unit is used to execute the program code, so that the processing unit can perform the following operations: perform face alignment processing on at least one captured image, thereby outputting at least one aligned image; input the at least one aligned image into a teacher model, obtaining a first output vector; inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and adjusting the output vector according to the first output vector and the second output vector Parameter settings for the student model.
附图说明Description of drawings
图1示出已知脸部辨识的深度学习网络的简略架构。Figure 1 shows a simplified architecture of a known deep learning network for face recognition.
图2示出在本公开实施例中如何运用经过脸部对齐处理后的图像训练教师模型。FIG. 2 shows how to train a teacher model using images processed by face alignment in an embodiment of the present disclosure.
图3示出在本公开实施例中如何运用训练完成的教师模型以及未经脸部对齐处理的图像训练学生模型。FIG. 3 shows how to use the trained teacher model and the unaligned images to train the student model in an embodiment of the present disclosure.
图4示出本公开实施例的训练脸部辨识的深度学习网络的方法。FIG. 4 illustrates a method for training a deep learning network for face recognition according to an embodiment of the present disclosure.
图5示出本公开实施例的训练脸部辨识的深度学习网络的装置。FIG. 5 shows an apparatus for training a deep learning network for face recognition according to an embodiment of the present disclosure.
具体实施方式Detailed ways
在以下描述中,描述了许多具体细节以提供阅读者对本公开实施例的透彻理解。然而,本领域技术人员将能理解,如何在缺少一个或多个具体细节的情况下,或者利用其他方法或组件或材料等来实现本公开。在其他情况下,众所周知的结构、材料或操作不会被示出或详细描述,从而避免模糊本公开的核心概念。In the following description, numerous specific details are described in order to provide the reader with a thorough understanding of the embodiments of the present disclosure. However, one skilled in the art will understand how to practice the present disclosure without one or more of the specific details, or with other methods or components or materials, or the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring core concepts of the disclosure.
说明书中提到的“一实施例”意味着该实施例所描述的特定特征、结构或特性可能被包括在本公开的至少一个实施例中。因此,本说明书中各处出现的“在一实施例中”不一定意味着同一个实施例。此外,前述的特定特征、结构或特性可以以任何合适的形式在一个或多个实施例中结合。Reference in the specification to "an embodiment" means that a specific feature, structure or characteristic described in the embodiment may be included in at least one embodiment of the present disclosure. Therefore, appearances of "in one embodiment" in various places in this specification do not necessarily mean the same embodiment. Furthermore, the particular features, structures, or characteristics described above may be combined in any suitable form in one or more embodiments.
请参考图2与图3,这些图示出本公开实施例如何利用知识蒸馏技巧,训练用于进行脸部辨识的深度学习网络。其中,由本公开训练后的脸部辨识的深度学习网络可以用于进行身份识别,其可根据一输入脸部图像产生一维的输出向量,并且将该输出向量与数据库中所有已注册的向量进行比对。当该输出向量与某个已注册向量之间的L2距离小于预设的临界值时,便可认定该输入脸部图像相符于该已注册向量所关联的身份。Please refer to FIG. 2 and FIG. 3 , which illustrate how to use the knowledge distillation technique to train a deep learning network for face recognition according to an embodiment of the present disclosure. Among them, the deep learning network for face recognition trained by the present disclosure can be used for identity recognition, which can generate a one-dimensional output vector according to an input face image, and compare the output vector with all registered vectors in the database Comparison. When the L2 distance between the output vector and a registered vector is smaller than a preset threshold, it can be determined that the input facial image matches the identity associated with the registered vector.
如图2所示,本公开实施例会先对教师模型(teacher model)110进行训练。在训练的过程中,一个或多个来源图像IMG_S会被输入至脸部检测器120,脸部检测器120会从来源图像IMG_S中找到包括有人脸特征的部分,将其撷取后,输出撷取图像IMG_C至脸部坐标检测器130。脸部坐标检测器130会识别撷取图像IMG_C中,关于脸部的重要特征(如:眼耳口鼻等)的坐标,并且视需求进行脸部对齐。例如,当撷取图像IMG_C中的脸部图形存在角度歪斜或者比例不正确等问题时,脸部坐标检测器130会对撷取图像IMG_C进行平移、缩放、或者是二维/三维旋转等几何处理。据此,脸部坐标检测器130将经过脸部对齐处理后的对齐图像IMG_A输入至教师模型110。当对齐图像IMG_A输入至教师模型110之后,教师模型110会产生一输出向量140。输出向量140会与相对应于来源图像IMG_S的标签(label)信息(即,来源图像IMG_S实质上所对应的身份类别)进行比较,从而产生一损失函数150(即,识别损失(identification loss))。而教师模型110的参数设定会根据当前的损失函数150而被调整,从而实现对教师模型110的训练。在使用大量不同的来源图像IMG_S训练教师模型110,使得损失函数150低于一预定值后,便可完成教师模型110的训练。接着,基于知识蒸馏(Knowledge Distillation)的技巧,从训练完成的教师模型110提取出一个简化的学生模型(student model)210。相较于教师模型110,学生模型210的结构较为精简且运算复杂度低,对于系统整体运算资源的占用比例也低。由于学生模型210是从教师模型110所蒸馏而出,其具有实质上近似教师模型110的识别能力。As shown in FIG. 2 , the embodiment of the present disclosure first trains a
请参考图3,该图示出本公开如何训练学生模型。其中,脸部检测器120撷取从一个或多个来源图像IMG_S提取脸部图形,从而产生撷取图像IMG_C。撷取图像IMG_C会在未经脸部对齐处理的情况下,直接被输入至学生模型210。学生模型210会根据撷取图像IMG_C产生一输出向量240。与此同时,撷取图像IMG_C也会经过脸部坐标检测器130的对齐处理后,产生对齐图像IMG_A。而对齐图像IMG_A则被输入至教师模型110,从而产生相应的输出向量145。基于输出向量145与输出向量240的差异(如:L2距离),可获得相应的损失函数250。根据损失函数250,可以调整学生模型210的参数设定。另一方面,输出向量240还会与相关联于来源图像IMG_S的标签信息进行比较。基于两者的差异(如:识别损失),将产生另一个损失函数260。根据损失函数260,也可以调整学生模型210的参数。通过损失函数250与260,可以实现对学生模型210的训练。当使用大量不同的来源图像IMG_S训练学生模型210,使得损失函数250与260低于个别的预定值后,便可完成学生模型210的训练。请注意,在训练学生模型210的过程中,教师模型110仅作为推论(Inference only),其参数设定在此期间不会被调整。Please refer to FIG. 3, which illustrates how the present disclosure trains a student model. Wherein, the
图4示出了本公开实施例的训练脸部辨识的深度学习网络的方法。如图所示,本公开的训练方法包括以下的简化流程:FIG. 4 shows a method for training a deep learning network for face recognition according to an embodiment of the present disclosure. As shown in the figure, the disclosed training method includes the following simplified process:
S310:使用一脸部坐标检测器对至少一个撷取图像进行脸部对齐处理,从而输出至少一个对齐图像;S310: Use a face coordinate detector to perform face alignment processing on at least one captured image, so as to output at least one aligned image;
S320:将该至少一个对齐图像输入一教师模型,以获得一第一输出向量;将该至少一个撷取图像输入对应于该教师模型的一学生模型,以获得一第二输出向量;以及S320: Input the at least one aligned image into a teacher model to obtain a first output vector; input the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and
S330:依据该第一输出向量与该第二输出向量,调整该学生模型的参数设定。S330: Adjust parameter settings of the student model according to the first output vector and the second output vector.
由于上述步骤的原理以及具体细节已于先前实施例中详细说明,故在此不进行重复描述。应当注意的是,上述的流程可能可以通过添加其他额外步骤或者是进行适当的变化与调整,更好地实现对脸部辨识网络模型的训练,更进一步提升其识别能力。再者,前述本公开实施例中所有的操作,都可以通过图5所示的装置400来实现。其中,装置400中的存储单元410可用于存储程序代码、指令、变量或数据。而装置400中的硬件处理单元420则可执行存储单元410所存储的程序代码与指令,并参考其中的变量或数据来执行前述实施例中所有的操作。Since the principles and specific details of the above steps have been described in detail in the previous embodiments, they will not be repeated here. It should be noted that the above process may be able to better realize the training of the face recognition network model and further improve its recognition ability by adding other additional steps or making appropriate changes and adjustments. Furthermore, all the operations in the aforementioned embodiments of the present disclosure can be implemented by the
总结来说,本公开提供了一种脸部辨识的深度学习网络模型的训练方法。通过本公开的训练方法,可以省略脸部辨识算法中,对于脸部对齐处理的需求。其中,本公开利用已经过脸部对齐处理的脸部图像,预先训练教师模型。接着,再利用已经训练完成的教师模型以及未经脸部对齐处理的脸部图像,训练一学生模型。由于采用未经脸部对齐处理的脸部图像进行训练,因此提升了学生模型对于角度歪斜或者是比例错误的脸部图像的适应能力。后续在学生模型进行脸部辨识时,便可在省略已知架构中的脸部坐标检测器的前提下,达成同样良好的识别能力。如此一来,本公开有效地降低了脸部辨识网络模型对于系统运算资源的占用比例。In summary, the present disclosure provides a method for training a deep learning network model for face recognition. Through the training method of the present disclosure, the need for face alignment processing in the face recognition algorithm can be omitted. Among them, the present disclosure uses face images that have been processed for face alignment to pre-train the teacher model. Then, a student model is trained by using the trained teacher model and the face images that have not been processed for face alignment. Since the face images without face alignment processing are used for training, the adaptability of the student model to face images with skewed angles or wrong proportions is improved. Later, when the student model performs face recognition, the same good recognition ability can be achieved under the premise of omitting the face coordinate detector in the known architecture. In this way, the present disclosure effectively reduces the occupancy ratio of the facial recognition network model to system computing resources.
本公开的实施例可使用硬件、软件、固件以及其相关结合来完成。藉由适当的一指令执行系统,可使用存储于一内存中的软件或固件来实现本公开的实施例。就硬件而言,则是可应用下列任一技术或其相关结合来完成:具有可根据数据信号执行逻辑功能的逻辑门的一个别运算逻辑、具有合适的组合逻辑门的一特定应用集成电路(applicationspecific integrated circuit,ASIC)、可程序门阵列(programmable gate array,PGA)或一现场可程序门阵列(field programmable gate array,FPGA)等。Embodiments of the present disclosure can be implemented using hardware, software, firmware, and related combinations thereof. With an appropriate instruction execution system, the embodiments of the present disclosure may be implemented using software or firmware stored in a memory. As far as hardware is concerned, it can be implemented by applying any of the following technologies or related combinations: an individual arithmetic logic with logic gates that can perform logic functions according to data signals, an application-specific integrated circuit with suitable combinational logic gates ( application specific integrated circuit (ASIC), programmable gate array (programmable gate array, PGA) or a field programmable gate array (field programmable gate array, FPGA), etc.
说明书内的流程图中的流程和方块示出了基于本公开的各种实施例的系统、方法和计算机软件产品所能实现的架构,功能和操作。在这方面,流程图或功能方块图中的每个方块可以代表程序代码的模块,区段或者是部分,其包括用于实现指定的逻辑功能的一个或多个可执行指令。另外,功能方块图以及/或流程图中的每个方块,以及方块的组合,基本上可以由执行指定功能或动作的专用硬件系统来实现,或专用硬件和计算机程序指令的组合来实现。这些计算机程序指令还可以存储在计算机可读媒体中,该媒体可以使计算机或其他可编程数据处理装置以特定方式工作,使得存储在计算机可读媒体中的指令,实现流程图以及/或功能方块图中的方块所指定的功能/动作。The procedures and blocks in the flowcharts in the specification show the architecture, functions and operations that can be realized by the systems, methods and computer software products based on various embodiments of the present disclosure. In this regard, each block in the flowchart or functional block diagram may represent a module, section, or portion of program code, which includes one or more executable instructions for implementing the specified logical function. In addition, each block in the functional block diagrams and/or flowcharts, as well as combinations of blocks, can basically be realized by a dedicated hardware system for performing specified functions or actions, or a combination of dedicated hardware and computer program instructions. These computer program instructions can also be stored in a computer-readable medium, and the medium can make a computer or other programmable data processing device work in a specific manner, so that the instructions stored in the computer-readable medium can realize the flowchart and/or the functional block The function/action specified by the block in the figure.
以上所述仅为本公开的较佳实施例,凡依本公开权利要求所做的均等变化与修饰,皆应属本公开的涵盖范围。The above descriptions are only preferred embodiments of the present disclosure, and all equivalent changes and modifications made according to the claims of the present disclosure shall fall within the scope of the present disclosure.
【符号说明】【Symbol Description】
10、120:脸部检测器10, 120: face detector
20、130:脸部坐标检测器20, 130: Face coordinate detector
30:脸部辨识网络模型30: Facial Recognition Network Model
110:教师模型110: Teacher Model
210:学生模型210: Student Model
140、145、240:输出向量140, 145, 240: output vector
150、250、260:损失函数150, 250, 260: loss function
IMG_S:来源图像IMG_S: Source image
IMG_C:撷取图像IMG_C: capture image
IMG_A:对齐图像IMG_A: Align the image
S310~S330:步骤S310~S330: steps
400:装置400: device
410:存储单元410: storage unit
420:硬件处理单元。420: a hardware processing unit.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111070532.9A CN115810207A (en) | 2021-09-13 | 2021-09-13 | Method and apparatus for training a deep learning network for face recognition |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111070532.9A CN115810207A (en) | 2021-09-13 | 2021-09-13 | Method and apparatus for training a deep learning network for face recognition |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115810207A true CN115810207A (en) | 2023-03-17 |
Family
ID=85481216
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111070532.9A Pending CN115810207A (en) | 2021-09-13 | 2021-09-13 | Method and apparatus for training a deep learning network for face recognition |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115810207A (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180336465A1 (en) * | 2017-05-18 | 2018-11-22 | Samsung Electronics Co., Ltd. | Apparatus and method for student-teacher transfer learning network using knowledge bridge |
| CN111242297A (en) * | 2019-12-19 | 2020-06-05 | 北京迈格威科技有限公司 | Knowledge distillation-based model training method, image processing method and device |
| CN112115783A (en) * | 2020-08-12 | 2020-12-22 | 中国科学院大学 | Human face characteristic point detection method, device and equipment based on deep knowledge migration |
-
2021
- 2021-09-13 CN CN202111070532.9A patent/CN115810207A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180336465A1 (en) * | 2017-05-18 | 2018-11-22 | Samsung Electronics Co., Ltd. | Apparatus and method for student-teacher transfer learning network using knowledge bridge |
| CN111242297A (en) * | 2019-12-19 | 2020-06-05 | 北京迈格威科技有限公司 | Knowledge distillation-based model training method, image processing method and device |
| CN112115783A (en) * | 2020-08-12 | 2020-12-22 | 中国科学院大学 | Human face characteristic point detection method, device and equipment based on deep knowledge migration |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI779815B (en) | Face recognition network model with face alignment based on knowledge distillation | |
| CN112818862B (en) | Face tampering detection method and system based on multi-source clues and mixed attention | |
| CN110675487B (en) | Three-dimensional face modeling and recognition method and device based on multi-angle two-dimensional face | |
| CN109886881B (en) | face makeup removal method | |
| WO2020103700A1 (en) | Image recognition method based on micro facial expressions, apparatus and related device | |
| CN111652827A (en) | A method and system for frontal face synthesis based on generative adversarial network | |
| CN113343826A (en) | Training method of human face living body detection model, human face living body detection method and device | |
| CN112541422A (en) | Expression recognition method and device with robust illumination and head posture and storage medium | |
| CN115147891A (en) | System, method, and storage medium for generating synthesized depth data | |
| CN110852310A (en) | Three-dimensional face recognition method and device, terminal equipment and computer readable medium | |
| WO2023050650A1 (en) | Animation video generation method and apparatus, and device and storage medium | |
| CN111598051B (en) | Face verification method, device, equipment and readable storage medium | |
| CN113034355B (en) | Portrait image double-chin removing method based on deep learning | |
| CN110991258B (en) | A face fusion feature extraction method and system | |
| CN114038045B (en) | Cross-modal face recognition model construction method and device and electronic equipment | |
| JP2019117577A (en) | Program, learning processing method, learning model, data structure, learning device and object recognition device | |
| Ashwin et al. | An e-learning system with multifacial emotion recognition using supervised machine learning | |
| CN113902956B (en) | Training method of fusion model, image fusion method, device, equipment and medium | |
| CN114862716B (en) | Image enhancement method, device, equipment and storage medium for face image | |
| CN104067295B (en) | Gesture recognition method, device and computer program for the method | |
| CN110321908A (en) | Image-recognizing method, terminal device and computer readable storage medium | |
| CN112990123A (en) | Image processing method, apparatus, computer device and medium | |
| US20250029425A1 (en) | Live human face detection method and apparatus, computer device, and storage medium | |
| CN115810207A (en) | Method and apparatus for training a deep learning network for face recognition | |
| Celik et al. | Enhancing face pose normalization with deep learning |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |