CN115810207A

CN115810207A - Method and apparatus for training a deep learning network for face recognition

Info

Publication number: CN115810207A
Application number: CN202111070532.9A
Authority: CN
Inventors: 陈建豪; 陈世泽
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2023-03-17

Abstract

Methods and apparatus are provided for training a deep learning network for face recognition. The method comprises the following steps: performing face alignment processing on the at least one captured image using a face coordinate detector, thereby outputting at least one aligned image; inputting the at least one alignment image into a teacher model to obtain a first output vector; inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and adjusting the parameter setting of the student model according to the first output vector and the second output vector.

Description

Method and apparatus for training a deep learning network for face recognition

技术领域technical field

本公开关于机器学习，尤指一种基于知识蒸馏技巧，从而实现具备脸部对齐效果的脸部辨识网络模型的训练方法与相关装置。The present disclosure is about machine learning, especially a training method and a related device for realizing a face recognition network model with face alignment effect based on knowledge distillation techniques.

背景技术Background technique

现今的脸部辨识算法主要针对脸部图像进行身份识别。为了让脸部辨识的深度学习网络，尽量得到相同环境下的脸部图像，一般会在脸部辨识网络模型之前，加上脸部坐标检测器(landmark detector)，此检测器可基于脸部重要特征(如：眼耳口鼻等)的坐标，进行脸部对齐(face alignment)处理。如图1所示的架构，来源图像会先经过脸部检测器10，脸部检测器10会从来源图像中找出脸部图形，并将其从来源图像中撷取出来。接着，撷取出的脸部图形会被输入至脸部坐标检测器20，脸部坐标检测器20会对脸部图形进行脸部对齐。其中，脸部坐标检测器20依据脸部重要特征的坐标，对脸部图形进行平移、缩放、或者是二维/三维旋转等几何处理。经过脸部对齐处理后的图像，才会被输入脸部辨识网络模型30，进行脸部辨识。脸部对齐的目的在于避免图形歪斜或比例错误等问题，对脸部辨识网络模型30造成负面影响，进而提升辨识正确率。然而，若要实现脸部坐标检测器20，则需要从系统中配置运算资源来进行：脸部五官坐标的深度学习模型的运算、基于五官坐标计算脸部图形需要进行多少角度的旋转、以及利用计算出的角度对图像进行旋转等操作。对于运算资源相对有限的嵌入式平台来说，额外加上模块来实现脸部对齐会让系统的整体运算效率显著降低。Today's facial recognition algorithms mainly perform identity recognition on facial images. In order to make the deep learning network of face recognition try to obtain facial images in the same environment, a face coordinate detector (landmark detector) is usually added before the face recognition network model. Coordinates of features (eg, eyes, ears, mouth, nose, etc.) are processed for face alignment. In the architecture shown in FIG. 1 , the source image will first pass through the face detector 10, and the face detector 10 will find out the face pattern from the source image and extract it from the source image. Then, the extracted face pattern is input to the face coordinate detector 20, and the face coordinate detector 20 performs face alignment on the face pattern. Among them, the face coordinate detector 20 performs geometric processing such as translation, scaling, or two-dimensional/three-dimensional rotation on the facial figure according to the coordinates of important facial features. Only the images processed by face alignment will be input into the face recognition network model 30 for face recognition. The purpose of face alignment is to avoid problems such as graphic skew or wrong proportions, which will have a negative impact on the face recognition network model 30, thereby improving the recognition accuracy. However, if the face coordinate detector 20 is to be implemented, it is necessary to configure computing resources from the system to perform: the calculation of the deep learning model of the facial facial features coordinates, the calculation of the angle of rotation of the facial figure based on the facial features coordinates, and the use of The calculated angle performs operations such as rotation on the image. For embedded platforms with relatively limited computing resources, adding additional modules to achieve face alignment will significantly reduce the overall computing efficiency of the system.

公开内容public content

有鉴于此，本公开的目的在于提供一种脸部辨识的深度学习网络模型的训练方法。通过本公开的训练方法，可以省略脸部辨识算法中，对于脸部对齐处理的需求。其中，本公开采用知识蒸馏(Knowledge Distillation)，利用已经对齐处理后的脸部图像，预先训练一教师模型(teacher model)。接着，再利用已经训练完成的教师模型以及未经对齐处理的脸部图像，训练一学生模型(student model)。由于采用未经脸部对齐处理的脸部图像进行训练，所以提升了学生模型对于角度歪斜或者是比例错误的脸部图像的适应能力。后续在运用学生模型进行脸部辨识时，便可在省略已知架构中的脸部坐标检测器(landmarkdetector)前提下，实现同等良好的识别能力。In view of this, the purpose of the present disclosure is to provide a training method of a deep learning network model for face recognition. Through the training method of the present disclosure, the need for face alignment processing in the face recognition algorithm can be omitted. Among them, the present disclosure adopts Knowledge Distillation to pre-train a teacher model by using aligned face images. Next, train a student model by using the trained teacher model and the unaligned face images. Since the face images without face alignment processing are used for training, the adaptability of the student model to face images with skewed angles or wrong proportions is improved. Later, when using the student model for face recognition, the same good recognition ability can be achieved under the premise of omitting the face coordinate detector (landmark detector) in the known architecture.

本公开的一实施例提供一种用于训练一脸部辨识的深度学习网络的方法，该方法包括：使用一脸部坐标检测器(landmark detector)对至少一个撷取图像进行脸部对齐处理，从而输出至少一个对齐图像；将该至少一个对齐图像输入一教师模型，以获得一第一输出向量；将该至少一个撷取图像输入对应于该教师模型的一学生模型，以获得一第二输出向量；以及依据该第一输出向量与该第二输出向量，调整该学生模型的参数设定。An embodiment of the present disclosure provides a method for training a deep learning network for face recognition, the method comprising: using a face coordinate detector (landmark detector) to perform face alignment processing on at least one captured image, thereby outputting at least one aligned image; inputting the at least one aligned image into a teacher model to obtain a first output vector; inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and adjusting parameter settings of the student model according to the first output vector and the second output vector.

本公开的一实施例提供一种用于训练一脸部辨识的深度学习网络的装置，该装置包括：一存储单元以及一处理单元。该存储单元用以存储一程序代码。该处理单元用以执行该程序代码，以至于该处理单元得以执行以下操作：对至少一个撷取图像进行脸部对齐处理，从而输出至少一个对齐图像；将该至少一个对齐图像输入一教师模型，以获得一第一输出向量；将该至少一个撷取图像输入对应于该教师模型的一学生模型，以获得一第二输出向量；以及依据该第一输出向量与该第二输出向量，调整该学生模型的参数设定。An embodiment of the present disclosure provides a device for training a deep learning network for face recognition, the device includes: a storage unit and a processing unit. The storage unit is used for storing a program code. The processing unit is used to execute the program code, so that the processing unit can perform the following operations: perform face alignment processing on at least one captured image, thereby outputting at least one aligned image; input the at least one aligned image into a teacher model, obtaining a first output vector; inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and adjusting the output vector according to the first output vector and the second output vector Parameter settings for the student model.

附图说明Description of drawings

图1示出已知脸部辨识的深度学习网络的简略架构。Figure 1 shows a simplified architecture of a known deep learning network for face recognition.

图2示出在本公开实施例中如何运用经过脸部对齐处理后的图像训练教师模型。FIG. 2 shows how to train a teacher model using images processed by face alignment in an embodiment of the present disclosure.

图3示出在本公开实施例中如何运用训练完成的教师模型以及未经脸部对齐处理的图像训练学生模型。FIG. 3 shows how to use the trained teacher model and the unaligned images to train the student model in an embodiment of the present disclosure.

图4示出本公开实施例的训练脸部辨识的深度学习网络的方法。FIG. 4 illustrates a method for training a deep learning network for face recognition according to an embodiment of the present disclosure.

图5示出本公开实施例的训练脸部辨识的深度学习网络的装置。FIG. 5 shows an apparatus for training a deep learning network for face recognition according to an embodiment of the present disclosure.

具体实施方式Detailed ways

在以下描述中，描述了许多具体细节以提供阅读者对本公开实施例的透彻理解。然而，本领域技术人员将能理解，如何在缺少一个或多个具体细节的情况下，或者利用其他方法或组件或材料等来实现本公开。在其他情况下，众所周知的结构、材料或操作不会被示出或详细描述，从而避免模糊本公开的核心概念。In the following description, numerous specific details are described in order to provide the reader with a thorough understanding of the embodiments of the present disclosure. However, one skilled in the art will understand how to practice the present disclosure without one or more of the specific details, or with other methods or components or materials, or the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring core concepts of the disclosure.

说明书中提到的“一实施例”意味着该实施例所描述的特定特征、结构或特性可能被包括在本公开的至少一个实施例中。因此，本说明书中各处出现的“在一实施例中”不一定意味着同一个实施例。此外，前述的特定特征、结构或特性可以以任何合适的形式在一个或多个实施例中结合。Reference in the specification to "an embodiment" means that a specific feature, structure or characteristic described in the embodiment may be included in at least one embodiment of the present disclosure. Therefore, appearances of "in one embodiment" in various places in this specification do not necessarily mean the same embodiment. Furthermore, the particular features, structures, or characteristics described above may be combined in any suitable form in one or more embodiments.

请参考图2与图3，这些图示出本公开实施例如何利用知识蒸馏技巧，训练用于进行脸部辨识的深度学习网络。其中，由本公开训练后的脸部辨识的深度学习网络可以用于进行身份识别，其可根据一输入脸部图像产生一维的输出向量，并且将该输出向量与数据库中所有已注册的向量进行比对。当该输出向量与某个已注册向量之间的L2距离小于预设的临界值时，便可认定该输入脸部图像相符于该已注册向量所关联的身份。Please refer to FIG. 2 and FIG. 3 , which illustrate how to use the knowledge distillation technique to train a deep learning network for face recognition according to an embodiment of the present disclosure. Among them, the deep learning network for face recognition trained by the present disclosure can be used for identity recognition, which can generate a one-dimensional output vector according to an input face image, and compare the output vector with all registered vectors in the database Comparison. When the L2 distance between the output vector and a registered vector is smaller than a preset threshold, it can be determined that the input facial image matches the identity associated with the registered vector.

如图2所示，本公开实施例会先对教师模型(teacher model)110进行训练。在训练的过程中，一个或多个来源图像IMG_S会被输入至脸部检测器120，脸部检测器120会从来源图像IMG_S中找到包括有人脸特征的部分，将其撷取后，输出撷取图像IMG_C至脸部坐标检测器130。脸部坐标检测器130会识别撷取图像IMG_C中，关于脸部的重要特征(如：眼耳口鼻等)的坐标，并且视需求进行脸部对齐。例如，当撷取图像IMG_C中的脸部图形存在角度歪斜或者比例不正确等问题时，脸部坐标检测器130会对撷取图像IMG_C进行平移、缩放、或者是二维/三维旋转等几何处理。据此，脸部坐标检测器130将经过脸部对齐处理后的对齐图像IMG_A输入至教师模型110。当对齐图像IMG_A输入至教师模型110之后，教师模型110会产生一输出向量140。输出向量140会与相对应于来源图像IMG_S的标签(label)信息(即，来源图像IMG_S实质上所对应的身份类别)进行比较，从而产生一损失函数150(即，识别损失(identification loss))。而教师模型110的参数设定会根据当前的损失函数150而被调整，从而实现对教师模型110的训练。在使用大量不同的来源图像IMG_S训练教师模型110，使得损失函数150低于一预定值后，便可完成教师模型110的训练。接着，基于知识蒸馏(Knowledge Distillation)的技巧，从训练完成的教师模型110提取出一个简化的学生模型(student model)210。相较于教师模型110，学生模型210的结构较为精简且运算复杂度低，对于系统整体运算资源的占用比例也低。由于学生模型210是从教师模型110所蒸馏而出，其具有实质上近似教师模型110的识别能力。As shown in FIG. 2 , the embodiment of the present disclosure first trains a teacher model 110 . During the training process, one or more source images IMG_S will be input to the face detector 120, and the face detector 120 will find the part including human face features from the source image IMG_S, extract it, and output the extracted Fetch image IMG_C to face coordinate detector 130 . The facial coordinate detector 130 will identify the coordinates of important facial features (such as eyes, ears, mouth, nose, etc.) in the captured image IMG_C, and perform face alignment as required. For example, when the face image in the captured image IMG_C has problems such as skewed angles or incorrect proportions, the face coordinate detector 130 will perform geometric processing such as translation, scaling, or 2D/3D rotation on the captured image IMG_C. . Accordingly, the face coordinate detector 130 inputs the aligned image IMG_A after the face alignment process to the teacher model 110 . After the aligned image IMG_A is input to the teacher model 110 , the teacher model 110 will generate an output vector 140 . The output vector 140 will be compared with the label (label) information corresponding to the source image IMG_S (ie, the identity category that the source image IMG_S essentially corresponds to), thereby generating a loss function 150 (ie, identification loss (identification loss)) . The parameter setting of the teacher model 110 will be adjusted according to the current loss function 150 , so as to realize the training of the teacher model 110 . After training the teacher model 110 using a large number of different source images IMG_S such that the loss function 150 is lower than a predetermined value, the training of the teacher model 110 can be completed. Next, a simplified student model (student model) 210 is extracted from the trained teacher model 110 based on knowledge distillation techniques. Compared with the teacher model 110, the student model 210 has a simpler structure and lower computational complexity, and occupies a lower proportion of the overall computing resources of the system. Since the student model 210 is distilled from the teacher model 110 , it has a recognition capability substantially similar to that of the teacher model 110 .

请参考图3，该图示出本公开如何训练学生模型。其中，脸部检测器120撷取从一个或多个来源图像IMG_S提取脸部图形，从而产生撷取图像IMG_C。撷取图像IMG_C会在未经脸部对齐处理的情况下，直接被输入至学生模型210。学生模型210会根据撷取图像IMG_C产生一输出向量240。与此同时，撷取图像IMG_C也会经过脸部坐标检测器130的对齐处理后，产生对齐图像IMG_A。而对齐图像IMG_A则被输入至教师模型110，从而产生相应的输出向量145。基于输出向量145与输出向量240的差异(如：L2距离)，可获得相应的损失函数250。根据损失函数250，可以调整学生模型210的参数设定。另一方面，输出向量240还会与相关联于来源图像IMG_S的标签信息进行比较。基于两者的差异(如：识别损失)，将产生另一个损失函数260。根据损失函数260，也可以调整学生模型210的参数。通过损失函数250与260，可以实现对学生模型210的训练。当使用大量不同的来源图像IMG_S训练学生模型210，使得损失函数250与260低于个别的预定值后，便可完成学生模型210的训练。请注意，在训练学生模型210的过程中，教师模型110仅作为推论(Inference only)，其参数设定在此期间不会被调整。Please refer to FIG. 3, which illustrates how the present disclosure trains a student model. Wherein, the face detector 120 extracts facial images from one or more source images IMG_S, so as to generate the captured image IMG_C. The captured image IMG_C is directly input to the student model 210 without face alignment processing. The student model 210 generates an output vector 240 according to the captured image IMG_C. At the same time, the captured image IMG_C is also processed by the facial coordinate detector 130 to generate an aligned image IMG_A. The aligned image IMG_A is input to the teacher model 110 to generate a corresponding output vector 145 . Based on the difference between the output vector 145 and the output vector 240 (eg, L2 distance), a corresponding loss function 250 can be obtained. According to the loss function 250, the parameter settings of the student model 210 can be adjusted. On the other hand, the output vector 240 is also compared with the tag information associated with the source image IMG_S. Based on the difference between the two (eg, recognition loss), another loss function 260 will be generated. According to the loss function 260, the parameters of the student model 210 may also be adjusted. Through the loss functions 250 and 260, the training of the student model 210 can be realized. When the student model 210 is trained by using a large number of different source images IMG_S such that the loss functions 250 and 260 are lower than respective predetermined values, the training of the student model 210 can be completed. Please note that during the process of training the student model 210, the teacher model 110 is only used as an inference (Inference only), and its parameter settings will not be adjusted during this period.

图4示出了本公开实施例的训练脸部辨识的深度学习网络的方法。如图所示，本公开的训练方法包括以下的简化流程：FIG. 4 shows a method for training a deep learning network for face recognition according to an embodiment of the present disclosure. As shown in the figure, the disclosed training method includes the following simplified process:

S310：使用一脸部坐标检测器对至少一个撷取图像进行脸部对齐处理，从而输出至少一个对齐图像；S310: Use a face coordinate detector to perform face alignment processing on at least one captured image, so as to output at least one aligned image;

S320：将该至少一个对齐图像输入一教师模型，以获得一第一输出向量；将该至少一个撷取图像输入对应于该教师模型的一学生模型，以获得一第二输出向量；以及S320: Input the at least one aligned image into a teacher model to obtain a first output vector; input the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and

S330：依据该第一输出向量与该第二输出向量，调整该学生模型的参数设定。S330: Adjust parameter settings of the student model according to the first output vector and the second output vector.

由于上述步骤的原理以及具体细节已于先前实施例中详细说明，故在此不进行重复描述。应当注意的是，上述的流程可能可以通过添加其他额外步骤或者是进行适当的变化与调整，更好地实现对脸部辨识网络模型的训练，更进一步提升其识别能力。再者，前述本公开实施例中所有的操作，都可以通过图5所示的装置400来实现。其中，装置400中的存储单元410可用于存储程序代码、指令、变量或数据。而装置400中的硬件处理单元420则可执行存储单元410所存储的程序代码与指令，并参考其中的变量或数据来执行前述实施例中所有的操作。Since the principles and specific details of the above steps have been described in detail in the previous embodiments, they will not be repeated here. It should be noted that the above process may be able to better realize the training of the face recognition network model and further improve its recognition ability by adding other additional steps or making appropriate changes and adjustments. Furthermore, all the operations in the aforementioned embodiments of the present disclosure can be implemented by the apparatus 400 shown in FIG. 5 . Wherein, the storage unit 410 in the device 400 can be used to store program codes, instructions, variables or data. The hardware processing unit 420 in the device 400 can execute the program codes and instructions stored in the storage unit 410 , and perform all the operations in the foregoing embodiments with reference to the variables or data therein.

总结来说，本公开提供了一种脸部辨识的深度学习网络模型的训练方法。通过本公开的训练方法，可以省略脸部辨识算法中，对于脸部对齐处理的需求。其中，本公开利用已经过脸部对齐处理的脸部图像，预先训练教师模型。接着，再利用已经训练完成的教师模型以及未经脸部对齐处理的脸部图像，训练一学生模型。由于采用未经脸部对齐处理的脸部图像进行训练，因此提升了学生模型对于角度歪斜或者是比例错误的脸部图像的适应能力。后续在学生模型进行脸部辨识时，便可在省略已知架构中的脸部坐标检测器的前提下，达成同样良好的识别能力。如此一来，本公开有效地降低了脸部辨识网络模型对于系统运算资源的占用比例。In summary, the present disclosure provides a method for training a deep learning network model for face recognition. Through the training method of the present disclosure, the need for face alignment processing in the face recognition algorithm can be omitted. Among them, the present disclosure uses face images that have been processed for face alignment to pre-train the teacher model. Then, a student model is trained by using the trained teacher model and the face images that have not been processed for face alignment. Since the face images without face alignment processing are used for training, the adaptability of the student model to face images with skewed angles or wrong proportions is improved. Later, when the student model performs face recognition, the same good recognition ability can be achieved under the premise of omitting the face coordinate detector in the known architecture. In this way, the present disclosure effectively reduces the occupancy ratio of the facial recognition network model to system computing resources.

本公开的实施例可使用硬件、软件、固件以及其相关结合来完成。藉由适当的一指令执行系统，可使用存储于一内存中的软件或固件来实现本公开的实施例。就硬件而言，则是可应用下列任一技术或其相关结合来完成：具有可根据数据信号执行逻辑功能的逻辑门的一个别运算逻辑、具有合适的组合逻辑门的一特定应用集成电路(applicationspecific integrated circuit，ASIC)、可程序门阵列(programmable gate array，PGA)或一现场可程序门阵列(field programmable gate array，FPGA)等。Embodiments of the present disclosure can be implemented using hardware, software, firmware, and related combinations thereof. With an appropriate instruction execution system, the embodiments of the present disclosure may be implemented using software or firmware stored in a memory. As far as hardware is concerned, it can be implemented by applying any of the following technologies or related combinations: an individual arithmetic logic with logic gates that can perform logic functions according to data signals, an application-specific integrated circuit with suitable combinational logic gates ( application specific integrated circuit (ASIC), programmable gate array (programmable gate array, PGA) or a field programmable gate array (field programmable gate array, FPGA), etc.

说明书内的流程图中的流程和方块示出了基于本公开的各种实施例的系统、方法和计算机软件产品所能实现的架构，功能和操作。在这方面，流程图或功能方块图中的每个方块可以代表程序代码的模块，区段或者是部分，其包括用于实现指定的逻辑功能的一个或多个可执行指令。另外，功能方块图以及/或流程图中的每个方块，以及方块的组合，基本上可以由执行指定功能或动作的专用硬件系统来实现，或专用硬件和计算机程序指令的组合来实现。这些计算机程序指令还可以存储在计算机可读媒体中，该媒体可以使计算机或其他可编程数据处理装置以特定方式工作，使得存储在计算机可读媒体中的指令，实现流程图以及/或功能方块图中的方块所指定的功能/动作。The procedures and blocks in the flowcharts in the specification show the architecture, functions and operations that can be realized by the systems, methods and computer software products based on various embodiments of the present disclosure. In this regard, each block in the flowchart or functional block diagram may represent a module, section, or portion of program code, which includes one or more executable instructions for implementing the specified logical function. In addition, each block in the functional block diagrams and/or flowcharts, as well as combinations of blocks, can basically be realized by a dedicated hardware system for performing specified functions or actions, or a combination of dedicated hardware and computer program instructions. These computer program instructions can also be stored in a computer-readable medium, and the medium can make a computer or other programmable data processing device work in a specific manner, so that the instructions stored in the computer-readable medium can realize the flowchart and/or the functional block The function/action specified by the block in the figure.

以上所述仅为本公开的较佳实施例，凡依本公开权利要求所做的均等变化与修饰，皆应属本公开的涵盖范围。The above descriptions are only preferred embodiments of the present disclosure, and all equivalent changes and modifications made according to the claims of the present disclosure shall fall within the scope of the present disclosure.

【符号说明】【Symbol Description】

10、120：脸部检测器10, 120: face detector

20、130：脸部坐标检测器20, 130: Face coordinate detector

30：脸部辨识网络模型30: Facial Recognition Network Model

110：教师模型110: Teacher Model

210：学生模型210: Student Model

140、145、240：输出向量140, 145, 240: output vector

150、250、260：损失函数150, 250, 260: loss function

IMG_S：来源图像IMG_S: Source image

IMG_C：撷取图像IMG_C: capture image

IMG_A：对齐图像IMG_A: Align the image

S310～S330：步骤S310～S330: steps

400：装置400: device

410：存储单元410: storage unit

420：硬件处理单元。420: a hardware processing unit.

Claims

1. A method for training a deep learning network for face recognition, comprising:

performing face alignment processing on the at least one captured image using a face coordinate detector, thereby outputting at least one aligned image;

inputting the at least one alignment image into a teacher model to obtain a first output vector;

inputting the at least one captured image into a student model corresponding to the teacher model to obtain a second output vector; and

and adjusting the parameter setting of the student model according to the first output vector and the second output vector.

2. The method of claim 1, wherein the step of performing face alignment on the at least one captured image comprises:

performing two-dimensional/three-dimensional rotation, scaling or translation on the at least one captured image to obtain the at least one aligned image.

3. The method of claim 1, wherein the method further comprises:

the at least one captured image is captured from the source image using a face detector.

4. The method of claim 1, wherein the method further comprises:

using the face coordinate detector to perform face alignment processing on the plurality of captured images, thereby outputting a plurality of aligned images;

inputting the plurality of alignment images into the teacher model to obtain a plurality of third output vectors;

calculating a first loss function between the plurality of third output vectors and the label information associated with the plurality of captured images respectively; and

and adjusting the parameter setting of the teacher model according to the first loss function.

5. The method of claim 4, wherein the step of inputting the at least one alignment image into the teacher model to obtain the first output vector comprises:

inputting the at least one alignment image into the teacher model adjusted according to the first loss function to obtain the first output vector.

6. The method of claim 1, wherein the step of adjusting the parameter settings of the student model comprises:

calculating a second penalty function between the first output vector and the second output vector; and

and adjusting the parameter setting of the student model according to the second loss function.

7. The method of claim 1, wherein the method further comprises:

calculating a third loss function of the second output vector and label information associated with the at least one captured image; and

and adjusting the parameter setting of the student model according to the third loss function.

8. An apparatus for training a deep learning network for face recognition, comprising:

a storage unit for storing a program code;

a processing unit, configured to execute the program code, such that the processing unit can perform the following operations:

performing face alignment processing on at least one captured image, thereby outputting at least one aligned image;

9. The apparatus of claim 8, wherein the processing unit executes the program code to:

10. The apparatus of claim 8, wherein the processing unit executes the program code to:

performing face alignment processing on the plurality of captured images, thereby outputting a plurality of aligned images;