CN109977876A

CN109977876A - Image-recognizing method, calculates equipment, system and storage medium at device

Info

Publication number: CN109977876A
Application number: CN201910242987.0A
Authority: CN
Inventors: 陈志博; 陈立
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2019-07-05

Abstract

The present invention relates to an image recognition method, an image recognition apparatus, a computing device, a computer-readable storage medium and an image recognition system. The image recognition method includes: predicting lighting conditions in a future period based on a plurality of images respectively shot by a camera at multiple consecutive moments; generating imaging parameters to be sent to the camera based on at least the predicted lighting conditions; imaging parameters are sent to the camera so that the camera captures an image to be recognized based on the imaging parameters in the future period; and an object of interest is recognized from the image to be recognized. The method can improve the quality of captured images, thereby improving the accuracy of detection and recognition of objects of interest.

Description

Image recognition method, apparatus, computing device, system and storage medium

技术领域technical field

本发明涉及图像识别技术，具体来说涉及一种图像识别方法、图像识别装置、计算设备、图像识别系统和存储介质。The present invention relates to image recognition technology, in particular to an image recognition method, an image recognition device, a computing device, an image recognition system and a storage medium.

背景技术Background technique

视频监控和图像识别技术广泛应用在半室外或室外场所，诸如大型会场、飞机场、火车站和城市道路。在不同地点和/或不同时间等不同的应用场景下，光照状况会发生变化。即使在同一场所，光照状况从清晨到日暮的变化也是显著的。图像后期处理技术（例如，提取光照不变特征、基于面部特征模型的图像增强、或其他标准化的处理方法）无法完全弥补摄像设备在采集图像过程中出现的信息缺失，影响识别结果的准确性。因此，如何降低复杂的现场光照环境对于图像识别结果的影响成为一项挑战。Video surveillance and image recognition technologies are widely used in semi-outdoor or outdoor places, such as large conference venues, airports, railway stations and urban roads. In different application scenarios, such as different locations and/or different times, the lighting conditions will change. Even within the same location, the change in light conditions from early morning to late evening was significant. Image post-processing techniques (for example, extracting illumination invariant features, image enhancement based on facial feature models, or other standardized processing methods) cannot fully compensate for the lack of information that occurs during image capture by camera equipment, which affects the accuracy of recognition results. Therefore, how to reduce the influence of complex on-site lighting environment on image recognition results becomes a challenge.

发明内容SUMMARY OF THE INVENTION

有利的是，提供一种可以缓解、减轻或甚至消除上述问题中的一个或多个的机制。It would be advantageous to provide a mechanism that can alleviate, alleviate or even eliminate one or more of the above problems.

根据本发明的一方面，提供了一种图像识别方法，包括：基于相机在连续的多个时刻分别拍摄的多个图像，预测未来时段的光照状况；至少基于所预测的光照状况，生成待发送至所述相机的摄像参数；发起将所述摄像参数发送至所述相机，以便相机在所述未来时段中基于所述摄像参数拍摄待识别图像；以及从所述待识别图像中识别感兴趣对象。According to an aspect of the present invention, there is provided an image recognition method, comprising: predicting the lighting conditions of a future period based on multiple images respectively captured by a camera at multiple consecutive times; camera parameters to the camera; initiate sending of the camera parameters to the camera so that the camera captures an image to be identified based on the camera parameters in the future time period; and identify an object of interest from the to-be-identified image .

在一些实施例中，所述预测未来时段的光照状况包括：利用残差网络依次提取所述多个图像各自的光照特征；利用时间递归神经网络递归地处理所述各自的光照特征，其中所述残差网络和所述时间递归神经网络已被训练以学习光照状况随时间推移的变化；以及由所述时间递归神经网络输出所预测的光照状况。所述多个时刻包括N个时刻，N为大于1的正整数。X_k+1的光照特征和所述时间递归神经网络对于X_k的光照特征的处理结果两者被输入到所述时间递归神经网络进行递归处理，其中X_k表示在所述N个时刻中的第k时刻拍摄的图像，并且X_k+1表示在所述N个时刻中的第k+1时刻拍摄的图像，k为整数且1≤k＜N。In some embodiments, the predicting the lighting conditions of the future period includes: sequentially extracting the respective lighting features of the plurality of images by using a residual network; recursively processing the respective lighting features using a temporal recurrent neural network, wherein the A residual network and the temporally recurrent neural network have been trained to learn changes in lighting conditions over time; and the lighting conditions predicted by the output of the temporally recurrent neural network. The multiple times include N times, where N is a positive integer greater than 1. Both the illumination feature of X _k+1 and the processing result of the temporal recurrent neural network on the illumination feature of X _k are input to the temporal recurrent neural network for recursive processing, where X _k represents the An image captured at time k, and X _k+1 represents an image captured at time k+1 among the N time moments, where k is an integer and 1≦k<N.

在一些实施例中，所述时间递归神经网络包括长短期记忆网络。In some embodiments, the temporal recurrent neural network comprises a long short term memory network.

在一些实施例中，所述生成待发送至所述相机的摄像参数包括：利用基于残差结构的神经网络将所预测的光照状况映射成第一版本的摄像参数。所述基于残差结构的神经网络已被训练以学习光照状况到摄像参数的映射。所述第一版本的摄像参数可被选择作为所述待发送至所述相机的摄像参数。In some embodiments, the generating the camera parameters to be sent to the camera includes using a residual structure-based neural network to map the predicted lighting conditions to the first version of the camera parameters. The residual structure-based neural network has been trained to learn the mapping of lighting conditions to camera parameters. The first version of the camera parameters may be selected as the camera parameters to be sent to the camera.

在一些实施例中，所述基于残差结构的神经网络包括多个串联的残差结构。所述将所预测的光照状况映射成第一版本的摄像参数包括：将表征所预测的光照状况的特征输入所述基于残差结构的神经网络，使得所述基于残差结构的神经网络执行操作，所述操作包括：将表征所预测的光照状况的特征进行重塑；对经重塑的特征进行卷积；对经卷积的特征应用激活函数；对应用激活函数后的特征执行批归一化，以得到归一化的特征图；利用所述多个串联的残差结构对所述归一化的特征图进行处理；以及将经处理的特征图输入全连接层，以得到所述第一版本的摄像参数。In some embodiments, the residual structure-based neural network includes a plurality of concatenated residual structures. The mapping of the predicted lighting conditions to the camera parameters of the first version includes: inputting features characterizing the predicted lighting conditions into the residual structure-based neural network, so that the residual structure-based neural network performs operations , the operations include: reshaping the features characterizing the predicted lighting conditions; convolving the reshaped features; applying an activation function to the convolved features; performing batch normalization on the features after applying the activation function to obtain a normalized feature map; process the normalized feature map using the plurality of concatenated residual structures; and input the processed feature map into a fully connected layer to obtain the first A version of the camera parameters.

在一些实施例中，所述方法还包括：控制所述相机在拍摄所述待识别图像之前实时拍摄样本图像；基于所述样本图像，生成第二版本的摄像参数；以及将所述第一版本的摄像参数和所述第二版本的摄像参数进行加权平均，以得到加权的摄像参数，所述加权的摄像参数可被选择作为所述待发送至所述相机的摄像参数。在第一操作模式下，所述第一版本的摄像参数被选择作为所述待发送至所述相机的摄像参数。在不同于所述第一操作模式的第二操作模式下，所述加权的摄像参数被选择作为所述待发送至所述相机的摄像参数。In some embodiments, the method further includes: controlling the camera to capture a sample image in real time before capturing the to-be-recognized image; generating a second version of camera parameters based on the sample image; and converting the first version A weighted average is performed between the camera parameters of the second version and the camera parameters of the second version to obtain weighted camera parameters, and the weighted camera parameters may be selected as the camera parameters to be sent to the camera. In a first mode of operation, the first version of the camera parameters is selected as the camera parameters to be sent to the camera. In a second operating mode different from the first operating mode, the weighted imaging parameters are selected as the imaging parameters to be sent to the camera.

在一些实施例中，所述感兴趣对象包括受检者的面部。所述识别感兴趣对象包括：利用第二深度神经网络从所述待识别图像中检测面部区域；以及利用第三深度神经网络从所述待识别图像中的所检测到的面部区域提取面部特征以用于识别所述受检者的身份。In some embodiments, the object of interest includes the subject's face. The identifying the object of interest includes: using a second deep neural network to detect a face region from the to-be-recognized image; and using a third deep neural network to extract facial features from the detected face region in the to-be-recognized image to used to identify the subject.

在一些实施例中，所述第二深度神经网络是浅层卷积神经网络。所述从所述待识别图像中检测面部区域包括：使用所述浅层卷积神经网络产生多个面部窗口；通过非极大值抑制得到多个面部窗口中的最优面部窗口作为最终检测到的面部区域。In some embodiments, the second deep neural network is a shallow convolutional neural network. The detecting the face region from the image to be recognized includes: using the shallow convolutional neural network to generate multiple face windows; obtaining the optimal face window among the multiple face windows through non-maximum suppression as the final detected face window. facial area.

在一些实施例中，所述方法还包括：利用所述第二深度神经网络从已登记图像中检测面部区域，其中所述已登记图像是对应于所述受检者的身份的电子照片；利用所述第三深度神经网络从所述已登记图像中的所检测到的面部区域提取面部特征；以及将从待识别图像中提取的面部特征与从所述已登记图像中提取的面部特征进行比较，以确定所述受检者是否与所述身份相符。In some embodiments, the method further comprises: using the second deep neural network to detect facial regions from a registered image, wherein the registered image is an electronic photograph corresponding to the identity of the subject; using the third deep neural network extracts facial features from the detected facial regions in the registered image; and compares the facial features extracted from the image to be recognized with the facial features extracted from the registered image to determine whether the subject matches the identity.

在一些实施例中，所述方法还包括：响应于所述受检者与所述身份不符的异常事件，发起向系统平台报告警报消息。In some embodiments, the method further includes: in response to an abnormal event in which the subject does not match the identity, initiating a reporting of an alert message to the system platform.

在一些实施例中，所述警报消息包含从所述待识别图像中提取的所述面部特征和从所述已登记图像中提取的面部特征。所述方法还包括：将从所述待识别图像中提取的所述面部特征和从所述已登记图像中提取的面部特征与公安系统的特征库比对，得到被冒名顶替者的身份信息和冒用者的身份信息。In some embodiments, the alert message includes the facial features extracted from the to-be-recognized image and the facial features extracted from the registered image. The method also includes: comparing the facial features extracted from the to-be-recognized image and the facial features extracted from the registered images with the feature library of the public security system to obtain the identity information and the identity information of the impostor. Identity information of the impostor.

在一些实施例中，所述方法还包括：利用冒用者的身份信息在在逃嫌疑人的数据库中进行查找，以确认冒用者是否为在逃嫌疑人。In some embodiments, the method further includes: using the identity information of the impostor to search in the database of the suspect at large to confirm whether the impostor is the suspect at large.

在一些实施例中，所述方法还包括：响应于冒用者被确认为在逃嫌疑人，调用现场周围一定地理范围内的监控摄像头；对监控摄像头获取的图像中的所有面部进行识别；并且将得到的面部特征与所述在逃嫌疑人的面部特征进行比对，由此对嫌疑人进行追踪。In some embodiments, the method further includes: in response to the fraudster being identified as a suspect at large, invoking surveillance cameras within a certain geographic range around the scene; recognizing all faces in the images captured by the surveillance cameras; and The obtained facial features are compared with the facial features of the suspect at large, thereby tracking the suspect.

在一些实施例中，所述方法还包括：控制所述相机在拍摄所述待识别图像之前实时拍摄样本图像；利用第一深度神经网络估计所述样本图像的深度；根据所估计的深度确定所述样本图像中的前景区域和背景区域、前景区域的亮度、和背景区域的亮度；响应于所述前景区域的亮度与所述背景区域的亮度之差超出预定范围，调节与所述相机相关联的至少一个可调节光源的照明强度；以及重复所述实时拍摄、所述估计和所述确定的步骤，直到当前样本图像中前景区域的亮度与背景区域的亮度处于所述预定范围之内。In some embodiments, the method further includes: controlling the camera to capture a sample image in real time before capturing the to-be-recognized image; estimating the depth of the sample image by using a first deep neural network; determining the sample image according to the estimated depth the foreground area and the background area, the brightness of the foreground area, and the brightness of the background area in the sample image; in response to the difference between the brightness of the foreground area and the brightness of the background area exceeding a predetermined range, adjust the adjustment associated with the camera and repeating the steps of real-time shooting, estimating and determining until the brightness of the foreground area and the brightness of the background area in the current sample image are within the predetermined range.

在一些实施例中，所述估计所述样本图像的深度包括：将所述样本图像进行尺寸调整以具有第一尺寸；以及将经尺寸调整的样本图像输入所述第一深度神经网络，使得所述第一深度神经网络执行操作，所述操作包括：对第一尺寸的样本图像进行多次卷积；以及将经卷积的样本图像反复进行处理，以得到具有所述第一尺寸的深度图。所述处理包括上采样和卷积。In some embodiments, the estimating the depth of the sample image comprises: resizing the sample image to have a first size; and inputting the resized sample image into the first deep neural network such that the The first deep neural network performs operations, the operations comprising: performing multiple convolutions on the sample images of the first size; and repeatedly processing the convolved sample images to obtain a depth map with the first size . The processing includes upsampling and convolution.

在一些实施例中，与所述相机相关联的至少一个可调节光源包括分别布置于所述相机两侧的两个可调节光源。调节与所述相机相关联的至少一个可调节光源的照明强度包括执行从以下各项所组成的组中选择的策略：两侧减光、暗侧补光、亮侧减光、两侧补光和不调节，以使得前景区域的亮度与背景区域的亮度之差处于预定范围之内。In some embodiments, the at least one adjustable light source associated with the camera includes two adjustable light sources arranged on either side of the camera, respectively. Adjusting the illumination intensity of the at least one adjustable light source associated with the camera includes executing a strategy selected from the group consisting of: two-side light reduction, dark-side fill light, bright-side light reduction, two-side fill light The sum is not adjusted so that the difference between the brightness of the foreground area and the brightness of the background area is within a predetermined range.

在一些实施例中，所述方法还包括：利用光传感器检测环境光照；以及响应于所述光传感器的检测结果在预定时间段内变化小于一阈值，设定一锁定时间段，在该锁定时间段期间在拍摄所述待识别图像之前不调节所述可调节光源的照明强度。In some embodiments, the method further comprises: detecting ambient light with a light sensor; and in response to the detection result of the light sensor changing by less than a threshold within a predetermined time period, setting a locking time period, during which the locking time During the segment period, the illumination intensity of the adjustable light source is not adjusted before the image to be recognized is captured.

根据本发明的另一方面，提供了一种图像识别装置，包括：光照预测模块，被配置成基于相机在连续的多个时刻分别拍摄的多个图像，预测未来时段的光照状况；摄像参数调节模块，被配置成至少基于所预测的光照状况，生成待发送至所述相机的摄像参数，并且发起将所述摄像参数发送至所述相机，以便相机在所述未来时段中基于所述摄像参数拍摄待识别图像；以及图像识别模块，被配置成从所述待识别图像中识别感兴趣对象。According to another aspect of the present invention, an image recognition device is provided, comprising: an illumination prediction module configured to predict illumination conditions in a future period based on a plurality of images respectively captured by a camera at a plurality of consecutive times; camera parameter adjustment a module configured to generate camera parameters to be sent to the camera based at least on the predicted lighting conditions, and to initiate sending of the camera parameters to the camera so that the camera is based on the camera parameters in the future time period photographing an image to be recognized; and an image recognition module configured to recognize an object of interest from the image to be recognized.

根据本发明的又另一方面，提供了一种计算设备，包括存储器和处理器，所述存储器被配置成在其上存储计算机程序指令，所述计算机程序指令当在所述处理器上执行时促使所述处理器执行如上所述的方法。According to yet another aspect of the present invention, there is provided a computing device comprising a memory and a processor, the memory configured to store computer program instructions thereon, the computer program instructions when executed on the processor The processor is caused to perform the method as described above.

根据本发明的再另一方面，提供了一种计算机可读存储介质，其上存储计算机程序指令，所述计算机程序指令当在处理器上执行时促使所述处理器执行如上所述的方法。According to yet another aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed on a processor, cause the processor to perform the method as described above.

根据本发明的再另一方面，提供了一种图像识别系统，包括：相机；以及如上所述的计算设备。According to yet another aspect of the present invention, there is provided an image recognition system including: a camera; and the computing device as described above.

在各实施例中，通过预测拍摄现场光照状况的变化来自适应地设定相机的摄像参数，使得拍摄的图像质量得以提高，从而提高感兴趣对象的检测和识别的准确度。In each embodiment, the imaging parameters of the camera are adaptively set by predicting changes in the lighting conditions of the shooting site, so that the quality of the captured images is improved, thereby improving the accuracy of detection and recognition of objects of interest.

附图说明Description of drawings

在下面结合附图对于示例性实施例的描述中，本发明的更多细节、特征和优点被公开，在附图中：Further details, features and advantages of the present invention are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:

图1示出了根据本发明实施例的图像识别方法的流程图；1 shows a flowchart of an image recognition method according to an embodiment of the present invention;

图2示出了根据本发明实施例的可以在其中应用图1的方法的示例系统的示意性框图；FIG. 2 shows a schematic block diagram of an example system in which the method of FIG. 1 may be applied, according to an embodiment of the present invention;

图3A示出了图2中的光照预测模块的示例性和示意性递归结构；FIG. 3A shows an exemplary and schematic recursive structure of the illumination prediction module in FIG. 2;

图3B示出了其中递归结构被展开的光照预测模块的示意图；Figure 3B shows a schematic diagram of a lighting prediction module in which the recursive structure is expanded;

图4示出了图2中的摄像参数调节模块所执行的操作的示例性和示意性图示；FIG. 4 shows an exemplary and schematic illustration of the operations performed by the camera parameter adjustment module in FIG. 2;

图5A和5B示出了图2中的图像识别模块所执行的操作的示例性和示意性图示；Figures 5A and 5B show exemplary and schematic illustrations of operations performed by the image recognition module in Figure 2;

图6示出了图2中的深度估计模块所使用的深度神经网络的操作的示例性和示意性图示；Figure 6 shows an exemplary and schematic illustration of the operation of the deep neural network used by the depth estimation module in Figure 2;

图7示出了深度估计模块、相机和可调节光源的示例配置；FIG. 7 shows an example configuration of a depth estimation module, camera, and adjustable light source;

图8示出了深度估计模块的操作的一般流程；Figure 8 shows the general flow of the operation of the depth estimation module;

图9示出了其中可以应用根据本发明实施例的技术的应用场景；并且FIG. 9 illustrates an application scenario in which techniques according to embodiments of the present invention may be applied; and

图10示出了一个示例系统，其包括代表可以实现本文描述的各种技术的一个或多个系统和/或设备的示例计算设备。10 illustrates an example system including an example computing device representative of one or more systems and/or devices that may implement the various techniques described herein.

具体实施方式Detailed ways

在描述本发明的实施例之前，解释说明本文中使用的若干术语。这些概念应当是人工智能领域的技术人员已知的，为了简洁起见，它们的详细描述在本文中被省略。Before describing embodiments of the present invention, several terms used herein are explained. These concepts should be known to those skilled in the art of artificial intelligence, and their detailed descriptions are omitted herein for brevity.

1、长短期记忆网络（Long Short-Term Memory, LSTM）。长短期记忆网络是一种时间递归神经网络，它包括被称为cell的结构, 每个cell包括三扇门，分别叫做遗忘门、输入门和输出门。信息进入LSTM之后，不符合算法规则的信息通过遗忘门被遗忘，而符合算法规则的信息被保留。1. Long Short-Term Memory (LSTM). The long short-term memory network is a time recurrent neural network, which includes a structure called a cell, each cell includes three gates, which are called the forget gate, the input gate and the output gate. After the information enters the LSTM, the information that does not conform to the algorithm rules is forgotten through the forgetting gate, while the information that conforms to the algorithm rules is retained.

2、残差网络（Residual Network, Resnet）。残差网络是2015年提出的一种深度卷积网络，旨在解决网络深度的增加带来的副作用（退化问题），提高网络性能。残差网络典型地包括多个残差学习结构（也称残差结构），每个残差学习结构通过前向神经网络和捷径（shortcut）连接实现。2. Residual network (Residual Network, Resnet). Residual network is a deep convolutional network proposed in 2015, which aims to solve the side effect (degeneration problem) caused by the increase of network depth and improve network performance. Residual networks typically include multiple residual learning structures (also called residual structures), each of which is implemented through a forward neural network and shortcut connections.

3、VGG16。VGGNet是牛津大学计算机视觉组（Visual Geometry Group）提出的一种深度卷积神经网络。通过反复堆叠3×3的小型卷积核和2×2的最大池化层，构筑16～19层深度的卷积神经网络。VGG16，即16层的VGGNet。3. VGG16. VGGNet is a deep convolutional neural network proposed by the Visual Geometry Group of Oxford University. By repeatedly stacking 3×3 small convolution kernels and 2×2 max pooling layers, a convolutional neural network with a depth of 16 to 19 layers is constructed. VGG16, the 16-layer VGGNet.

4、带参数的修正线性单元（Parametric Rectified Linear Unit, PReLU）。修正线性单元，又称为线性整流函数，是一种人工神经网络中常用的激活函数（activationfunction），通常指代诸如斜坡函数及其变种之类的非线性函数。4. Parametric Rectified Linear Unit (PReLU) with parameters. Modified linear unit, also known as linear rectification function, is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions such as ramp functions and their variants.

5、深度神经网络（Deep Neural Networks, DNN）。深度神经网络是伴随着2006年深度学习（deep learning）概念的提出而出现的一种神经网络。DNN可以理解为具有很多隐藏层的神经网络。5. Deep Neural Networks (DNN). Deep neural network is a kind of neural network that appeared with the concept of deep learning in 2006. DNN can be understood as a neural network with many hidden layers.

6、批归一化（Batch Normalization, BN ）。批归一化是2015年提出的一种训练优化方法，用于更快速、更稳定地训练深度神经网络。关于批归一化的更多内容可在https://ar×iv.org/abs/1010.03167 中找到。6. Batch Normalization (BN). Batch normalization is a training optimization method proposed in 2015 for faster and more stable training of deep neural networks. More on batch normalization can be found at https://ar×iv.org/abs/1010.03167.

下面结合附图详细描述本发明的实施例。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

图1示出了根据本发明实施例的图像识别方法100的流程图，并且图2示出了其中可以应用图1的方法100的示例系统200的示意性框图。1 shows a flowchart of an image recognition method 100 according to an embodiment of the present invention, and FIG. 2 shows a schematic block diagram of an example system 200 in which the method 100 of FIG. 1 may be applied.

系统200包括相机CAM、与相机CAM相关联（例如，安装在相机两侧）的可调节光源ALS1和ALS2、图像识别装置210、以及系统平台（例如，服务器SRV和数据库DB）。The system 200 includes a camera CAM, adjustable light sources ALS1 and ALS2 associated with the camera CAM (eg, mounted on both sides of the camera), an image recognition device 210 , and a system platform (eg, a server SRV and a database DB).

相机CAM可以是各种类型的图像捕获设备，例如数码相机、网络摄像头、监控摄像头，等等。虽然相机CAM在图2中被示出为与图像识别装置210相分离，但是在一些实施例中，相机CAM可以与图像识别装置210相集成。Camera CAMs may be various types of image capture devices, such as digital cameras, webcams, surveillance cameras, and the like. Although the camera CAM is shown in FIG. 2 as being separate from the image recognition device 210 , in some embodiments the camera CAM may be integrated with the image recognition device 210 .

可调节光源ALS1和ALS2是可选的，用于为相机CAM提供补充光照。光源ALS1和ALS2在其照明强度（发光亮度）可被控制成增大和减小的意义上是可调节的。虽然图2中示出了两个可调节光源，但是系统200可以包括更少或更多的可调节光源。Adjustable light sources ALS1 and ALS2 are optional to provide supplemental lighting to the camera CAM. The light sources ALS1 and ALS2 are adjustable in the sense that their illumination intensity (luminance brightness) can be controlled to increase and decrease. Although two adjustable light sources are shown in FIG. 2, system 200 may include fewer or more adjustable light sources.

图像识别装置210包括光照预测模块211、摄像参数调节模块212、图像识别模块213、深度估计模块215、告警模块214和光传感器SNS，其中深度估计模块215、告警模块214和光传感器SNS可以是可选的。图像识别装置210可以经由任何适当的I/O接口（例如，有线或无线收发机，图2中未示出）和任何适当的通信协议（例如，TCP/IP、蓝牙、Zigbee、Wi-Fi）与相机CAM、可调节光源ALS1和ALS2、服务器SRV、以及数据库DB进行交互。The image recognition device 210 includes an illumination prediction module 211, a camera parameter adjustment module 212, an image recognition module 213, a depth estimation module 215, an alarm module 214 and a light sensor SNS, wherein the depth estimation module 215, the alarm module 214 and the light sensor SNS may be optional . Image recognition device 210 may be via any suitable I/O interface (eg, wired or wireless transceiver, not shown in Figure 2) and any suitable communication protocol (eg, TCP/IP, Bluetooth, Zigbee, Wi-Fi) Interacts with camera CAM, adjustable light sources ALS1 and ALS2, server SRV, and database DB.

系统平台包括例如服务器SRV和数据库DB。在这种情况下，图像识别装置210可以作为客户端操作。将理解的是，图像识别装置210和系统平台也可以基于不同于客户端/服务器的架构，例如分布式的云架构（其中系统平台为云）或物理地位于同一位置或分离的前后台结构（其中系统平台为后台）。The system platform includes, for example, a server SRV and a database DB. In this case, the image recognition apparatus 210 may operate as a client. It will be appreciated that the image recognition device 210 and the system platform may also be based on architectures other than client/server, such as a distributed cloud architecture (where the system platform is a cloud) or physically co-located or separate front-office structures ( The system platform is the background).

参考图1和2，在步骤110，基于相机CAM在连续的N个时刻分别拍摄的N个图像X₁,X₂, X₃…, X_N，预测未来时段的光照状况，其中N为大于1的正整数。这个步骤可以由光照预测模块211实现。1 and 2, in step 110, based on _N images X ₁ , X ₂ , X ₃ . positive integer of . This step can be implemented by the illumination prediction module 211 .

N个图像X₁, X₂, X₃…, X_N可以经由图像识别装置210的I/O接口从相机CAM接收。在一些实施例中，光照预测模块211利用残差网络依次提取图像X₁, X₂, X₃…, X_N各自的光照特征，并且利用时间递归神经网络递归地处理所述各自的光照特征。然后，时间递归神经网络输出所预测的光照状况。所述残差网络和所述时间递归神经网络被预先训练以学习光照状况随时间推移的变化。不同场所的光照数据被馈送给光照预测模块211用于训练。替换地，也可以针对特定场所进行定制化训练。The N images X ₁ , X ₂ , X ₃ . . . , X _N may be received from the camera CAM via the I/O interface of the image recognition device 210 . In some embodiments, the illumination prediction module 211 sequentially extracts illumination features of the images X ₁ , X ₂ , X ₃ . . . , X _N using a residual network, and recursively processes the respective illumination features using a temporal recurrent neural network. The temporal recurrent neural network then outputs the predicted lighting conditions. The residual network and the temporal recurrent neural network are pretrained to learn changes in lighting conditions over time. Lighting data for different locations is fed to the lighting prediction module 211 for training. Alternatively, training can be customized for a specific location.

图3A示出了光照预测模块211的示例性和示意性递归结构。在该示例中，光照预测模块211包括残差网络Resnet50和长短期记忆网络LSTM。如已知的，Resnet50是一个50层的残差网络，其在性能和需求算力上达到了较好的折中。LSTM适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。因此，这样的结构适合于处理光照状况这种包含较多信息且随时间变化的对象。FIG. 3A shows an exemplary and schematic recursive structure of the illumination prediction module 211 . In this example, the illumination prediction module 211 includes a residual network Resnet50 and a long short term memory network LSTM. As known, Resnet50 is a 50-layer residual network, which achieves a good compromise between performance and required computing power. LSTMs are suitable for processing and predicting important events with relatively long intervals and delays in time series. Therefore, such a structure is suitable for dealing with objects that contain more information and change over time, such as lighting conditions.

如图3A所示，图像X_k（k为整数且1≤k＜N）被输入Resnet50，该Resnet50输出一个高维（例如，1000维或1600维）的光照特征。该特征然后被输入LSTM，并且LSTM的输出（其具有与Resnet50的输出相同的维度）连同由Resnet50从图像X_k+1提取的光照特征被再次输入LSTM以进行递归处理。As shown in Figure 3A, the image _Xk (k is an integer and 1≤k<N) is input to Resnet50, which outputs a high-dimensional (eg, 1000- or 1600-dimensional) illumination feature. This feature is then fed into the LSTM, and the output of the LSTM (which has the same dimensions as the output of Resnet50) is fed into the LSTM again for recursive processing along with the lighting features extracted by Resnet50 from the image _Xk+1 .

图3B示出了其中递归结构被展开的光照预测模块211的示意图。如图3B所示，Resnet50从图像X₁提取的光照特征被输入LSTM，并且LSTM的输出与Resnet50从图像X₂提取的光照特征二者被再次输入LSTM。然后，LSTM的输出与Resnet50从图像X₃提取的光照特征二者被再次输入LSTM，以此类推。由此，借助于LSTM的时间递归特性，光照预测模块211输出表征所预测的未来时段的光照状况的特征h_N。Figure 3B shows a schematic diagram of the illumination prediction module 211 in which the recursive structure is expanded. As shown in Figure 3B, the illumination features extracted by _Resnet50 from image X1 are input to the LSTM, and both the output of the LSTM and the illumination features extracted by _Resnet50 from image X2 are input into the LSTM again. Then, both the output of the LSTM and the lighting features extracted by Resnet50 from image _X3 are fed into the LSTM again, and so on. Thus, with the help of the temporal recursive properties of LSTMs, the lighting prediction module 211 outputs features _hN characterizing the lighting conditions for the predicted future time period.

返回参考图1和2，在步骤120，至少基于所预测的光照状况，生成待发送至所述相机CAM的摄像参数。这个步骤可以由摄像参数调节模块212实现。在各实施例中，摄像参数可以包括例如曝光值、光圈值、快门速度、感光度、对比度、分辨率、饱和度、锐度、滤镜设置、颜色空间模式、白平衡设置、照片高宽比和照片存储格式中的一个或多个。Referring back to Figures 1 and 2, at step 120, camera parameters to be sent to the camera CAM are generated based at least on the predicted lighting conditions. This step can be implemented by the camera parameter adjustment module 212 . In various embodiments, camera parameters may include, for example, exposure value, aperture value, shutter speed, sensitivity, contrast, resolution, saturation, sharpness, filter settings, color space mode, white balance settings, photo aspect ratio and one or more of photo storage formats.

图4示出了摄像参数调节模块212所执行的操作的示例性和示意性图示。在该示例中，摄像参数调节模块212利用基于残差结构的神经网络将所预测的光照状况映射成第一版本的摄像参数。所述基于残差结构的神经网络包括多个串联的残差结构，但是为了图示的方便，仅示出了它们中的一个。FIG. 4 shows an exemplary and schematic illustration of the operations performed by the camera parameter adjustment module 212 . In this example, the camera parameter adjustment module 212 utilizes a residual structure-based neural network to map the predicted lighting conditions to the camera parameters of the first version. The residual structure-based neural network includes a plurality of concatenated residual structures, but for the convenience of illustration, only one of them is shown.

每个残差结构通过前向神经网络和由分支支路表示的捷径（shortcut）连接实现，如图4所示。残差结构的思想不是直接拟合恒等映射函数H(x)=x，而是将函数转换成H(x)=F(x)+x，得到残差函数F(x)=H(x)-x。只要F(x)=0，就构成了恒等映射H(x)=x。例如，如果把5映射到5.1，那么引入残差前是F’(5)=5.1，引入残差后是H(5)=5.1，H(5)=F(5)+5，F(5)=0.1，其中F’和F表示网络参数映射。引入残差后的映射对输出的变化更加敏感。例如，输出从5.1变到5.2，映射F’增加了2%，而对于残差结构而言，如果输出从5.1变到5.2，则映射F是从0.1到0.2，增加了100%。这样，突出了微小的变化，效果更好。所述基于残差结构的神经网络可以已被利用大量数据进行训练以学习光照状况到摄像参数的映射。Each residual structure is implemented by a feedforward neural network and shortcut connections represented by branch branches, as shown in Figure 4. The idea of the residual structure is not to directly fit the identity mapping function H(x)=x, but to convert the function into H(x)=F(x)+x to obtain the residual function F(x)=H(x )-x. As long as F(x)=0, the identity map H(x)=x is formed. For example, if 5 is mapped to 5.1, then F'(5)=5.1 before the residual is introduced, H(5)=5.1 after the residual is introduced, H(5)=F(5)+5, F(5 )=0.1, where F' and F represent the network parameter mapping. The mapping after introducing residuals is more sensitive to changes in the output. For example, if the output changes from 5.1 to 5.2, the map F' increases by 2%, while for the residual structure, if the output changes from 5.1 to 5.2, the map F is from 0.1 to 0.2, an increase of 100%. This way, small changes are highlighted and the effect is better. The residual structure-based neural network may have been trained with large amounts of data to learn the mapping of lighting conditions to camera parameters.

在一些实施例中，来自光照预测模块211的所预测的未来时段的光照状况（即，特征h_N）被输入所述基于残差结构的神经网络，使得所述基于残差结构的神经网络执行以下操作。首先，将输入的特征h_N进行重塑（reshape）。例如，1600维的特征被重塑为40*40。随后，利用例如3*3的卷积核对其进行卷积，并且应用激活函数PReLU。接着，对应用激活函数后的特征执行批归一化（Batch Normalization），得到归一化的特征图（feature map）。归一化的特征图然后被输入多个串联的残差结构中。这里残差结构的数目可以是灵活的。一般地，通过采用4个以上的残差模块，能够较好地拟合数据。经过全连接层的处理后，第一版本的摄像参数从摄像参数调节模块212输出。In some embodiments, the lighting conditions (ie, features h _N ) of the predicted future periods from the lighting prediction module 211 are input into the residual structure-based neural network such that the residual structure-based neural network performs Do the following. First, reshape the input feature h _N. For example, 1600-dimensional features are reshaped to 40*40. Then, it is convolved with a convolution kernel of eg 3*3, and an activation function PReLU is applied. Next, batch normalization is performed on the features after applying the activation function to obtain a normalized feature map. The normalized feature maps are then fed into multiple concatenated residual structures. The number of residual structures here can be flexible. Generally, by using more than 4 residual modules, the data can be better fitted. After being processed by the fully connected layer, the camera parameters of the first version are output from the camera parameter adjustment module 212 .

返回参考图1和2，在步骤130，发起将摄像参数发送至相机CAM，以便相机CAM在未来时段中基于摄像参数拍摄待识别图像。摄像参数至相机CAM的发送可以由摄像参数调节模块212发起，并且由图像识别装置210的I/O接口执行。Referring back to Figures 1 and 2, at step 130, sending the camera parameters to the camera CAM is initiated so that the camera CAM captures images to be identified based on the camera parameters in a future period. The sending of the camera parameters to the camera CAM may be initiated by the camera parameter adjustment module 212 and performed by the I/O interface of the image recognition device 210 .

在一些实施例中，第一版本的摄像参数经由图像识别装置210的I/O接口被直接发送至相机CAM，以便相机CAM在未来时段中基于第一版本的摄像参数拍摄待识别图像。也即，相机CAM使用的摄像参数单纯地基于由光照预测模块211预测的光照状况而被导出。这种操作模式可以被称为全自动模式。In some embodiments, the camera parameters of the first version are sent directly to the camera CAM via the I/O interface of the image recognition device 210, so that the camera CAM captures images to be recognized based on the camera parameters of the first version in a future period. That is, the imaging parameters used by the camera CAM are simply derived based on the lighting conditions predicted by the lighting prediction module 211 . This mode of operation may be referred to as the fully automatic mode.

在一些实施例中，第一版本的摄像参数也可以经进一步处理之后被发送至相机CAM。例如，为了更好地使摄像参数适配于现场的光照状况，在确定摄像参数时，可以将现场的实时光照状况考虑在内。具体地，控制相机CAM在拍摄待识别图像之前实时拍摄样本图像，并且摄像参数调节模块212基于所述样本图像生成第二版本的摄像参数。这可以例如通过典型的图像处理方法来实现。具体地，摄像参数调节模块212处理样本图像以得到各种质量指标，例如，亮度、对比度、色调、饱和度、锐度等，并且基于这些质量指标生成第二版本的摄像参数，旨在改进这些质量指标。然后，摄像参数调节模块212将第一版本的摄像参数和第二版本的摄像参数进行加权平均，以得到加权的摄像参数。所述加权的摄像参数经由图像识别装置210的I/O接口被发送至相机CAM。这种操作模式可以被称为半自动模式，因为相机CAM使用的摄像参数并非单纯地基于由光照预测模块211预测的光照状况而被导出。In some embodiments, the camera parameters of the first version may also be sent to the camera CAM after further processing. For example, in order to better adapt the camera parameters to the lighting conditions of the scene, the real-time lighting conditions of the scene may be taken into account when determining the camera parameters. Specifically, the camera CAM is controlled to capture a sample image in real time before capturing the image to be recognized, and the imaging parameter adjustment module 212 generates a second version of imaging parameters based on the sample image. This can be achieved, for example, by typical image processing methods. Specifically, the camera parameter adjustment module 212 processes the sample image to obtain various quality indicators, such as brightness, contrast, hue, saturation, sharpness, etc., and generates a second version of camera parameters based on these quality indicators, aiming to improve these quality indicators Quality Index. Then, the camera parameter adjustment module 212 performs a weighted average of the camera parameters of the first version and the camera parameters of the second version to obtain weighted camera parameters. The weighted imaging parameters are sent to the camera CAM via the I/O interface of the image recognition device 210 . This mode of operation may be referred to as a semi-automatic mode because the camera parameters used by the camera CAM are not derived purely based on the lighting conditions predicted by the lighting prediction module 211 .

在步骤140，从相机CAM所拍摄的待识别图像中识别感兴趣对象。在各实施例中，感兴趣对象的识别可以由图像识别模块213实现。In step 140, the object of interest is identified from the to-be-identified image captured by the camera CAM. In various embodiments, the identification of the object of interest may be implemented by the image recognition module 213 .

所述感兴趣对象可以随应用场景变化。例如，在交通违法监控的应用中，感兴趣对象可以为机动车的车牌。又例如，在安全检查的应用中，感兴趣对象可以是受检者（例如，人类或动物）的面部。与人类面部类似，动物的面部也包含许多特征，例如，以面部器官的形状和几何关系为基础的几何特征等。因此，人脸检测和识别的技术也适用于动物，使得在某些应用场景下，根据本发明实施例的技术可以适用于对动物的面部进行检测和识别。The object of interest may vary with application scenarios. For example, in the application of traffic violation monitoring, the object of interest may be the license plate of a motor vehicle. As another example, in the application of security inspection, the object of interest may be the face of a subject (eg, a human or an animal). Similar to human faces, animal faces also contain many features, such as geometric features based on the shape and geometric relationships of facial organs. Therefore, the technology of face detection and recognition is also applicable to animals, so that in some application scenarios, the technology according to the embodiments of the present invention can be applied to detect and recognize the faces of animals.

图5A和5B示出了图像识别模块213所执行的操作的示例性和示意性图示。如图5A所示，利用包括多个卷积层、池化层和全连接层的深度神经网络从来自相机CAM（图2）的输入图像中检测感兴趣对象的区域（下称“感兴趣区域（ROI）”）。在该示例中，该深度神经网络从输入图像中抽取高层次的特征来检测图像中多个可能的ROI区域，并且输出多个可能的ROI区域中得分最高的ROI区域的分数。例如，可以使用浅层的卷积神经网络（CNN）快速产生多个ROI窗口，然后经过非极大值抑制得到多个ROI窗口中最优的ROI窗口。该最优的ROI区域作为检测结果被提供给图5B所示的另一深度神经网络。与图5A类似，图5B的深度神经网络也包括多个卷积层、池化层和全连接层。该深度神经网络从所检测到的ROI区域提取图像特征以供识别。5A and 5B show exemplary and schematic illustrations of the operations performed by the image recognition module 213. As shown in Fig. 5A, a deep neural network including multiple convolutional layers, pooling layers, and fully connected layers is used to detect regions of objects of interest (hereinafter referred to as "regions of interest") from the input images from the camera CAM (Fig. 2). (ROI)”). In this example, the deep neural network extracts high-level features from the input image to detect multiple possible ROI regions in the image, and outputs the score of the highest scoring ROI region among the multiple possible ROI regions. For example, a shallow convolutional neural network (CNN) can be used to quickly generate multiple ROI windows, and then obtain the optimal ROI window among multiple ROI windows through non-maximum suppression. The optimal ROI region is provided to another deep neural network shown in FIG. 5B as the detection result. Similar to Figure 5A, the deep neural network of Figure 5B also includes multiple convolutional layers, pooling layers, and fully connected layers. The deep neural network extracts image features from the detected ROI regions for identification.

在感兴趣对象为机动车牌的实施例中，提取的图像特征可以利用典型的图像识别技术被识别为一串字符（例如，汉字、字母和数字）的组合。该识别结果（以及可选地，待识别图像本身）可以经由例如图像识别装置210的I/O接口被发送至数据库DB（图2）以供记录和查询。In embodiments where the object of interest is a motor vehicle license plate, the extracted image features may be recognized as a combination of a string of characters (eg, Chinese characters, letters, and numbers) using typical image recognition techniques. The recognition result (and optionally, the image itself to be recognized) can be sent to the database DB ( FIG. 2 ) via eg the I/O interface of the image recognition device 210 for recording and querying.

在感兴趣对象为受检者的面部的实施例中，取决于不同的应用场景，提取的图像特征可以被用于不同的目的。在下文中，以感兴趣对象为人类面部为例进一步描述图像识别模块213的操作。In embodiments where the object of interest is the subject's face, the extracted image features may be used for different purposes, depending on different application scenarios. In the following, the operation of the image recognition module 213 is further described by taking the object of interest as a human face as an example.

在一些实施例中，提取的面部特征可以被提供给公安系统，并且与身份证信息库中的面部特征进行比对，从而识别该受检者的身份。In some embodiments, the extracted facial features can be provided to the public security system and compared with the facial features in the ID card information database to identify the subject's identity.

在一些实施例中，提取的面部特征可以与特定信息库（例如在逃嫌疑人数据库）中的面部特征进行比对，从而确定该受检者是否为在逃嫌疑人。In some embodiments, the extracted facial features may be compared with facial features in a specific information database (eg, a database of suspects at large) to determine whether the subject is a suspect at large.

在一些实施例中，图5A的深度神经网络还从已登记图像中检测面部区域，所述已登记图像是对应于受检者的身份的电子照片，例如从受检者所持身份证中读取的电子照片、由相机CAM（图2）拍摄的受检者所持证件上的纸件照片的电子副本、存储于数据库DB（图2）中的以受检者的身份事先注册的电子照片，等等。从待识别图像中提取的面部区域和从已登记图像中提取的面部区域两者被处理成具有相同尺寸的两个图像。然后，这两个图像被依次输入到图5B的深度神经网络（其可以是深度比较浅的卷积网络）中以便提取面部特征。在该深度神经网络中使用几个卷积层可以是足够的，因为这两个提取出的面部图像的内容比较简单。这样能够减少卷积操作，提高运行速率，降低对于硬件资源的要求。In some embodiments, the deep neural network of Figure 5A also detects facial regions from registered images, which are electronic photographs corresponding to the subject's identity, such as from an ID card held by the subject The electronic photo of the subject, the electronic copy of the paper photo on the certificate held by the subject taken by the camera CAM (Figure 2), the electronic photo that is stored in the database DB (Figure 2) and registered in advance in the identity of the subject, etc. Wait. Both the facial area extracted from the image to be recognized and the facial area extracted from the registered image are processed into two images having the same size. These two images are then sequentially input into the deep neural network of Figure 5B (which can be a shallower convolutional network) in order to extract facial features. It may be sufficient to use several convolutional layers in this deep neural network, since the content of the two extracted face images is relatively simple. This can reduce convolution operations, increase the running rate, and reduce the requirements for hardware resources.

将从待识别图像中提取的面部特征与从所述已登记图像中提取的面部特征进行比较，以确定所述受检者是否与所述身份相符。例如，如果来自待识别图像的面部特征与来自已登记图像的面部特征之间的相似度小于某一阈值，则表明受检者与其所持证件展示的（或事先在数据库中注册的）身份不一致。否则，表明受检者正在使用真实的身份。The facial features extracted from the image to be recognized are compared with the facial features extracted from the registered image to determine whether the subject matches the identity. For example, if the similarity between the facial features from the image to be recognized and the facial features from the registered image is less than a certain threshold, it indicates that the subject is inconsistent with the identity displayed (or previously registered in the database) on the document held by the subject. Otherwise, it is indicated that the subject is using a real identity.

响应于受检者与身份不符的异常事件，可以发起向系统平台报告警报消息。这可以由图像识别装置210中的告警模块214实现。In response to an abnormal event in which the subject does not match the identity, an alert message can be initiated to report to the system platform. This can be accomplished by the alerting module 214 in the image recognition device 210 .

返回参考图2，告警模块214通过适当的I/O接口（未示出）与系统平台（在该示例中，服务器SRV和数据库DB）进行交互。告警模块214可以将警报消息AM发送给系统平台，所述警报消息指示受检者与身份不符。警报消息AM还可以包含由图像识别模块213从待识别图像（由相机CAM拍摄）中提取的面部特征和从已登记图像（例如，从数据库DB中读取）中提取的面部特征。替换地或附加地，告警模块214可以在图像识别装置210的现场发出声音、文字提示等形式的警报。Referring back to Figure 2, the alerting module 214 interacts with the system platform (in this example, the server SRV and the database DB) through appropriate I/O interfaces (not shown). The alert module 214 may send an alert message AM to the system platform, the alert message indicating that the subject does not match the identity. The alert message AM may also contain facial features extracted by the image recognition module 213 from the image to be recognized (taken by the camera CAM) and facial features extracted from the registered image (eg, read from the database DB). Alternatively or additionally, the alarm module 214 may issue an alarm in the form of a sound, a text prompt, or the like at the scene of the image recognition device 210 .

系统平台可以利用不同的策略来响应警报消息AM。在一些实施例中，系统平台可以将警报消息AM中包含的两种面部特征（一个从待识别图像中提取，另一个从已登记图像提取）与公安系统的特征库比对，得到被冒名顶替者m的身份信息和冒用者n的身份信息。然后，系统平台可以向被冒名顶替者m发送消息（例如，短消息、微信消息），以提醒其注销或冻结证件。系统平台还可以利用冒用者n的身份信息在在逃嫌疑人的数据库中进行查找，确认冒用者n是否为在逃嫌疑人。如果冒用者n被确认为在逃嫌疑人，则系统平台可以将这一结果报告给公安系统，和/或采取其他措施。例如，系统平台调用现场周围一定地理范围内的监控摄像头，对各个监控摄像头获取的图像中的所有面部进行识别，并且将得到的面部特征与嫌疑人n的面部特征进行比对，由此实现对嫌疑人n的追踪。The system platform can utilize different strategies to respond to the alert message AM. In some embodiments, the system platform can compare the two facial features contained in the alarm message AM (one is extracted from the image to be recognized and the other is extracted from the registered image) with the feature database of the public security system, and obtains an impostor identity information of user m and identity information of fraudulent user n. Then, the system platform can send a message (eg, a short message, WeChat message) to the impostor m to remind him to cancel or freeze his credentials. The system platform can also use the identity information of the impostor n to search in the database of the fugitive suspect to confirm whether the impostor n is the fugitive suspect. If the fraudulent user n is confirmed as a fugitive suspect, the system platform can report the result to the public security system, and/or take other measures. For example, the system platform calls surveillance cameras within a certain geographical range around the scene to identify all faces in the images obtained by each surveillance camera, and compares the obtained facial features with the facial features of suspect n, thereby realizing the identification of Tracking of suspect n.

将理解的是，受检者与身份不符的异常事件或受检者的身份识别失败的其他异常事件也可能与来自相机CAM的待识别图像的质量有关。因此，响应于这些异常事件，可以触发图像识别装置210进入上面所述的半自动模式，其中摄像参数调节模块212从第一和第二版本的摄像参数导出加权的摄像参数，以将现场的实时光照状况考虑在内。这有助于进一步改善相机CAM后续拍摄的待识别图像的质量。It will be appreciated that anomalies where the subject does not match the identity or other anomalies where the subject's identification fails may also be related to the quality of the image to be identified from the camera CAM. Therefore, in response to these abnormal events, the image recognition device 210 can be triggered to enter the semi-automatic mode described above, wherein the camera parameter adjustment module 212 derives weighted camera parameters from the camera parameters of the first and second versions to adjust the real-time lighting of the scene conditions are taken into account. This helps to further improve the quality of images to be recognized that are subsequently captured by the camera CAM.

在某些情况下，相机CAM的摄像参数的调节（无论是否考虑现场的实时光照状况）可能仍然不足以得到具有期望质量的待识别图像。因此，可以采取附加的措施以得到进一步改善的图像质量。为此目的，图像识别装置210在图2中被示出为还包括深度估计模块215和（一个或多个）可调节光源ALS。In some cases, the adjustment of the imaging parameters of the camera CAM (regardless of whether the real-time lighting conditions of the scene are considered) may still not be sufficient to obtain the image to be recognized with the desired quality. Therefore, additional measures can be taken to obtain further improved image quality. For this purpose, the image recognition device 210 is shown in Figure 2 as further comprising a depth estimation module 215 and an adjustable light source(s) ALS.

在一些实施例中，控制相机CAM在拍摄待识别图像之前实时拍摄样本图像，并且深度估计模块215利用深度神经网络对来自相机CAM的样本图像进行深度估计。深度估计模块215所使用的样本图像可以与上面描述的摄像参数调节模块212所使用的样本图像相同。替换地，深度估计模块215所使用的样本图像可以独立于摄像参数调节模块212所使用的样本图像，例如在摄像参数调节模块212所使用的样本图像被拍摄的时刻之前或之后的时间段中被拍摄。根据所估计的深度，深度估计模块215确定样本图像中的前景（面部）区域和背景区域，并且根据前景区域和背景区域各自的亮度，调节（一个或多个）可调节光源ALS的照明强度。In some embodiments, the camera CAM is controlled to capture the sample image in real time before capturing the image to be recognized, and the depth estimation module 215 utilizes a deep neural network to perform depth estimation on the sample image from the camera CAM. The sample image used by the depth estimation module 215 may be the same as the sample image used by the camera parameter adjustment module 212 described above. Alternatively, the sample image used by the depth estimation module 215 may be independent of the sample image used by the camera parameter adjustment module 212, eg, in a time period before or after the time when the sample image used by the camera parameter adjustment module 212 was captured. shoot. Based on the estimated depth, the depth estimation module 215 determines foreground (face) regions and background regions in the sample image, and adjusts the illumination intensity of the adjustable light source(s) ALS according to the respective brightness of the foreground and background regions.

图6示出了深度估计模块215所使用的深度神经网络的操作的示例性和示意性图示。该深度神经网络是一种经修改的VGG16网络。下面结合图6描述深度估计的操作。FIG. 6 shows an exemplary and schematic illustration of the operation of the deep neural network used by the depth estimation module 215 . The deep neural network is a modified VGG16 network. The operation of depth estimation is described below with reference to FIG. 6 .

首先，相机CAM拍摄的样本图像被尺寸调整以具有第一尺寸，例如224×224。样本图像包括红（R）、绿（G）、蓝（B）三个颜色通道，因此经尺寸调整的图像被表示为224×224×3。First, the sample image captured by the camera CAM is resized to have a first size, eg, 224×224. The sample image includes three color channels, red (R), green (G), and blue (B), so the resized image is represented as 224×224×3.

然后，经尺寸调整的样本图像被输入到经修改的VGG16网络，使得经修改的VGG16网络执行以下操作。Then, the resized sample images are input to the modified VGG16 network such that the modified VGG16 network performs the following operations.

在第一阶段中，对224×224×3的输入图像进行多次卷积，得到14×14×512的特征层。在第二阶段中，将14×14×512的特征层进行上采样（upscale=4），形成56*56*512的特征层。然后利用一个（3, 3, 512, 64）的卷积核将56×56×512的特征层卷积成56×56×64的卷积层。再次进行upscale=4的上采样，得到224×224×64的卷积层。然后利用一个（3,3, 64, 1）的卷积核将224×224×64的卷积层卷积成224×224×1的特征层，该特征层以灰度值表示深度信息，并且因此可以被称为深度图（depth map）。In the first stage, multiple convolutions are performed on the 224×224×3 input image, resulting in 14×14×512 feature layers. In the second stage, the 14×14×512 feature layer is upsampled (upscale=4) to form a 56*56*512 feature layer. Then use a (3, 3, 512, 64) convolution kernel to convolve the 56×56×512 feature layer into a 56×56×64 convolutional layer. Upsampling with upscale=4 is performed again to obtain a convolutional layer of 224×224×64. Then a (3, 3, 64, 1) convolution kernel is used to convolve the 224×224×64 convolutional layer into a 224×224×1 feature layer, which represents depth information in gray values, and So it can be called a depth map.

根据所估计的深度，深度估计模块215执行所谓“测光”操作，以确定所述样本图像中的前景区域和背景区域、前景区域的亮度、和背景区域的亮度。Based on the estimated depth, the depth estimation module 215 performs a so-called "photometric" operation to determine the foreground and background regions, the brightness of the foreground regions, and the brightness of the background regions in the sample image.

图7示出了深度估计模块215、相机CAM和可调节光源ALS1和ALS2的示例配置。在该示例配置中，两个可调节光源ALS1和ALS2分别被布置在相机CAM的左侧和右侧，并且深度估计模块215根据前景区域的亮度和背景区域的亮度，选择性地调节可调节光源ALS1和ALS2的照明强度。Figure 7 shows an example configuration of depth estimation module 215, camera CAM and adjustable light sources ALS1 and ALS2. In this example configuration, two adjustable light sources ALS1 and ALS2 are arranged to the left and right of the camera CAM, respectively, and the depth estimation module 215 selectively adjusts the adjustable light sources according to the brightness of the foreground area and the brightness of the background area Illumination intensity for ALS1 and ALS2.

图8示出了深度估计模块215的操作的一般流程。FIG. 8 shows a general flow of the operation of the depth estimation module 215.

在步骤S1，深度估计模块215读取相机CAM拍摄的样本图像。In step S1, the depth estimation module 215 reads the sample image captured by the camera CAM.

在步骤S2，深度估计模块215执行如上所述的深度估计。At step S2, the depth estimation module 215 performs depth estimation as described above.

在步骤S3，深度估计模块215根据估计的深度确定样本图像中的前景区域和背景区域。例如，深度小于一阈值的区域被确定为前景区域，并且深度大于该阈值的区域被确定为背景区域。深度估计模块215进一步确定前景区域的亮度和背景区域的亮度。这可以包括各种情况。例如，当前景区域的平均亮度大于第一阈值时，确定前景区域的亮度为强；反之，确定前景区域的亮度为弱。进一步地，还可以确定前景区域的亮度是局部弱（例如，前景区域中各个像素的亮度的总体方差大于第二阈值）还是整体弱（例如，前景区域中各个像素的亮度的总体方差小于第二阈值）。当背景区域的平均亮度大于第三阈值时，确定背景区域的亮度为强；反之，确定背景区域的亮度为弱。第三阈值可以等于或不等于第一阈值。In step S3, the depth estimation module 215 determines the foreground area and the background area in the sample image according to the estimated depth. For example, an area with a depth smaller than a threshold is determined as a foreground area, and an area with a depth greater than the threshold is determined as a background area. The depth estimation module 215 further determines the brightness of the foreground regions and the brightness of the background regions. This can include a variety of situations. For example, when the average brightness of the foreground area is greater than the first threshold, the brightness of the foreground area is determined to be strong; otherwise, the brightness of the foreground area is determined to be weak. Further, it can also be determined whether the brightness of the foreground region is locally weak (for example, the overall variance of the brightness of each pixel in the foreground region is greater than the second threshold) or overall weak (for example, the overall variance of the brightness of each pixel in the foreground region is smaller than the second threshold). threshold). When the average brightness of the background region is greater than the third threshold, the brightness of the background region is determined to be strong; otherwise, the brightness of the background region is determined to be weak. The third threshold may or may not be equal to the first threshold.

在步骤S4，根据前景区域的亮度与背景区域的亮度，深度估计模块215选择性地调节可调节光源ALS1和ALS2的照明强度。具体地，在两个可调节光源ALS1和ALS2被分别布置于相机CAM两侧的情况下，调节策略可以包括：两侧减光、暗侧补光和/或亮侧减光、两侧补光和不调节，以使得前景区域的亮度与背景区域的亮度之差处于预定范围之内。该预定范围可以是一个经验值，并且可以人为设定。更一般地，如果前景区域的亮度与背景区域的亮度之差处于预定范围之内（即，前景区域的亮度与背景区域的亮度接近），则可调节光源ALS1和ALS2无需开启或调节。如果前景区域的亮度与背景区域的亮度之差超出预定范围的下限（即，背景区域的亮度明显大于前景区域的亮度），则可以调整可调节光源ALS1和ALS2的照明强度。如果前景区域的亮度与背景区域的亮度之差超出预定范围的上限（即，背景区域的亮度明显小于前景区域的亮度），则可以调整可调节光源ALS1和ALS2的照明强度。在最后一种情况下，很容易出现“阴阳脸（一半亮一半暗）”的现象，因此可以适当地调节可调节光源ALS1和ALS2中的对应一个，以使得前景区域中较暗的部分变亮。In step S4, according to the brightness of the foreground area and the brightness of the background area, the depth estimation module 215 selectively adjusts the illumination intensity of the adjustable light sources ALS1 and ALS2. Specifically, in the case where the two adjustable light sources ALS1 and ALS2 are respectively arranged on both sides of the camera CAM, the adjustment strategy may include: light reduction on both sides, fill light on the dark side and/or light reduction on the bright side, fill light on both sides The sum is not adjusted so that the difference between the brightness of the foreground area and the brightness of the background area is within a predetermined range. The predetermined range can be an empirical value and can be set artificially. More generally, if the difference between the brightness of the foreground area and the brightness of the background area is within a predetermined range (ie, the brightness of the foreground area is close to the brightness of the background area), the adjustable light sources ALS1 and ALS2 do not need to be turned on or adjusted. If the difference between the brightness of the foreground area and the brightness of the background area exceeds the lower limit of the predetermined range (ie, the brightness of the background area is significantly greater than the brightness of the foreground area), the illumination intensity of the adjustable light sources ALS1 and ALS2 can be adjusted. If the difference between the brightness of the foreground area and the brightness of the background area exceeds the upper limit of the predetermined range (ie, the brightness of the background area is significantly smaller than that of the foreground area), the illumination intensity of the adjustable light sources ALS1 and ALS2 can be adjusted. In the last case, the phenomenon of "yin and yang faces (half bright and half dark)" is easy to appear, so the corresponding one of the adjustable light sources ALS1 and ALS2 can be adjusted appropriately to brighten the darker parts in the foreground area .

在执行完照明强度的调节之后，重复步骤S1、S2和S3，直到当前样本图像中前景区域的亮度与背景区域的亮度处于所述预定范围之内。由此，能够大幅度提高待识别图像的质量，从而提高面部检测和识别的精度。After the adjustment of the illumination intensity is performed, steps S1 , S2 and S3 are repeated until the brightness of the foreground area and the brightness of the background area in the current sample image are within the predetermined range. As a result, the quality of the image to be recognized can be greatly improved, thereby improving the accuracy of face detection and recognition.

另外，当环境光照均匀变化时（如，对于晴天户外），深度估计模块215可以设定锁定时间段，在该锁定时间段期间在拍摄待识别图像之前不进行可调节光源的调节，以便改善系统的响应性。当环境光照容易突变时（如，对于演唱会等存在变化的干扰光源的场景），深度估计模块215可以实时地进行深度估计、测光和可调节光源的调节，以保证面部识别的正确率。为此目的，相机CAM或者图像识别装置210可以包括光传感器SNS以感测环境光照。深度估计模块215可以根据来自光传感器SNS的检测结果确定环境光照是均匀变化还是容易突变，并且执行相应的补光策略。均匀变化的环境光照可以由光传感器SNS的检测结果在预定时间段内变化小于一阈值来指示。In addition, when the ambient lighting varies uniformly (eg, for sunny outdoors), the depth estimation module 215 may set a lock time period during which adjustment of the adjustable light source is not performed before the image to be recognized is captured, in order to improve the system responsiveness. When the ambient light is prone to sudden changes (eg, for a scene with changing interfering light sources such as a concert), the depth estimation module 215 can perform depth estimation, light metering and adjustment of the adjustable light source in real time to ensure the correctness of facial recognition. For this purpose, the camera CAM or the image recognition device 210 may include a light sensor SNS to sense ambient lighting. The depth estimation module 215 can determine whether the ambient light changes uniformly or is prone to sudden changes according to the detection result from the light sensor SNS, and executes a corresponding lighting strategy. The uniformly varying ambient light may be indicated by the detection result of the light sensor SNS varying less than a threshold within a predetermined period of time.

图9示出了其中可以应用根据本发明实施例的技术的应用场景。该应用场景例如为用于露天会场的安全检查，其中与会者事先已经向会议举办方登记了个人照片（即，“注册照”，存储在例如数据库DB中），并且在露天会场的入口处接受安全检查。如图9所示的客户端包括多个功能块，其分别执行上面关于图1-8描述的各个模块的功能。客户端可以体现为嵌入式设备，例如与相机相集成或分离的嵌入式视频监控设备。FIG. 9 illustrates an application scenario in which techniques according to embodiments of the present invention may be applied. This application scenario is, for example, a security check for an open-air venue, in which a participant has previously registered a personal photo (ie, a "registration photo", stored in, for example, a database DB) with the conference organizer, and is accepted at the entrance of the open-air venue Security check. The client as shown in FIG. 9 includes a plurality of functional blocks, which respectively perform the functions of the various modules described above with respect to FIGS. 1-8 . The client can be embodied as an embedded device, such as an embedded video surveillance device integrated with or separate from the camera.

利用“光照预测”功能块，客户端学习会场在不同天气和时间的光照变化规律，并且预测入场时间段的光照状况。利用“摄像参数调节”功能块，客户端根据预测的光照状况自动地调节相机CAM在该入场时间段使用的摄像参数，以获取质量改善的图像。在入场时间段期间，与会者在会场入口接受安全检查以便进入会场。相机CAM拍摄受检者的样本图像，并且客户端利用“深度估计与测光”功能块评估样本图像中的前景（人脸）的亮度，并据此实时地调节光源ALS1和ALS2以提供补充光照。Using the "light prediction" function block, the client learns the changing law of lighting in the venue in different weather and time, and predicts the lighting conditions during the admission time period. Using the "Camera Parameter Adjustment" function block, the client automatically adjusts the camera parameters used by the camera CAM during the entry time period according to the predicted lighting conditions to obtain images with improved quality. During the entry time period, attendees undergo a security check at the entrance of the venue to gain access to the venue. The camera CAM takes a sample image of the subject, and the client uses the "depth estimation and metering" function block to evaluate the brightness of the foreground (face) in the sample image, and adjust the light sources ALS1 and ALS2 in real time accordingly to provide supplementary lighting .

此后，相机CAM实时拍摄受检者的照片，并将其传送至“图像识别”功能块，在该功能块处，实时拍摄的照片与数据库DB中存储的注册照进行比对。若比对结果超过阈值，则认为受检者是以真实身份进入；反之则认为受检者是以虚假身份进入。在后者的情况下，客户端通过“告警”功能块可以在现场发出警报，表明受检者未能通过安全检查。客户端还可以生成警报消息，并发送至服务器SRV。该服务器SRV进一步在公安数据系统中查找受检者的信息。如果发现该受检者是犯罪嫌疑人，则返回相关信息给公安视频监控系统（例如，天眼系统），对受检者进行追踪监控。After that, the camera CAM takes a photo of the subject in real time and transmits it to the "image recognition" function block, where the photo taken in real time is compared with the registered photos stored in the database DB. If the comparison result exceeds the threshold, it is considered that the subject has entered with a real identity; otherwise, it is considered that the subject has entered with a false identity. In the latter case, the client, via the "alert" function block, can issue an alert on-site that the subject has failed the security check. Clients can also generate alert messages and send them to the server SRV. The server SRV further searches the information of the examinee in the public security data system. If it is found that the subject is a criminal suspect, the relevant information is returned to the public security video surveillance system (for example, the Sky Eye system) to track and monitor the subject.

图10一般地在1000处图示了示例系统，其包括相机CAM、可调节光源ALS1和ALS2、以及代表可以实现本文描述的各种技术的一个或多个系统和/或设备的示例计算设备1010。计算设备1010可以是例如服务提供商的服务器、与客户端（例如，客户端设备）相关联的设备、片上系统、和/或任何其它合适的计算设备或计算系统。上面关于图2描述的图像识别装置210可以采取计算设备1010的形式。10 illustrates generally at 1000 an example system including a camera CAM, adjustable light sources ALS1 and ALS2, and an example computing device 1010 representing one or more systems and/or devices that may implement the various techniques described herein . Computing device 1010 may be, for example, a service provider's server, a device associated with a client (eg, a client device), a system on a chip, and/or any other suitable computing device or computing system. The image recognition apparatus 210 described above with respect to FIG. 2 may take the form of the computing device 1010 .

如图示的示例计算设备1010包括彼此通信耦合的处理系统1011、一个或多个计算机可读介质1012以及一个或多个I / O接口1013。尽管未示出，但是计算设备1010还可以包括系统总线或其他数据和命令传送系统，其将各种组件彼此耦合。系统总线可以包括不同总线结构的任何一个或组合，所述总线结构诸如存储器总线或存储器控制器、外围总线、通用串行总线、和/或利用各种总线架构中的任何一种的处理器或局部总线。还构思了各种其他示例，诸如控制和数据线。The example computing device 1010 as illustrated includes a processing system 1011 , one or more computer-readable media 1012 , and one or more I/O interfaces 1013 communicatively coupled to each other. Although not shown, computing device 1010 may also include a system bus or other data and command transfer system that couples the various components to each other. The system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or processor utilizing any of a variety of bus architectures. local bus. Various other examples are also contemplated, such as control and data lines.

处理系统1011代表使用硬件执行一个或多个操作的功能。因此，处理系统1011被图示为包括可被配置为处理器、功能块等的硬件元件1014。这可以包括在硬件中实现作为专用集成电路或使用一个或多个半导体形成的其它逻辑器件。硬件元件1014不受其形成的材料或其中采用的处理机构的限制。例如，处理器可以由（多个）半导体和/或晶体管（例如，电子集成电路（IC））组成。在这样的上下文中，处理器可执行指令可以是电子可执行指令。Processing system 1011 represents functionality that uses hardware to perform one or more operations. Accordingly, processing system 1011 is illustrated as including hardware elements 1014 that may be configured as processors, functional blocks, and the like. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors. The hardware element 1014 is not limited by the materials from which it is formed or the processing mechanism employed therein. For example, a processor may be composed of semiconductor(s) and/or transistors (eg, electronic integrated circuits (ICs)). In such a context, the processor-executable instructions may be electronically-executable instructions.

计算机可读介质1012被图示为包括存储器/存储装置1015。存储器/存储装置1015表示与一个或多个计算机可读介质相关联的存储器/存储容量。存储器/存储装置1015可以包括易失性介质（诸如随机存取存储器（RAM））和/或非易失性介质（诸如只读存储器（ROM）、闪存、光盘、磁盘等）。存储器/存储装置1015可以包括固定介质（例如，RAM、ROM、固定硬盘驱动器等）以及可移动介质（例如，闪存、可移动硬盘驱动器、光盘等）。计算机可读介质1012可以以下面进一步描述的各种其他方式进行配置。Computer readable medium 1012 is illustrated as including memory/storage 1015 . Memory/storage 1015 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 1015 may include volatile media (such as random access memory (RAM)) and/or non-volatile media (such as read only memory (ROM), flash memory, optical disks, magnetic disks, etc.). Memory/storage 1015 may include fixed media (eg, RAM, ROM, fixed hard drives, etc.) as well as removable media (eg, flash memory, removable hard drives, optical discs, etc.). The computer-readable medium 1012 may be configured in various other ways as described further below.

一个或多个输入/输出（I/O）接口1013代表允许向计算设备1010传送命令和数据和从计算设备1010接收命令和数据的功能。I/O接口1013可以由任何适当的通信接口和通信协议实现。One or more input/output (I/O) interfaces 1013 represent functionality that allows commands and data to be communicated to and received from computing device 1010 . I/O interface 1013 may be implemented by any suitable communication interface and communication protocol.

计算设备1010还包括图像识别应用1016。图像识别应用1016可以作为计算程序指令存储在存储器/存储装置1015中。图像识别应用1016可以连同处理系统1011、计算机可读介质1012和I/O接口1013一起实现关于图2描述的图像识别装置210的各个模块的全部功能。Computing device 1010 also includes an image recognition application 1016 . Image recognition application 1016 may be stored in memory/storage device 1015 as computational program instructions. The image recognition application 1016 may, in conjunction with the processing system 1011 , the computer readable medium 1012 and the I/O interface 1013 , implement the full functionality of the various modules of the image recognition device 210 described with respect to FIG. 2 .

本文可以在软件硬件元件或程序模块的一般上下文中描述各种技术。一般地，这些模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元素、组件、数据结构等。本文所使用的术语“模块”，“功能”和“组件”一般表示软件、固件、硬件或其组合。本文描述的技术的特征是与平台无关的，意味着这些技术可以在具有各种处理器的各种计算平台上实现。Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "function," and "component" as used herein generally refer to software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform independent, meaning that the techniques can be implemented on a variety of computing platforms with a variety of processors.

所描述的模块和技术的实现可以存储在某种形式的计算机可读介质上或者跨某种形式的计算机可读介质传输。计算机可读介质可以包括可由计算设备1010访问的各种介质。作为示例而非限制，计算机可读介质可以包括“计算机可读存储介质”和“计算机可读信号介质”。An implementation of the described modules and techniques may be stored on or transmitted across some form of computer-readable media. Computer-readable media may include various media that can be accessed by computing device 1010 . By way of example and not limitation, computer-readable media may include "computer-readable storage media" and "computer-readable signal media."

与单纯的信号传输、载波或信号本身相反，“计算机可读存储介质”是指能够持久存储信息的介质和/或设备，和/或有形的存储装置。因此，计算机可读存储介质是指非信号承载介质。计算机可读存储介质包括诸如易失性和非易失性、可移动和不可移动介质和/或以适用于存储信息（诸如计算机可读指令、数据结构、程序模块、逻辑元件/电路或其他数据）的方法或技术实现的存储设备之类的硬件。计算机可读存储介质的示例可以包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字通用盘（DVD）或其他光学存储装置、硬盘、盒式磁带、磁带，磁盘存储装置或其他磁存储设备，或其他存储设备、有形介质或适于存储期望信息并可以由计算机访问的制品。"Computer-readable storage medium" refers to media and/or devices capable of persistent storage of information, and/or tangible storage devices, as opposed to mere signal transmissions, carrier waves, or signals per se. Thus, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media includes, for example, volatile and nonvolatile, removable and non-removable media and/or are suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. ) hardware such as a storage device implemented by a method or technology. Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROMs, digital versatile disks (DVDs) or other optical storage devices, hard disks, cassettes, magnetic tape, magnetic disk storage A device or other magnetic storage device, or other storage device, tangible medium, or article of manufacture suitable for storing desired information and accessible by a computer.

“计算机可读信号介质”是指被配置为诸如经由网络将指令发送到计算设备1010的硬件的信号承载介质。信号介质典型地可以将计算机可读指令、数据结构、程序模块或其他数据体现在诸如载波、数据信号或其它传输机制的调制数据信号中。信号介质还包括任何信息传递介质。术语“调制数据信号”是指以这样的方式对信号中的信息进行编码来设置或改变其特征中的一个或多个的信号。作为示例而非限制，通信介质包括诸如有线网络或直接连线的有线介质以及诸如声、RF、红外和其它无线介质的无线介质。"Computer-readable signal medium" refers to a signal-bearing medium that is configured to transmit instructions to hardware of computing device 1010, such as via a network. Signal media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism. Signaling media also includes any information delivery media. The term "modulated data signal" refers to a signal that encodes information in the signal in such a manner as to set or change one or more of its characteristics. By way of example and not limitation, communication media includes wired media such as a wired network or direct-wired as well as wireless media such as acoustic, RF, infrared, and other wireless media.

如前所述，硬件元件1014和计算机可读介质1012代表以硬件形式实现的指令、模块、可编程器件逻辑和/或固定器件逻辑，其在一些实施例中可以用于实现本文描述的技术的至少一些方面。硬件元件可以包括集成电路或片上系统、专用集成电路（ASIC）、现场可编程门阵列（FPGA）、复杂可编程逻辑器件（CPLD）以及硅中的其它实现或其他硬件设备的组件。在这种上下文中，硬件元件可以作为执行由硬件元件所体现的指令、模块和/或逻辑所定义的程序任务的处理设备，以及用于存储用于执行的指令的硬件设备，例如，先前描述的计算机可读存储介质。As previously mentioned, hardware elements 1014 and computer-readable media 1012 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that, in some embodiments, may be used to implement the techniques described herein. At least some aspects. Hardware elements may include integrated circuits or systems on a chip, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), and other implementations in silicon or components of other hardware devices. In this context, a hardware element may act as a processing device for performing program tasks defined by the instructions, modules and/or logic embodied by the hardware element, as well as a hardware device for storing instructions for execution, eg, previously described computer-readable storage medium.

前述的组合也可以用于实现本文所述的各种技术和模块。因此，可以将软件、硬件或程序模块和其它程序模块实现为在某种形式的计算机可读存储介质上和/或由一个或多个硬件元件1014体现的一个或多个指令和/或逻辑。计算设备1010可以被配置为实现与软件和/或硬件模块相对应的特定指令和/或功能。因此，例如通过使用处理系统的计算机可读存储介质和/或硬件元件1014，可以至少部分地以硬件来实现将模块实现为可由计算设备1010作为软件执行的模块。指令和/或功能可以由一个或多个制品（例如，一个或多个计算设备1010和/或处理系统1011）可执行/可操作以实现本文所述的技术、模块和示例。Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Accordingly, software, hardware or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 1014 . Computing device 1010 may be configured to implement specific instructions and/or functions corresponding to software and/or hardware modules. Thus, a module may be implemented at least in part in hardware as a module executable by computing device 1010 as software, eg, through the use of computer-readable storage media and/or hardware elements 1014 of the processing system. The instructions and/or functions may be executable/operable by one or more articles of manufacture (eg, one or more computing devices 1010 and/or processing system 1011 ) to implement the techniques, modules, and examples described herein.

在各种实施方式中，计算设备1010可以采用各种不同的配置。这些配置中的每一个包括可以具有一般不同的构造和能力的设备，并且因此可以根据不同设备类别中的一个或多个配置计算设备1010。例如，计算设备1010可以被实现为包括个人计算机、台式计算机、多屏幕计算机、膝上型计算机、上网本等的计算机类设备。计算设备1010还可以被实现为包括诸如移动电话、便携式音乐播放器、便携式游戏设备、平板计算机、多屏幕计算机等移动设备的移动装置类设备。计算设备1010还可以实现为电视类设备，包括电视、机顶盒、游戏机等。In various implementations, computing device 1010 may take various different configurations. Each of these configurations includes devices that may have generally different configurations and capabilities, and thus computing device 1010 may be configured according to one or more of different device classes. For example, computing device 1010 may be implemented as a computer-type device including a personal computer, desktop computer, multi-screen computer, laptop computer, netbook, and the like. Computing device 1010 may also be implemented as a mobile device-type device including mobile devices such as mobile phones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 1010 may also be implemented as a television-type device, including televisions, set-top boxes, game consoles, and the like.

本文描述的技术可以由计算设备1010的这些各种配置来支持，并且不限于本文所描述的技术的具体示例。计算设备1010可以与作为系统平台的“云”1020交互。在一些实施例中，计算设备1010的功能还可以通过使用分布式系统、诸如通过如下所述的平台1030在“云”1020上全部或部分地实现。The techniques described herein may be supported by these various configurations of computing device 1010 and are not limited to the specific examples of the techniques described herein. Computing device 1010 may interact with "cloud" 1020 as a platform for the system. In some embodiments, the functionality of computing device 1010 may also be implemented in whole or in part on "cloud" 1020 through the use of a distributed system, such as through platform 1030 as described below.

云1020包括和/或代表用于资源1032的平台1030。平台1030抽象云1020的硬件（例如，服务器）和软件资源的底层功能。资源1032可以包括在远离计算设备1010的服务器上执行计算机处理时可以使用的应用和/或数据。资源1032还可以包括通过因特网和/或通过诸如蜂窝或Wi-Fi网络的订户网络提供的服务。Cloud 1020 includes and/or represents a platform 1030 for resources 1032 . The platform 1030 abstracts the underlying functionality of the hardware (eg, servers) and software resources of the cloud 1020 . Resources 1032 may include applications and/or data that may be used when computer processing is performed on servers remote from computing device 1010 . Resources 1032 may also include services provided over the Internet and/or over subscriber networks, such as cellular or Wi-Fi networks.

平台1030可以抽象资源和功能以将计算设备1010与其他计算设备连接。平台1030还可以用于抽象资源的分级以提供遇到的对于经由平台1030实现的资源1032的需求的相应水平的分级。因此，在互连设备实施例中，本文描述的功能的实现可以分布在整个系统1000内。例如，功能可以部分地在计算设备1010上以及通过抽象云1020的功能的平台1030来实现。Platform 1030 may abstract resources and functions to connect computing device 1010 with other computing devices. Platform 1030 may also be used to abstract the hierarchy of resources to provide a hierarchy of corresponding levels of requirements encountered for resources 1032 implemented via platform 1030 . Thus, in an interconnected device embodiment, implementation of the functions described herein may be distributed throughout the system 1000 . For example, functionality may be implemented in part on computing device 1010 and through platform 1030 that abstracts the functionality of cloud 1020 .

在本文的讨论中，描述了各种不同的实施例。应当领会和理解，本文描述的每个实施例可以单独使用或与本文所述的一个或多个其他实施例相关联地使用。In the discussion herein, various different embodiments are described. It should be appreciated and understood that each embodiment described herein may be used alone or in association with one or more of the other embodiments described herein.

尽管已经以结构特征和/或方法动作特定的语言描述了主题，但是应当理解，所附权利要求中限定的主题不一定限于上述具体特征或动作。相反，上述具体特征和动作被公开为实现权利要求的示例形式Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims

通过研究附图、公开内容和所附的权利要求书，本领域技术人员在实践所要求保护的主题时，能够理解和实现对于所公开的实施例的变型。在权利要求书中，词语“包括”不排除其他元件或步骤，不定冠词“一”或“一个”不排除多个，并且“多个”意指两个或更多。在相互不同的从属权利要求中记载了某些措施的仅有事实并不表明这些措施的组合不能用来获利。Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, the indefinite articles "a" or "an" do not exclude a plurality, and "a plurality" means two or more. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

1. An image recognition method, comprising:

Predict the lighting conditions in future periods based on multiple images captured by the camera at multiple consecutive moments;

generating imaging parameters to be sent to the camera based at least on the predicted lighting conditions;

initiating sending of the camera parameters to the camera so that the camera captures an image to be identified based on the camera parameters in the future time period; and

An object of interest is identified from the to-be-identified image.

2. The method of claim 1, wherein said predicting lighting conditions for future periods comprises:

Use the residual network to sequentially extract the respective illumination features of the multiple images;

Recursively processing the respective lighting features using a temporal recurrent neural network, wherein the residual network and the temporal recurrent neural network have been trained to learn changes in lighting conditions over time; and

the lighting conditions predicted by the temporal recurrent neural network output,

wherein the plurality of moments includes N moments, N is a positive integer greater than 1, and

Wherein, both the illumination feature of X _k+1 and the processing result of the temporal recurrent neural network on the illumination feature of X _k are input to the temporal recurrent neural network for recursive processing, where X _k represents the N time instants , and X _k+1 denotes an image captured at the k+1 th time among the N time instants, where k is an integer and 1≦k<N.

3. The method of claim 2, wherein the temporal recurrent neural network comprises a long short term memory network.

4. The method of claim 1, wherein said generating the camera parameters to be sent to the camera comprises using a residual structure-based neural network to map the predicted lighting conditions to the camera parameters of the first version, wherein The residual structure-based neural network has been trained to learn a mapping of lighting conditions to camera parameters, and wherein the first version of the camera parameters may be selected as the camera parameters to be sent to the camera.

5. The method of claim 4, wherein the residual structure-based neural network comprises a plurality of concatenated residual structures, and wherein the mapping of the predicted lighting conditions to the camera parameters of the first version comprises: The features characterizing the predicted lighting conditions are input into the residual structure-based neural network, so that the residual structure-based neural network performs operations including:

reshape the features that characterize the predicted lighting conditions;

Convolve the reshaped features;

apply an activation function to the convolved features;

Perform batch normalization on the features after applying the activation function to get a normalized feature map;

processing the normalized feature map using the plurality of concatenated residual structures; and

The processed feature maps are fed into a fully connected layer to obtain the first version of the camera parameters.

6. The method of claim 4, further comprising:

controlling the camera to capture a sample image in real time before capturing the to-be-recognized image;

based on the sample image, generating a second version of the camera parameters; and

performing a weighted average of the camera parameters of the first version and the camera parameters of the second version to obtain weighted camera parameters, which may be selected as the camera parameters to be sent to the camera ,in:

in a first mode of operation, the first version of the camera parameters is selected as the camera parameters to be sent to the camera, and

In a second operating mode different from the first operating mode, the weighted imaging parameters are selected as the imaging parameters to be sent to the camera.

7. The method of claim 1, wherein the object of interest comprises a subject's face, and wherein the identifying the object of interest comprises:

using a second deep neural network to detect face regions from the to-be-identified image; and

Facial features are extracted from the detected facial regions in the to-be-identified image using a third deep neural network for identifying the subject's identity.

8. The method of claim 7, further comprising:

Using the second deep neural network to detect facial regions from a registered image, wherein the registered image is an electronic photograph corresponding to the identity of the subject;

extracting facial features from the detected facial regions in the registered image using the third deep neural network; and

The facial features extracted from the image to be recognized are compared with the facial features extracted from the registered image to determine whether the subject matches the identity.

9. The method of claim 8, further comprising initiating reporting of an alert message to a system platform in response to an abnormal event in which the subject does not match the identity.

10. The method of claim 1, further comprising:

Using a first deep neural network to estimate the depth of the sample image;

Determine the foreground area and the background area, the brightness of the foreground area, and the brightness of the background area in the sample image according to the estimated depth;

adjusting the illumination intensity of at least one adjustable light source associated with the camera in response to a difference between the brightness of the foreground area and the brightness of the background area being outside a predetermined range; and

The steps of real-time shooting, estimating and determining are repeated until the brightness of the foreground region and the brightness of the background region in the current sample image are within the predetermined range.

11. The method of claim 10, wherein the estimating the depth of the sample image comprises:

resizing the sample image to have a first size; and

inputting the resized sample image into the first deep neural network, causing the first deep neural network to perform operations comprising:

performing multiple convolutions on the sample image of the first size; and

The convolved sample images are repeatedly processed to obtain a depth map of the first size, wherein the processing includes upsampling and convolution.

12. An image recognition device, comprising:

an illumination prediction module, configured to predict illumination conditions in future periods based on a plurality of images captured by the camera at a plurality of consecutive moments;

a camera parameter adjustment module configured to generate camera parameters to be sent to the camera based at least on the predicted lighting conditions, and initiate sending of the camera parameters to the camera so that the camera can use the camera parameters for the future time period based on the camera parameters shooting the image to be identified with the camera parameters; and

An image recognition module configured to recognize an object of interest from the to-be-recognized image.

13. A computing device comprising a memory and a processor, the memory configured to store thereon computer program instructions that, when executed on the processor, cause the processor to perform claim 1 The method of any one of -11.

14. A computer-readable storage medium having stored thereon computer program instructions which, when executed on a processor, cause the processor to perform the method of any of claims 1-11.

15. An image recognition system, comprising:

cameras; and

The computing device of claim 13.