RU2768545C1

RU2768545C1 - Method and system for recognition of the emotional state of employees

Info

Publication number: RU2768545C1
Application number: RU2021121283A
Authority: RU
Inventors: Дмитрий Владимирович Гордеев; Кирилл Андреевич Кондратьев; Константин Игоревич Островский
Original assignee: Общество С Ограниченной Ответственностью "Инновационный Центр Философия.Ит"
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2022-03-24

Abstract

FIELD: computer technology.SUBSTANCE: invention relates to the field of computer technology, in particular, to methods and systems for recognizing faces and emotions based on a video stream, and can be used to monitor the emotional state of employees in organizations. The technical result of the proposed technical solution is achieved by performing face recognition on frames received from the video stream, classifying emotions using the first neural network; compare the face image with the database of employees to obtain the employee identification coefficient based on the descriptors, determine the coefficient of the employees' emotional state by counting the aggregated emotions of the employees and the performance indicators of the employees, display the obtained coefficient of the employees' emotional state on the information media screen.EFFECT: improving the accuracy of recognition of the emotional state of employees.7 cl, 6 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Настоящее техническое решение относится к вычислительной технике, а в частности к способам и системам распознавания лиц и эмоций на основе видеопотока и может быть использовано для мониторинга эмоционального состояния сотрудников в организациях.The present technical solution relates to computer technology, and in particular to methods and systems for recognizing faces and emotions based on a video stream and can be used to monitor the emotional state of employees in organizations.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE INVENTION

Наиболее близким аналогом является источник информации US 10,496,947 B1, опубликованный 03.12.2019 г., раскрывающий способ и систему распознавания эмоций для аналитики персонала. Способ включает в себя этапы приема видео потока, включающего последовательность изображений, обнаружения человека на одном или нескольких изображениях, определения опорных точек характерных черт человека, выравнивания сетки виртуального лица по отношению к человеку на одном или нескольких изображениях, основанного, по меньшей мере частично, на контрольных точках, динамического определения по последовательности изображений по крайней мере одной деформации сетки виртуального лица, определение того, что по крайней мере одна деформация относится по крайней мере к одной лицевой эмоции, выбранной из множества эталонных лицевых эмоций.The closest analogue is the source of information US 10,496,947 B1, published on 03.12.2019, which reveals a method and an emotion recognition system for personnel analytics. The method includes the steps of receiving a video stream including a sequence of images, detecting a person in one or more images, determining anchor points of the person's features, aligning a virtual face grid with respect to the person in one or more images, based at least in part on control points, dynamically determining from the sequence of images at least one deformation of the virtual face mesh, determining that at least one deformation relates to at least one facial emotion selected from a plurality of reference facial emotions.

Предлагаемое решение отличается от известного из уровня техники решения тем, что не требует нахождения человека вблизи камеры для более точного распознавания эмоционального состояния сотрудника.The proposed solution differs from the prior art solution in that it does not require a person to be near the camera for a more accurate recognition of the employee's emotional state.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Технической задачей, на решение которой направлено заявленное решение, является разработка способа и системы распознавания эмоционального состояния сотрудников, охарактеризованных в независимых пунктах формулы изобретения. Дополнительные варианты реализации настоящего изобретения представлены в зависимых пунктах формулы изобретения.The technical problem to be solved by the claimed solution is the development of a method and system for recognizing the emotional state of employees, described in independent claims. Additional embodiments of the present invention are presented in dependent claims.

Технический результат заключается в повышении точности распознавания эмоционального состояния сотрудников. Дополнительным техническим результатом является увеличение производительности вычислительной системы при решении поставленной задачи (т.е. позволяет производить обработку с получением результата (продукта) за меньшее количество времени), тем самым снижая нагрузку на центральный процессор вычислительного устройства, за счет уменьшения количества обрабатываемых запросов.The technical result is to increase the accuracy of recognition of the emotional state of employees. An additional technical result is an increase in the performance of the computing system when solving the task (i.e., it allows processing to obtain a result (product) in less time), thereby reducing the load on the central processor of the computing device, by reducing the number of processed requests.

Заявленный технический результат достигается за счет работы компьютерно-реализуемого способа распознавания эмоционального состояния сотрудников, выполняющийся на вычислительном устройстве, который содержит процессор и память, хранящую инструкции, исполняемые процессором и включающие следующие этапы:The claimed technical result is achieved through the operation of a computer-implemented method for recognizing the emotional state of employees, running on a computing device that contains a processor and memory that stores instructions executed by the processor and includes the following steps:

на вычислительное устройство, в подсистему распознавания, передают по меньшей мере один кадр из видеопотока с по меньшей мере одной видеокамеры;at least one frame from the video stream from at least one video camera is transmitted to the computing device, to the recognition subsystem;

осуществляют распознавание по меньшей мере одного лица на кадрах, полученных из видеопотока, посредством модуля распознавания лиц подсистемы распознавания, применяя линейный классификатор, работающий на основе входных извлеченных признаков кадра;recognizing at least one face on frames received from the video stream by means of the face recognition module of the recognition subsystem, using a linear classifier operating on the basis of the input extracted features of the frame;

осуществляют классификацию эмоций по меньшей мере одного лица, распознанного на предыдущем этапе, посредством модуля классификации эмоций подсистемы распознавания, используя первую нейронную сеть;classifying the emotions of at least one person recognized in the previous step by means of the emotion classification module of the recognition subsystem using the first neural network;

осуществляют сопоставление изображения лица, распознанного с помощью модуля распознавания, с базой данных сотрудников, посредством модуля идентификации подсистемы распознавания, на основе дескрипторов, вычисляемых с помощью второй нейронной сети и передают полученные данные на сервер;comparing the face image recognized by the recognition module with the employee database by means of the recognition subsystem identification module based on the descriptors calculated using the second neural network and transmitting the received data to the server;

посредством сервера осуществляют определение коэффициента эмоционального состояния сотрудников путем подсчета агрегированных по установленным интервалам времени метрикам эмоций сотрудников, распознанных с помощью модуля классификации эмоций, и показателей работы сотрудников, и передают полученные данные на клиентский сервер;by means of the server, the coefficient of the emotional state of employees is determined by calculating the aggregated metrics of employees' emotions, recognized by the emotions classification module, and the performance indicators of employees, aggregated over the set time intervals, and transmitting the received data to the client server;

посредством клиентского сервера выводят полученный коэффициент эмоционального состояния сотрудников на информационный медиа-экран.by means of the client server, the obtained coefficient of the emotional state of the employees is displayed on the information media screen.

В частном варианте реализации предлагаемого способа, входные извлеченные признаки кадра представляют собой дескриптор области кадра размером 80×80, вычисленный по методу гистограмм ориентированных градиентов (HOG).In a particular implementation of the proposed method, the input extracted frame features are an 80×80 frame area descriptor calculated using the oriented gradient histogram (HOG) method.

В другом частном варианте реализации предлагаемого способа, эмоциональное состояние сотрудника определяется как положительное или нейтральное.In another particular embodiment of the proposed method, the emotional state of the employee is defined as positive or neutral.

В другом частном варианте реализации предлагаемого способа, положительное эмоциональное состояние сотрудника распознается, когда сотрудник в кадре улыбается.In another particular embodiment of the proposed method, the positive emotional state of the employee is recognized when the employee in the frame smiles.

В другом частном варианте реализации предлагаемого способа, при классификации эмоций по меньше мере одного лица, при работе первой нейронной сети осуществляют разделение операции свертки на свертку по ширине и по глубине.In another particular embodiment of the proposed method, when classifying emotions of at least one person, when the first neural network is running, the convolution operation is divided into convolution in width and depth.

В другом частном варианте реализации предлагаемого способа, в качестве меры схожести при сопоставлении распознанного лица с базой данных сотрудников используют евклидово расстояние межу дескрипторами изображений лиц.In another particular embodiment of the proposed method, the Euclidean distance between face image descriptors is used as a measure of similarity when comparing a recognized face with a database of employees.

Заявленный технический результат также достигается за счет осуществления системы распознавания эмоционального состояния сотрудников, содержащей:The claimed technical result is also achieved through the implementation of a system for recognizing the emotional state of employees, containing:

по меньшей мере одну видеокамеру, выполненную с возможностью записи видеопотока и передачи по меньшей мере одного кадра из видеопотока на вычислительное устройство;at least one video camera configured to record the video stream and transmit at least one frame from the video stream to the computing device;

подсистему распознавания, включающую:recognition subsystem, including:

модуль распознавания лиц, выполненный с возможностью распознавания по меньшей мере одного лица на кадрах, полученных из видеопотока, и передачи распознанного лица в модуль классификации эмоций и в модуль идентификации,a face recognition module configured to recognize at least one face in frames received from the video stream and transfer the recognized face to the emotion classification module and to the identification module,

модуль классификации эмоций, выполненный с возможностью классификации эмоций по меньшей мере одного лица, распознанного на кадре, на предыдущем этапе,an emotion classification module configured to classify the emotions of at least one face recognized in the frame in the previous step,

модуль идентификации, выполненный с возможностью сопоставления изображения лица, распознанного с помощью модуля распознавания лиц, с базой данных сотрудников;an identification module, configured to match the face image recognized by the face recognition module with a database of employees;

базу данных;database;

сервер, выполненный с возможностью записи результатов классификации эмоций по меньшей мере одного распознанного лица в базу данных, определения коэффициента эмоционального состояния сотрудников и передачи коэффициента эмоционального состояния сотрудников на клиентский сервер;a server configured to record the emotion classification results of the at least one recognized face into a database, determine the employees' emotional state coefficient, and transmit the employees' emotional state coefficient to the client server;

клиентский сервер, выполненный с возможностью передачи коэффициента эмоционального состояния сотрудников на информационный медиа-экран;a client server configured to transmit the coefficient of the emotional state of employees to the information media screen;

информационный медиа-экран, выполненный с возможностью отображения коэффициента эмоционального состояния сотрудников.information media screen, configured to display the coefficient of the emotional state of employees.

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF THE DRAWINGS

Реализация изобретения будет описана в дальнейшем в соответствии с прилагаемыми чертежами, которые представлены для пояснения сути изобретения и никоим образом не ограничивают область изобретения. К заявке прилагаются следующие чертежи:The implementation of the invention will be described hereinafter in accordance with the accompanying drawings, which are presented to explain the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

Фиг. 1 иллюстрирует блок-схему работы предлагаемого компьютерно-реализуемого способа распознавания эмоционального состояния сотрудников.Fig. 1 illustrates a block diagram of the operation of the proposed computer-implemented method for recognizing the emotional state of employees.

Фиг. 2 иллюстрирует архитектурную схему варианта реализации системы распознавания эмоционального состояния сотрудников.Fig. 2 illustrates the architectural scheme of an embodiment of a system for recognizing the emotional state of employees.

Фиг. 3 иллюстрирует архитектурную схему варианта реализации подсистемы распознавания.Fig. 3 illustrates an architectural diagram of an embodiment of a recognition subsystem.

Фиг. 4 иллюстрирует блок-схему операций нейронной сети для классификации типа эмоций.Fig. 4 illustrates a flowchart of neural network operations for emotion type classification.

Фиг. 5 иллюстрирует блок-схему сверточного блока типа 2 нейронной сети для классификации типа эмоций.Fig. 5 illustrates a block diagram of a type 2 convolutional block of a neural network for emotion type classification.

Фиг. 6, иллюстрирует схему работы вычислительного устройства.Fig. 6 illustrates the operation of the computing device.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В приведенном ниже подробном описании реализации изобретения приведены многочисленные детали реализации, призванные обеспечить отчетливое понимание настоящего изобретения. Однако, квалифицированному в предметной области специалисту, будет очевидно каким образом можно использовать настоящее изобретение, как с данными деталями реализации, так и без них. В других случаях хорошо известные методы, процедуры и компоненты не были описаны подробно, чтобы не затруднять излишне понимание особенностей настоящего изобретения.In the following detailed description of the implementation of the invention, numerous implementation details are provided to provide a clear understanding of the present invention. However, it will be apparent to one skilled in the art how the present invention can be used, both with and without these implementation details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to unnecessarily obscure the features of the present invention.

Кроме того, из приведенного изложения будет ясно, что изобретение не ограничивается приведенной реализацией. Многочисленные возможные модификации, изменения, вариации и замены, сохраняющие суть и форму настоящего изобретения, будут очевидными для квалифицированных в предметной области специалистов.Moreover, it will be clear from the foregoing that the invention is not limited to the present implementation. Numerous possible modifications, changes, variations and substitutions that retain the spirit and form of the present invention will be apparent to those skilled in the subject area.

Нижеуказанные термины и определения применяются в данной заявке, если иное явно не указано. Ссылки на методики, используемые при описании данного изобретения, относятся к хорошо известным методам, включая изменения этих методов и замену их эквивалентными методами, известными специалистам в данной области техники.The following terms and definitions apply in this application, unless otherwise expressly stated. References to techniques used in the description of this invention refer to well-known methods, including modifications of these methods and replacing them with equivalent methods known to those skilled in the art.

На Фиг. 1 проиллюстрирована блок-схема работы предлагаемого компьютерно-реализуемого способа распознавания эмоционального состояния сотрудников.On FIG. Figure 1 illustrates a block diagram of the operation of the proposed computer-implemented method for recognizing the emotional state of employees.

С по меньшей мере одной камеры видеонаблюдения, на вычислительное устройство, поступает видеопоток, посредством средств сетевого взаимодействия. Рекомендуемое качество изображений не ниже 720р (размер кадра 1280×720 пикселей). Кадровая частота может составлять 20 кадров в секунды и выше, в зависимости от вычислительных возможностей аппаратного обеспечения для обработки данных.From at least one video surveillance camera, a video stream arrives at the computing device by means of network interaction. The recommended image quality is at least 720p (frame size 1280×720 pixels). The frame rate can be 20 frames per second or higher, depending on the computing power of the data processing hardware.

Из полученного видеопотока получают по меньшей мере один текущий кадр с камеры видеонаблюдения в форме цветного изображения в формате RGB (этап 105) и передают текущий кадр в подсистему распознавания (210).From the received video stream, at least one current frame from the video surveillance camera is obtained in the form of a color image in RGB format (step 105) and the current frame is transmitted to the recognition subsystem (210).

На этапе 110 осуществляют распознавание по меньшей мере одного лица на кадрах, полученных из видеопотока, посредством модуля распознавания лиц (310) подсистемы распознавания (210), применяя линейный классификатор, работающий на основе входных извлеченных признаков кадра.At step 110, at least one face is recognized on frames received from the video stream by means of the face recognition module (310) of the recognition subsystem (210), using a linear classifier operating on the basis of the input extracted features of the frame.

Распознавание по меньшей мере одного лица заключается в выделении прямоугольных областей изображения минимального размера на кадрах, содержащих полностью лицо человека. Распознавание по меньшей мере одного лица осуществляется с применением линейного классификатора опорных векторов (SVM) обученного на размеченном наборе примеров, а именно осуществляют применение ко входному изображению скользящего окна. Входной информацией для модели классификатора являются, вычисленные на основе изображения, вектора числовых признаков. В одном варианте реализации для вычисления числовых признаков применяется метод гистограмм ориентированных градиентов (HOG), содержащий следующие шаги.Recognition of at least one face consists in selecting rectangular areas of the image of the minimum size on frames containing the entire face of a person. Recognition of at least one face is carried out using a linear support vector classifier (SVM) trained on a labeled set of examples, namely, a sliding window is applied to the input image. The input information for the classifier model is the vector of numerical features calculated on the basis of the image. In one implementation, a histogram of oriented gradients (HOG) method is used to compute numeric features, comprising the following steps.

1. Полученное цветное изображение кадра преобразуют в изображение в оттенках серого, посредством известных методов.1. The resulting color frame image is converted into a grayscale image using known methods.

2. Вычисляют горизонтальные и вертикальные градиенты по формулам2. Calculate horizontal and vertical gradients using the formulas

где I(x,y) - значение интенсивности пикселя изображения с координатами х, у, G - длина вектора градиента, θ - угол направления вектора.where I(x, y) is the intensity value of the image pixel with x, y coordinates, G is the length of the gradient vector, θ is the direction angle of the vector.

3. Окно изображения размером 80×80 пикселей делится на ячейки величиной 8×8 пикселей.3. An 80×80 pixel image window is divided into 8×8 pixel cells.

4. Для каждой ячейки вычисляется гистограмма из 9-ти элементов. Элементы гистограммы обозначают углы направления векторов градиента с шагом

Длины векторов градиента, сгруппированные по направлению, суммируются.4. For each cell, a histogram of 9 elements is calculated. The elements of the histogram denote the direction angles of the gradient vectors with a step

The lengths of the gradient vectors grouped by direction are summed.

5. Выполняется конкатенация гистограмм в блоках соседних ячеек размером 2×2. Полученный для каждого блока вектор, размером 36 элементов, нормализуется: выполняется операция деления элементов вектора на норму вектора.5. The histograms are concatenated in blocks of neighboring cells sized 2×2. The vector obtained for each block, with a size of 36 elements, is normalized: the operation of dividing the vector elements by the vector norm is performed.

6. Выполняется конкатенация векторов по всем блокам окна. Полученный таким образом результирующий вектор является дескриптором - вектором признаков, описывающим фрагмент изображения кадра.6. The vectors are concatenated over all blocks of the window. The resulting vector thus obtained is a descriptor - a feature vector that describes a frame image fragment.

Дескриптор подается на вход линейного классификатора SVM, на выходе получается вероятность нахождения лица человека в рассматриваемом окне, а именно: бинарный признак наличия лица на изображении.The descriptor is fed to the input of the SVM linear classifier, the output is the probability of finding a person's face in the considered window, namely: a binary sign of the presence of a face in the image.

7. Шаги 3-7 повторяются для следующего окна, полученного сдвигом на 1 пиксель по горизонтали и вертикали.7. Steps 3-7 are repeated for the next window, obtained by shifting 1 pixel horizontally and vertically.

8. В качестве областей, содержащих лица, выбираются окна с вероятностью, превышающей экспериментально подобранное пороговое значение.8. As areas containing faces, windows with a probability exceeding the experimentally selected threshold value are selected.

В одном варианте реализации для обучения линейного классификатора SVM используются примеры из датасета LFW, описанного в статье Huang G. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, 2008.One implementation uses examples from the LFW dataset described in Huang G. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, 2008 to train a linear SVM classifier.

Кроме того, может использоваться одна из открытых реализаций алгоритма распознавания лиц, например, из библиотеки Dlib, описанной в статье King D. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009).In addition, one of the open source implementations of the face recognition algorithm can be used, for example, from the Dlib library described in King D. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009).

Обучающие примеры представляют собой пары: дескриптор области изображения размером 80×80, вычисленный по методу HOG и бинарный признак наличия лица.The training examples are pairs: an 80×80 image area descriptor calculated using the HOG method and a binary sign of the presence of a face.

Полученный результат модуля распознавания лица передают в модуль классификации эмоций, посредством средств сетевого взаимодействия.The result of the face recognition module is transmitted to the emotion classification module by means of network interaction.

После того как на кадре было распознано по меньшей мере одно лицо, на этапе 115, осуществляют классификацию эмоций по меньшей мере одного лица, распознанного посредством модуля классификации эмоций (315) подсистемы распознавания (210), используя первую сверточную нейронную сеть.After at least one face has been recognized in the frame, at step 115, emotions are classified for at least one face recognized by the emotion classification module (315) of the recognition subsystem (210) using the first convolutional neural network.

Выделяются два типа эмоционального состояния: положительное, в случае, когда человек в кадре улыбается, и нейтральное в противном случае. Определение улыбки человека в кадре, как уже было описано выше, осуществляется посредством использования первой нейронной сети, которая определяет на изображении лица расположение ключевых точек в области глаз, щек, губ и движения лицевых мышц. Когда человек улыбается, данный процесс сопровождается характерной мимикой лица: уголки губ поднимаются и вместе с этим поднимаются щеки, смещаются точки вокруг глаз. При определении эмоции, нейронная сеть анализирует область лица на изображении на выявление вышеуказанных признаков, а именно, но не ограничиваясь: высота уголков губ, подъем щек, сужение глаз.There are two types of emotional state: positive, in the case when the person in the frame is smiling, and neutral otherwise. Determining the smile of a person in the frame, as already described above, is carried out by using the first neural network, which determines the location of key points in the area of the eyes, cheeks, lips and movements of the facial muscles on the face image. When a person smiles, this process is accompanied by a characteristic facial expression: the corners of the lips rise and at the same time the cheeks rise, the points around the eyes shift. When determining an emotion, the neural network analyzes the face area in the image to identify the above features, namely, but not limited to: the height of the corners of the lips, the rise of the cheeks, the narrowing of the eyes.

Для классификации эмоций по меньшей мере одного распознанного лица, в кадре видеопотока, применяется первая нейронная сеть сверточного типа. В целях сокращения количества обучаемых параметров модели для снижения требований к объему обучающей выборки и ускорения работы сети в режиме эксплуатации, при выборе архитектуры модели может применяться подход с разделением операции свертки на свертку по ширине и глубине, например, аналогичный описанному в статье Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).To classify the emotions of at least one recognized face, in the frame of the video stream, the first neural network of the convolutional type is used. In order to reduce the number of model parameters to be trained in order to reduce the requirements for the size of the training sample and speed up the operation of the network in the operational mode, when choosing the model architecture, an approach can be used with the division of the convolution operation into convolution in width and depth, for example, similar to that described in the article Chollet F. Xception : Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

Классификация типа эмоционального состояния человека по изображению лица с помощью первой нейронной сети сверточного типа включает следующие шаги:The classification of the type of emotional state of a person according to the face image using the first convolutional neural network includes the following steps:

1. Преобразование цветного RGB изображения кадра в изображение в оттенках серого, посредством известных методов.1. Converting a color RGB image of a frame into a grayscale image using known methods.

2. Масштабирование изображения к целевому размеру 64×64.2. Scale the image to the target size of 64×64.

3. Нормализация изображения с математическим ожиданием и среднеквадратичным отклонением, подсчитанным для обучающей выборки.3. Image normalization with mathematical expectation and standard deviation calculated for the training sample.

4. Выполнение преобразования нормализованного одноканального изображения размером 64×64 первой сверточной нейронной сетью.4. Performing the transformation of the normalized single-channel image with a size of 64×64 by the first convolutional neural network.

Схема осуществляемых операций для одного варианта реализации показана на Фиг. 4. Первая сверточная нейронная сеть получает на вход одноканальное изображение лица (405). К изображению, последовательно, применяются сверточные блоки (410), (415), (420). Сверточный блок типа 1 применяется два раза и состоит из 8-ми сверточных фильтров размера 3×3 пикселя, применяемых ко входным каналам карт признаков с шагом в 1 пиксель. Операция свертки описывается формулой:A flowchart for one implementation is shown in FIG. 4. The first convolutional neural network receives a single-channel face image as input (405). Convolutional blocks (410), (415), (420) are applied to the image sequentially. Type 1 convolutional block is applied twice and consists of 8 3×3 pixel size convolutional filters applied to feature map input channels in 1 pixel steps. The convolution operation is described by the formula:

где K - размер фильтра, С - количество входных каналов признаков, L - количество выходных каналов, x_ijc - значения признака в позиции (i, j) в канале с,

- вес фильтра I в позиции (i, j) для канала с. К выходным значениям признаков применяется функция активации видаwhere K is the filter size, C is the number of feature input channels, L is the number of output channels, x _ijc are feature values at position (i, j) in channel c,

- weight of the filter I at position (i, j) for channel c. An activation function of the form is applied to the output values of the features

Сверточный блок типа 2 применяется 4 раза с различным числом фильтров. В одном варианте реализации используется 16, 32, 64, 128 фильтров, однако специалисту должно быть ясно, что могут использоваться и другие фильтры.Type 2 convolution block is applied 4 times with different number of filters. In one implementation, 16, 32, 64, 128 filters are used, however, one skilled in the art will appreciate that other filters may be used.

Схема сверточного блока типа 2 для одного варианта реализации показана на Фиг. 5. Ко входным картам признаков параллельно применяется 2 типа преобразований.A type 2 convolutional block diagram for one implementation is shown in FIG. 5. 2 types of transformations are applied in parallel to the input feature maps.

Преобразование типа 1 включается в себя последовательное двукратное применение Р фильтров (где Р - архитектурный параметр блока) разделенной свертки (510) и функции активации (515), которая была описана выше, применение пулинга (520) для областей размером 2×2 с шагом 2 пикселя. Разделенная операция свертки представляет собой последовательное применение свертки с размером фильтра 3×3 для каждого канала и свертки с размером фильтра 1×1 между каналами:Type 1 transformation includes successive double application of P filters (where P is an architectural parameter of the block) of a split convolution (510) and an activation function (515) that was described above, the application of pooling (520) for 2×2 regions with a step of 2 pixel. The split convolution operation is the sequential application of a convolution with a filter size of 3x3 for each channel and a convolution with a filter size of 1x1 between the channels:

где

веса I-го плоского фильтра, w'_c - веса фильтра свертки каналов 1×1. Пулинг выполняется как замена области 2×2 пикселя на максимальное значение в этой области. В результате применения операции пулинга с шагом 2 размер карты признаков уменьшается в 2 раза. Преобразование типа 2 представляет собой применение свертки с Р фильтрами размера 1×1 с шагом 2 пикселя (525), полученные в результате карты признаков имеют вдвое меньший размер относительно исходных. Выполняется конкатенация каналов (530) для карт признаков, полученных в результате применения преобразований типа 1 и 2. Результирующие карты признаков (535) содержат 2Р каналов.where

weights of the I-th flat filter, w' _c - weights of the 1×1 channel convolution filter. Pooling is done as replacing a 2x2 pixel area with the maximum value in that area. As a result of applying the pooling operation with a step of 2, the size of the feature map is reduced by 2 times. Type 2 transformation is the application of convolution with P filters of size 1×1 with a step of 2 pixels (525), the resulting feature maps are half the size of the original ones. The channels are concatenated (530) for the feature maps resulting from applying type 1 and type 2 transformations. The resulting feature maps (535) contain 2P channels.

Сверточный блок типа 3 применяется однократно и представляет собой выполнение операций свертки с 2-мя фильтрами размера 3×3, пулинга с выбором среднего значения в каждой из карт признаков, применение функцииThe type 3 convolutional block is applied once and represents the execution of convolution operations with 2 filters of size 3 × 3, pooling with the choice of the average value in each of the feature maps, application of the function

для получения элементов p_i вектора распределения вероятностей типа эмоционального состояния (425). Веса всех фильтров свертки являются обучаемыми параметрами модели и подбираются с применением алгоритмов стохастического градиентного спуска и обратного распространения ошибки с минимизацией функции потерь типа перекрестной энтропии.to obtain elements p _i of the probability distribution vector of the emotional state type (425). The weights of all convolution filters are learning parameters of the model and are selected using stochastic gradient descent and backpropagation algorithms with minimization of the cross entropy loss function.

5. В качестве результата алгоритма выбирается наиболее вероятный тип эмоционального состояния, на основе полученного на шаге 4 вектора распределения вероятностей. Данный результат представлен как коэффициент распознавания эмоций - бинарный признак положительной эмоции или нейтральной эмоции.5. As a result of the algorithm, the most probable type of emotional state is selected based on the probability distribution vector obtained in step 4. This result is presented as an emotion recognition coefficient - a binary sign of a positive emotion or a neutral emotion.

Например, для распознанного лица получили вектор распределения вероятностей по типам эмоционального состояния с элементами 0.8 и 0.2. Принимается положительное эмоциональное состояние: бинарный признак положительной эмоции устанавливается в значение 1. Коэффициент уверенности распознавания эмоции принимается равным наибольшей вероятности 0.8.For example, for a recognized face, a probability distribution vector was obtained for the types of emotional state with elements 0.8 and 0.2. A positive emotional state is assumed: the binary sign of a positive emotion is set to 1. The emotion recognition confidence coefficient is taken equal to the highest probability of 0.8.

На шаге 120 осуществляют сопоставление изображения лица, распознанного с помощью модуля распознавания (310) подсистемы распознавания (210), с базой данных сотрудников (225), посредством модуля идентификации (320) подсистемы распознавания (210), на основе дескрипторов, вычисляемых с помощью второй нейронной сети.At step 120, the face image recognized by the recognition module (310) of the recognition subsystem (210) is compared with the employee database (225) by means of the identification module (320) of the recognition subsystem (210), based on the descriptors calculated using the second neural network.

Результатом сопоставления изображения распознанного лица и изображения из базы данных сотрудников является получение коэффициента идентификации сотрудника, для уверенности того, что сотрудник распознан корректно. Данный коэффициент представлен в качестве бинарного признака.The result of comparing the image of the recognized face and the image from the database of employees is to obtain the coefficient of identification of the employee, to ensure that the employee is recognized correctly. This coefficient is presented as a binary feature.

Для сопоставления могут применяться дескрипторы - представление изображения в виде вектора вещественных чисел признаков. Для вычисления дескриптора размером 128 элементов в качестве кодировщика применяется вторая нейронная сеть сверточного типа, обученная на открытом наборе примеров фотографий известных людей. В одном варианте реализации используется архитектура нейронной сети ResNet, описанная в статье Kaiming Не et al. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.For comparison, descriptors can be used - representation of the image as a vector of real numbers of features. To calculate a descriptor with a size of 128 elements, a second convolutional neural network is used as an encoder, trained on an open set of sample photos of famous people. In one implementation, the ResNet neural network architecture described in Kaiming He et al. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.

В качестве меры схожести при сравнении изображений распознанных лиц, с помощью модуля распознавания лиц (310) подсистемы распознавания (210) и лиц, находящихся в базе данных сотрудников (225) используется евклидово расстояние межу дескрипторами изображений распознанных лиц и лиц, находящихся в базе данных сотрудников (225). Наименьшее из вычисленных расстояний, между входным объектом (распознанным лицом) и изображениями лиц, находящимися в базе данных сотрудников (225), сравнивается с заданным, в качестве параметра, пороговым значением. В случае, когда наименьшее расстояние не превышает пороговое значение, текущее изображение распознанного лица принимается идентифицированным по базе изображений сотрудников, в противном случае лицо считается не идентифицированным.As a measure of similarity when comparing images of recognized faces, using the face recognition module (310) of the recognition subsystem (210) and faces in the employee database (225), the Euclidean distance between the descriptors of images of recognized faces and faces in the employee database is used (225). The smallest of the calculated distances between the input object (recognized face) and face images in the employee database (225) is compared with the threshold value specified as a parameter. In the case when the smallest distance does not exceed the threshold value, the current image of the recognized face is assumed to be identified from the database of employee images, otherwise the face is considered unidentified.

В нижеуказанном примере, пороговое значение устанавливается как расстояние между дескрипторами равным 0,6, однако специалисту должно быть понятно, что данный порог может быть изменен в большую или меньшую сторону. В базе данных сотрудников содержится 2 изображения сотрудника, для одного изображения распознанного лица получили расстояния между дескрипторами 0.2 и 0.5, наименьшее расстояние 0.2 (до изображения сотрудника 1) не превышает порог 0.6, в этом случае принимается, что изображение распознанного лица соответствует изображению сотрудника 1.In the example below, the threshold value is set as the distance between the descriptors equal to 0.6, however, the specialist should be clear that this threshold can be changed up or down. The database of employees contains 2 images of an employee, for one image of a recognized face, the distances between descriptors 0.2 and 0.5 were obtained, the smallest distance 0.2 (to the image of employee 1) does not exceed the threshold 0.6, in this case it is assumed that the image of the recognized face corresponds to the image of employee 1.

Для другого изображения распознанного лица получили расстояния 0.9 и 0.7, наименьшее расстояние 0.7 (по отношению к изображению сотрудника 2) выше порога 0.6, поэтому распознанное лицо считается не идентифицированным: не удалось подобрать достаточно похожий пример в базе данных сотрудников.For another image of a recognized face, distances of 0.9 and 0.7 were obtained, the smallest distance of 0.7 (in relation to the image of employee 2) is above the threshold of 0.6, so the recognized face is considered not identified: it was not possible to find a sufficiently similar example in the database of employees.

Полученные результаты в определенном формате передается на сервер. Полученные результаты передаются в виде текстового сообщения, содержащего структуру данных со следующими полями: идентификатор сообщения, идентификатор камеры, отметка времени, количество распознанных лиц, количество положительных эмоций, а также поле «подробности по сотрудникам». Поле «подробности по сотрудникам» представляет собой массив структур с полями: идентификатор сотрудника, бинарный признак положительной эмоции, коэффициент идентификации сотрудника, коэффициент распознавания эмоций.The results obtained in a certain format are transmitted to the server. The received results are transmitted as a text message containing a data structure with the following fields: message ID, camera ID, time stamp, number of recognized faces, number of positive emotions, and the “employee details” field. The "details for employees" field is an array of structures with fields: employee ID, binary sign of positive emotion, employee identification coefficient, emotion recognition coefficient.

Результат, в виде текстового сообщения, передается по сети, посредством средств сетевого взаимодействия, на сервер, в виде текста в формате JSON. Могут также применяться другие текстовые форматы, например, XML, или бинарное представление данных с целью сжатия.The result, in the form of a text message, is transmitted over the network, through networking tools, to the server, in the form of text in JSON format. Other textual formats may also be used, such as XML, or a binary representation of the data for compression purposes.

Передача данных осуществляется в синхронном режиме по протоколу HTTP или асинхронно с помощью сервера очередей сообщений, который может использовать протоколы, но не ограничиваясь, AMQP, Kafka, OpenWire, STOMP, MQTT.Data transfer is carried out synchronously via the HTTP protocol or asynchronously using a message queue server that can use protocols, but not limited to, AMQP, Kafka, OpenWire, STOMP, MQTT.

На шаге 125, посредством сервера (220), осуществляют определение коэффициента эмоционального состояния сотрудников путем подсчета агрегированных по установленным интервалам времени метрикам эмоций сотрудников, распознанных с помощью модуля классификации эмоций (315) подсистемы распознавания (210), и показателей работы сотрудников, и передают полученные данные на клиентский сервер (230).At step 125, by means of the server (220), the coefficient of the emotional state of employees is determined by calculating the aggregated metrics of employee emotions, recognized by the emotion classification module (315) of the recognition subsystem (210), and the performance indicators of employees, aggregated over the set time intervals, and transmitting the received data to the client server (230).

Показателем работы сотрудников является рабочая активность в течение рабочего дня. Показатель рабочей активности определяется как количество людей, проходящих в области обзора камеры за минуту при осреднении за последний час.An indicator of the work of employees is the work activity during the working day. The work activity indicator is defined as the number of people passing through the camera's field of view per minute, averaged over the last hour.

Положительный уровень эмоционального состояния сотрудников вычисляется как отношение выявленных случаев положительных эмоций к общему количеству выделенных в видеопотоке лиц людей за последний час. Например, за интервал времени T=1 час было получено N=100 наблюдений из них в Р=20 случаях классифицирован положительный тип эмоций, уровень положительного эмоционального состояния за интервал времени T вычисляется как r=P/N=20%. В одном варианте реализации метрика рабочей активности сотрудников определяется как результат осреднения числа наблюдений n_i за интервалы времени t_i в течение периода времени ТThe positive level of employees' emotional state is calculated as the ratio of detected cases of positive emotions to the total number of people's faces highlighted in the video stream over the last hour. For example, for the time interval T=1 hour, N=100 observations were obtained, of which in P=20 cases a positive type of emotions was classified, the level of a positive emotional state for the time interval T is calculated as r=P/N=20%. In one embodiment, the employee work activity metric is defined as the result of averaging the number of observations n _i over time intervals t _i over a period of time T

где М число интервалов, на которые разбивается период времени Т. Например может выполняться осреднение числа наблюдений по минутам в течение часа.where M is the number of intervals into which the time period T is divided. For example, the number of observations can be averaged over minutes over an hour.

Посредством клиентского сервера (230) выводят полученный коэффициент эмоционального состояния сотрудников на информационный медиа-экран (235) посредством средств сетевого взаимодействия.By means of the client server (230), the obtained coefficient of the emotional state of employees is displayed on the information media screen (235) by means of network interaction.

Вышеописанный способ выполняется за счет осуществления системы распознавания эмоционального состояния сотрудников, представленной на Фиг. 2, которая включает: по меньшей мере одну видеокамеру (205); подсистему распознавания (210), содержащую модуль распознавания лиц (310), модуль классификации эмоций (315), модуль идентификации (320); базу данных (225); сервер (220); клиентский сервер (230); информационный медиа-экран (235).The above-described method is performed by implementing the employee emotional state recognition system shown in FIG. 2 which includes: at least one video camera (205); a recognition subsystem (210) containing a face recognition module (310), an emotion classification module (315), an identification module (320); database (225); server (220); client server (230); information media screen (235).

Подсистема распознавания (210) включает:The recognition subsystem (210) includes:

модуль распознавания лиц (310), выполненный с возможностью распознавания по меньшей мере одного лица на кадрах, полученных из видеопотока, и передачи распознанного лица в модуль классификации эмоций и в модуль идентификации;a face recognition module (310) configured to recognize at least one face in frames received from the video stream and transmit the recognized face to the emotion classification module and to the identification module;

модуль классификации эмоций (315), выполненный с возможностью классификации эмоций по меньшей мере одного лица, распознанного на кадре, на предыдущем этапе;an emotion classification module (315), configured to classify the emotions of at least one face recognized in the frame in the previous step;

модуль идентификации (320), выполненный с возможностью сопоставления изображения лица, распознанного с помощью модуля распознавания лиц, с базой данных сотрудников.an identification module (320) configured to match an image of a face recognized by the face recognition module with a database of employees.

База данных (225) является средством хранения данных распознавания, а также содержит данные, которые используются для сопоставления изображения лица, распознанного с помощью модуля распознавания подсистемы распознавания, с изображениями сотрудников.The database (225) is a means of storing recognition data, and also contains data that is used to match the face image recognized by the recognition module of the recognition subsystem with images of employees.

Сервер (220), выполненный с возможностью записи результатов классификации эмоций по меньшей мере одного распознанного лица в базу данных, а также определения коэффициента эмоционального состояния сотрудников и передачи коэффициента эмоционального состояния сотрудников на клиентский сервер по протоколу HTTP.Server (220) configured to record emotion classification results of at least one recognized face in a database, as well as to determine the employee emotional state coefficient and transmit the employee emotional state coefficient to the client server via the HTTP protocol.

Клиентский сервер (230), выполненный с возможностью передачи коэффициента эмоционального состояния сотрудников на информационный медиа-экран.Client server (230), configured to transfer the coefficient of the emotional state of employees to the information media screen.

Информационный медиа-экран (235), выполненный с возможностью отображения коэффициента эмоционального состояния сотрудников. Информационный медиа-экран представляет собой экран, на котором отображается общее количество наблюдений (лиц), доля положительных эмоций от общего числа наблюдений, уровень рабочей активности за последний час, который отображается в виде температуры в диапазоне 0-30 градусов, уровень положительного эмоционального состояния за последний час, который отображается в форме облачности с тремя градациями: солнечно (высокий уровень положительных эмоций), преимущественно облачно (средний уровень), грозовой фронт (низкий уровень), а также изменение уровня рабочей активности по часам в течение дня.Information media screen (235), configured to display the coefficient of the emotional state of employees. The information media screen is a screen that displays the total number of observations (faces), the proportion of positive emotions from the total number of observations, the level of work activity for the last hour, which is displayed as a temperature in the range of 0-30 degrees, the level of positive emotional state for the last hour, which is displayed in the form of clouds with three gradations: sunny (high level of positive emotions), mostly cloudy (medium level), thunderstorm (low level), as well as the change in the level of work activity by hours during the day.

На Фиг. 6 далее будет представлена общая схема вычислительного устройства (600), обеспечивающего обработку данных, необходимую для реализации заявленного решения.On FIG. 6, the general scheme of the computing device (600) will be presented below, providing the data processing necessary to implement the claimed solution.

В общем случае устройство (600) содержит такие компоненты, как: один или более процессоров (601), по меньшей мере одну память (602), средство хранения данных (603), интерфейсы ввода/вывода (604), средство В/В (605), средства сетевого взаимодействия (606).In general, the device (600) contains components such as: one or more processors (601), at least one memory (602), storage media (603), input/output interfaces (604), I/O ( 605), networking tools (606).

Процессор (601) устройства выполняет основные вычислительные операции, необходимые для функционирования устройства (600) или функциональности одного или более его компонентов. Процессор (601) исполняет необходимые машиночитаемые команды, содержащиеся в оперативной памяти (602).The processor (601) of the device performs the basic computing operations necessary for the operation of the device (600) or the functionality of one or more of its components. The processor (601) executes the necessary machine-readable instructions contained in the main memory (602).

Память (602), как правило, выполнена в виде ОЗУ и содержит необходимую программную логику, обеспечивающую требуемый функционал.The memory (602) is typically in the form of RAM and contains the necessary software logic to provide the desired functionality.

Средство хранения данных (603) может выполняться в виде HDD, SSD дисков, рейд массива, сетевого хранилища, флэш-памяти, оптических накопителей информации (CD, DVD, MD, Blue-Ray дисков) и т.п. Средство (603) позволяет выполнять долгосрочное хранение различного вида информации, например, вышеупомянутых файлов с наборами данных пользователей, базы данных, содержащих записи измеренных для каждого пользователя временных интервалов, идентификаторов пользователей и т.п.The data storage means (603) can be in the form of HDD, SSD disks, raid array, network storage, flash memory, optical information storage devices (CD, DVD, MD, Blue-Ray disks), etc. The means (603) allows long-term storage of various types of information, for example, the aforementioned files with user data sets, a database containing records of time intervals measured for each user, user identifiers, etc.

Интерфейсы (604) представляют собой стандартные средства для подключения и работы с серверной частью, например, USB, RS232, RJ45, LPT, COM, HDMI, PS/2, Lightning, FireWire и т.п.Interfaces (604) are standard means for connecting and working with the server part, for example, USB, RS232, RJ45, LPT, COM, HDMI, PS/2, Lightning, FireWire, etc.

Выбор интерфейсов (604) зависит от конкретного исполнения устройства (600), которое может представлять собой персональный компьютер, мейнфрейм, серверный кластер, тонкий клиент, смартфон, ноутбук и т.п.The choice of interfaces (604) depends on the specific implementation of the device (600), which can be a personal computer, mainframe, server cluster, thin client, smartphone, laptop, and the like.

В качестве средств В/В данных (605) в любом воплощении системы, реализующей описываемый способ, может использоваться клавиатура. Аппаратное исполнение клавиатуры может быть любым известным: это может быть, как встроенная клавиатура, используемая на ноутбуке или нетбуке, так и обособленное устройство, подключенное к настольному компьютеру, серверу или иному компьютерному устройству. Подключение при этом может быть, как проводным, при котором соединительный кабель клавиатуры подключен к порту PS/2 или USB, расположенному на системном блоке настольного компьютера, так и беспроводным, при котором клавиатура осуществляет обмен данными по каналу беспроводной связи, например, радиоканалу, с базовой станцией, которая, в свою очередь, непосредственно подключена к системному блоку, например, к одному из USB-портов. Помимо клавиатуры, в составе средств В/В данных также может использоваться: джойстик, дисплей (сенсорный дисплей), проектор, тачпад, манипулятор мышь, трекбол, световое перо, динамики, микрофон и т.п.As means of I/O data (605) in any embodiment of the system that implements the described method, a keyboard can be used. The keyboard hardware can be any known: it can be either a built-in keyboard used on a laptop or netbook, or a separate device connected to a desktop computer, server, or other computer device. In this case, the connection can be either wired, in which the keyboard connection cable is connected to the PS / 2 or USB port located on the system unit of the desktop computer, or wireless, in which the keyboard exchanges data via a wireless communication channel, for example, a radio channel, with base station, which, in turn, is directly connected to the system unit, for example, to one of the USB ports. In addition to the keyboard, I/O devices can also use: joystick, display (touchscreen), projector, touchpad, mouse, trackball, light pen, speakers, microphone, etc.

Средства сетевого взаимодействия (606) выбираются из устройства, обеспечивающий сетевой прием и передачу данных, например, Ethernet карту, WLAN/Wi-Fi модуль, Bluetooth модуль, BLE модуль, NFC модуль, IrDa, RFID модуль, GSM модем и т.п. С помощью средств (605) обеспечивается организация обмена данными по проводному или беспроводному каналу передачи данных, например, WAN, PAN, ЛВС (LAN), Интранет, Интернет, WLAN, WMAN или GSM.Means of networking (606) are selected from a device that provides network reception and transmission of data, for example, an Ethernet card, WLAN/Wi-Fi module, Bluetooth module, BLE module, NFC module, IrDa, RFID module, GSM modem, etc. With the help of tools (605) the organization of data exchange over a wired or wireless data transmission channel, for example, WAN, PAN, LAN (LAN), Intranet, Internet, WLAN, WMAN or GSM, is provided.

Компоненты устройства (600) сопряжены посредством общей шины передачи данных (610).The components of the device (600) are coupled via a common data bus (610).

В настоящих материалах заявки было представлено предпочтительное раскрытие осуществление заявленного технического решения, которое не должно использоваться как ограничивающее иные, частные воплощения его реализации, которые не выходят за рамки испрашиваемого объема правовой охраны и являются очевидными для специалистов в соответствующей области техники.In these application materials, a preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, private embodiments of its implementation, which do not go beyond the scope of the requested legal protection and are obvious to specialists in the relevant field of technology.

Claims

1. A computer-implemented method for recognizing the emotional state of employees, including the following steps:

at least one frame from the video stream from at least one video surveillance camera is transmitted to the computing device, to the recognition subsystem;

recognizing at least one face on frames received from the video stream by means of a face recognition module using a linear classifier operating on the basis of the input extracted features of the frame;

carry out the classification of emotions of at least one person, recognized in the previous step, by means of the module of classification of emotions, using the first neural network;

comparing the image of the face recognized by the recognition module with the database of employees to obtain the employee identification coefficient, by means of the identification module, based on the descriptors calculated using the second neural network, and transmitting the results to the server;

by means of the server, the coefficient of the emotional state of employees is determined by counting the aggregated, according to the established time intervals, metrics, the emotions of the employees recognized using the emotions classification module, and the performance of the employees, and transmitting the received data to the client server;

by means of the client server, the obtained coefficient of the emotional state of the employees is displayed on the information media screen.

2. The method of claim 1, wherein the input extracted frame features are an 80x80 frame area descriptor calculated using the oriented gradient histogram (HOG) method.

3. The method according to claim. 1, characterized in that the emotional state of the employee is defined as positive or neutral.

4. The method according to claim 3, characterized in that the positive emotional state of the employee is recognized when the employee in the frame smiles.

5. The method according to claim 1, characterized in that when classifying the emotions of at least one person, during the operation of the first neural network, the convolution operation is divided into convolution in width and depth.

6. The method according to claim 1, characterized in that the Euclidean distance between face image descriptors is used as a measure of similarity when comparing a recognized face with a database of employees.

7. A system for recognizing the emotional state of employees, containing:

at least one video surveillance camera configured to record a video stream and transmit at least one frame from the video stream to a computing device in the recognition subsystem;

recognition subsystem, including:

- a face recognition module configured to recognize at least one face in frames received from the video stream and transfer the recognized face to the emotion classification module and to the identification module,

- an emotion classification module configured to classify the emotions of at least one person recognized in the frame in the previous step,

- an identification module configured to match an image of a face recognized by the face recognition module with a database of employees to obtain an employee identification coefficient;

database;

a server configured to write the recognition results to a database, determine the employees' emotional state coefficient, and transmit the employees' emotional state coefficient to the client server;

a client server configured to transmit the coefficient of the emotional state of employees to the information media screen;

information media screen, configured to display the coefficient of the emotional state of employees.