CN120014907B

CN120014907B - A Dynamic Constraint Method and System for Simulation Training Difficulty Based on Emotion Recognition

Info

Publication number: CN120014907B
Application number: CN202510037504.9A
Authority: CN
Inventors: 姜英豪; 朱星
Original assignee: Wuhan Future Phantom Technology Co Ltd
Current assignee: Wuhan Future Phantom Technology Co Ltd
Priority date: 2025-01-09
Filing date: 2025-01-09
Publication date: 2025-11-18
Anticipated expiration: 2045-01-09
Also published as: CN120014907A

Abstract

The invention discloses a simulation training difficulty dynamic constraint method and a system based on emotion recognition, and relates to the technical field of man-machine interaction, wherein the method comprises the following steps: when a target user enters a driving simulation training area, physiological signals, voice signals and expression image information of the target user are collected through a wearing sensor, and training difficulty calibration is carried out on the signals based on a training difficulty level calibration database, so that first matching training difficulty is obtained. If the first matching training difficulty is null or non-unique, processing the signal through the emotion recognition model to obtain a second matching training difficulty, and adjusting the difficulty mode of the simulation training according to the second matching training difficulty. The driving simulation training device solves the technical problems that the existing driving simulation training difficulty cannot be adaptively adjusted according to the user requirements, so that the individuation degree of the simulation training is poor, and the technical effects of improving individuation and adaptability adjusting capacity of the driving simulation training are achieved.

Description

Emotion recognition-based simulation training difficulty dynamic constraint method and system

Technical Field

The application relates to the technical field of man-machine interaction, in particular to a simulation training difficulty dynamic constraint method and system based on emotion recognition.

Background

The existing driving simulation training system is widely applied to the fields of driver training, traffic safety education, military training and the like. Most current simulated training systems employ fixed training difficulties and preset task contexts, lacking consideration of individual needs and skill levels of the driver. The training difficulty of the existing system is usually adjusted based on unified standards, and is not flexibly adapted according to the actual requirements, progress and learning conditions of each user. The stationary mode may result in some drivers encountering too simple or complex tasks during the training process, affecting the learning effect and training experience. Therefore, how to flexibly adjust the training difficulty according to the skill level, the learning progress and the actual demand of the driver becomes a key problem for improving the individuation and the effectiveness of the simulation training.

In the related art at the present stage, the driving simulation training difficulty can not be adaptively adjusted according to the user requirement, so that the technical problem of poor individuation degree of the simulation training is caused.

Disclosure of Invention

The application provides a driving skill training method based on a diversified virtual examination room, which is used for solving the technical problem that the individuation degree of simulation training is poor because the existing driving simulation training difficulty cannot be adaptively adjusted according to the user demand.

The application provides a simulation training difficulty dynamic constraint method based on emotion recognition, which comprises the following steps:

When a target user enters a driving simulation training area to start training, acquiring target user physiological signal time sequence information and target user voice signal time sequence information through a wearing sensor, acquiring target user expression image time sequence information through a simulation training area camera, acquiring a training difficulty level calibration database, calibrating training difficulty levels of the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information based on the training difficulty level calibration database to acquire first matching training difficulty, processing the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information through an emotion recognition model when the first matching training difficulty is null or non-unique, acquiring second matching training difficulty, and adjusting a simulation training difficulty mode according to the second matching training difficulty level.

The application provides a simulation training difficulty dynamic constraint system based on emotion recognition, which comprises the following steps:

The system comprises an expression image time sequence information acquisition module, a calibration database acquisition module, a training difficulty calibration module and a second matching training difficulty mode adjustment module, wherein the expression image time sequence information acquisition module is used for acquiring target user physiological signal time sequence information and target user voice signal time sequence information through a wearable sensor when a target user enters a driving simulation training area to start training, the target user expression image time sequence information is acquired through a simulation training area camera, the calibration database acquisition module is used for acquiring a training difficulty level calibration database, the training difficulty calibration module is used for calibrating the training difficulty of the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information based on the training difficulty level calibration database to acquire first matching training difficulty, the second matching training difficulty acquisition module is used for processing the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information through an emotion recognition model when the first matching training difficulty is null or non-unique, and the second matching training difficulty mode adjustment module is used for adjusting the matching training mode according to the second matching training difficulty mode.

According to the emotion recognition-based simulation training difficulty dynamic constraint method and system, when a target user enters a driving simulation training area, physiological signals, voice signals and expression image information of the target user are acquired through a wearing sensor, training difficulty calibration is carried out on the signals based on a training difficulty level calibration database, and first matching training difficulty is obtained. If the first matching training difficulty is null or non-unique, the signals are processed through the emotion recognition model to obtain a second matching training difficulty, and the difficulty mode of the simulation training is adjusted accordingly, so that the technical effects of improving the individuation and adaptability adjustment capacity of the driving simulation training are achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings needed in the description of the embodiments, which are merely examples of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a simulation training difficulty dynamic constraint method based on emotion recognition according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of a dynamic constraint system for simulating training difficulty based on emotion recognition according to an embodiment of the present application.

Reference numerals illustrate the expression image time sequence information acquisition module 10, the calibration database acquisition module 20, the training difficulty calibration module 30, the second matching training difficulty acquisition module 40 and the difficulty mode adjustment module 50.

Detailed Description

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

The present application will be described in further detail below with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, but all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict, the term "first\second" being referred to merely as distinguishing between similar objects and not representing a particular ordering for the objects. The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be expressly listed or inherent to such process, method, article, or apparatus, and unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains. The terminology used herein is for the purpose of describing embodiments of the application only.

The embodiment of the application provides a simulation training difficulty dynamic constraint method based on emotion recognition, as shown in fig. 1, comprising the following steps:

Step S100, when a target user enters a driving simulation training area to start training, acquiring physiological signal time sequence information and target user voice signal time sequence information of the target user through a wearing sensor, and acquiring expression image time sequence information of the target user through a simulation training area camera. Specifically, when the target user steps in the driving simulation training area and starts training, the data acquisition system enters a working state. The wearable sensor which is pre-arranged at a specific part is tightly attached to the skin by using a biological sensing technology, physiological signal time sequence information comprising heart rate, blood pressure, skin conductivity and the like is collected, the signals can fluctuate due to driving scene and emotion change, such as heart rate rise, blood pressure rise and skin conductivity increase in tension, and the whole-course dynamics is recorded in a curve form. Meanwhile, the built-in microphone of the sensor converts voice signals generated by the driving condition and emotion of a user into digital signals and records the digital signals according to time sequence, and acoustic characteristics such as intonation, speech speed and the like of the voice signals can reflect emotion and psychological activities. The camera simulating the training area shoots the facial expression of the user at a certain frame rate, and the expression changes from the initial tension of training to the panic of encountering an emergency, to the happiness of completing the task and the like form expression image time sequence information according to time sequence, so that comprehensive and firm data base stones are provided for evaluating the training state, calibrating the training difficulty and the like.

And step S200, obtaining a training difficulty level calibration database. Specifically, when the training difficulty level calibration database is acquired, the data source is firstly determined, the multi-channel information is integrated, such as driving training history records covering various road conditions and weather simulation in the past, and the driving training history records containing basic information of a target user, physiological and voice signal time sequence information acquired by a wearing sensor, expression image time sequence information acquired by a camera, and training difficulty and results are recorded. Preprocessing the collected data, cleaning and standardizing abnormal points of physiological signals, recognizing voice semantics and quantifying acoustic characteristics, recognizing expression image categories and counting frequency duration, and classifying by task numbers and user number layering indexes according to logic arrangement. And designing a database architecture, determining the structures and fields of each table, such as a training task table, a user information table and the like, and finally accurately inputting data according to the architecture to ensure complete consistency and construct a database for calibrating the subsequent training difficulty.

In one possible implementation, the training difficulty level calibration database is obtained, and step S200 further includes step S210, obtaining a simulated training task number and a simulated training difficulty mode number. Specifically, according to the current driving simulation training task, the corresponding simulated training task number is obtained. The number is an identification of a particular simulated driving training scenario and process that can be uniquely determined among a number of different training tasks. For example, different driving route plans, traffic scene settings, or training target settings may correspond to different task numbers. Meanwhile, a simulated training difficulty mode number is obtained, the number represents a difficulty level mode adopted by the current training, such as a primary difficulty mode number possibly corresponding to simpler road conditions, fewer traffic rule restrictions and lower driving operation requirements, a middle-level difficulty mode number can relate to more complex road condition combinations, more traffic rule restrictions and medium driving operation complexity, and a high-level difficulty mode number means a road condition which is close to reality and has great challenges, strict traffic rules and driving operation with high difficulty, such as expressway driving simulation under severe weather conditions and the like.

Step S220, associating and storing the second matching training difficulty, the simulated training task number, the simulated training difficulty mode number, the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information in the training difficulty level calibration database. Specifically, after the simulated training task number and the simulated training difficulty mode number are acquired, the second matching training difficulty and number, the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information are associated and stored in a training difficulty level calibration database. In the storage process, the simulated training task number and the simulated training difficulty mode number are used as key indexes, and all relevant data under the same task number and difficulty mode number are integrated together. For example, for a set of data with a specific task number of "T001" and a difficulty model number of "D02", physiological signal time sequence information generated by a target user under the task during training, such as curve data of heart rate changing with time, blood pressure fluctuation data, skin conductivity change sequence, etc., speech signal time sequence information, such as speech intonation change, speech speed and specific speaking content at different driving stages, expression image time sequence information, such as panic expression when encountering an emergency, relaxed expression when driving smoothly, etc., and second matching training difficulty determined by an emotion recognition model, are stored in a database under the corresponding "T001-D02" data sets. The association storage mode enables the database to conveniently inquire all relevant data under specific training conditions according to task numbers and difficulty mode numbers, provides rich and orderly data bases for subsequent training difficulty calibration, training effect evaluation and training strategy optimization, is beneficial to continuously perfecting the content and accuracy of the training difficulty level calibration database, and further improves the performance and adaptability of the whole driving simulation training system.

And step S300, training difficulty calibration is carried out on the physiological signal time sequence information of the target user, the voice signal time sequence information of the target user and the expression image time sequence information of the target user based on the training difficulty level calibration database, and a first matching training difficulty is obtained. Specifically, based on a training difficulty level calibration database, a target user current simulation training task number and a difficulty mode number are used as retrieval conditions, and corresponding first recorded physiological, voice and expression image time sequence information and first recorded training difficulty are extracted from the retrieval conditions. The method comprises the steps of respectively calculating first similarity coefficients of target user physiological signal time sequence information and first recorded physiological signal time sequence information, comprehensively analyzing similarity of multidimensional index change trend and the like of heart rate, blood pressure, skin conductivity and the like, calculating second similarity coefficients of target user voice signal time sequence information and first recorded voice signal time sequence information, analyzing acoustic characteristics of intonation, speech speed, emotion vocabulary and the like, calculating third similarity coefficients of first recorded expression image time sequence information and target user expression image time sequence information, and comparing facial muscle movement, expression duration and the like by means of an image recognition and expression analysis algorithm. When the first similarity coefficient, the second similarity coefficient and the third similarity coefficient are respectively larger than or equal to the respective similarity threshold values, the first record training difficulty is determined to be the first matching training difficulty of the target user, past data experience can be utilized to provide reasonable initial difficulty setting basis for training, and training pertinence and effectiveness and assistance personalized arrangement are improved.

In one possible implementation manner, the training difficulty calibration is performed on the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information based on the training difficulty calibration database, so as to obtain a first matching training difficulty, step S300 further includes step S310 of inputting the simulated training task number and the simulated training difficulty mode number into the training difficulty calibration database, and extracting the first recorded physiological signal time sequence information, the first recorded voice signal time sequence information, the first recorded expression image time sequence information and the first recorded training difficulty. Specifically, the simulated training task number and the simulated training difficulty degree mode number corresponding to the current simulated training of the target user are input into a training difficulty level calibration database. The database performs the search and extraction operations in the data store according to the two key numbers. The method comprises the steps of extracting first recorded physiological signal time sequence information matched with a task number and a difficulty mode number, wherein the first recorded physiological signal time sequence information comprises detailed data of time variation of various physiological signals such as heart rate, blood pressure and skin conductivity generated in the training process of a user participating in the same training, extracting the first recorded voice signal time sequence information which comprises information reflecting the language expression characteristics of the user such as voice tone variation sequences, voice speed variation curves and voice contents, extracting first recorded expression image time sequence information, namely facial expression variation of the user in different training stages such as time distribution and duration of smile, frowning and the like recorded by an image acquisition device, extracting first recorded training difficulty corresponding to the first recorded physiological signal time sequence information, and determining a difficulty value based on comprehensive evaluation of the same training task and difficulty mode.

Step S320, calculating a first similarity coefficient between the first recorded physiological signal timing information and the target user physiological signal timing information. Specifically, after the above-mentioned types of first recorded information are successfully extracted, calculation of the first similarity coefficient is started. Since the physiological signal contains a plurality of different indicators, corresponding distribution weights are set for the different indicators. For example, indices that are more sensitive to emotional response and have important indicators in driving simulation training, such as heart rate and skin conductivity, are given relatively high weights, while indices that reflect emotion relatively indirectly, such as body temperature, are given relatively low weights. After the weights are determined, similarity calculation is performed on each physiological signal index according to the weights. Taking heart rate as an example, comparing the heart rate variation curve in the time sequence information of the first recorded physiological signal with the heart rate variation curve in the time sequence information of the target user physiological signal, analyzing the similarity degree of the heart rate variation curve and the heart rate variation curve in the aspects of average value, fluctuation amplitude, time point of peak occurrence, variation trend and the like of the heart rate, and also examining the characteristics of the blood pressure signal such as the similarity of the numerical value and the consistency of the fluctuation period, wherein the skin conductivity focuses on the variation response similarity and the like of the blood pressure signal in the transition of different training scenes. And comprehensively calculating each index according to the weight of each index to finally obtain a first similarity coefficient capable of accurately reflecting the similarity degree of the time sequence information of the first recorded physiological signal and the time sequence information of the target user physiological signal.

Step S330, calculating a second similarity coefficient between the target user voice signal timing information and the first recorded voice signal timing information. Specifically, a second similarity coefficient, that is, a similarity between the target user voice signal timing information and the first recorded voice signal timing information is then calculated. The computing process analyzes a plurality of key features of the speech signal. In terms of intonation, comparing the ascending and descending change rules of intonation in the whole training process, for example, whether the intonation is ascending or not when a complex driving scene is encountered, and whether the ascending amplitude and the ascending frequency are similar, for the speed of the intonation, whether the speed of the intonation is accelerated in different training stages or not is examined, for example, whether the speed of the intonation is accelerated when the driving operation is relatively tense or not and whether the acceleration degree is similar or not is examined, semantic analysis is carried out on the voice content, and whether the occurrence frequency of specific emotion vocabularies or vocabularies related to the driving operation is consistent or not is counted. The second similarity coefficient capable of accurately representing the similarity degree of the voice signal is obtained by comprehensively comparing and quantitatively analyzing the characteristics of the voice signal such as intonation, speech speed and semantics.

Step S340, calculating a third phase coefficient of the first recorded expression image timing information and the target user expression image timing information. Specifically, a third similarity coefficient, that is, a similarity between the first recorded expression image time series information and the target user expression image time series information is calculated. Firstly, the facial muscle movement characteristics in the expression image are compared in detail by means of an image recognition technology and an expression analysis algorithm. For example, when the user faces an emergency traffic situation, the user can observe whether similar eyebrow-tattooing muscle movements are generated, whether the depth, duration and position distribution of the eyebrows are similar, and for smile expression, the similarity of the angle of mouth lifting, the squinting degree of eyes, the stretching degree of the whole facial expression and the like is analyzed, and meanwhile, the frequency and the time of expression conversion, such as the time point of converting from compact Zhang Biaoqing to relaxing expression and whether the middle expression state in the conversion process is consistent or not, are also considered. And finally, determining a third phase coefficient which can accurately reflect the similarity degree of the two expression images by comprehensively analyzing and quantitatively evaluating the multidimensional features of the expression images.

In step S350, when the first similarity coefficient is greater than or equal to a first similarity threshold, the second similarity coefficient is greater than or equal to a second similarity threshold, and the third similarity coefficient is greater than or equal to a third similarity threshold, the first record training difficulty is set as the first match training difficulty. Specifically, when the first similarity coefficient obtained through the above calculation process is greater than or equal to the first similarity threshold, the second similarity coefficient is greater than or equal to the second similarity threshold, and the third similarity coefficient is greater than or equal to the third similarity threshold, this indicates that the physiological state, the language expression state, and the expression state of the target user in the simulation training are highly similar to the user states in the previous same training task and difficulty mode recorded in the database. In this case, the first record training difficulty is directly set as the first matching training difficulty of the target user based on the reliability of the empirical data. For example, if the first recorded training difficulty is "advanced difficulty", then the first matching training difficulty of the target user will also be determined as "advanced difficulty", so as to provide a reasonable and well-defined initial training difficulty setting for the subsequent simulated driving training, which is helpful for improving the pertinence and effectiveness of the whole training process.

Step S400, when the first matching training difficulty is null or non-unique, processing the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information through an emotion recognition model to obtain a second matching training difficulty. Specifically, when the first matching training difficulty is null or not unique, the first matching training difficulty is processed by means of an emotion recognition model formed by a physiological signal pre-processing network, a voice signal pre-processing network, an expression image pre-processing network and a post-emotion recognition fitting network. Physiological signal time sequence information (data which cover changes of heart rate, blood pressure, skin conductivity and the like along with driving scenes and emotion) of a target user, voice signal time sequence information (including emotion related characteristics such as intonation, speech speed, loudness and voice content) and expression image time sequence information (such as facial expression changes reflecting emotion such as frowning, smiling and the like) are respectively input into a corresponding pre-processing network. The physiological signal preprocessing network determines physiological signal identification training difficulty and trains through complex steps such as similarity calculation, clustering, grouping, mode calculation and the like according to historical driving training data (including first monitoring physiological signal time sequence information and qualified training time length record data) under specific simulation training tasks and difficulty mode numbers, and outputs physiological signal matching training difficulty after inputting target user data, the voice signal preprocessing network similarly analyzes target user voice characteristics to obtain voice signal matching training difficulty after training based on the historical voice data, and the expression image preprocessing network learns by utilizing a large number of expression image historical data to establish a mapping relation and obtains expression matching training difficulty according to an input target user expression image. And finally, inputting the three difficulty values into a post emotion recognition fitting network, comprehensively analyzing the modal information, distributing weight and other integration optimization, and outputting second matching training difficulty comprehensively reflecting the emotion state of the target user so as to improve the effectiveness and adaptability of the simulated driving training.

And S500, adjusting a simulation training difficulty mode according to the second matching training difficulty. Specifically, an operation step sequence corresponding to the simulated training task number is obtained, which covers a series of actions and decision processes from the starting of the vehicle to the coping with various traffic conditions, such as starting check, intersection starting, speed limit adjustment, turning operation and the like in the urban driving simulated task. Dividing the operation step sequence according to different numbers of steps to generate a simulated training difficulty mode, obtaining a first simulated training difficulty mode from single-step division, such as independently considering the basic difficulty of starting the vehicle steps, to two-step division to form a second simulated training difficulty mode, focusing on the improvement of the consistency and the cooperativity between the steps until K-th simulated training difficulty mode is obtained according to K-th steps, wherein K is the total number of operation steps, and the difficulty is gradually complicated and comprehensive along with the increase of the dividing steps. And then adding the difficulty modes into a set, screening the adaptation modes in the set according to the obtained second matching training difficulty, and dynamically switching the system to a corresponding new mode if the second matching training difficulty is changed due to the emotion of the target user in training so as to accurately adapt the difficulty modes, the emotion of the user and the matching training difficulty, achieve the best training effect and avoid the influence of improper difficulty on training effect.

In one possible implementation, the emotion recognition model includes a physiological signal pre-processing network, a speech signal pre-processing network, an emoticon pre-processing network, and a post-emotion recognition fitting network. Specifically, the emotion recognition model is a comprehensive multi-module architecture and consists of a physiological signal pre-processing network, a voice signal pre-processing network, an expression image pre-processing network and a post-emotion recognition fitting network. The four networks cooperate with each other to accurately identify the emotional state of the target user from information sources of different dimensions, and further determine the training difficulty matched with the emotional state.

And adjusting a simulation training difficulty mode according to the second matching training difficulty, wherein the step S500 further comprises a step S510 of processing the physiological signal time sequence information of the target user through the physiological signal preprocessing network to obtain the physiological signal matching training difficulty. Specifically, the physiological signal pre-processing network is focused on processing the target user physiological signal timing information. The physiological signal time sequence information contains various indexes capable of reflecting the internal emotion change of a user, such as heart rate data, which can fluctuate along with the tension degree, the excitation level or the fatigue state of the user in driving simulation training, the heart rate tends to be remarkably accelerated when facing complex dangerous driving scenes, the blood pressure data also can rise under high-pressure situations, the skin conductivity is also a key index, and sweat gland secretion change can cause skin conductivity change when the emotion fluctuates. The network is constructed based on a large amount of historical driving training data, in the construction process, the historical driving training data under specific simulation training task numbers and simulation training difficulty mode numbers are collected first, the historical driving training data comprise first monitoring physiological signal time sequence information and qualified training time length recording data, and the qualified training time length recording data refer to time length after triggering the first monitoring physiological signal time sequence information and training to be qualified. And setting a training difficulty mapping table, and mapping the training duration record data of the physiological signal through calculation so as to obtain the training difficulty of the physiological signal identification. The method comprises the steps of firstly obtaining a plurality of groups of first monitoring physiological signal time sequence information and qualified training time length record data in a one-to-one correspondence mode, then carrying out similarity calculation on the first monitoring physiological signal time sequence information to form a first monitoring physiological signal similarity coefficient set, clustering the first monitoring physiological signal time sequence information according to a set similarity coefficient threshold value and combining the similarity coefficient set to obtain a plurality of groups of first monitoring physiological signal time sequence information, then grouping the qualified training time length record data according to a clustering result to obtain a plurality of groups of qualified training time length record data, traversing the combination calculation number to obtain a plurality of pieces of qualified training time length identification data, finally randomly extracting single data from the plurality of groups of first monitoring physiological signal time sequence information, processing the plurality of pieces of qualified training time length identification data according to a training difficulty mapping table, and determining the physiological signal identification training difficulty. And then taking the physiological signal identification training difficulty as supervision, taking the first monitoring physiological signal time sequence information as input, and training the physiological signal pre-processing network. When the time sequence information of the physiological signal of the target user is input, the network performs deep analysis and feature extraction on the physiological signal according to trained model parameters and algorithms, and finally outputs the physiological signal to match with training difficulty.

Step S520, processing the target user voice signal timing information through the voice signal pre-processing network, to obtain the voice signal matching training difficulty. Specifically, the voice signal pre-processing network mainly processes the time sequence information of the voice signal of the target user. During driving simulation training, the user's speech signal contains a number of features that are closely related to emotion. For example, the change of the intonation can intuitively reflect the emotional state of the user, the intonation is generally raised when the user is excited, stressed or excited, the speed of the intonation is also an important emotional indication, the rapid speed of the intonation is often associated with the emotion such as the urgency, the excitement and the like, the slow speed of the intonation can suggest the relaxation, the sinkation or the fatigue of the user, the word selection and the expression mode in the voice content are also not negligible, and some emotion words or specific driving related words can further reveal the psychological state of the user. The construction steps of the network are similar to those of a physiological signal preprocessing network, and training is performed based on a large amount of historical data related to voice signals. In the training process, the characteristics of intonation, speech speed, speech content and the like in the historical speech signal data are analyzed and extracted, and a correlation model between the speech signal characteristics and training difficulty is established. When the time sequence information of the voice signal of the target user is input, the network can rapidly capture various characteristic information in the voice, such as keywords and emotion vocabularies in the voice content are identified through a voice recognition technology, the change trend of intonation, the change rhythm of the speed of the voice and the like are analyzed, and the voice signal matching training difficulty is calculated according to the established association model.

And step S530, processing the time sequence information of the expression image of the target user through the expression image pre-processing network to obtain the expression matching training difficulty. Specifically, the expression image pre-processing network aims at processing the timing information of the expression image of the target user. The expression image is an intuitive external expression of the emotion of the user, such as the expression of frowning is often associated with the emotion such as confusion, anxiety, dissatisfaction and the like, smiles show the pleasure of the emotion of the user with high probability, the pleasure of driving situation is satisfied or the response is easy, the eyes are open, which can mean that the user is surprised or alert, the shape and the opening and closing degree of the mouth can reflect different emotions, for example, the closed mouth can represent tension or concentration, and the open mouth can be surprised or shout. The network utilizes a large amount of expression image history data when being constructed. Deep learning and analysis are performed on the facial muscle movement characteristics, the types of expressions (such as happiness, sadness, anger, surprise and the like), the duration time of the expressions, the frequency of expression conversion and other characteristics in the expression image. And establishing a mapping relation between the expression image characteristics and the training difficulty through a machine learning algorithm. When the timing information of the expression image of the target user is input, the network can accurately identify the type and the characteristics of the expression, for example, judge whether the expression is short-term surprise or long-time tension, and obtain the expression matching training difficulty according to the change condition of the expression and the established mapping relation.

Step S540, inputting the physiological signal matching training difficulty, the speech signal matching training difficulty and the expression matching training difficulty into the post emotion recognition fitting network, and outputting the second matching training difficulty. Specifically, after the physiological signal pre-processing network, the voice signal pre-processing network and the expression image pre-processing network respectively output physiological signal matching training difficulty, voice signal matching training difficulty and expression matching training difficulty, three difficulty values are input into a post emotion recognition fitting network. The post emotion recognition fitting network comprehensively considers the three training difficulty values from different modal information, and integrates and optimizes by using a fitting algorithm. For example, different weights are allocated to different modal information according to the difference of the emotion expression accuracy and importance, and then weighted summation and other operations are performed. After the processing of the post emotion recognition fitting network, the second matching training difficulty is finally output, and the difficulty value synthesizes emotion information reflected by the physiological state, the speech expression state and the expression state of the target user, so that the training difficulty suitable for the current emotion state of the target user can be more accurately determined, and the effectiveness and the adaptability of the simulated driving training are improved.

In a possible implementation manner, the physiological signal pre-processing network processes the physiological signal time sequence information of the target user to obtain a physiological signal matching training difficulty, and step S510 further includes step S511 of collecting historical driving training data in the simulated training task number and the simulated training difficulty mode number, where the historical driving training data includes first monitoring physiological signal time sequence information and qualified training duration record data, and the qualified training duration record data refers to a duration of training to be qualified after triggering the first monitoring physiological signal time sequence information. Specifically, for a specific simulated training task number and simulated training difficulty mode number, historical driving training data related to the specific simulated training task number and simulated training difficulty mode number are collected. In the data, the time sequence information of the first monitoring physiological signal is focused, which covers the detailed records of various physiological indexes such as heart rate change curve, blood pressure fluctuation data, dynamic change of skin conductivity and the like of a driver in the training process. And meanwhile, the method also comprises qualified training duration record data which explicitly records the duration from the triggering of the time sequence information of the first monitoring physiological signal until the driver reaches the qualified standard under the training task and difficulty mode. For example, if the simulated training task is driving training under complex urban road conditions, and the simulated training difficulty mode is medium difficulty, the collected historical data will reflect the physiological signal change conditions of different drivers and the time difference required for each driver to reach the qualification under specific situations.

Step S512, a training difficulty mapping table is set, and the qualified training duration record data is mapped to obtain the physiological signal identification training difficulty. Specifically, a specific training difficulty mapping table is set. The mapping table establishes a corresponding relation between the qualified training duration record data and the training difficulty based on analysis and summary of a large amount of historical data. The acquired qualified training duration record data are input into the training difficulty mapping table to carry out corresponding mapping operation, so that the training difficulty of the physiological signal identification is determined. For example, if it is found that in a certain set of history data, after a specific physiological signal is triggered, the driver reaches a qualification standard in a short time, then the training difficulty of the corresponding physiological signal identifier is set to a lower level according to the mapping table, otherwise, if the qualification training time is longer, the training difficulty of the corresponding physiological signal identifier may be higher. The key is to accurately construct reasonable mapping logic between the qualified training duration and the training difficulty so as to provide reliable identification basis for subsequent network training.

Step S513, training the physiological signal pre-processing network by taking the physiological signal identification training difficulty as supervision and the first monitoring physiological signal time sequence information as input. Specifically, the acquired physiological signal identification training difficulty is used as supervision information, the first monitoring physiological signal time sequence information is used as input data of the network, and the physiological signal pre-processing network starts to be trained. During the training process, the network constantly learns the inherent correlation between various feature patterns in the first monitored physiological signal timing information and the corresponding physiological signal identification training difficulties. For example, the network may sort out what predicted training difficulty should be output under specific heart rate trends, blood pressure fluctuation patterns, and skin conductivity changes. Through repeated training of a large amount of historical data, the physiological signal preprocessing network continuously optimizes own model parameters and algorithm structures, so that the matching training difficulty of the physiological signals matched with the new target user physiological signal timing information can be accurately predicted.

Step S514, wherein the voice signal pre-processing network and the emoticon pre-processing network are the same as the steps of constructing the physiological signal pre-processing network. Specifically, the construction steps of the voice signal pre-processing network and the emoticon pre-processing network are the same as those of the physiological signal pre-processing network. For a voice signal preprocessing network, historical voice signal data under specific simulation training task numbers and simulation training difficulty mode numbers are collected, wherein the historical voice signal data comprises information such as intonation change sequences, speed change curves, voice contents and the like of voices, and corresponding qualified training duration record data (the qualified training duration refers to duration from training to qualification associated with the voice signal characteristics). Setting a training difficulty mapping table aiming at the voice signals, mapping qualified training duration record data into voice signal identification training difficulty, taking the voice signal identification training difficulty as supervision, taking historical voice signal data as input, and training a voice signal pre-processing network. For the expression image pre-processing network, firstly, historical expression image data under specific tasks and difficulty modes, such as facial muscle movement characteristics of different expressions, expression duration time and the like, and qualified training duration record data are collected, an expression image training difficulty mapping table is set, the expression image identification training difficulty is mapped and obtained, then the expression image identification training difficulty is used as supervision, and the historical expression image data is used as input to train the expression image pre-processing network. Through the same construction steps, the three preprocessing networks can effectively convert the multi-mode information of the user into corresponding matching training difficulty information in the respective information processing field, and a solid foundation is laid for accurately evaluating the emotion state of the target user and determining the proper training difficulty for the whole emotion recognition model.

In a possible implementation manner, a training difficulty mapping table is set, the qualified training duration record data is mapped, the physiological signal identification training difficulty is obtained, step S512 further includes step S5121, and a plurality of first monitoring physiological signal time sequence information and a plurality of qualified training duration record data which are in one-to-one correspondence are obtained. Specifically, from historical driving training data resources, a plurality of first monitoring physiological signal time sequence information and a plurality of qualified training duration record data which are in one-to-one correspondence are obtained. The first monitoring physiological signal time sequence information records the physiological signal change courses of different drivers under specific simulation training task numbers and simulation training difficulty mode numbers in detail, such as how heart rate fluctuates along with switching of driving scenes, fluctuation conditions of blood pressure when encountering complex road conditions, subtle changes of skin conductivity at tension time and the like. The corresponding qualified training duration record data indicates the length of time it takes for the driver to finally reach the qualified standard in the training task since the physiological signal starts to have a specific change (i.e., the first monitoring physiological signal timing information is triggered). For example, in an urban road driving simulation training situation with moderate difficulty, the first monitoring physiological signal time sequence information of a certain driver shows that the heart rate of the certain driver shows a specific fluctuation curve when the heart rate passes through a plurality of intersections and traffic condition changes, and the qualified training duration record data shows that 30 minutes are spent from the first obvious change of the heart rate to the qualified training.

Step 5122, performing a pairwise similarity calculation on the time sequence information of the plurality of first monitoring physiological signals to obtain a first monitoring physiological signal similarity coefficient set. Specifically, the acquired time sequence information of the plurality of first monitoring physiological signals is subjected to pairwise similarity calculation. The calculation process is not simple numerical comparison, but comprehensive analysis of the variation trend, fluctuation amplitude and similarity of signal characteristics of various physiological signal indexes. For example, the heart rate signal is not only compared with the average heart rate value in the proximity degree, but also analyzed to determine whether the acceleration and deceleration change rules of the heart rate in the training process are consistent, the blood pressure signal is analyzed to determine the similarity of the peak value and the valley value of the blood pressure and the magnitude of the value, and the skin conductivity signal is focused on the change slope of the skin conductivity signal in different training stages and the coincidence degree of the fluctuation period. And integrating the multidimensional comparison results into numerical values capable of quantitatively representing the similarity degree, and further constructing a first monitoring physiological signal similarity coefficient set. Each of the similarity coefficients represents a degree of similarity between the pair of first monitored physiological signal timing information.

Step S5123, clustering the plurality of first monitoring physiological signal time sequence information according to the similarity coefficient threshold and combining the first monitoring physiological signal similarity coefficient sets to obtain multi-cluster first monitoring physiological signal time sequence information. Specifically, clustering operation is performed on time sequence information of a plurality of first monitoring physiological signals according to a preset similarity coefficient threshold value and by combining the constructed first monitoring physiological signal similarity coefficient set. When the similarity coefficient between the time sequence information of the two first monitoring physiological signals is greater than or equal to the threshold value, the two first monitoring physiological signals are classified into the same class. Through the clustering process, the time sequence information of the first monitoring physiological signals is orderly divided into a plurality of clusters of time sequence information of the first monitoring physiological signals. The physiological signal time sequence information in each cluster has higher similarity, and represents a driver group with similar physiological response characteristics under a specific driving scene and difficulty mode. For example, a cluster of first monitored physiological signal timing information corresponds to a population of drivers who have relatively smooth heart rate fluctuations, relatively steady blood pressure changes, and less skin conductivity changes in the face of frequent stops and stops in urban roads and complex traffic lights.

Step 5124, grouping the plurality of qualified training duration record data according to the time sequence information of the multi-cluster first monitoring physiological signals to obtain a plurality of groups of qualified training duration record data. Specifically, after the clustering of the time sequence information of the first monitoring physiological signals is completed, grouping a plurality of qualified training duration record data according to the obtained time sequence information of the first monitoring physiological signals. Because the qualified training duration record data and the first monitoring physiological signal time sequence information are in one-to-one correspondence, when the first monitoring physiological signal time sequence information is clustered into different clusters, the corresponding qualified training duration record data is naturally divided into different groups, so that a plurality of groups of qualified training duration record data are obtained. The grouping enables the training time length record data of each combination lattice to be associated with the time sequence information of the first monitoring physiological signals of the specific cluster, and a foundation is laid for subsequent deep analysis of training time length differences of different physiological signal feature groups.

And step S5125, traversing the plurality of groups of qualified training duration record data to perform mode value calculation, and obtaining a plurality of groups of qualified training duration identification data. Specifically, each combination training time length record data is traversed, and the crowd value is calculated. The mode value represents the qualified training time length with highest occurrence frequency in the driver group corresponding to the time sequence information of the first monitoring physiological signal of the specific cluster. And obtaining a plurality of qualified training duration identification data by calculating the crowd value. The identification data can reflect typical characteristics of the driver group corresponding to the group of data in terms of training duration to a certain extent. For example, a certain bin training duration record data is [25,28,25,30,25], then a score value of 25,25 minutes is determined as the qualified training duration identification data for the group.

Step S5126, the multi-cluster first monitoring physiological signal time sequence information is traversed to randomly extract single data respectively, and a plurality of first monitoring physiological signal time sequence information is obtained. Specifically, traversing the time sequence information of the first monitoring physiological signals of a plurality of clusters, and randomly extracting single data from each cluster to obtain the time sequence information of the first monitoring physiological signals. The randomly extracted data are used as basic data for processing based on the training difficulty mapping table, so that the characteristic representativeness of the time sequence information of the first monitoring physiological signals of each cluster is reserved, the randomly extracted data have certain randomness, and diversified input samples can be provided for the application of the training difficulty mapping table.

Step 5127, based on the training difficulty mapping table, respectively processing the plurality of qualified training duration identification data to obtain the physiological signal identification training difficulty. Specifically, based on a preset training difficulty mapping table, a plurality of qualified training duration identification data are respectively processed. The training difficulty mapping table is a data conversion rule system constructed based on a large amount of historical data and expert knowledge, and can be mapped into corresponding physiological signal identification training difficulty according to the size and distribution of qualified training duration identification data. For example, if one qualified training duration identification data is shorter, it indicates that in the physiological signal feature group, the driver can generally reach the training qualification standard faster, then the corresponding physiological signal identification training difficulty is set to a lower level according to the training difficulty mapping table, otherwise, if the qualified training duration identification data is longer, the corresponding higher physiological signal identification training difficulty is set. The physiological signal identification training difficulty is finally obtained through mapping processing, and key supervision information and difficulty calibration basis are provided for training of a subsequent physiological signal pre-processing network.

In one possible implementation manner, the step S500 further includes a step S560 of adjusting the simulation training difficulty mode according to the second matching training difficulty, to obtain an operation step sequence of the simulation training task number. Specifically, the operation step sequence corresponding to the simulated training task number is accurately acquired from a management system or a data storage library of the simulated driving training task. Taking a common urban road driving simulation training task as an example, the sequence of operation steps covers the whole process from the preparation of getting on to the final stopping of the vehicle. The method comprises the steps of adjusting a seat to a proper position, tying a safety belt, checking whether various indicator lamps of a rearview mirror and an instrument panel are normal or not, starting a vehicle, namely inserting a key or pressing a starting button, observing the self-checking condition of the vehicle, stepping on a clutch (manual gear) or engaging in a proper gear (automatic gear) and releasing a manual brake, starting according to traffic signal lamp indications, keeping proper vehicle speed under different road conditions, turning the steering lamp in advance and observing surrounding traffic conditions when turning operation is carried out, confirming a safety distance and using the steering lamp indication when changing a lane, correctly judging priority right of way at an intersection and giving full charge to pedestrians and other vehicles, and the like.

Step S570, dividing the operation step sequence according to a single operation step to obtain a first simulation training difficulty mode. Specifically, the acquired operation step sequence is divided according to a single operation step, so that a first simulation training difficulty mode is constructed. For example, for a single step of "adjusting the seat to the proper position", in this difficulty mode, the degree of familiarity of the learner with the seat adjusting function, such as the front-rear position of the seat, the height adjustment, and the backrest inclination adjustment method, may be set with some seat adjusting devices slightly jammed or less sensitive, and see whether the learner can properly operate to achieve a comfortable and safe driving posture. For the step of starting according to the traffic signal lamp indication, the reaction speed of a student to the color change of the traffic signal lamp, the cooperation of clutch and accelerator (manual gear) or the switching of brake and accelerator (automatic gear) during starting, whether the vehicle starts steadily, whether the phenomenon of sliding or rushing, and the like are focused. The single-step dividing mode enables the difficulty of each operation step to be considered and set independently, and is helpful for novice students to primarily grasp the key and standard of each driving action.

And step S580, dividing the operation step sequence according to two operation steps to obtain a second simulation training difficulty mode. Specifically, after the single-step segmentation is completed, the operation step sequence is further segmented according to two operation steps, so that a second simulation training difficulty mode is obtained. Taking the combination of the two operation steps of turning on the turn signal lamp in advance and observing the surrounding traffic condition during the turning operation and confirming the safe distance and using the turn signal lamp for indication during the lane change as an example. The difficulty setting is not limited to the basic operation of a single step, but consistency and cooperativity between two steps are considered. The learner not only needs to complete the turning operation accurately, but also needs to change lanes at proper time, and the learner needs to have certain comprehensive judging capability and operation coordination capability.

And step S590, until the operation step sequence is divided according to K operation steps, obtaining a K-th simulation training difficulty mode, wherein K is equal to the total number of operation steps. Specifically, according to the segmentation logic, the number of segmented operation steps is continuously increased until the operation step sequence is segmented according to K operation steps, so as to obtain a K-th simulation training difficulty mode, wherein K is equal to the total number of operation steps. As the number of segmentation steps increases, the difficulty pattern becomes increasingly complex and complex.

Step S5100, adding the first simulated training difficulty mode, the second simulated training difficulty mode, and the K simulated training difficulty mode into the simulated training difficulty modes. Specifically, the generated first simulation training difficulty mode, the generated second simulation training difficulty mode and the generated K simulation training difficulty modes are added into a simulation training difficulty mode set. In the process of simulating driving training, proper difficulty modes can be selected from the difficulty mode set to carry out training according to actual situations of students, such as learning progress, driving skill level, current emotion state and the like.

In the above, the simulation training difficulty dynamic constraint method based on emotion recognition according to the embodiment of the present invention is described in detail with reference to fig. 1. Next, a simulated training difficulty dynamic constraint system based on emotion recognition according to an embodiment of the present invention will be described with reference to fig. 2.

The dynamic constraint system for simulating training difficulty based on emotion recognition is used for solving the technical problem that the existing driving simulation training difficulty cannot be adaptively adjusted according to user requirements, so that the individuation degree of simulation training is poor, and the technical effect of improving individuation and adaptability adjusting capacity of driving simulation training is achieved. The emotion recognition-based simulated training difficulty dynamic constraint system comprises an expression image time sequence information acquisition module 10, a calibration database acquisition module 20, a training difficulty calibration module 30, a second matched training difficulty acquisition module 40 and a difficulty mode adjustment module 50.

The expression image time sequence information acquisition module 10 is used for acquiring physiological signal time sequence information of the target user and voice signal time sequence information of the target user through the wearing sensor when the target user enters the driving simulation training area to start training, and acquiring expression image time sequence information of the target user through the simulation training area camera.

The calibration data base acquisition module 20 is configured to acquire a training difficulty level calibration data base.

The training difficulty calibration module 30 is configured to calibrate the training difficulty for the target user physiological signal timing information, the target user speech signal timing information, and the target user expression image timing information based on the training difficulty level calibration database, so as to obtain a first matching training difficulty.

The second matching training difficulty obtaining module 40 is configured to process, when the first matching training difficulty is null or non-unique, the target user physiological signal time sequence information, the target user voice signal time sequence information and the target user expression image time sequence information through an emotion recognition model, so as to obtain a second matching training difficulty.

The difficulty mode adjustment module 50 is configured to adjust the simulated training difficulty mode according to the second matching training difficulty.

Next, the specific configuration of the calibration data base acquisition module 20 will be described in detail. As described above, the calibration data base obtaining module 20 further includes a number obtaining unit configured to obtain a simulated training task number and a simulated training difficulty pattern number, and an information storage unit configured to store the second matching training difficulty, the simulated training task number, the simulated training difficulty pattern number, the target user physiological signal timing information, the target user speech signal timing information, and the target user expression image timing information in association with each other in the training difficulty calibration data base.

Next, the specific configuration of the training difficulty calibration module 30 will be described in detail. As described above, the training difficulty calibration module 30 further includes a time series information extraction unit for inputting the simulated training task number and the simulated training difficulty pattern number into the training difficulty calibration database, extracting the first recorded physiological signal time series information, the first recorded voice signal time series information, the first recorded expression image time series information and the first recorded training difficulty, a first similarity coefficient calculation unit for calculating a first similarity coefficient of the first recorded physiological signal time series information and the target user physiological signal time series information, a second similarity coefficient calculation unit for calculating a second similarity coefficient of the target user voice signal time series information and the first recorded voice signal time series information, a third similarity coefficient calculation unit for calculating the first similarity coefficient, a first similarity coefficient calculation unit for calculating the first similarity coefficient and the first similarity coefficient, a second similarity coefficient calculation unit for setting the first similarity coefficient or the second similarity coefficient to be equal to or greater than a first similarity coefficient threshold value, and setting the first record training difficulty as the first matching training difficulty.

Next, the specific configuration of the difficulty mode adjustment module 50 will be described in detail. As described above, the difficulty pattern adjustment module 50 further comprises an emotion recognition model component unit, a physiological signal matching training difficulty acquisition unit, a physiological signal matching training difficulty output unit, a voice signal matching training difficulty acquisition unit, an expression matching training difficulty acquisition unit and a second matching training difficulty output unit, wherein the emotion recognition model component unit is used for processing the target user physiological signal time sequence information through the physiological signal preprocessing network to obtain a voice signal matching training difficulty, the expression matching training difficulty acquisition unit is used for processing the target user voice signal time sequence information through the voice signal preprocessing network to obtain a voice signal matching training difficulty, the expression matching training difficulty acquisition unit is used for processing the target user image time sequence information through the expression image preprocessing network to obtain a target user image matching training difficulty, and the second matching training difficulty output unit is used for inputting the signal matching training difficulty, the expression matching training difficulty and the second matching training difficulty to the second matching training network.

The physiological signal matching training difficulty obtaining unit further comprises a driving training data collecting subunit, a pre-processing network training subunit and a processing network construction subunit, wherein the driving training data collecting subunit is used for collecting historical driving training data in the simulated training task number and the simulated training difficulty mode number, the historical driving training data comprises first monitoring physiological signal time sequence information and qualified training time length recording data, the qualified training time length recording data refer to time length after triggering the first monitoring physiological signal time sequence information and trained to be qualified, the recording data mapping subunit is used for setting a training difficulty mapping table and mapping the qualified training time length recording data to obtain physiological signal identification training difficulty, the pre-processing network training subunit is used for taking the physiological signal identification training difficulty as supervision and taking the first monitoring physiological signal time sequence information as input to train the physiological signal pre-processing network, and the processing network construction subunit is used for constructing the physiological signal processing network, wherein the processing network construction subunit is used for constructing the pre-processing network and the physiological signal processing network.

The recording data mapping subunit further comprises a training time length recording data acquisition micro-unit, a similarity coefficient set acquisition micro-unit, a time sequence information clustering micro-unit, a recording data grouping micro-unit, a time sequence value calculation micro-unit and a time sequence information clustering micro-unit, wherein the training time length recording data acquisition micro-unit is used for acquiring a plurality of groups of first monitoring physiological signal time sequence information, the similarity coefficient set of the first monitoring physiological signal is used for carrying out two-by-two similarity calculation on the plurality of first monitoring physiological signal time sequence information to acquire a first monitoring physiological signal similarity coefficient set, the time sequence information clustering micro-unit is used for clustering the plurality of first monitoring physiological signal time sequence information according to a similarity coefficient threshold value and combining the first monitoring physiological signal similarity coefficient set to acquire multi-cluster first monitoring physiological signal time sequence information, the recording data grouping micro-unit is used for grouping the plurality of qualified training time sequence recording data according to the multi-cluster first monitoring physiological signal time sequence information, the time sequence value calculation micro-unit is used for acquiring a plurality of groups of qualified training time sequence information, the time sequence value calculation micro-unit is used for carrying out one-by one group of the multi-cluster first monitoring physiological signal time sequence information, the multi-time sequence information is acquired, the multi-cluster first time sequence information is obtained by the multi-cluster first time sequence information clustering time sequence information is obtained, and the multi-cluster first time sequence information is obtained by the first time sequence information clustering unit is used for the multi-clustered by the first time sequence information, and the first time sequence information is obtained by the first time sequence information unit, and the first time sequence unit is used for the multi-time-sequential information method, the physiological signal identification training difficulty obtaining micro-unit is used for respectively processing the plurality of qualified training duration identification data based on the training difficulty mapping table to obtain the physiological signal identification training difficulty.

The difficulty mode adjustment module 50 further comprises an operation step sequence acquisition unit, an operation step segmentation unit, a second simulation training difficulty mode acquisition unit and a traversing step segmentation unit, wherein the operation step sequence acquisition unit is used for acquiring an operation step sequence of a simulation training task number, the operation step segmentation unit is used for segmenting the operation step sequence according to a single operation step to acquire a first simulation training difficulty mode, the second simulation training difficulty mode acquisition unit is used for segmenting the operation step sequence according to two operation steps to acquire a second simulation training difficulty mode, the traversing step segmentation unit is used for segmenting the operation step sequence according to K operation steps to acquire a K-th simulation training difficulty mode, K is equal to the total number of the operation steps, and the training difficulty mode addition unit is used for adding the first simulation training difficulty mode and the second simulation training difficulty mode into the simulation training difficulty mode until the K-th simulation training difficulty mode.

The emotion recognition-based simulation training difficulty dynamic constraint system provided by the embodiment of the invention can execute the emotion recognition-based simulation training difficulty dynamic constraint method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Although the present application makes various references to certain modules in a system according to an embodiment of the present application, any number of different modules may be used and run on a user terminal and/or a server, and each unit and module included are merely divided according to functional logic, but are not limited to the above-described division, so long as the corresponding functions can be implemented, and in addition, specific names of each functional unit are only for convenience of distinguishing from each other, and are not intended to limit the scope of protection of the present application.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application. In some cases, the acts or steps recited in the present application may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Claims

1. A method for dynamically constraining the difficulty of simulated training based on emotion recognition, characterized in that it is applied to a system for dynamically constraining the difficulty of simulated training based on emotion recognition, wherein the system is communicatively connected to a wearable sensor, and the wearable sensor is deployed at a preset location of the target user, comprising:

When the target user enters the driving simulation training area to start training, the system collects the target user's physiological signal timing information and voice signal timing information through wearable sensors, and collects the target user's facial expression image timing information through the camera in the simulation training area.

Obtain a training difficulty level calibration database;

Based on the training difficulty level calibration database, the training difficulty is calibrated for the target user's physiological signal timing information, the target user's voice signal timing information, and the target user's facial expression image timing information to obtain the first matching training difficulty.

When the first matching training difficulty is empty or not unique, the second matching training difficulty is obtained by processing the target user's physiological signal time sequence information, the target user's voice signal time sequence information and the target user's facial expression image time sequence information through the emotion recognition model.

Adjust the simulation training difficulty mode according to the second matching training difficulty.

2. The method as described in claim 1, characterized in that it further comprises:

Obtain the simulation training task number and the simulation training difficulty mode number;

The second matching training difficulty, the simulated training task number, the simulated training difficulty mode number, the target user's physiological signal timing information, the target user's voice signal timing information, and the target user's facial expression image timing information are associated and stored in the training difficulty level calibration database.

3. The method as described in claim 2, characterized in that, based on the training difficulty level calibration database, the training difficulty calibration is performed on the target user's physiological signal timing information, the target user's speech signal timing information, and the target user's facial expression image timing information to obtain a first matching training difficulty, including:

Input the simulated training task number and the simulated training difficulty mode number into the training difficulty level calibration database, and extract the first recorded physiological signal timing information, the first recorded speech signal timing information, the first recorded facial expression image timing information, and the first recorded training difficulty.

Calculate the first similarity coefficient between the first recorded physiological signal timing information and the target user's physiological signal timing information;

Calculate the second similarity coefficient between the timing information of the target user's speech signal and the timing information of the first recorded speech signal;

Calculate the third similarity coefficient between the temporal information of the first recorded facial expression image and the temporal information of the target user's facial expression image;

When the first similarity coefficient is greater than or equal to the first similarity threshold, the second similarity coefficient is greater than or equal to the second similarity threshold, and the third similarity coefficient is greater than or equal to the third similarity threshold, the training difficulty of the first record is set to the first matching training difficulty.

4. The method as described in claim 2, characterized in that the emotion recognition model includes a physiological signal preprocessing network, a speech signal preprocessing network, an facial expression image preprocessing network, and a post-emotion recognition fitting network. When the first matching training difficulty is empty or not unique, the emotion recognition model processes the target user's physiological signal timing information, the target user's speech signal timing information, and the target user's facial expression image timing information to obtain a second matching training difficulty, including:

The physiological signal preprocessing network processes the time sequence information of the target user's physiological signals to obtain the physiological signal matching training difficulty.

The speech signal preprocessing network processes the temporal information of the target user's speech signal to obtain the speech signal matching training difficulty.

The facial expression image preprocessing network is used to process the temporal information of the target user's facial expression image to obtain the training difficulty of facial expression matching.

The training difficulty of matching the physiological signal, the training difficulty of matching the speech signal, and the training difficulty of matching the facial expression are input into the post-emotion recognition fitting network, and the second matching training difficulty is output.

5. The method as described in claim 4, wherein the construction step of the physiological signal preprocessing network includes:

Collect historical driving training data at the simulation training task number and the simulation training difficulty mode number, wherein the historical driving training data includes first monitoring physiological signal timing information and qualified training duration record data, wherein the qualified training duration record data refers to the training duration to qualified after triggering the first monitoring physiological signal timing information;

A training difficulty mapping table is set up to map the qualified training duration records to obtain physiological signals indicating the training difficulty.

The physiological signal preprocessing network is trained using the training difficulty of the physiological signal identifier as supervision and the timing information of the first monitored physiological signal as input.

The construction steps of the speech signal preprocessing network and the facial expression image preprocessing network are the same as those of the physiological signal preprocessing network.

6. The method as described in claim 5, characterized in that, setting a training difficulty mapping table to map the qualified training duration record data to obtain physiological signals indicating training difficulty includes:

Obtain a one-to-one correspondence of several first-monitored physiological signal timing information and several qualified training duration record data;

The pairwise similarity calculation is performed on the time-series information of the plurality of first monitoring physiological signals to obtain a set of similarity coefficients for the first monitoring physiological signals;

Based on the similarity coefficient threshold and combined with the first monitoring physiological signal similarity coefficient set, the time series information of the several first monitoring physiological signals is clustered to obtain multi-cluster first monitoring physiological signal time series information.

Based on the timing information of the first monitoring physiological signal of the multiple clusters, the several qualified training duration record data are grouped to obtain multiple sets of qualified training duration record data.

The mode value is calculated by traversing the multiple sets of training duration records to obtain multiple qualified training duration identifier data;

By traversing the multiple clusters of first monitoring physiological signal time-series information, a single data point is randomly extracted to obtain multiple first monitoring physiological signal time-series information;

Based on the training difficulty mapping table, the multiple qualified training duration identifier data are processed respectively to obtain the physiological signal identifier training difficulty.

7. The method as described in claim 1, characterized in that adjusting the simulation training difficulty mode according to the second matching training difficulty includes:

The sequence of steps to obtain the simulation training task number;

The sequence of operation steps is divided into individual operation steps to obtain the first simulation training difficulty mode;

The sequence of operation steps is divided into two operation steps to obtain a second simulation training difficulty mode;

The process continues until the sequence of operation steps is divided into K operation steps to obtain the Kth simulated training difficulty mode, where K equals the total number of operation steps.

The first simulation training difficulty mode, the second simulation training difficulty mode, up to the Kth simulation training difficulty mode are added to the simulation training difficulty mode.

8. A dynamic constraint system for simulation training difficulty based on emotion recognition, characterized in that the system comprises:

The facial expression image timing information acquisition module is used to acquire the timing information of the target user's physiological signals and voice signals through wearable sensors when the target user enters the driving simulation training area to start training, and to acquire the timing information of the target user's facial expression images through the camera in the simulation training area.

A calibration database acquisition module is used to obtain a training difficulty level calibration database;

The training difficulty calibration module is used to calibrate the training difficulty based on the training difficulty level calibration database for the target user's physiological signal timing information, the target user's voice signal timing information, and the target user's facial expression image timing information, to obtain a first matching training difficulty.

The second matching training difficulty acquisition module is used to obtain the second matching training difficulty by processing the target user's physiological signal timing information, the target user's voice signal timing information, and the target user's facial expression image timing information through an emotion recognition model when the first matching training difficulty is empty or not unique.

The difficulty mode adjustment module is used to adjust the simulation training difficulty mode according to the second matching training difficulty.