CN119524382A

CN119524382A - A virtual simulation interactive image recognition game monitoring method

Info

Publication number: CN119524382A
Application number: CN202411649023.5A
Authority: CN
Inventors: 周守波
Original assignee: Anhui Bodong Information Technology Co ltd
Current assignee: Anhui Bodong Information Technology Co ltd
Priority date: 2024-11-19
Filing date: 2024-11-19
Publication date: 2025-02-28

Abstract

The invention provides a virtual simulation interactive image recognition game monitoring method which comprises the steps of S1, capturing natural behaviors of a player in an interactive game through high-resolution camera equipment, generating video data representing game action moments in the interactive game of the player to provide higher-quality input data, S2, carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and deep learning target detection algorithm by utilizing the high-resolution video data collected in the step S1, and S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to virtual characters in the game. According to the invention, through combining with the Kinect somatosensory camera, the natural behaviors of the player are accurately captured and analyzed in real time, and then the shadow areas in the images are identified, so that the shadow areas in the images in the high-resolution video data are accurately analyzed and identified, and more abundant information is provided for subsequent image processing and analysis.

Description

Virtual simulation interactive image recognition game monitoring method

Technical Field

The invention relates to the technical field of virtual simulation, in particular to a virtual simulation interactive image recognition game monitoring method.

Background

Most of the existing motion capture analysis systems adopt a common camera to capture human body motion, the acquired images are fuzzy, and the phenomena of ghosting, shadow, foot deficiency and the like cannot be processed, so that the subsequent analysis and judgment results are easily affected. In motion capture technology, the "shadow" phenomenon is a key technical challenge, which refers to a data area that cannot be captured during capture due to the limitation of the field of view of the sensor or the technical limitation, and this phenomenon is particularly prominent in gambling training, because it directly affects the integrity of motion data and the accuracy of motion recognition.

For example, in a virtual reality football game, if the motion capture system is unable to accurately capture every minute motion of the player, such as a quick turn or subtle foot adjustment, the virtual character in the game may not be able to reproduce the motion, resulting in a great deal of loss of player experience in the game.

Such data incompleteness not only reduces the accuracy of motion recognition, which can impair the effectiveness of game training, but can also increase the cost of the training system because additional sensors or more advanced capture techniques are needed to compensate for the missing data, resulting in a reduction in training effectiveness because the player cannot learn all necessary motion patterns from the complete motion data.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a virtual simulation interactive image recognition game monitoring method which solves the problems in the background art.

In order to achieve the above purpose, the invention is realized by the following technical scheme that the virtual simulation interactive image recognition game monitoring method comprises the following steps:

s1, constructing player action capturing strategy

Capturing natural behaviors of a player in an interactive game through a high-resolution camera device, and generating video data representing game action moments in the interactive game of the player so as to provide higher-quality input data;

s2, data optimization

Utilizing the high-resolution video data collected in the step S1, and carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis;

s3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual roles in the game.

Compared with the prior art, the invention has the beneficial effects that:

In order to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human body motion, the acquired image is fuzzy, and the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easy to influence, the invention combines a Kinect somatosensory camera to firstly realize accurate capture and real-time analysis of the natural behaviors of a player, and secondly, extracts and analyzes the motions of the player by identifying shadow areas in the image and utilizing a deep learning target, so that the invention can more accurately analyze and identify the shadow parts in the image in high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and virtual environment.

Drawings

The disclosure of the present invention is described with reference to the accompanying drawings. It should be understood that the drawings are for purposes of illustration only and are not intended to limit the scope of the present invention in which like reference numerals are used to designate like parts. Wherein:

FIG. 1 is a schematic diagram of a virtual simulation interactive image recognition game monitoring method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a player motion capture process according to an embodiment of the invention;

FIG. 3 is a schematic flow chart of optimizing and processing high-resolution video data acquired by a high-resolution camera device according to an embodiment of the present invention;

FIG. 4 is a flow chart of a virtual simulation system for performing behavior recognition and response to actions of a player according to an embodiment of the present invention;

FIG. 5 is a flow chart illustrating the extraction and analysis of shadow portions in an acquired high resolution video data image according to an embodiment of the present invention;

FIG. 6 is a flowchart of combining the player actions captured by the Kinect somatosensory camera with the high resolution video data image analysis results according to an embodiment of the present invention.

Detailed Description

It is to be understood that, according to the technical solution of the present invention, those skilled in the art may propose various alternative structural modes and implementation modes without changing the true spirit of the present invention. Accordingly, the following detailed description and drawings are merely illustrative of the invention and are not intended to be exhaustive or to limit the invention to the precise form disclosed.

The present invention will be described in further detail below with reference to the drawings, but is not limited thereto.

As the understanding of the technical conception and the realization principle of the invention, the virtual simulation interactive image recognition game monitoring method provided by the invention mainly aims to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human motion, the acquired image is fuzzy, the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easily influenced. Therefore, the invention combines the Kinect somatosensory camera, firstly, realizes accurate capture and real-time analysis of the natural behaviors of the player, and secondly, extracts and analyzes the actions of the player by identifying the shadow areas in the images and utilizing the deep learning targets, so that the invention can more accurately analyze and identify the shadow areas in the images in the high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and the virtual environment.

In specific implementation, as shown in fig. 1, the proposed virtual simulation interactive image recognition game monitoring method includes the following steps:

s1, constructing player action capturing strategy

The natural behavior of the player in the interactive game is captured by the high-resolution camera device, and video data representing the game action moment in the interactive game of the player is generated so as to provide higher-quality input data. It will be appreciated that in order to more accurately capture and simulate the movements of a player, it is generally important to pay attention to several critical body parts, such as the shoulders, hands, feet and knees, which are particularly important in a game, so that they are identified with special markers, just as if a standard jumping motion were performed, to ensure that the movements of the parts were accurately captured, so that the character in the game mimics the actual movements of the player, and by capturing the natural movements of the player in an interactive game, high quality video data is generated, which can be used as a basis for subsequent analysis and processing, and by means of a high resolution imaging device, it is ensured that the captured movements are rich and accurate, providing the game with actual player action input.

As shown in fig. 2-3, in one embodiment of the present invention, capturing a player's natural behavior in an interactive game is accomplished by marking key action parts in the course of the player's game action, which is specifically:

S1-1, taking the standing action of the player in the straight line as a reference, and self-defining physical angle parameters and position parameters of the player in the straight line, wherein the physical angle parameters comprise theta _tor which is the inclination angle of the trunk when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, theta _nec which is the inclination angle of the neck and the trunk when the player stands in the straight line and is used for indicating whether the head of the player keeps straight line with the trunk, theta _kne which is the bending angle of the knee when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, and the position parameters are position coordinates A _st of the player in the three-dimensional space of the virtual character in the game and are used for describing whether the player stands in a standing area captured by a preset high-resolution imaging device.

S1-2, capturing hand and finger movements of a player by using a glove with built-in sensors, wherein the hand and finger movements at least comprise bending and rotating angles of wrists, overall posture angles of hands and bending angles of each finger, synchronously using an accelerometer, a gyroscope or a magnetometer inertial sensor to be attached to a body part of the player, capturing head rotation alpha _hea, waist bending beta _wai and leg movement gamma _hip movements of the player, and realizing overall capturing of the whole body movements of the player, wherein the head rotation movements at least comprise yaw alpha _yaw, pitch alpha _pit and rolling alpha _roo, the waist bending movements at least comprise bending beta _ben and rotating angles beta _rot of waists, and the leg movement movements at least comprise rotating angles gamma _hiprot of hip joints, bending angles gamma _kneben of knees and rotating angles gamma _ankrot of knees.

Based on the above technical concept, it should be noted that, in the motion recognition process, first, body angle parameters of a player including inclination or bending angles of a trunk, a neck and knees are captured by a sensor to determine a basic standing posture of the player, then, a specific position of the player in a three-dimensional space is determined by a position parameter to ensure that the player is within an effective capturing range of an image capturing device, and further, limb motions of the player including up and down movements of shoulders, opening and closing of hands, lifting of feet, bending of knees and the like can be captured in detail by action coordinate points of shoulders, hands, feet, and the like, and the change of the coordinate points can reflect a dynamic process of the player when executing a specific motion. By comprehensively analyzing the parameters, the system can accurately identify and respond to the natural behaviors of the player, so that the real-time mapping and simulation of the actions in the game are realized, the system can better understand and simulate the actions of the player, and a more real and interactive game experience is provided.

In one embodiment of the invention, by capturing the physical actions, gestures and facial expressions of a player in a game, rich data can be collected that can fully record the player's interaction and experience. After marking the player action information, the acquired action information can be edited and enhanced by the video editing server, and the edited and enhanced video can provide higher-quality input data, which is helpful for improving the accuracy of image recognition and analysis, i.e. the high-quality video input data can improve the performance of an image processing algorithm, because the acquired video input data provides more context information for image processing, such as target tracking or behavior analysis. It will be appreciated that the complete motion information in the captured video data of the player's moments during the game play is the basis for advanced image processing and analysis, without which the subsequent analysis will be incomplete due to the shadow edges or shadow areas of the image.

Therefore, the invention further proposes that the specific process of generating the instant video data during the game action of the player is as follows:

S1-3, adopting a Kinect somatosensory camera as high-resolution camera equipment, setting and installing a plurality of Kinect somatosensory camera nodes based on a standing area captured by the preset high-resolution camera equipment, determining that each node position coordinate P _i is accurately set and is connected with a game server, simultaneously, establishing a relevant parameter L _i and a connection state parameter C _i of each Kinect somatosensory camera node and the standing area so as to ensure that action data of a player can be accurately captured and transmitted,

P _i＝x_i,y_i,z_i represents the position coordinate of the ith Kinect somatosensory camera node in the game virtual scene,

L _i＝{P_i∈A_st |i=1, 2,., N }, indicates whether the i-th Kinect somatosensory camera node P _i is located in the region of the position coordinate a _st of the player in the three-dimensional space of the virtual character in the game, the connection state parameter C _i is the connection state of the i-th Kinect somatosensory camera node and the game server, and a binary description is adopted, 1 indicates that the connection is successful, and 0 indicates that the connection is failed;

S1-4, when a player enters a game virtual scene, the virtual simulation system recognizes through a virtual identity identifier worn by the player, a signal sent by the virtual identity identifier is sent to a game server through a communication protocol in the game, and action video data acquisition in the game action process of the player is carried out, wherein after the game server receives the virtual identity identifier signal of the player, a Kinect body sensing camera corresponding to the position of the player is activated, the Kinect body sensing camera starts to capture the action of the virtual role of the player, and the action data is transmitted to the game server in a video stream form, the game server carries out preliminary processing on the received video stream, comprises format conversion and coding optimization, so as to adapt to video playing standards in the game, and the processed video data is stored in a sharing directory of the game server in an MP4 format, so that the compatibility and accessibility of the video data are ensured;

S1-5, a game server calls a built-in video editing module, clips, special effect addition or scene conversion editing operation is carried out on the collected video stream, and a flower, a background picture and game logo in a virtual scene are added in the process, so that the ornamental value of the video and the immersion of the game are enhanced;

S1-6, storing edited video data in a special storage area of a game server by adopting a distributed storage technology, and ensuring that each video file is associated with a virtual identity of a player and a game time stamp;

S1-7, the game server assigns a unique identifier to each video data based on the combination of the game time stamp and the virtual identity of the player to provide an index path for subsequent real-time analysis of the player' S natural behavior, which operates as follows:

The marking formula in the indexing process of the video data is [ M_ { tag } = g (S_ { id }, T_ { timestamp }) (1), wherein M_ { tag } is a unique marking identifier allocated to the video data, g is a marking function, and the marking function is generated by combining a virtual identity signal S_ { id } of a player and a timestamp T_ { timestamp };

On the basis of the marked video data, a multidimensional index structure is built, the construction of an index path is completed, and a built index formula is shown as follows, wherein [ I_ { index } = h (C_ { class }, M_ { tag }, P_ { location }) (2), wherein I_ { index } is a built index, and h is an index function which is a retrieval path commonly created by classification C_ { class }, mark M_ { tag } of the video data and storage position P_ { location } of the video data in a server shared directory;

S1-8, by checking whether the position coordinates A _st of the player in the three-dimensional space of the virtual character in the game exceeds the predefined game area boundary, a judgment condition L _lea that the player leaves the game area is constructed, so that the virtual simulation system resets all video data states related to the player and prepares for capturing new game actions, and the logic formula is as follows:

In the formula, if the player position coordinate x _pla、y_pla or z _pla shows the minimum value x _min、y_min、z_min or the maximum value x _max、y_max、x_max of the game area, the judgment condition L _lea =1 indicates that the player leaves the game area or completes the game action, at this time, when the player completes the game action or leaves the game area, the server automatically resets all video data states related to the player, thereby clearing temporary cached data, freeing storage space and preparing for capturing new game actions, and otherwise, the judgment condition L _lea =0 indicates that the player is still in the game area.

S2, data optimization

And (3) utilizing the high-resolution video data collected in the step S1 to analyze and process the natural behaviors of the player in real time through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis.

As shown in fig. 4, in an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of frame processing is:

S2-11, opening high-resolution video data by using VideoCapture functions of a video processing library OpenCV, storing a returned high-resolution video data capturing object in a variable cap, defining that a variable N= 5,N represents that 1 frame is extracted every 5 frames to set a frame extraction interval, creating a variable frame_count and initializing to 0 to track the read frame number;

S2-12, entering a loop, setting two values, namely a Boolean value ret and a frame, which are returned by calling a cap.read () function in the loop after each iteration attempt to read a frame from a variable cap, wherein if ret is True, the frame is displayed, the frame variable contains effective image data and can be used for subsequent processing or display, and the frame_count variable is increased by 1;

S2-13, if the frame_count is a multiple of N, the interval point for extracting the next frame is considered to be reached, at the moment, the current frame is displayed by using a cv2. Imshoww function and is waited for user input, if the user presses a q key, the loop is exited, when the loop is ended, a cap.release () function is used for releasing high-resolution video data capture objects, and a cv2.DestroyAllWindows () function is used for closing all open windows, so that the calculation amount required for processing video is reduced.

After the frame processing is performed on the high-resolution video data by using the video processing library OpenCV, it is also necessary to remove the image noise of the high-resolution video data by using gaussian filtering or median filtering, determine a threshold value by using the Otsu method, and convert the image of the high-resolution video data into a binary image.

In an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of deep learning target detection is adopted:

S2-21, constructing a training data set containing the feature vector F so as to describe the game action of the virtual character of the player in the game:

F＝[θ_tor,θ_nec,θ_kne,A_st(x,y,z),α_yaw,α_pit,α_roo,β_ben,β_rot,γ_hip _rot,γ_kne _ben,

γ_ankrot](4);

S2-22, respectively calculating an average value mu _f and a standard deviation sigma _f of each feature F in the training data set to finish the standardization of the feature vector F, wherein the average value mu F has a calculation formula: Wherein N is the number of samples in the training data set, f _i is the value of the characteristic f in the ith sample, and the standard deviation sigma _f is calculated as follows:

S2-23, constructing a convolutional neural network architecture CNN comprising a plurality of convolutional layers, a pooling layer, a full-connection layer and a classification layer, so as to extract features from input high-resolution video data and classify the features, and simultaneously, inputting a normalized feature vector F into the convolutional neural network architecture CNN for forward propagation and calculation output, wherein,

In the convolution layer, convolution operation is performed on input image data I by (I x, y) =Σ _mΣ_n I (m, n) K (x-m, y-n) (7), where I is input image data representing high-resolution video data of player action after frame processing and binarization processing, K is convolution operation, m and n are sliding positions of the convolution kernel K on the input image data I, x and y are coordinates of the position of the convolution kernel K on the input image data I, I (m, n) is a pixel value of the input image data I at the position (m, n), and K (x-m, y-n) is a weight value of the convolution kernel K at the offset position (x-m, y-n) relative to the center position (x, y) thereof;

In the pooling layer, pooling operation is performed by using a formula P (I, j) =max _m,n I (i·s-m, j·s-n) (8) to reduce the calculation amount and prevent overfitting, wherein P is a feature map after pooling, I is an input image data feature map, and S is a step size;

in the full connection layer, the feature is mapped to a final classification result by using an activation function ReLU, wherein Z=W×A+b, Z is output, W is a weight matrix, A is input, and b is bias;

In the classification layer, a Softmax function is used and passed through the formula Converting the output of the convolutional neural network architecture CNN into a probability distribution, wherein Z is the output of the fully connected layer, softmax (Z) _i is the probability of the ith class,For the sum of all class scores after exponential conversion, it is used for normalization, ensure that the sum of probabilities of all classes is 1;

S2-24, marking head rotation, waist bending and leg movement actions in the captured high-resolution video data, training a behavior recognition model by using the marked data set, and adjusting parameters of the behavior recognition model by using a back propagation algorithm Wherein θ is a recognition model parameter, η is a learning rate, and J (θ) is a loss function, and the trained model is applied to captured high-resolution video data to recognize the player's behavior.

Based on the technical conception, the specific process of recognizing the player's behavior by the trained behavior recognition model is to input the preprocessed high-resolution video data into the trained behavior recognition model, wherein the input data at least comprises frames or fragments of the player's hand, finger motion, head rotation alpha _hea, waist bending beta _wai and leg movement gamma _hip motion in the game, and then analyze each frame or fragment and output the behavior class probability by the behavior recognition model, and then average the behavior class probability(11) Performing time sequence analysis on the predicted result which is output by the behavior recognition model and characterizes the action of the player in the game to determine a final behavior label, wherein P _t is the behavior type probability of the player in the game at time T, and T is the considered time window size; and finally, storing the identification results comprising the behavior label, the time stamp and the related video frames in a correlated way with the high-resolution video data, so as to realize the accurate identification of the action behavior of the player.

As shown in fig. 5 to 6, in an embodiment of the present invention, since a Kinect motion camera is used as a high-resolution imaging device to capture human body motion, high-resolution video data is obtained, and a specific light pattern (usually an infrared light pattern) is essentially projected into a game scene where a player is located, the Kinect motion camera captures reflections of the light patterns on the surface of the player, but due to different depths of different positions on the surface of an object, distortions of the light patterns are also different. That is, it may occur that the reflection characteristics of light are different from each other due to different surfaces, and the definition of the shadow edges and the internal brightness of the high resolution video data image are affected, for example, the shadow edges generated by the smooth surfaces may be clearer, and the shadow edges may be blurred by the rough surfaces due to diffuse reflection. In order to identify the shadow portion in the image (shadow/shadow phenomenon is particularly prominent in the game training because it directly affects the integrity of the motion data and the accuracy of the motion recognition), it is first necessary to construct a multispectral reflection model that can decompose the reflected light in the image into a direct reflection component and a scattered reflection component, and then identify the shadow region by calculating a separation matrix of the shadow components. The concrete implementation nodes are as follows:

When analyzing and processing the natural behavior of a player in real time, before realizing the accurate identification of the game action behavior of the player by adopting a deep learning target detection algorithm, the high-resolution video data image which is converted into a binary image is subjected to region division and segmentation so as to eliminate and extract the shadow part in the image of the high-resolution video data and provide accurate visual information for the subsequent virtual character animation, and the process is as follows:

Firstly, converting a high-resolution video data image from an RGB color space to a YCrCb color space by using a conversion formula to extract a shadow part, wherein Y=0.299R+0.587G+0.114B, cr= 0.713 (R-Y), cb=0.564 (B-Y), wherein Y is the brightness of the high-resolution video data image, cr and Cb are the chromaticity of the high-resolution video data image and are used for describing the color information of the high-resolution video data image, traversing each pixel of the high-resolution video data image, applying the formula to the RGB value of each pixel, calculating a corresponding YCrCb value, and comparing the converted YCrCb image with an original RGB image to ensure that the conversion is correct;

Secondly, a black shadow region including a shadow portion in the high resolution video data image is identified by analyzing a reflection component analysis method, which is operated by constructing a multispectral reflection model, wherein R (lambda, phi) =R _d(λ,φ)+R_S (lambda, phi) (12), and decomposing reflected light in the high resolution video data image into a direct reflection component R _d (lambda, phi) and a scattered reflection component R _S (lambda, phi), wherein the direct reflection component R _d (lambda, phi) is reflected light directly related to a light source, is directly reflected to a Kinect body-sensing camera by a body of a player, and the scattered reflection component R _S (lambda, phi) is reflected light related to ambient light or scattered light, and at least comprises light reflected to the body of the player by an ambient environment;

constructing a shadow separation matrix S _dow for separating and identifying a shadow region in a high-resolution video data image, wherein A is a design matrix containing characteristic vectors related to a direct reflection component R _d (lambda, phi) and a scattered reflection component R _S (lambda, phi), the characteristic vectors comprise the posture and surface characteristics of a player body, W is a weight matrix for adjusting the influence of different factors on the shadow part, the influence comprises the light source intensity, the distance between the player body and a Kinect body sensing camera and the intensity of ambient light, L is a brightness matrix for representing the brightness value of each pixel point obtained from a multispectral reflection model, the spectral response comprises the Kinect body sensing camera, E is an ambient light matrix for representing the influence of the ambient light on the brightness of each pixel point, and the illumination condition of the surrounding environment where the player is located is included;

Solving a shadow separation matrix S _dow by using a weighted linear least square method, and extracting a shadow region in the high-resolution video data image;

Again, the change in the direct reflectance component R _d (λ, Φ) is analyzed as a time series data to identify the trend of the change in the posture of the player's body, Δr _d(t)＝R_d(t)-R_d (t-1) (14), where Δr _d (t) is the amount of change in the direct reflectance component R _d (λ, Φ) of the change in the posture of the player's body at time t, R _d (t) is the direct reflectance component of the change in the posture of the player's body at time t, while the change in the scattered reflectance component R _S (λ, Φ) is analyzed to detect the interaction degree of the player with the environment, if the scattered reflectance component R _S (λ, Φ) increases significantly in a certain area, indicating that the player enters a new lighting environment or interacts with objects in the environment, and the scattered reflectance component R _S (λ, Φ) is measured using image processing techniques to increase the lighting intensity of the area where the player is located to determine whether the player enters a brighter or darker environment;

Finally, combining the action of the player captured by the Kinect somatosensory camera with the analysis result of the high-resolution video data image, and enabling the action of the player and the analysis result of the high-resolution video data image to be aligned in space through an image processing technology, so that the pixel points of the Kinect somatosensory camera and the Kinect somatosensory camera represent the same physical position, and integrating the action of the player and the analysis result of the high-resolution video data image into a unified data structure for complete data fusion.

S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual characters in the game, wherein when the behaviors of the player are recognized, the behaviors are mapped to the virtual characters in the game by using an inverse kinematics algorithm, the target gestures of the game of the player are calculated according to the inverse kinematics algorithm, the target gestures comprise the joint angles and the body part positions of the characters, and finally, the animation of the virtual characters is updated according to the calculated target gestures, so that the characters can reflect the natural behaviors of the player in the game environment.

The technical scope of the present invention is not limited to the above description, and those skilled in the art may make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and these changes and modifications should be included in the scope of the present invention.

Claims

1. A virtual simulation interactive image recognition game monitoring method, characterized in that it includes the following steps:

S1. Build a player motion capture strategy

Capturing the natural behavior of players in interactive games through high-resolution camera equipment, generating video data representing the game action moments in the player interactive game, so as to provide higher quality input data;

S2. Data Optimization

Using the high-resolution video data collected in step S1, the natural behaviors occurring in the player interactive game are analyzed and processed in real time through frame processing, binarization, and deep learning target detection algorithms to extract valid video data segments and prepare for the next step of behavior recognition analysis;

S3. Identify and respond to the player’s natural behavior, and map this behavior to the virtual character in the game.

2. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that:

In step S1, the specific process of capturing the natural behavior of players in interactive games is as follows:

S1-1. Based on the player's standing upright action, customize the body angle parameters and position parameters when the player stands upright, where:

The body angle parameters include: θ _tor is the tilt angle of the player's trunk when standing upright, which is used to indicate whether the player's body is upright; θ _nec is the tilt angle of the player's neck and trunk when standing upright, which indicates whether the player's head is in a straight line with the trunk; θ _kne is the bending angle of the player's knees when standing upright, which indicates whether the player's body is fully straight;

The position parameter is the position coordinate _Ast of the player's virtual character in the game in the three-dimensional space, which is used to describe whether the player is in the standing area captured by the predetermined high-resolution camera device;

S1-2, using gloves with built-in sensors to capture the player's hand and finger movements, wherein the hand and finger movements at least include the bending and rotation angles of the wrist, the overall posture angle of the hand, and the bending angle of each finger;

An accelerometer, a gyroscope or a magnetometer inertial sensor is synchronously used and attached to the player's body parts to capture the player's head rotation α _hea , waist bending β _wai and leg movement γ _hip movements, so as to achieve comprehensive capture of the player's whole body movements; the head rotation movement at least includes yaw α _yaw , pitch α _pit and roll α _roo , the waist bending movement at least includes waist bending β _ben and rotation angle β _rot , and the leg movement movement at least includes hip joint rotation angle γ _hiprot , knee bending angle γ _kneben , ankle rotation angle γ _ankrot .

3. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that:

The specific process of generating instantaneous video data of the player's game action is as follows:

S1-3, using Kinect motion sensing camera as a high-resolution camera device, and based on the predetermined standing area captured by the high-resolution camera device, setting and installing multiple Kinect motion sensing camera nodes, ensuring that the position coordinates P _i of each node are accurately set and connected to the game server, and at the same time, establishing the association parameter L _i and the connection state parameter C _i of each Kinect motion sensing camera node and the standing area, so as to ensure that the player's action data can be accurately captured and transmitted, wherein,

P _i = x _i , y _i , z _i , represents the position coordinates of the i-th Kinect camera node in the game virtual scene.

L _i ={P _i ∈A _st |i=1,2,...,N}, indicating whether the ith Kinect camera node P _i is located in the position coordinate region A _st of the player's virtual character in the three-dimensional space in the game. The connection status parameter C _i is the connection status between the ith Kinect camera node and the game server, which is described in binary, 1 indicates a successful connection, and 0 indicates a failed connection.

S1-4. When the player enters the virtual scene of the game, the virtual simulation system identifies the player through the virtual identity identifier worn by the player. The signal emitted by the virtual identity identifier is sent to the game server through the communication protocol in the game to collect the action video data of the player in the game process: after the game server receives the virtual identity identification signal of the player, it activates the Kinect somatosensory camera corresponding to the player's position. The Kinect somatosensory camera starts to capture the player's virtual character action and transmits the action data to the game server in the form of a video stream. The game server performs preliminary processing on the received video stream, including format conversion and encoding optimization, to adapt to the video playback standard in the game, and stores the processed video data in the shared directory of the game server in MP4 format to ensure the compatibility and accessibility of the video data;

S1-5, the game server calls the built-in video editing module to edit the collected video stream, add special effects or scene conversion editing operations, and add highlights, background images and game logos in the virtual scene in the process to enhance the viewing experience of the video and the immersion of the game;

S1-6. The edited video data is stored in a dedicated storage area of the game server using distributed storage technology, and each video file is ensured to be associated with the player's virtual identity and game timestamp;

S1-7, the game server assigns a unique identifier to each video data based on the combination of the game timestamp and the player's virtual identity, so as to provide an index path for the subsequent real-time analysis of the player's natural behavior, and the operation is as follows:

The tagging formula in the process of constructing the index of video data is: [M_{tag}＝g(S_{id},T_{timestamp})], where M_{tag} is the unique tag assigned to the video data, and g is the tagging function, which is generated by combining the player's virtual identity signal S_{id} and the timestamp T_{timestamp};

On the basis of the marked video data, a multi-dimensional index structure is constructed to complete the index path construction. The constructed index formula is as follows: [I_{index}＝h(C_{class},M_{tag},P_{location})], where I_{index} is the established index and h is the index function, which is a search path jointly created by the classification C_{class} of the video data, the tag M_{tag} and the storage location P_{location} of the video data in the server shared directory;

S1-8, by checking whether the position coordinate _Ast of the player's virtual character in the game exceeds the predefined game area boundary, a determination condition _Llea is constructed for the player to leave the game area, so that the virtual simulation system resets all video data states related to the player and prepares for capturing new game actions. The logic formula is as follows:

1. if(x _pla ＜x _min or x _pla ＞x _max )or

0, oth

In the formula, if the player position coordinates x _pla , y _pla or z _pla exceed the minimum value x _min , y _min , z _min or the maximum value x _max , y _max , x _max of the game area, the judgment condition L _lea =1, indicating that the player has left the game area or completed the game action. At this time, when the player completes the game action or leaves the game area, the server automatically resets all video data states related to the player, thereby clearing the temporarily cached data, releasing storage space, and preparing for capturing new game actions; otherwise, the judgment condition L _lea =0, indicating that the player is still in the game area.

4. The virtual simulation interactive image recognition game monitoring method according to claim 1 or 3, characterized in that:

In step S2, when the player's natural behavior is analyzed and processed in real time, the specific process of frame processing is as follows:

S2-11, using the VideoCapture function of the video processing library OpenCV to open the high-resolution video data, and storing the returned high-resolution video data capture object in the variable cap, defining a variable N=5, where N represents extracting 1 frame every 5 frames to set the frame extraction interval, creating a variable frame_count and initializing it to 0 to track the number of frames that have been read;

S2-12, enter the loop, set each iteration to try to read a frame from the variable cap, and call the cap.read() function in this loop to return two values: Boolean value ret and frame frame, where if ret is True, the frame is displayed, indicating that the frame variable contains image data that effectively represents the high-resolution video data and can be used for subsequent processing or display, and the frame_count variable increases by 1; if ret is False, it means that all frames of the high-resolution video data have been read, and the loop will terminate;

S2-13. If frame_count is a multiple of N, it is considered that the interval point for extracting the next frame has been reached. At this time, the current frame will be displayed using the cv2.imshow function, and wait for user input. If the user presses the q key, the loop will exit. When the loop ends, the cap.release() function will be used to release the high-resolution video data capture object, and the cv2.destroyAllWindows() function will be used to close all open windows, thereby reducing the amount of computation required to process the video.

5. The virtual simulation interactive image recognition game monitoring method according to claim 4, characterized in that:

After using the video processing library OpenCV to perform frame processing on the high-resolution video data, it is also necessary to use Gaussian filtering or median filtering to remove image noise from the high-resolution video data, and use the Otsu method to determine the threshold to convert the high-resolution video data image into a binary image.

6. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that:

In step S2, when analyzing and processing the player's natural behavior in real time, the specific process of deep learning target detection is as follows:

S2-21. Construct a training data set containing feature vector F to describe the game actions of the player's virtual character in the game:

F＝[θ _tor , θ _nec , θ _kne , A _st (x, y, z), α _yaw , α _pit , α _roo , β _ben , β _rot , γ _hiprot , γ _kneben , γ _ankrot ];

S2-22. For each feature f in the training data set, the feature vector F is standardized by calculating its mean value μ _f and standard deviation σ _f , where the mean value μ f is calculated as follows: Where N is the number of samples in the training data set, _fi is the value of feature f in the i-th sample; the standard deviation _σf is calculated as:

S2-23. Construct a convolutional neural network architecture CNN including multiple convolutional layers, pooling layers, fully connected layers and classification layers to extract features from the input high-resolution video data and perform classification. At the same time, input the standardized feature vector F into the convolutional neural network architecture CNN, perform forward propagation, and calculate the output, wherein,

In the convolution layer, the convolution kernel is applied to the input image data I for convolution operation:

(I*K)(x, y) = ∑ _m ∑nI(m, n)K(xm, yn), where I is the input image data after frame processing and binarization, and represents the high-resolution video data of the player's game action, K is the convolution kernel, * is the convolution operation, m and n are the sliding positions of the convolution kernel K on the input image data I, x and y are the coordinates of the position in the convolution kernel K on the input image data I, I(m, n) is the pixel value of the input image data I at the position (m, n), and K(xm, yn) is the weight value of the convolution kernel K at the offset position (xm, yn) relative to its center position (x, y);

In the pooling layer, the formula P(i, j) = max _{m, n} I(i·sm, j·sn) is used for pooling operation to reduce the amount of calculation and prevent overfitting. In the formula, P is the feature map after pooling, I is the feature map of the input image data, and S is the step size;

In the fully connected layer, the activation function ReLU is used to map the features to the final classification result: Z = W*A+b, where Z is the output, W is the weight matrix, A is the input, and b is the bias;

In the classification layer, the Softmax function is used and the formula The output of the convolutional neural network architecture CNN is converted into a probability distribution, where Z is the output of the fully connected layer, Softmax(Z)i is the probability of the i-th class, It is the sum of all category score index conversions, which is used for normalization to ensure that the sum of the probabilities of all categories is 1;

S2-24. Label the head rotation, waist bending and leg movement in the captured high-resolution video data, use the labeled data set to train the behavior recognition model, and adjust the behavior recognition model parameters through the back propagation algorithm Where θ is the recognition model parameter, η is the learning rate, J(θ) is the loss function, and the trained model is applied to the captured high-resolution video data to recognize the player's behavior.

7. The virtual simulation interactive image recognition game monitoring method according to claim 6, characterized in that:

The specific process of the trained behavior recognition model identifying the player's behavior is as follows:

First, the pre-processed high-resolution video data is input into the trained behavior recognition model, and the input data at least includes frames or segments of the player's hand and finger movements, head rotation α _hea , waist bending β _wai , and leg movement _{γ hip} in the game;

Secondly, the action recognition model analyzes each frame or clip of the input and outputs the action category probability;

Again, by averaging the behavior category probabilities Perform time series analysis on the prediction results of the behavior recognition model that represent the player's actions in the game to determine the final behavior label, where _Pt is the probability of the player's behavior category occurring at time t in the game, and T is the size of the time window considered;

Finally, the recognition results including behavior labels, timestamps and related video frames are associated with high-resolution video data and stored to achieve accurate recognition of players' gaming actions.

8. The virtual simulation interactive image recognition game monitoring method according to claim 6, characterized in that:

When analyzing and processing the natural behavior of players in real time, before using the deep learning target detection algorithm to accurately identify the player's game action behavior, it is also necessary to divide and segment the high-resolution video data image that has been converted into a binary image to eliminate and extract the shadow part in the image of the high-resolution video data, so as to provide accurate visual information for the subsequent virtual character animation. The process is as follows:

First, the high-resolution video data image is converted from the RGB color space to the YCrCb color space using the following conversion formula to achieve the extraction of the shadow part: Y=0.299R+0.587G+0.114B, Cr=0.713(R-Y), Cb=0.564(B-Y), where Y is the brightness of the high-resolution video data image, Cr and Cb are the chromaticity of the high-resolution video data image, which are used to describe the color information of the high-resolution video data image. Each pixel of the high-resolution video data image is traversed, and the above formula is applied to the RGB value of each pixel to calculate the corresponding YCrCb value, and the converted YCrCb image is compared with the original RGB image to ensure the correct conversion;

Secondly, the black shadow area including the shadow part in the high-resolution video data image is identified by analyzing the reflection component analysis method, and the operation is as follows: a multispectral reflection model is constructed, R(λ, φ) = R _d (λ, φ) + R _s (λ, φ), and the reflected light in the high-resolution video data image is decomposed into a direct reflection component R _d (λ, φ) and a scattered reflection component R _S (λ, φ), where the direct reflection component R _d (λ, φ) is the reflected light directly related to the light source, which is directly reflected by the player's body to the Kinect body sensing camera, and the scattered reflection component R _s (λ, φ) is the reflected light related to the ambient light or the scattered light, which at least includes the light reflected from the surrounding environment to the player's body;

Constructing a shadow separation matrix S _dow to separate and identify shadow areas in the high-resolution video data image, S _dow =( ^AT WA) ^-1AT ^W (LE), where A is a design matrix containing eigenvectors related to the direct reflection component R _d (λ, φ) and the scattered reflection component R _s (λ, φ), including the posture and surface characteristics of the player's body, W is a weight matrix used to adjust the influence of different factors on the shadow part, including the intensity of the light source, the distance between the player's body and the Kinect somatosensory camera, and the intensity of the ambient light, L is a brightness matrix, representing the brightness value of each pixel point obtained from the multi-spectral reflection model, including the spectral response of the Kinect somatosensory camera, and E is an ambient light matrix, representing the influence of ambient light on the brightness of each pixel point, including the lighting conditions of the surrounding environment where the game player is located;

The shadow separation matrix S _dow is solved by using the weighted linear least square method to extract the shadow area in the high-resolution video data image;

Again, the change of the direct reflection component R _d (λ, φ) is analyzed as a time series data to identify the trend of the player's body posture change, ΔRd(t) = Rd(t)-R _d (t-1), where ΔR _d (t) is the change of the direct reflection component R _d (λ, φ) of the player's body posture change at time t, and R _d (t) is the direct reflection component of the player's body posture change at time t. At the same time, the change of the scattered reflection component _RS (λ, φ) is analyzed to detect the degree of interaction between the player and the environment. If the scattered reflection component _RS (λ, φ) increases significantly in a certain area, it indicates that the player has entered a new lighting environment or has interacted with objects in the environment. The scattered reflection component _RS (λ, φ) is measured using image processing technology to increase the light intensity of the player's area to determine whether the player has entered a brighter or darker environment.

Finally, the player's movements captured by the Kinect motion sensing camera are combined with the results of high-resolution video data image analysis. Through image processing technology, the two are aligned in space to ensure that the pixels of the two represent the same physical location and are integrated into a unified data structure for complete data fusion.

9. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that:

When the player's game actions are identified, the inverse kinematics algorithm is needed to map these actions to the virtual character in the game, calculate the target posture of the player's game actions based on the inverse kinematics algorithm, and finally update the animation of the virtual character based on the calculated target posture, so that the character can reflect the player's natural behavior in the game environment.