[go: up one dir, main page]

CN119524382A - A virtual simulation interactive image recognition game monitoring method - Google Patents

A virtual simulation interactive image recognition game monitoring method Download PDF

Info

Publication number
CN119524382A
CN119524382A CN202411649023.5A CN202411649023A CN119524382A CN 119524382 A CN119524382 A CN 119524382A CN 202411649023 A CN202411649023 A CN 202411649023A CN 119524382 A CN119524382 A CN 119524382A
Authority
CN
China
Prior art keywords
player
game
video data
image
resolution video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411649023.5A
Other languages
Chinese (zh)
Inventor
周守波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Bodong Information Technology Co ltd
Original Assignee
Anhui Bodong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Bodong Information Technology Co ltd filed Critical Anhui Bodong Information Technology Co ltd
Priority to CN202411649023.5A priority Critical patent/CN119524382A/en
Publication of CN119524382A publication Critical patent/CN119524382A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/213Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/40Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
    • A63F13/42Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/10Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
    • A63F2300/1087Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/30Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6045Methods for processing data by generating or executing the game program for mapping control signals received from the input arrangement into game commands

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a virtual simulation interactive image recognition game monitoring method which comprises the steps of S1, capturing natural behaviors of a player in an interactive game through high-resolution camera equipment, generating video data representing game action moments in the interactive game of the player to provide higher-quality input data, S2, carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and deep learning target detection algorithm by utilizing the high-resolution video data collected in the step S1, and S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to virtual characters in the game. According to the invention, through combining with the Kinect somatosensory camera, the natural behaviors of the player are accurately captured and analyzed in real time, and then the shadow areas in the images are identified, so that the shadow areas in the images in the high-resolution video data are accurately analyzed and identified, and more abundant information is provided for subsequent image processing and analysis.

Description

Virtual simulation interactive image recognition game monitoring method
Technical Field
The invention relates to the technical field of virtual simulation, in particular to a virtual simulation interactive image recognition game monitoring method.
Background
Most of the existing motion capture analysis systems adopt a common camera to capture human body motion, the acquired images are fuzzy, and the phenomena of ghosting, shadow, foot deficiency and the like cannot be processed, so that the subsequent analysis and judgment results are easily affected. In motion capture technology, the "shadow" phenomenon is a key technical challenge, which refers to a data area that cannot be captured during capture due to the limitation of the field of view of the sensor or the technical limitation, and this phenomenon is particularly prominent in gambling training, because it directly affects the integrity of motion data and the accuracy of motion recognition.
For example, in a virtual reality football game, if the motion capture system is unable to accurately capture every minute motion of the player, such as a quick turn or subtle foot adjustment, the virtual character in the game may not be able to reproduce the motion, resulting in a great deal of loss of player experience in the game.
Such data incompleteness not only reduces the accuracy of motion recognition, which can impair the effectiveness of game training, but can also increase the cost of the training system because additional sensors or more advanced capture techniques are needed to compensate for the missing data, resulting in a reduction in training effectiveness because the player cannot learn all necessary motion patterns from the complete motion data.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a virtual simulation interactive image recognition game monitoring method which solves the problems in the background art.
In order to achieve the above purpose, the invention is realized by the following technical scheme that the virtual simulation interactive image recognition game monitoring method comprises the following steps:
s1, constructing player action capturing strategy
Capturing natural behaviors of a player in an interactive game through a high-resolution camera device, and generating video data representing game action moments in the interactive game of the player so as to provide higher-quality input data;
s2, data optimization
Utilizing the high-resolution video data collected in the step S1, and carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis;
s3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual roles in the game.
Compared with the prior art, the invention has the beneficial effects that:
In order to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human body motion, the acquired image is fuzzy, and the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easy to influence, the invention combines a Kinect somatosensory camera to firstly realize accurate capture and real-time analysis of the natural behaviors of a player, and secondly, extracts and analyzes the motions of the player by identifying shadow areas in the image and utilizing a deep learning target, so that the invention can more accurately analyze and identify the shadow parts in the image in high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and virtual environment.
Drawings
The disclosure of the present invention is described with reference to the accompanying drawings. It should be understood that the drawings are for purposes of illustration only and are not intended to limit the scope of the present invention in which like reference numerals are used to designate like parts. Wherein:
FIG. 1 is a schematic diagram of a virtual simulation interactive image recognition game monitoring method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a player motion capture process according to an embodiment of the invention;
FIG. 3 is a schematic flow chart of optimizing and processing high-resolution video data acquired by a high-resolution camera device according to an embodiment of the present invention;
FIG. 4 is a flow chart of a virtual simulation system for performing behavior recognition and response to actions of a player according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating the extraction and analysis of shadow portions in an acquired high resolution video data image according to an embodiment of the present invention;
FIG. 6 is a flowchart of combining the player actions captured by the Kinect somatosensory camera with the high resolution video data image analysis results according to an embodiment of the present invention.
Detailed Description
It is to be understood that, according to the technical solution of the present invention, those skilled in the art may propose various alternative structural modes and implementation modes without changing the true spirit of the present invention. Accordingly, the following detailed description and drawings are merely illustrative of the invention and are not intended to be exhaustive or to limit the invention to the precise form disclosed.
The present invention will be described in further detail below with reference to the drawings, but is not limited thereto.
As the understanding of the technical conception and the realization principle of the invention, the virtual simulation interactive image recognition game monitoring method provided by the invention mainly aims to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human motion, the acquired image is fuzzy, the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easily influenced. Therefore, the invention combines the Kinect somatosensory camera, firstly, realizes accurate capture and real-time analysis of the natural behaviors of the player, and secondly, extracts and analyzes the actions of the player by identifying the shadow areas in the images and utilizing the deep learning targets, so that the invention can more accurately analyze and identify the shadow areas in the images in the high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and the virtual environment.
In specific implementation, as shown in fig. 1, the proposed virtual simulation interactive image recognition game monitoring method includes the following steps:
s1, constructing player action capturing strategy
The natural behavior of the player in the interactive game is captured by the high-resolution camera device, and video data representing the game action moment in the interactive game of the player is generated so as to provide higher-quality input data. It will be appreciated that in order to more accurately capture and simulate the movements of a player, it is generally important to pay attention to several critical body parts, such as the shoulders, hands, feet and knees, which are particularly important in a game, so that they are identified with special markers, just as if a standard jumping motion were performed, to ensure that the movements of the parts were accurately captured, so that the character in the game mimics the actual movements of the player, and by capturing the natural movements of the player in an interactive game, high quality video data is generated, which can be used as a basis for subsequent analysis and processing, and by means of a high resolution imaging device, it is ensured that the captured movements are rich and accurate, providing the game with actual player action input.
As shown in fig. 2-3, in one embodiment of the present invention, capturing a player's natural behavior in an interactive game is accomplished by marking key action parts in the course of the player's game action, which is specifically:
S1-1, taking the standing action of the player in the straight line as a reference, and self-defining physical angle parameters and position parameters of the player in the straight line, wherein the physical angle parameters comprise theta tor which is the inclination angle of the trunk when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, theta nec which is the inclination angle of the neck and the trunk when the player stands in the straight line and is used for indicating whether the head of the player keeps straight line with the trunk, theta kne which is the bending angle of the knee when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, and the position parameters are position coordinates A st of the player in the three-dimensional space of the virtual character in the game and are used for describing whether the player stands in a standing area captured by a preset high-resolution imaging device.
S1-2, capturing hand and finger movements of a player by using a glove with built-in sensors, wherein the hand and finger movements at least comprise bending and rotating angles of wrists, overall posture angles of hands and bending angles of each finger, synchronously using an accelerometer, a gyroscope or a magnetometer inertial sensor to be attached to a body part of the player, capturing head rotation alpha hea, waist bending beta wai and leg movement gamma hip movements of the player, and realizing overall capturing of the whole body movements of the player, wherein the head rotation movements at least comprise yaw alpha yaw, pitch alpha pit and rolling alpha roo, the waist bending movements at least comprise bending beta ben and rotating angles beta rot of waists, and the leg movement movements at least comprise rotating angles gamma hiprot of hip joints, bending angles gamma kneben of knees and rotating angles gamma ankrot of knees.
Based on the above technical concept, it should be noted that, in the motion recognition process, first, body angle parameters of a player including inclination or bending angles of a trunk, a neck and knees are captured by a sensor to determine a basic standing posture of the player, then, a specific position of the player in a three-dimensional space is determined by a position parameter to ensure that the player is within an effective capturing range of an image capturing device, and further, limb motions of the player including up and down movements of shoulders, opening and closing of hands, lifting of feet, bending of knees and the like can be captured in detail by action coordinate points of shoulders, hands, feet, and the like, and the change of the coordinate points can reflect a dynamic process of the player when executing a specific motion. By comprehensively analyzing the parameters, the system can accurately identify and respond to the natural behaviors of the player, so that the real-time mapping and simulation of the actions in the game are realized, the system can better understand and simulate the actions of the player, and a more real and interactive game experience is provided.
In one embodiment of the invention, by capturing the physical actions, gestures and facial expressions of a player in a game, rich data can be collected that can fully record the player's interaction and experience. After marking the player action information, the acquired action information can be edited and enhanced by the video editing server, and the edited and enhanced video can provide higher-quality input data, which is helpful for improving the accuracy of image recognition and analysis, i.e. the high-quality video input data can improve the performance of an image processing algorithm, because the acquired video input data provides more context information for image processing, such as target tracking or behavior analysis. It will be appreciated that the complete motion information in the captured video data of the player's moments during the game play is the basis for advanced image processing and analysis, without which the subsequent analysis will be incomplete due to the shadow edges or shadow areas of the image.
Therefore, the invention further proposes that the specific process of generating the instant video data during the game action of the player is as follows:
S1-3, adopting a Kinect somatosensory camera as high-resolution camera equipment, setting and installing a plurality of Kinect somatosensory camera nodes based on a standing area captured by the preset high-resolution camera equipment, determining that each node position coordinate P i is accurately set and is connected with a game server, simultaneously, establishing a relevant parameter L i and a connection state parameter C i of each Kinect somatosensory camera node and the standing area so as to ensure that action data of a player can be accurately captured and transmitted,
P i=xi,yi,zi represents the position coordinate of the ith Kinect somatosensory camera node in the game virtual scene,
L i={Pi∈Ast |i=1, 2,., N }, indicates whether the i-th Kinect somatosensory camera node P i is located in the region of the position coordinate a st of the player in the three-dimensional space of the virtual character in the game, the connection state parameter C i is the connection state of the i-th Kinect somatosensory camera node and the game server, and a binary description is adopted, 1 indicates that the connection is successful, and 0 indicates that the connection is failed;
S1-4, when a player enters a game virtual scene, the virtual simulation system recognizes through a virtual identity identifier worn by the player, a signal sent by the virtual identity identifier is sent to a game server through a communication protocol in the game, and action video data acquisition in the game action process of the player is carried out, wherein after the game server receives the virtual identity identifier signal of the player, a Kinect body sensing camera corresponding to the position of the player is activated, the Kinect body sensing camera starts to capture the action of the virtual role of the player, and the action data is transmitted to the game server in a video stream form, the game server carries out preliminary processing on the received video stream, comprises format conversion and coding optimization, so as to adapt to video playing standards in the game, and the processed video data is stored in a sharing directory of the game server in an MP4 format, so that the compatibility and accessibility of the video data are ensured;
S1-5, a game server calls a built-in video editing module, clips, special effect addition or scene conversion editing operation is carried out on the collected video stream, and a flower, a background picture and game logo in a virtual scene are added in the process, so that the ornamental value of the video and the immersion of the game are enhanced;
S1-6, storing edited video data in a special storage area of a game server by adopting a distributed storage technology, and ensuring that each video file is associated with a virtual identity of a player and a game time stamp;
S1-7, the game server assigns a unique identifier to each video data based on the combination of the game time stamp and the virtual identity of the player to provide an index path for subsequent real-time analysis of the player' S natural behavior, which operates as follows:
The marking formula in the indexing process of the video data is [ M_ { tag } = g (S_ { id }, T_ { timestamp }) (1), wherein M_ { tag } is a unique marking identifier allocated to the video data, g is a marking function, and the marking function is generated by combining a virtual identity signal S_ { id } of a player and a timestamp T_ { timestamp };
On the basis of the marked video data, a multidimensional index structure is built, the construction of an index path is completed, and a built index formula is shown as follows, wherein [ I_ { index } = h (C_ { class }, M_ { tag }, P_ { location }) (2), wherein I_ { index } is a built index, and h is an index function which is a retrieval path commonly created by classification C_ { class }, mark M_ { tag } of the video data and storage position P_ { location } of the video data in a server shared directory;
S1-8, by checking whether the position coordinates A st of the player in the three-dimensional space of the virtual character in the game exceeds the predefined game area boundary, a judgment condition L lea that the player leaves the game area is constructed, so that the virtual simulation system resets all video data states related to the player and prepares for capturing new game actions, and the logic formula is as follows:
In the formula, if the player position coordinate x pla、ypla or z pla shows the minimum value x min、ymin、zmin or the maximum value x max、ymax、xmax of the game area, the judgment condition L lea =1 indicates that the player leaves the game area or completes the game action, at this time, when the player completes the game action or leaves the game area, the server automatically resets all video data states related to the player, thereby clearing temporary cached data, freeing storage space and preparing for capturing new game actions, and otherwise, the judgment condition L lea =0 indicates that the player is still in the game area.
S2, data optimization
And (3) utilizing the high-resolution video data collected in the step S1 to analyze and process the natural behaviors of the player in real time through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis.
As shown in fig. 4, in an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of frame processing is:
S2-11, opening high-resolution video data by using VideoCapture functions of a video processing library OpenCV, storing a returned high-resolution video data capturing object in a variable cap, defining that a variable N= 5,N represents that 1 frame is extracted every 5 frames to set a frame extraction interval, creating a variable frame_count and initializing to 0 to track the read frame number;
S2-12, entering a loop, setting two values, namely a Boolean value ret and a frame, which are returned by calling a cap.read () function in the loop after each iteration attempt to read a frame from a variable cap, wherein if ret is True, the frame is displayed, the frame variable contains effective image data and can be used for subsequent processing or display, and the frame_count variable is increased by 1;
S2-13, if the frame_count is a multiple of N, the interval point for extracting the next frame is considered to be reached, at the moment, the current frame is displayed by using a cv2. Imshoww function and is waited for user input, if the user presses a q key, the loop is exited, when the loop is ended, a cap.release () function is used for releasing high-resolution video data capture objects, and a cv2.DestroyAllWindows () function is used for closing all open windows, so that the calculation amount required for processing video is reduced.
After the frame processing is performed on the high-resolution video data by using the video processing library OpenCV, it is also necessary to remove the image noise of the high-resolution video data by using gaussian filtering or median filtering, determine a threshold value by using the Otsu method, and convert the image of the high-resolution video data into a binary image.
In an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of deep learning target detection is adopted:
S2-21, constructing a training data set containing the feature vector F so as to describe the game action of the virtual character of the player in the game:
F=[θtorneckne,Ast(x,y,z),αyawpitroobenrothip rotkne ben,
γankrot](4);
S2-22, respectively calculating an average value mu f and a standard deviation sigma f of each feature F in the training data set to finish the standardization of the feature vector F, wherein the average value mu F has a calculation formula: Wherein N is the number of samples in the training data set, f i is the value of the characteristic f in the ith sample, and the standard deviation sigma f is calculated as follows:
S2-23, constructing a convolutional neural network architecture CNN comprising a plurality of convolutional layers, a pooling layer, a full-connection layer and a classification layer, so as to extract features from input high-resolution video data and classify the features, and simultaneously, inputting a normalized feature vector F into the convolutional neural network architecture CNN for forward propagation and calculation output, wherein,
In the convolution layer, convolution operation is performed on input image data I by (I x, y) =Σ mΣn I (m, n) K (x-m, y-n) (7), where I is input image data representing high-resolution video data of player action after frame processing and binarization processing, K is convolution operation, m and n are sliding positions of the convolution kernel K on the input image data I, x and y are coordinates of the position of the convolution kernel K on the input image data I, I (m, n) is a pixel value of the input image data I at the position (m, n), and K (x-m, y-n) is a weight value of the convolution kernel K at the offset position (x-m, y-n) relative to the center position (x, y) thereof;
In the pooling layer, pooling operation is performed by using a formula P (I, j) =max m,n I (i·s-m, j·s-n) (8) to reduce the calculation amount and prevent overfitting, wherein P is a feature map after pooling, I is an input image data feature map, and S is a step size;
in the full connection layer, the feature is mapped to a final classification result by using an activation function ReLU, wherein Z=W×A+b, Z is output, W is a weight matrix, A is input, and b is bias;
In the classification layer, a Softmax function is used and passed through the formula Converting the output of the convolutional neural network architecture CNN into a probability distribution, wherein Z is the output of the fully connected layer, softmax (Z) i is the probability of the ith class,For the sum of all class scores after exponential conversion, it is used for normalization, ensure that the sum of probabilities of all classes is 1;
S2-24, marking head rotation, waist bending and leg movement actions in the captured high-resolution video data, training a behavior recognition model by using the marked data set, and adjusting parameters of the behavior recognition model by using a back propagation algorithm Wherein θ is a recognition model parameter, η is a learning rate, and J (θ) is a loss function, and the trained model is applied to captured high-resolution video data to recognize the player's behavior.
Based on the technical conception, the specific process of recognizing the player's behavior by the trained behavior recognition model is to input the preprocessed high-resolution video data into the trained behavior recognition model, wherein the input data at least comprises frames or fragments of the player's hand, finger motion, head rotation alpha hea, waist bending beta wai and leg movement gamma hip motion in the game, and then analyze each frame or fragment and output the behavior class probability by the behavior recognition model, and then average the behavior class probability(11) Performing time sequence analysis on the predicted result which is output by the behavior recognition model and characterizes the action of the player in the game to determine a final behavior label, wherein P t is the behavior type probability of the player in the game at time T, and T is the considered time window size; and finally, storing the identification results comprising the behavior label, the time stamp and the related video frames in a correlated way with the high-resolution video data, so as to realize the accurate identification of the action behavior of the player.
As shown in fig. 5 to 6, in an embodiment of the present invention, since a Kinect motion camera is used as a high-resolution imaging device to capture human body motion, high-resolution video data is obtained, and a specific light pattern (usually an infrared light pattern) is essentially projected into a game scene where a player is located, the Kinect motion camera captures reflections of the light patterns on the surface of the player, but due to different depths of different positions on the surface of an object, distortions of the light patterns are also different. That is, it may occur that the reflection characteristics of light are different from each other due to different surfaces, and the definition of the shadow edges and the internal brightness of the high resolution video data image are affected, for example, the shadow edges generated by the smooth surfaces may be clearer, and the shadow edges may be blurred by the rough surfaces due to diffuse reflection. In order to identify the shadow portion in the image (shadow/shadow phenomenon is particularly prominent in the game training because it directly affects the integrity of the motion data and the accuracy of the motion recognition), it is first necessary to construct a multispectral reflection model that can decompose the reflected light in the image into a direct reflection component and a scattered reflection component, and then identify the shadow region by calculating a separation matrix of the shadow components. The concrete implementation nodes are as follows:
When analyzing and processing the natural behavior of a player in real time, before realizing the accurate identification of the game action behavior of the player by adopting a deep learning target detection algorithm, the high-resolution video data image which is converted into a binary image is subjected to region division and segmentation so as to eliminate and extract the shadow part in the image of the high-resolution video data and provide accurate visual information for the subsequent virtual character animation, and the process is as follows:
Firstly, converting a high-resolution video data image from an RGB color space to a YCrCb color space by using a conversion formula to extract a shadow part, wherein Y=0.299R+0.587G+0.114B, cr= 0.713 (R-Y), cb=0.564 (B-Y), wherein Y is the brightness of the high-resolution video data image, cr and Cb are the chromaticity of the high-resolution video data image and are used for describing the color information of the high-resolution video data image, traversing each pixel of the high-resolution video data image, applying the formula to the RGB value of each pixel, calculating a corresponding YCrCb value, and comparing the converted YCrCb image with an original RGB image to ensure that the conversion is correct;
Secondly, a black shadow region including a shadow portion in the high resolution video data image is identified by analyzing a reflection component analysis method, which is operated by constructing a multispectral reflection model, wherein R (lambda, phi) =R d(λ,φ)+RS (lambda, phi) (12), and decomposing reflected light in the high resolution video data image into a direct reflection component R d (lambda, phi) and a scattered reflection component R S (lambda, phi), wherein the direct reflection component R d (lambda, phi) is reflected light directly related to a light source, is directly reflected to a Kinect body-sensing camera by a body of a player, and the scattered reflection component R S (lambda, phi) is reflected light related to ambient light or scattered light, and at least comprises light reflected to the body of the player by an ambient environment;
constructing a shadow separation matrix S dow for separating and identifying a shadow region in a high-resolution video data image, wherein A is a design matrix containing characteristic vectors related to a direct reflection component R d (lambda, phi) and a scattered reflection component R S (lambda, phi), the characteristic vectors comprise the posture and surface characteristics of a player body, W is a weight matrix for adjusting the influence of different factors on the shadow part, the influence comprises the light source intensity, the distance between the player body and a Kinect body sensing camera and the intensity of ambient light, L is a brightness matrix for representing the brightness value of each pixel point obtained from a multispectral reflection model, the spectral response comprises the Kinect body sensing camera, E is an ambient light matrix for representing the influence of the ambient light on the brightness of each pixel point, and the illumination condition of the surrounding environment where the player is located is included;
Solving a shadow separation matrix S dow by using a weighted linear least square method, and extracting a shadow region in the high-resolution video data image;
Again, the change in the direct reflectance component R d (λ, Φ) is analyzed as a time series data to identify the trend of the change in the posture of the player's body, Δr d(t)=Rd(t)-Rd (t-1) (14), where Δr d (t) is the amount of change in the direct reflectance component R d (λ, Φ) of the change in the posture of the player's body at time t, R d (t) is the direct reflectance component of the change in the posture of the player's body at time t, while the change in the scattered reflectance component R S (λ, Φ) is analyzed to detect the interaction degree of the player with the environment, if the scattered reflectance component R S (λ, Φ) increases significantly in a certain area, indicating that the player enters a new lighting environment or interacts with objects in the environment, and the scattered reflectance component R S (λ, Φ) is measured using image processing techniques to increase the lighting intensity of the area where the player is located to determine whether the player enters a brighter or darker environment;
Finally, combining the action of the player captured by the Kinect somatosensory camera with the analysis result of the high-resolution video data image, and enabling the action of the player and the analysis result of the high-resolution video data image to be aligned in space through an image processing technology, so that the pixel points of the Kinect somatosensory camera and the Kinect somatosensory camera represent the same physical position, and integrating the action of the player and the analysis result of the high-resolution video data image into a unified data structure for complete data fusion.
S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual characters in the game, wherein when the behaviors of the player are recognized, the behaviors are mapped to the virtual characters in the game by using an inverse kinematics algorithm, the target gestures of the game of the player are calculated according to the inverse kinematics algorithm, the target gestures comprise the joint angles and the body part positions of the characters, and finally, the animation of the virtual characters is updated according to the calculated target gestures, so that the characters can reflect the natural behaviors of the player in the game environment.
The technical scope of the present invention is not limited to the above description, and those skilled in the art may make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and these changes and modifications should be included in the scope of the present invention.

Claims (9)

1.一种虚拟仿真互动式图像识别游戏监测方法,其特征在于:包括步骤:1. A virtual simulation interactive image recognition game monitoring method, characterized in that it includes the following steps: S1、构建玩家动作捕捉策略S1. Build a player motion capture strategy 通过高分辨率摄像设备捕捉玩家在互动游戏中的自然行为,生成玩家互动游戏中表征游戏动作瞬间的视频数据,以提供更高质量的输入数据;Capturing the natural behavior of players in interactive games through high-resolution camera equipment, generating video data representing the game action moments in the player interactive game, so as to provide higher quality input data; S2、数据优化S2. Data Optimization 利用S1步骤采集的高分辨率视频数据,通过帧处理、二值化和深度学习目标检测算法,对玩家互动游戏中发生的自然行为进行实时分析和处理,以提取有效视频数据片段并准备进行下一步的行为识别分析;Using the high-resolution video data collected in step S1, the natural behaviors occurring in the player interactive game are analyzed and processed in real time through frame processing, binarization, and deep learning target detection algorithms to extract valid video data segments and prepare for the next step of behavior recognition analysis; S3、识别并响应玩家的自然行为,将此行为动作映射至游戏中的虚拟角色上。S3. Identify and respond to the player’s natural behavior, and map this behavior to the virtual character in the game. 2.根据权利要求1所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:2. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that: S1步骤中,捕捉玩家在互动游戏中自然行为的具体过程为:In step S1, the specific process of capturing the natural behavior of players in interactive games is as follows: S1-1、以玩家直行站立的动作为基准,自定义玩家直行站立时表征身体角度参数和位置参数,其中,S1-1. Based on the player's standing upright action, customize the body angle parameters and position parameters when the player stands upright, where: 身体角度参数包括:θtor为玩家直行站立时躯干的倾斜角度,用于表示玩家身体是否直立、θnec为玩家直行站立时颈部与躯干的倾斜角度,表示玩家头部是否与躯干保持直线、θkne为玩家直行站立时膝盖的弯曲角度,表示玩家身体是否完全伸直;The body angle parameters include: θ tor is the tilt angle of the player's trunk when standing upright, which is used to indicate whether the player's body is upright; θ nec is the tilt angle of the player's neck and trunk when standing upright, which indicates whether the player's head is in a straight line with the trunk; θ kne is the bending angle of the player's knees when standing upright, which indicates whether the player's body is fully straight; 位置参数为玩家在游戏中的虚拟角色的三维空间中的位置坐标Ast,用于描述玩家是否在预定的高分辨率摄像设备所捕捉地站立区域内;The position parameter is the position coordinate Ast of the player's virtual character in the game in the three-dimensional space, which is used to describe whether the player is in the standing area captured by the predetermined high-resolution camera device; S1-2、使用内置传感器的手套,捕捉玩家手部和手指动作,所述手部和手指动作至少包括手腕的弯曲和旋转角度、手部的整体姿态角度以及每个手指的弯曲角度;S1-2, using gloves with built-in sensors to capture the player's hand and finger movements, wherein the hand and finger movements at least include the bending and rotation angles of the wrist, the overall posture angle of the hand, and the bending angle of each finger; 同步使用加速度计、陀螺仪或磁力计惯性传感器,贴附于玩家身体部位,捕捉玩家头部转动αhea、腰部弯曲βwai以及腿部移动γhip动作,实现对玩家全身动作的全面捕捉;所述头部转动动作至少包括偏航αyaw、俯仰αpit和翻滚αroo,所述腰部弯曲动作至少包括腰部的弯曲βben和旋转角度βrot,所述腿部移动动作至少包括髋关节的旋转角度γhiprot、膝盖的弯曲角度γkneben、脚踝的旋转角度γankrotAn accelerometer, a gyroscope or a magnetometer inertial sensor is synchronously used and attached to the player's body parts to capture the player's head rotation α hea , waist bending β wai and leg movement γ hip movements, so as to achieve comprehensive capture of the player's whole body movements; the head rotation movement at least includes yaw α yaw , pitch α pit and roll α roo , the waist bending movement at least includes waist bending β ben and rotation angle β rot , and the leg movement movement at least includes hip joint rotation angle γ hiprot , knee bending angle γ kneben , ankle rotation angle γ ankrot . 3.根据权利要求1所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:3. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that: 生成玩家游戏动作过程中的瞬间视频数据的具体过程为:The specific process of generating instantaneous video data of the player's game action is as follows: S1-3、采用Kinect体感摄像头作为高分辨率摄像设备,并基于预定的高分辨率摄像设备所捕捉地站立区域,设定安装多个Kinect体感摄像头节点,确定每个节点位置坐标Pi均被精确设定,且都与游戏服务器相连,同时,建立每个Kinect体感摄像头节点与站立区域的关联参数Li以及连接状态参数Ci,以确保玩家的动作数据能够被准确捕捉和传输,其中,S1-3, using Kinect motion sensing camera as a high-resolution camera device, and based on the predetermined standing area captured by the high-resolution camera device, setting and installing multiple Kinect motion sensing camera nodes, ensuring that the position coordinates P i of each node are accurately set and connected to the game server, and at the same time, establishing the association parameter L i and the connection state parameter C i of each Kinect motion sensing camera node and the standing area, so as to ensure that the player's action data can be accurately captured and transmitted, wherein, Pi=xi,yi,zi,表示第i个Kinect体感摄像头节点在游戏虚拟场景中的位置坐标,P i = x i , y i , z i , represents the position coordinates of the i-th Kinect camera node in the game virtual scene. Li={Pi∈Ast|i=1,2,...,N},表示第i个Kinect体感摄像头节点Pi是否位于玩家在游戏中的虚拟角色的三维空间中的位置坐标Ast区域内,连接状态参数Ci为第i个Kinect体感摄像头节点与游戏服务器的连接状态,采用二进制描述,1表示连接成功,0表示连接失败;L i ={P i ∈A st |i=1,2,...,N}, indicating whether the ith Kinect camera node P i is located in the position coordinate region A st of the player's virtual character in the three-dimensional space in the game. The connection status parameter C i is the connection status between the ith Kinect camera node and the game server, which is described in binary, 1 indicates a successful connection, and 0 indicates a failed connection. S1-4、待玩家进入游戏虚拟场景时,虚拟仿真系统通过玩家佩戴的虚拟身份标识器进行识别,所述虚拟身份标识器发出的信号通过游戏内的通信协议发送至游戏服务器,进行玩家游戏动作过程中的动作视频数据采集:游戏服务器接收到玩家的虚拟身份标识信号后,激活与玩家位置相对应的Kinect体感摄像头,Kinect体感摄像头开始捕捉玩家的虚拟角色动作,并将动作数据以视频流的形式传输到游戏服务器,游戏服务器对收到的视频流进行初步处理,包括格式转换和编码优化,以适应游戏内的视频播放标准,并将处理后的视频数据以MP4格式存储在游戏服务器的共享目录中,确保视频数据的兼容性和可访问性;S1-4. When the player enters the virtual scene of the game, the virtual simulation system identifies the player through the virtual identity identifier worn by the player. The signal emitted by the virtual identity identifier is sent to the game server through the communication protocol in the game to collect the action video data of the player in the game process: after the game server receives the virtual identity identification signal of the player, it activates the Kinect somatosensory camera corresponding to the player's position. The Kinect somatosensory camera starts to capture the player's virtual character action and transmits the action data to the game server in the form of a video stream. The game server performs preliminary processing on the received video stream, including format conversion and encoding optimization, to adapt to the video playback standard in the game, and stores the processed video data in the shared directory of the game server in MP4 format to ensure the compatibility and accessibility of the video data; S1-5、游戏服务器调用内置的视频编辑模块,对采集到的视频流进行剪辑、特效添加或场景转换编辑操作,并在此过程中添加虚拟场景中的花絮、背景图和游戏logo,以增强视频的观赏性和游戏的沉浸感;S1-5, the game server calls the built-in video editing module to edit the collected video stream, add special effects or scene conversion editing operations, and add highlights, background images and game logos in the virtual scene in the process to enhance the viewing experience of the video and the immersion of the game; S1-6、将编辑完成的视频数据采用分布式存储技术,存储于游戏服务器的专用存储区域,且保证每个视频文件都与玩家的虚拟身份标识和游戏时间戳相关联;S1-6. The edited video data is stored in a dedicated storage area of the game server using distributed storage technology, and each video file is ensured to be associated with the player's virtual identity and game timestamp; S1-7、游戏服务器基于游戏时间戳和玩家的虚拟身份标识的组合,为每个视频数据分配一唯一的标识符,以为后续对玩家的自然行为的实时分析提供索引路径,其操作如下:S1-7, the game server assigns a unique identifier to each video data based on the combination of the game timestamp and the player's virtual identity, so as to provide an index path for the subsequent real-time analysis of the player's natural behavior, and the operation is as follows: 构建视频数据的索引过程中的标记公式:[M_{tag}=g(S_{id},T_{timestamp})],式中,M_{tag}为分配给视频数据的唯一标记标识,g为标记函数,其结合玩家的虚拟身份标识信号S_{id}和时间戳T_{timestamp}所生成;The tagging formula in the process of constructing the index of video data is: [M_{tag}=g(S_{id},T_{timestamp})], where M_{tag} is the unique tag assigned to the video data, and g is the tagging function, which is generated by combining the player's virtual identity signal S_{id} and the timestamp T_{timestamp}; 在完成标记的所述视频数据的基础上,构建一多维的索引结构,完成索引路径搭建,构建的索引公式如下:[I_{index}=h(C_{class},M_{tag},P_{location})],式中,I_{index}为建立的索引,h为索引函数,其为由视频数据的分类C_{class}、标记M_{tag}以及视频数据在服务器共享目录中的存储位置P_{location}所共同创建检索路径;On the basis of the marked video data, a multi-dimensional index structure is constructed to complete the index path construction. The constructed index formula is as follows: [I_{index}=h(C_{class},M_{tag},P_{location})], where I_{index} is the established index and h is the index function, which is a search path jointly created by the classification C_{class} of the video data, the tag M_{tag} and the storage location P_{location} of the video data in the server shared directory; S1-8、通过检查玩家在游戏中的虚拟角色的三维空间中的位置坐标Ast是否超出预定义的游戏区域边界,构建玩家离开游戏区域的判定条件Llea,以使虚拟仿真系统重置与该玩家相关的所有视频数据状态,并为捕捉新游戏动作做好准备,其逻辑公式如下:S1-8, by checking whether the position coordinate Ast of the player's virtual character in the game exceeds the predefined game area boundary, a determination condition Llea is constructed for the player to leave the game area, so that the virtual simulation system resets all video data states related to the player and prepares for capturing new game actions. The logic formula is as follows: 1,if(xpla<xmin or xpla>xmax)or1. if(x pla <x min or x pla >x max )or 0,oth0, oth 式中,若玩家位置坐标xpla、ypla或zpla出了游戏区域的最小值xmin、ymin、zmin或最大值xmax、ymax、xmax,则判定条件Llea=1,表示玩家离开了游戏区域或完成游戏动作,此时,待玩家完成游戏动作或离开游戏区域,服务器自动重置与该玩家相关的所有视频数据状态,从而清除临时缓存的数据,释放存储空间,并为捕捉新游戏动作做好准备;反之,判定条件Llea=0,表示玩家仍在游戏区域内。In the formula, if the player position coordinates x pla , y pla or z pla exceed the minimum value x min , y min , z min or the maximum value x max , y max , x max of the game area, the judgment condition L lea =1, indicating that the player has left the game area or completed the game action. At this time, when the player completes the game action or leaves the game area, the server automatically resets all video data states related to the player, thereby clearing the temporarily cached data, releasing storage space, and preparing for capturing new game actions; otherwise, the judgment condition L lea =0, indicating that the player is still in the game area. 4.根据权利要求1或3所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:4. The virtual simulation interactive image recognition game monitoring method according to claim 1 or 3, characterized in that: S2步骤中,对玩家的自然行为进行实时分析和处理时,帧处理的具体过程为:In step S2, when the player's natural behavior is analyzed and processed in real time, the specific process of frame processing is as follows: S2-11、使用视频处理库OpenCV的VideoCapture函数打开所述高分辨率视频数据,并将返回的高分辨率视频数据捕获对象存储在变量cap中,定义变量N=5,N表示每隔5帧提取1帧,以设置祯提取间隔,创建变量frame_count并初始化为0,以跟踪已读取的帧数;S2-11, using the VideoCapture function of the video processing library OpenCV to open the high-resolution video data, and storing the returned high-resolution video data capture object in the variable cap, defining a variable N=5, where N represents extracting 1 frame every 5 frames to set the frame extraction interval, creating a variable frame_count and initializing it to 0 to track the number of frames that have been read; S2-12、进入循环,设定每次迭代尝试从变量cap中读取一帧,并在此循环中调用cap.read()函数返回的两个值:布尔值ret和帧frame,其中,若ret为True,则显示该帧,表示帧frame变量中包含了有效表征高分辨率视频数据的图像数据,可以用于后续的处理或显示,frame_count变量增加1;若ret为False,表示所述高分辨率视频数据的所有帧均已被读取完毕,循环将终止;S2-12, enter the loop, set each iteration to try to read a frame from the variable cap, and call the cap.read() function in this loop to return two values: Boolean value ret and frame frame, where if ret is True, the frame is displayed, indicating that the frame variable contains image data that effectively represents the high-resolution video data and can be used for subsequent processing or display, and the frame_count variable increases by 1; if ret is False, it means that all frames of the high-resolution video data have been read, and the loop will terminate; S2-13、若frame_count是N的倍数,则认为已经到达提取下一帧的间隔点,此时,将使用cv2.imshow函数当前帧显示出来,并等待用户输入,若用户按下了q键,退出循环;当循环结束时,使用cap.release()函数释放所述高分辨率视频数据捕获对象,并使用cv2.destroyAllWindows()函数关闭所有打开窗口,从而减少处理视频所需的计算量。S2-13. If frame_count is a multiple of N, it is considered that the interval point for extracting the next frame has been reached. At this time, the current frame will be displayed using the cv2.imshow function, and wait for user input. If the user presses the q key, the loop will exit. When the loop ends, the cap.release() function will be used to release the high-resolution video data capture object, and the cv2.destroyAllWindows() function will be used to close all open windows, thereby reducing the amount of computation required to process the video. 5.根据权利要求4所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:5. The virtual simulation interactive image recognition game monitoring method according to claim 4, characterized in that: 在使用视频处理库OpenCV对所述高分辨率视频数据进行帧处理后,还需要使用高斯滤波或中值滤波去除高分辨率视频数据图像噪声,并使用Otsu方法确定阈值,将高分辨率视频数据图像转换为二值图像。After using the video processing library OpenCV to perform frame processing on the high-resolution video data, it is also necessary to use Gaussian filtering or median filtering to remove image noise from the high-resolution video data, and use the Otsu method to determine the threshold to convert the high-resolution video data image into a binary image. 6.根据权利要求1所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:6. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that: S2步骤中,对玩家的自然行为进行实时分析和处理时,采用的深度学习目标检测具体过程为:In step S2, when analyzing and processing the player's natural behavior in real time, the specific process of deep learning target detection is as follows: S2-21、构建包含特征向量F的训练数据集,以描述玩家在游戏中的虚拟角色游戏动作:S2-21. Construct a training data set containing feature vector F to describe the game actions of the player's virtual character in the game: F=[θtor,θnec,θkne,Ast(x,y,z),αyaw,αpit,αroo,βben,βrot,γhiprot,γkneben,γankrot];F=[θ tor , θ nec , θ kne , A st (x, y, z), α yaw , α pit , α roo , β ben , β rot , γ hiprot , γ kneben , γ ankrot ]; S2-22、对于训练数据集中的每个特征f,分别通过计算其平均值μf和标准差σf,完成特征向量F的标准化,其中,平均值μf计算公式为:式中,N为训练数据集中样本数量,fi为第i个样本中特征f的值;标准差σf计算公式为:S2-22. For each feature f in the training data set, the feature vector F is standardized by calculating its mean value μ f and standard deviation σ f , where the mean value μ f is calculated as follows: Where N is the number of samples in the training data set, fi is the value of feature f in the i-th sample; the standard deviation σf is calculated as: S2-23、构建一包括多个卷积层、池化层、全连接层和分类层的卷积神经网络架构CNN,以从输入的高分辨率视频数据中提取特征并进行分类,同时,将标准化后的特征向量F输入所述卷积神经网络架构CNN,进行前向传播,计算输出,其中,S2-23. Construct a convolutional neural network architecture CNN including multiple convolutional layers, pooling layers, fully connected layers and classification layers to extract features from the input high-resolution video data and perform classification. At the same time, input the standardized feature vector F into the convolutional neural network architecture CNN, perform forward propagation, and calculate the output, wherein, 在卷积层中,对输入图像数据I应用卷积核进行卷积操作:In the convolution layer, the convolution kernel is applied to the input image data I for convolution operation: (I*K)(x,y)=∑m∑nI(m,n)K(x-m,y-n),式中,I为通过帧处理、二值化处理后,且表征玩家游戏动作高分辨率视频数据的输入图像数据,K为卷积核,*为卷积操作,m和n为卷积核K在输入图像数据I上的滑动位置,x和y为卷积核K中位置在输入图像数据I上的坐标,I(m,n)为输入图像数据I在位置(m,n)上的像素值,K(x-m,y-n)为卷积核K在相对于其中心位置(x,y)的偏移位置(x-m,y-n)的权重值;(I*K)(x, y) = ∑ m ∑nI(m, n)K(xm, yn), where I is the input image data after frame processing and binarization, and represents the high-resolution video data of the player's game action, K is the convolution kernel, * is the convolution operation, m and n are the sliding positions of the convolution kernel K on the input image data I, x and y are the coordinates of the position in the convolution kernel K on the input image data I, I(m, n) is the pixel value of the input image data I at the position (m, n), and K(xm, yn) is the weight value of the convolution kernel K at the offset position (xm, yn) relative to its center position (x, y); 在池化层中,使用公式P(i,j)=maxm,nI(i·s-m,j·s-n)进行池化操作,以减少计算量并防止过拟合,式中,P为池化后的特征图,I为输入的图像数据特征图,S为步长;In the pooling layer, the formula P(i, j) = max m, n I(i·sm, j·sn) is used for pooling operation to reduce the amount of calculation and prevent overfitting. In the formula, P is the feature map after pooling, I is the feature map of the input image data, and S is the step size; 在全连接层中,使用激活函数ReLU将特征映射到最终的分类结果:Z=W*A+b,式中,Z为输出,W为权重矩阵,A为输入,b为偏置;In the fully connected layer, the activation function ReLU is used to map the features to the final classification result: Z = W*A+b, where Z is the output, W is the weight matrix, A is the input, and b is the bias; 在分类层中,使用Softmax函数并通过公式将卷积神经网络架构CNN的输出转换为概率分布,式中,Z为全连接层的输出,Softmax(Z)i为第i类的概率,为所有类别得分指数转换后的总和,它用于归一化,确保所有类别的概率之和为1;In the classification layer, the Softmax function is used and the formula The output of the convolutional neural network architecture CNN is converted into a probability distribution, where Z is the output of the fully connected layer, Softmax(Z)i is the probability of the i-th class, It is the sum of all category score index conversions, which is used for normalization to ensure that the sum of the probabilities of all categories is 1; S2-24、对捕获的高分辨率视频数据中的头部转动、腰部弯曲和腿部移动动作进行标注,使用标注好的数据集训练行为识别模型,通过反向传播算法调整行为识别模型参数式中,θ为识别模型参数,η为学习率,J(θ)损失函数,并将训练好的模型应用于捕获的高分辨率视频数据中,识别玩家的行为。S2-24. Label the head rotation, waist bending and leg movement in the captured high-resolution video data, use the labeled data set to train the behavior recognition model, and adjust the behavior recognition model parameters through the back propagation algorithm Where θ is the recognition model parameter, η is the learning rate, J(θ) is the loss function, and the trained model is applied to the captured high-resolution video data to recognize the player's behavior. 7.根据权利要求6所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:7. The virtual simulation interactive image recognition game monitoring method according to claim 6, characterized in that: 训练好的行为识别模型识别玩家的行为的具体过程为:The specific process of the trained behavior recognition model identifying the player's behavior is as follows: 首先,将经预处理的高分辨率视频数据输入到训练好的行为识别模型中,输入的数据至少包括玩家在游戏中的手部、手指动作、头部转动α hea、腰部弯曲β wai以及腿部移动γhip动作的帧或片段;First, the pre-processed high-resolution video data is input into the trained behavior recognition model, and the input data at least includes frames or segments of the player's hand and finger movements, head rotation α hea , waist bending β wai , and leg movement γ hip in the game; 其次,行为识别模型对输入的每个帧或片段进行分析,并输出行为类别概率;Secondly, the action recognition model analyzes each frame or clip of the input and outputs the action category probability; 再次,通过对行为类别概率进行平均的方式对行为识别模型输出的表征玩家在游戏中动作的预测结果进行时间序列分析,以确定最终行为标签,式中,Pt为玩家在游戏中于时间t处发生的行为类别概率,T为考虑的时间窗口大小;Again, by averaging the behavior category probabilities Perform time series analysis on the prediction results of the behavior recognition model that represent the player's actions in the game to determine the final behavior label, where Pt is the probability of the player's behavior category occurring at time t in the game, and T is the size of the time window considered; 最后,将包括行为标签、时间戳和相关视频帧的识别结果与高分辨率视频数据进行关联存储,实现对玩家游戏动作行为的精准识别。Finally, the recognition results including behavior labels, timestamps and related video frames are associated with high-resolution video data and stored to achieve accurate recognition of players' gaming actions. 8.根据权利要求6所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:8. The virtual simulation interactive image recognition game monitoring method according to claim 6, characterized in that: 在对玩家的自然行为进行实时分析和处理时,采用深度学习目标检测算法实现对玩家游戏动作行为的精准识别之前,还需要对已经转换为二值图像的高分辨率视频数据图像进行区域划分与分割,以消除并提取高分辨率视频数据的图像中的阴影部分,为后续的虚拟角色动画提供准确视觉信息,其过程如下:When analyzing and processing the natural behavior of players in real time, before using the deep learning target detection algorithm to accurately identify the player's game action behavior, it is also necessary to divide and segment the high-resolution video data image that has been converted into a binary image to eliminate and extract the shadow part in the image of the high-resolution video data, so as to provide accurate visual information for the subsequent virtual character animation. The process is as follows: 首先,使用如下转换公式将所述高分辨率视频数据图像从RGB颜色空间转换到YCrCb颜色空间,以实现所述阴影部分的提取:Y=0.299R+0.587G+0.114B,Cr=0.713(R-Y),Cb=0.564(B-Y),式中,Y为高分辨率视频数据图像亮度,Cr和Cb为高分辨率视频数据图像色度,用于描述高分辨率视频数据图像颜色信息,遍历高分辨率视频数据图像的每个像素,对每个像素的RGB值应用上述公式,计算出对应的YCrCb值,并将转换后的YCrCb图像与原始RGB图像进行比较,确保转换正确;First, the high-resolution video data image is converted from the RGB color space to the YCrCb color space using the following conversion formula to achieve the extraction of the shadow part: Y=0.299R+0.587G+0.114B, Cr=0.713(R-Y), Cb=0.564(B-Y), where Y is the brightness of the high-resolution video data image, Cr and Cb are the chromaticity of the high-resolution video data image, which are used to describe the color information of the high-resolution video data image. Each pixel of the high-resolution video data image is traversed, and the above formula is applied to the RGB value of each pixel to calculate the corresponding YCrCb value, and the converted YCrCb image is compared with the original RGB image to ensure the correct conversion; 其次,通过分析反射分量分析方法识别高分辨率视频数据图像中包括阴影部分的黑影区域,其操作如下:构建多光谱反射模型,R(λ,φ)=Rd(λ,φ)+Rs(λ,φ),将高分辨率视频数据图像中的反射光分解为直接反射分量Rd(λ,φ)和散射反射分量RS(λ,φ),式中,直接反射分量Rd(λ,φ)为与光源直接相关的反射光,由玩家的身体直接反射到Kinect体感摄像头,散射反射分量Rs(λ,φ)为与环境光或散射光相关的反射光,至少包括由周围环境反射到玩家身体上的光;Secondly, the black shadow area including the shadow part in the high-resolution video data image is identified by analyzing the reflection component analysis method, and the operation is as follows: a multispectral reflection model is constructed, R(λ, φ) = R d (λ, φ) + R s (λ, φ), and the reflected light in the high-resolution video data image is decomposed into a direct reflection component R d (λ, φ) and a scattered reflection component R S (λ, φ), where the direct reflection component R d (λ, φ) is the reflected light directly related to the light source, which is directly reflected by the player's body to the Kinect body sensing camera, and the scattered reflection component R s (λ, φ) is the reflected light related to the ambient light or the scattered light, which at least includes the light reflected from the surrounding environment to the player's body; 构建阴影分离矩阵Sdow,分离并识别高分辨率视频数据图像中的阴影区域,Sdow=(ATWA)-1ATW(L-E),式中,A为设计矩阵包含与直接反射分量Rd(λ,φ)和散射反射分量Rs(λ,φ)相关的特征向量,包括玩家身体的姿势和表面特性,W为权重矩阵,用于调整不同因素对阴影部分的影响,包括光源强度、玩家身体与Kinect体感摄像头的距离以及环境光的强度,L为亮度矩阵,表示从所述多光谱反射模型中得到的每个像素点的亮度值,包括Kinect体感摄像头的光谱响应,E为环境光矩阵,表示环境光对每个像素点亮度的影响,包括游戏玩家所处的周围环境的光照条件;Constructing a shadow separation matrix S dow to separate and identify shadow areas in the high-resolution video data image, S dow =( AT WA) -1AT W (LE), where A is a design matrix containing eigenvectors related to the direct reflection component R d (λ, φ) and the scattered reflection component R s (λ, φ), including the posture and surface characteristics of the player's body, W is a weight matrix used to adjust the influence of different factors on the shadow part, including the intensity of the light source, the distance between the player's body and the Kinect somatosensory camera, and the intensity of the ambient light, L is a brightness matrix, representing the brightness value of each pixel point obtained from the multi-spectral reflection model, including the spectral response of the Kinect somatosensory camera, and E is an ambient light matrix, representing the influence of ambient light on the brightness of each pixel point, including the lighting conditions of the surrounding environment where the game player is located; 使用加权线性最小二乘法求解阴影分离矩阵Sdow,提取高分辨率视频数据图像中的阴影区域;The shadow separation matrix S dow is solved by using the weighted linear least square method to extract the shadow area in the high-resolution video data image; 再次,将直接反射分量Rd(λ,φ)的变化作为一个时间序列数据进行分析,以识别玩家身体的姿势变化的趋势,ΔRd(t)=Rd(t)-Rd(t-1),式中,ΔRd(t)为玩家身体的姿势变化在时间t的直接反射分量Rd(λ,φ)的变化量,Rd(t)为玩家身体的姿势变化在时间t的直接反射分量,同时,分析散射反射分量RS(λ,φ)的变化来检测玩家与环境的交互程度,若散射反射分量Rs(λ,φ)在某个区域显著增加,表明玩家进入了一个新的光照环境或与环境中的物体发生了交互,使用图像处理技术测量散射反射分量Rs(λ,φ)增加该玩家所处区域的光照强度,以确定玩家是否进入更亮或更暗的环境;Again, the change of the direct reflection component R d (λ, φ) is analyzed as a time series data to identify the trend of the player's body posture change, ΔRd(t) = Rd(t)-R d (t-1), where ΔR d (t) is the change of the direct reflection component R d (λ, φ) of the player's body posture change at time t, and R d (t) is the direct reflection component of the player's body posture change at time t. At the same time, the change of the scattered reflection component RS (λ, φ) is analyzed to detect the degree of interaction between the player and the environment. If the scattered reflection component RS (λ, φ) increases significantly in a certain area, it indicates that the player has entered a new lighting environment or has interacted with objects in the environment. The scattered reflection component RS (λ, φ) is measured using image processing technology to increase the light intensity of the player's area to determine whether the player has entered a brighter or darker environment. 最后,将Kinect体感摄像头捕捉到的玩家动作与高分辨率视频数据图像分析结果结合,通过图像处理技术,使二者在空间上对齐,确保两者的像素点代表的是同一物理位置,整合到一个统一的数据结构中完整数据融合。Finally, the player's movements captured by the Kinect motion sensing camera are combined with the results of high-resolution video data image analysis. Through image processing technology, the two are aligned in space to ensure that the pixels of the two represent the same physical location and are integrated into a unified data structure for complete data fusion. 9.根据权利要求1所述的虚拟仿真互动式图像识别游戏监测方法,其特征在于:9. The virtual simulation interactive image recognition game monitoring method according to claim 1, characterized in that: 当识别到玩家游戏动作行为,还需要使用逆运动学算法将这些动作映射到游戏中的虚拟角色上,根据逆运动学算法计算出玩家游戏动作的目标姿态,最后根据计算出的目标姿态更新虚拟角色的动画,使得角色能够在游戏环境中反映出玩家的自然行为。When the player's game actions are identified, the inverse kinematics algorithm is needed to map these actions to the virtual character in the game, calculate the target posture of the player's game actions based on the inverse kinematics algorithm, and finally update the animation of the virtual character based on the calculated target posture, so that the character can reflect the player's natural behavior in the game environment.
CN202411649023.5A 2024-11-19 2024-11-19 A virtual simulation interactive image recognition game monitoring method Pending CN119524382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411649023.5A CN119524382A (en) 2024-11-19 2024-11-19 A virtual simulation interactive image recognition game monitoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411649023.5A CN119524382A (en) 2024-11-19 2024-11-19 A virtual simulation interactive image recognition game monitoring method

Publications (1)

Publication Number Publication Date
CN119524382A true CN119524382A (en) 2025-02-28

Family

ID=94692935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411649023.5A Pending CN119524382A (en) 2024-11-19 2024-11-19 A virtual simulation interactive image recognition game monitoring method

Country Status (1)

Country Link
CN (1) CN119524382A (en)

Similar Documents

Publication Publication Date Title
EP4002198A1 (en) Posture acquisition method and device, and key point coordinate positioning model training method and device
Liu et al. Learning deep models for face anti-spoofing: Binary or auxiliary supervision
US7899206B2 (en) Device, system and method for determining compliance with a positioning instruction by a figure in an image
US9002054B2 (en) Device, system and method for determining compliance with an instruction by a figure in an image
WO2021042547A1 (en) Behavior identification method, device and computer-readable storage medium
CN108197589B (en) Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture
CN110472554A (en) Table tennis action identification method and system based on posture segmentation and crucial point feature
US9183431B2 (en) Apparatus and method for providing activity recognition based application service
CN104615234B (en) Message processing device and information processing method
CN109886153B (en) A real-time face detection method based on deep convolutional neural network
CN114399838B (en) Multi-person behavior recognition method and system based on posture estimation and binary classification
CN107767335A (en) A kind of image interfusion method and system based on face recognition features' point location
CN105825168B (en) A face detection and tracking method of golden snub-nosed monkey based on S-TLD
CN109670517A (en) Object detection method, device, electronic equipment and target detection model
CN110046574A (en) Safety cap based on deep learning wears recognition methods and equipment
CN109325408A (en) A gesture judgment method and storage medium
CN111860451A (en) A game interaction method based on facial expression recognition
CN111291612A (en) Pedestrian re-identification method and device based on multi-person multi-camera tracking
CN119524382A (en) A virtual simulation interactive image recognition game monitoring method
CN115830517A (en) Examination room abnormal frame extraction method and system based on video
Zhou Computational Analysis of Table Tennis Games from Real-Time Videos Using Deep Learning
CN108108010A (en) A kind of brand-new static gesture detection and identifying system
CN108052913A (en) A kind of art work image identification and comparison method
Tavari et al. A review of literature on hand gesture recognition for Indian Sign Language
Shen et al. A method of billiard objects detection based on Snooker game video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination