CN119524382A - A virtual simulation interactive image recognition game monitoring method - Google Patents
A virtual simulation interactive image recognition game monitoring method Download PDFInfo
- Publication number
- CN119524382A CN119524382A CN202411649023.5A CN202411649023A CN119524382A CN 119524382 A CN119524382 A CN 119524382A CN 202411649023 A CN202411649023 A CN 202411649023A CN 119524382 A CN119524382 A CN 119524382A
- Authority
- CN
- China
- Prior art keywords
- player
- game
- video data
- image
- resolution video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/20—Input arrangements for video game devices
- A63F13/21—Input arrangements for video game devices characterised by their sensors, purposes or types
- A63F13/213—Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/40—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
- A63F13/42—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/55—Controlling game characters or game objects based on the game progress
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/10—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
- A63F2300/1087—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/30—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6045—Methods for processing data by generating or executing the game program for mapping control signals received from the input arrangement into game commands
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a virtual simulation interactive image recognition game monitoring method which comprises the steps of S1, capturing natural behaviors of a player in an interactive game through high-resolution camera equipment, generating video data representing game action moments in the interactive game of the player to provide higher-quality input data, S2, carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and deep learning target detection algorithm by utilizing the high-resolution video data collected in the step S1, and S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to virtual characters in the game. According to the invention, through combining with the Kinect somatosensory camera, the natural behaviors of the player are accurately captured and analyzed in real time, and then the shadow areas in the images are identified, so that the shadow areas in the images in the high-resolution video data are accurately analyzed and identified, and more abundant information is provided for subsequent image processing and analysis.
Description
Technical Field
The invention relates to the technical field of virtual simulation, in particular to a virtual simulation interactive image recognition game monitoring method.
Background
Most of the existing motion capture analysis systems adopt a common camera to capture human body motion, the acquired images are fuzzy, and the phenomena of ghosting, shadow, foot deficiency and the like cannot be processed, so that the subsequent analysis and judgment results are easily affected. In motion capture technology, the "shadow" phenomenon is a key technical challenge, which refers to a data area that cannot be captured during capture due to the limitation of the field of view of the sensor or the technical limitation, and this phenomenon is particularly prominent in gambling training, because it directly affects the integrity of motion data and the accuracy of motion recognition.
For example, in a virtual reality football game, if the motion capture system is unable to accurately capture every minute motion of the player, such as a quick turn or subtle foot adjustment, the virtual character in the game may not be able to reproduce the motion, resulting in a great deal of loss of player experience in the game.
Such data incompleteness not only reduces the accuracy of motion recognition, which can impair the effectiveness of game training, but can also increase the cost of the training system because additional sensors or more advanced capture techniques are needed to compensate for the missing data, resulting in a reduction in training effectiveness because the player cannot learn all necessary motion patterns from the complete motion data.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a virtual simulation interactive image recognition game monitoring method which solves the problems in the background art.
In order to achieve the above purpose, the invention is realized by the following technical scheme that the virtual simulation interactive image recognition game monitoring method comprises the following steps:
s1, constructing player action capturing strategy
Capturing natural behaviors of a player in an interactive game through a high-resolution camera device, and generating video data representing game action moments in the interactive game of the player so as to provide higher-quality input data;
s2, data optimization
Utilizing the high-resolution video data collected in the step S1, and carrying out real-time analysis and processing on the natural behaviors of the player through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis;
s3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual roles in the game.
Compared with the prior art, the invention has the beneficial effects that:
In order to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human body motion, the acquired image is fuzzy, and the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easy to influence, the invention combines a Kinect somatosensory camera to firstly realize accurate capture and real-time analysis of the natural behaviors of a player, and secondly, extracts and analyzes the motions of the player by identifying shadow areas in the image and utilizing a deep learning target, so that the invention can more accurately analyze and identify the shadow parts in the image in high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and virtual environment.
Drawings
The disclosure of the present invention is described with reference to the accompanying drawings. It should be understood that the drawings are for purposes of illustration only and are not intended to limit the scope of the present invention in which like reference numerals are used to designate like parts. Wherein:
FIG. 1 is a schematic diagram of a virtual simulation interactive image recognition game monitoring method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a player motion capture process according to an embodiment of the invention;
FIG. 3 is a schematic flow chart of optimizing and processing high-resolution video data acquired by a high-resolution camera device according to an embodiment of the present invention;
FIG. 4 is a flow chart of a virtual simulation system for performing behavior recognition and response to actions of a player according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating the extraction and analysis of shadow portions in an acquired high resolution video data image according to an embodiment of the present invention;
FIG. 6 is a flowchart of combining the player actions captured by the Kinect somatosensory camera with the high resolution video data image analysis results according to an embodiment of the present invention.
Detailed Description
It is to be understood that, according to the technical solution of the present invention, those skilled in the art may propose various alternative structural modes and implementation modes without changing the true spirit of the present invention. Accordingly, the following detailed description and drawings are merely illustrative of the invention and are not intended to be exhaustive or to limit the invention to the precise form disclosed.
The present invention will be described in further detail below with reference to the drawings, but is not limited thereto.
As the understanding of the technical conception and the realization principle of the invention, the virtual simulation interactive image recognition game monitoring method provided by the invention mainly aims to solve the problems that the existing motion capture analysis system mostly adopts a common camera to capture human motion, the acquired image is fuzzy, the phenomena such as ghosting, shadow, foot deficiency and the like cannot be processed, and the subsequent analysis and judgment results are easily influenced. Therefore, the invention combines the Kinect somatosensory camera, firstly, realizes accurate capture and real-time analysis of the natural behaviors of the player, and secondly, extracts and analyzes the actions of the player by identifying the shadow areas in the images and utilizing the deep learning targets, so that the invention can more accurately analyze and identify the shadow areas in the images in the high-resolution video data, provide richer information for subsequent image processing and analysis, provide rich and correct data support for game design, and enhance the interactivity between the player and the virtual environment.
In specific implementation, as shown in fig. 1, the proposed virtual simulation interactive image recognition game monitoring method includes the following steps:
s1, constructing player action capturing strategy
The natural behavior of the player in the interactive game is captured by the high-resolution camera device, and video data representing the game action moment in the interactive game of the player is generated so as to provide higher-quality input data. It will be appreciated that in order to more accurately capture and simulate the movements of a player, it is generally important to pay attention to several critical body parts, such as the shoulders, hands, feet and knees, which are particularly important in a game, so that they are identified with special markers, just as if a standard jumping motion were performed, to ensure that the movements of the parts were accurately captured, so that the character in the game mimics the actual movements of the player, and by capturing the natural movements of the player in an interactive game, high quality video data is generated, which can be used as a basis for subsequent analysis and processing, and by means of a high resolution imaging device, it is ensured that the captured movements are rich and accurate, providing the game with actual player action input.
As shown in fig. 2-3, in one embodiment of the present invention, capturing a player's natural behavior in an interactive game is accomplished by marking key action parts in the course of the player's game action, which is specifically:
S1-1, taking the standing action of the player in the straight line as a reference, and self-defining physical angle parameters and position parameters of the player in the straight line, wherein the physical angle parameters comprise theta tor which is the inclination angle of the trunk when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, theta nec which is the inclination angle of the neck and the trunk when the player stands in the straight line and is used for indicating whether the head of the player keeps straight line with the trunk, theta kne which is the bending angle of the knee when the player stands in the straight line and is used for indicating whether the body of the player stands in the straight line, and the position parameters are position coordinates A st of the player in the three-dimensional space of the virtual character in the game and are used for describing whether the player stands in a standing area captured by a preset high-resolution imaging device.
S1-2, capturing hand and finger movements of a player by using a glove with built-in sensors, wherein the hand and finger movements at least comprise bending and rotating angles of wrists, overall posture angles of hands and bending angles of each finger, synchronously using an accelerometer, a gyroscope or a magnetometer inertial sensor to be attached to a body part of the player, capturing head rotation alpha hea, waist bending beta wai and leg movement gamma hip movements of the player, and realizing overall capturing of the whole body movements of the player, wherein the head rotation movements at least comprise yaw alpha yaw, pitch alpha pit and rolling alpha roo, the waist bending movements at least comprise bending beta ben and rotating angles beta rot of waists, and the leg movement movements at least comprise rotating angles gamma hiprot of hip joints, bending angles gamma kneben of knees and rotating angles gamma ankrot of knees.
Based on the above technical concept, it should be noted that, in the motion recognition process, first, body angle parameters of a player including inclination or bending angles of a trunk, a neck and knees are captured by a sensor to determine a basic standing posture of the player, then, a specific position of the player in a three-dimensional space is determined by a position parameter to ensure that the player is within an effective capturing range of an image capturing device, and further, limb motions of the player including up and down movements of shoulders, opening and closing of hands, lifting of feet, bending of knees and the like can be captured in detail by action coordinate points of shoulders, hands, feet, and the like, and the change of the coordinate points can reflect a dynamic process of the player when executing a specific motion. By comprehensively analyzing the parameters, the system can accurately identify and respond to the natural behaviors of the player, so that the real-time mapping and simulation of the actions in the game are realized, the system can better understand and simulate the actions of the player, and a more real and interactive game experience is provided.
In one embodiment of the invention, by capturing the physical actions, gestures and facial expressions of a player in a game, rich data can be collected that can fully record the player's interaction and experience. After marking the player action information, the acquired action information can be edited and enhanced by the video editing server, and the edited and enhanced video can provide higher-quality input data, which is helpful for improving the accuracy of image recognition and analysis, i.e. the high-quality video input data can improve the performance of an image processing algorithm, because the acquired video input data provides more context information for image processing, such as target tracking or behavior analysis. It will be appreciated that the complete motion information in the captured video data of the player's moments during the game play is the basis for advanced image processing and analysis, without which the subsequent analysis will be incomplete due to the shadow edges or shadow areas of the image.
Therefore, the invention further proposes that the specific process of generating the instant video data during the game action of the player is as follows:
S1-3, adopting a Kinect somatosensory camera as high-resolution camera equipment, setting and installing a plurality of Kinect somatosensory camera nodes based on a standing area captured by the preset high-resolution camera equipment, determining that each node position coordinate P i is accurately set and is connected with a game server, simultaneously, establishing a relevant parameter L i and a connection state parameter C i of each Kinect somatosensory camera node and the standing area so as to ensure that action data of a player can be accurately captured and transmitted,
P i=xi,yi,zi represents the position coordinate of the ith Kinect somatosensory camera node in the game virtual scene,
L i={Pi∈Ast |i=1, 2,., N }, indicates whether the i-th Kinect somatosensory camera node P i is located in the region of the position coordinate a st of the player in the three-dimensional space of the virtual character in the game, the connection state parameter C i is the connection state of the i-th Kinect somatosensory camera node and the game server, and a binary description is adopted, 1 indicates that the connection is successful, and 0 indicates that the connection is failed;
S1-4, when a player enters a game virtual scene, the virtual simulation system recognizes through a virtual identity identifier worn by the player, a signal sent by the virtual identity identifier is sent to a game server through a communication protocol in the game, and action video data acquisition in the game action process of the player is carried out, wherein after the game server receives the virtual identity identifier signal of the player, a Kinect body sensing camera corresponding to the position of the player is activated, the Kinect body sensing camera starts to capture the action of the virtual role of the player, and the action data is transmitted to the game server in a video stream form, the game server carries out preliminary processing on the received video stream, comprises format conversion and coding optimization, so as to adapt to video playing standards in the game, and the processed video data is stored in a sharing directory of the game server in an MP4 format, so that the compatibility and accessibility of the video data are ensured;
S1-5, a game server calls a built-in video editing module, clips, special effect addition or scene conversion editing operation is carried out on the collected video stream, and a flower, a background picture and game logo in a virtual scene are added in the process, so that the ornamental value of the video and the immersion of the game are enhanced;
S1-6, storing edited video data in a special storage area of a game server by adopting a distributed storage technology, and ensuring that each video file is associated with a virtual identity of a player and a game time stamp;
S1-7, the game server assigns a unique identifier to each video data based on the combination of the game time stamp and the virtual identity of the player to provide an index path for subsequent real-time analysis of the player' S natural behavior, which operates as follows:
The marking formula in the indexing process of the video data is [ M_ { tag } = g (S_ { id }, T_ { timestamp }) (1), wherein M_ { tag } is a unique marking identifier allocated to the video data, g is a marking function, and the marking function is generated by combining a virtual identity signal S_ { id } of a player and a timestamp T_ { timestamp };
On the basis of the marked video data, a multidimensional index structure is built, the construction of an index path is completed, and a built index formula is shown as follows, wherein [ I_ { index } = h (C_ { class }, M_ { tag }, P_ { location }) (2), wherein I_ { index } is a built index, and h is an index function which is a retrieval path commonly created by classification C_ { class }, mark M_ { tag } of the video data and storage position P_ { location } of the video data in a server shared directory;
S1-8, by checking whether the position coordinates A st of the player in the three-dimensional space of the virtual character in the game exceeds the predefined game area boundary, a judgment condition L lea that the player leaves the game area is constructed, so that the virtual simulation system resets all video data states related to the player and prepares for capturing new game actions, and the logic formula is as follows:
In the formula, if the player position coordinate x pla、ypla or z pla shows the minimum value x min、ymin、zmin or the maximum value x max、ymax、xmax of the game area, the judgment condition L lea =1 indicates that the player leaves the game area or completes the game action, at this time, when the player completes the game action or leaves the game area, the server automatically resets all video data states related to the player, thereby clearing temporary cached data, freeing storage space and preparing for capturing new game actions, and otherwise, the judgment condition L lea =0 indicates that the player is still in the game area.
S2, data optimization
And (3) utilizing the high-resolution video data collected in the step S1 to analyze and process the natural behaviors of the player in real time through frame processing, binarization and a deep learning target detection algorithm so as to extract effective video data fragments and prepare for the next behavior recognition analysis.
As shown in fig. 4, in an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of frame processing is:
S2-11, opening high-resolution video data by using VideoCapture functions of a video processing library OpenCV, storing a returned high-resolution video data capturing object in a variable cap, defining that a variable N= 5,N represents that 1 frame is extracted every 5 frames to set a frame extraction interval, creating a variable frame_count and initializing to 0 to track the read frame number;
S2-12, entering a loop, setting two values, namely a Boolean value ret and a frame, which are returned by calling a cap.read () function in the loop after each iteration attempt to read a frame from a variable cap, wherein if ret is True, the frame is displayed, the frame variable contains effective image data and can be used for subsequent processing or display, and the frame_count variable is increased by 1;
S2-13, if the frame_count is a multiple of N, the interval point for extracting the next frame is considered to be reached, at the moment, the current frame is displayed by using a cv2. Imshoww function and is waited for user input, if the user presses a q key, the loop is exited, when the loop is ended, a cap.release () function is used for releasing high-resolution video data capture objects, and a cv2.DestroyAllWindows () function is used for closing all open windows, so that the calculation amount required for processing video is reduced.
After the frame processing is performed on the high-resolution video data by using the video processing library OpenCV, it is also necessary to remove the image noise of the high-resolution video data by using gaussian filtering or median filtering, determine a threshold value by using the Otsu method, and convert the image of the high-resolution video data into a binary image.
In an embodiment of the present invention, when analyzing and processing natural behaviors of a player in real time, a specific process of deep learning target detection is adopted:
S2-21, constructing a training data set containing the feature vector F so as to describe the game action of the virtual character of the player in the game:
F=[θtor,θnec,θkne,Ast(x,y,z),αyaw,αpit,αroo,βben,βrot,γhip rot,γkne ben,
γankrot](4);
S2-22, respectively calculating an average value mu f and a standard deviation sigma f of each feature F in the training data set to finish the standardization of the feature vector F, wherein the average value mu F has a calculation formula: Wherein N is the number of samples in the training data set, f i is the value of the characteristic f in the ith sample, and the standard deviation sigma f is calculated as follows:
S2-23, constructing a convolutional neural network architecture CNN comprising a plurality of convolutional layers, a pooling layer, a full-connection layer and a classification layer, so as to extract features from input high-resolution video data and classify the features, and simultaneously, inputting a normalized feature vector F into the convolutional neural network architecture CNN for forward propagation and calculation output, wherein,
In the convolution layer, convolution operation is performed on input image data I by (I x, y) =Σ mΣn I (m, n) K (x-m, y-n) (7), where I is input image data representing high-resolution video data of player action after frame processing and binarization processing, K is convolution operation, m and n are sliding positions of the convolution kernel K on the input image data I, x and y are coordinates of the position of the convolution kernel K on the input image data I, I (m, n) is a pixel value of the input image data I at the position (m, n), and K (x-m, y-n) is a weight value of the convolution kernel K at the offset position (x-m, y-n) relative to the center position (x, y) thereof;
In the pooling layer, pooling operation is performed by using a formula P (I, j) =max m,n I (i·s-m, j·s-n) (8) to reduce the calculation amount and prevent overfitting, wherein P is a feature map after pooling, I is an input image data feature map, and S is a step size;
in the full connection layer, the feature is mapped to a final classification result by using an activation function ReLU, wherein Z=W×A+b, Z is output, W is a weight matrix, A is input, and b is bias;
In the classification layer, a Softmax function is used and passed through the formula Converting the output of the convolutional neural network architecture CNN into a probability distribution, wherein Z is the output of the fully connected layer, softmax (Z) i is the probability of the ith class,For the sum of all class scores after exponential conversion, it is used for normalization, ensure that the sum of probabilities of all classes is 1;
S2-24, marking head rotation, waist bending and leg movement actions in the captured high-resolution video data, training a behavior recognition model by using the marked data set, and adjusting parameters of the behavior recognition model by using a back propagation algorithm Wherein θ is a recognition model parameter, η is a learning rate, and J (θ) is a loss function, and the trained model is applied to captured high-resolution video data to recognize the player's behavior.
Based on the technical conception, the specific process of recognizing the player's behavior by the trained behavior recognition model is to input the preprocessed high-resolution video data into the trained behavior recognition model, wherein the input data at least comprises frames or fragments of the player's hand, finger motion, head rotation alpha hea, waist bending beta wai and leg movement gamma hip motion in the game, and then analyze each frame or fragment and output the behavior class probability by the behavior recognition model, and then average the behavior class probability(11) Performing time sequence analysis on the predicted result which is output by the behavior recognition model and characterizes the action of the player in the game to determine a final behavior label, wherein P t is the behavior type probability of the player in the game at time T, and T is the considered time window size; and finally, storing the identification results comprising the behavior label, the time stamp and the related video frames in a correlated way with the high-resolution video data, so as to realize the accurate identification of the action behavior of the player.
As shown in fig. 5 to 6, in an embodiment of the present invention, since a Kinect motion camera is used as a high-resolution imaging device to capture human body motion, high-resolution video data is obtained, and a specific light pattern (usually an infrared light pattern) is essentially projected into a game scene where a player is located, the Kinect motion camera captures reflections of the light patterns on the surface of the player, but due to different depths of different positions on the surface of an object, distortions of the light patterns are also different. That is, it may occur that the reflection characteristics of light are different from each other due to different surfaces, and the definition of the shadow edges and the internal brightness of the high resolution video data image are affected, for example, the shadow edges generated by the smooth surfaces may be clearer, and the shadow edges may be blurred by the rough surfaces due to diffuse reflection. In order to identify the shadow portion in the image (shadow/shadow phenomenon is particularly prominent in the game training because it directly affects the integrity of the motion data and the accuracy of the motion recognition), it is first necessary to construct a multispectral reflection model that can decompose the reflected light in the image into a direct reflection component and a scattered reflection component, and then identify the shadow region by calculating a separation matrix of the shadow components. The concrete implementation nodes are as follows:
When analyzing and processing the natural behavior of a player in real time, before realizing the accurate identification of the game action behavior of the player by adopting a deep learning target detection algorithm, the high-resolution video data image which is converted into a binary image is subjected to region division and segmentation so as to eliminate and extract the shadow part in the image of the high-resolution video data and provide accurate visual information for the subsequent virtual character animation, and the process is as follows:
Firstly, converting a high-resolution video data image from an RGB color space to a YCrCb color space by using a conversion formula to extract a shadow part, wherein Y=0.299R+0.587G+0.114B, cr= 0.713 (R-Y), cb=0.564 (B-Y), wherein Y is the brightness of the high-resolution video data image, cr and Cb are the chromaticity of the high-resolution video data image and are used for describing the color information of the high-resolution video data image, traversing each pixel of the high-resolution video data image, applying the formula to the RGB value of each pixel, calculating a corresponding YCrCb value, and comparing the converted YCrCb image with an original RGB image to ensure that the conversion is correct;
Secondly, a black shadow region including a shadow portion in the high resolution video data image is identified by analyzing a reflection component analysis method, which is operated by constructing a multispectral reflection model, wherein R (lambda, phi) =R d(λ,φ)+RS (lambda, phi) (12), and decomposing reflected light in the high resolution video data image into a direct reflection component R d (lambda, phi) and a scattered reflection component R S (lambda, phi), wherein the direct reflection component R d (lambda, phi) is reflected light directly related to a light source, is directly reflected to a Kinect body-sensing camera by a body of a player, and the scattered reflection component R S (lambda, phi) is reflected light related to ambient light or scattered light, and at least comprises light reflected to the body of the player by an ambient environment;
constructing a shadow separation matrix S dow for separating and identifying a shadow region in a high-resolution video data image, wherein A is a design matrix containing characteristic vectors related to a direct reflection component R d (lambda, phi) and a scattered reflection component R S (lambda, phi), the characteristic vectors comprise the posture and surface characteristics of a player body, W is a weight matrix for adjusting the influence of different factors on the shadow part, the influence comprises the light source intensity, the distance between the player body and a Kinect body sensing camera and the intensity of ambient light, L is a brightness matrix for representing the brightness value of each pixel point obtained from a multispectral reflection model, the spectral response comprises the Kinect body sensing camera, E is an ambient light matrix for representing the influence of the ambient light on the brightness of each pixel point, and the illumination condition of the surrounding environment where the player is located is included;
Solving a shadow separation matrix S dow by using a weighted linear least square method, and extracting a shadow region in the high-resolution video data image;
Again, the change in the direct reflectance component R d (λ, Φ) is analyzed as a time series data to identify the trend of the change in the posture of the player's body, Δr d(t)=Rd(t)-Rd (t-1) (14), where Δr d (t) is the amount of change in the direct reflectance component R d (λ, Φ) of the change in the posture of the player's body at time t, R d (t) is the direct reflectance component of the change in the posture of the player's body at time t, while the change in the scattered reflectance component R S (λ, Φ) is analyzed to detect the interaction degree of the player with the environment, if the scattered reflectance component R S (λ, Φ) increases significantly in a certain area, indicating that the player enters a new lighting environment or interacts with objects in the environment, and the scattered reflectance component R S (λ, Φ) is measured using image processing techniques to increase the lighting intensity of the area where the player is located to determine whether the player enters a brighter or darker environment;
Finally, combining the action of the player captured by the Kinect somatosensory camera with the analysis result of the high-resolution video data image, and enabling the action of the player and the analysis result of the high-resolution video data image to be aligned in space through an image processing technology, so that the pixel points of the Kinect somatosensory camera and the Kinect somatosensory camera represent the same physical position, and integrating the action of the player and the analysis result of the high-resolution video data image into a unified data structure for complete data fusion.
S3, recognizing and responding to the natural behaviors of the player, and mapping the behaviors to the virtual characters in the game, wherein when the behaviors of the player are recognized, the behaviors are mapped to the virtual characters in the game by using an inverse kinematics algorithm, the target gestures of the game of the player are calculated according to the inverse kinematics algorithm, the target gestures comprise the joint angles and the body part positions of the characters, and finally, the animation of the virtual characters is updated according to the calculated target gestures, so that the characters can reflect the natural behaviors of the player in the game environment.
The technical scope of the present invention is not limited to the above description, and those skilled in the art may make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and these changes and modifications should be included in the scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411649023.5A CN119524382A (en) | 2024-11-19 | 2024-11-19 | A virtual simulation interactive image recognition game monitoring method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411649023.5A CN119524382A (en) | 2024-11-19 | 2024-11-19 | A virtual simulation interactive image recognition game monitoring method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119524382A true CN119524382A (en) | 2025-02-28 |
Family
ID=94692935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411649023.5A Pending CN119524382A (en) | 2024-11-19 | 2024-11-19 | A virtual simulation interactive image recognition game monitoring method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119524382A (en) |
-
2024
- 2024-11-19 CN CN202411649023.5A patent/CN119524382A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP4002198A1 (en) | Posture acquisition method and device, and key point coordinate positioning model training method and device | |
Liu et al. | Learning deep models for face anti-spoofing: Binary or auxiliary supervision | |
US7899206B2 (en) | Device, system and method for determining compliance with a positioning instruction by a figure in an image | |
US9002054B2 (en) | Device, system and method for determining compliance with an instruction by a figure in an image | |
WO2021042547A1 (en) | Behavior identification method, device and computer-readable storage medium | |
CN108197589B (en) | Semantic understanding method, apparatus, equipment and the storage medium of dynamic human body posture | |
CN110472554A (en) | Table tennis action identification method and system based on posture segmentation and crucial point feature | |
US9183431B2 (en) | Apparatus and method for providing activity recognition based application service | |
CN104615234B (en) | Message processing device and information processing method | |
CN109886153B (en) | A real-time face detection method based on deep convolutional neural network | |
CN114399838B (en) | Multi-person behavior recognition method and system based on posture estimation and binary classification | |
CN107767335A (en) | A kind of image interfusion method and system based on face recognition features' point location | |
CN105825168B (en) | A face detection and tracking method of golden snub-nosed monkey based on S-TLD | |
CN109670517A (en) | Object detection method, device, electronic equipment and target detection model | |
CN110046574A (en) | Safety cap based on deep learning wears recognition methods and equipment | |
CN109325408A (en) | A gesture judgment method and storage medium | |
CN111860451A (en) | A game interaction method based on facial expression recognition | |
CN111291612A (en) | Pedestrian re-identification method and device based on multi-person multi-camera tracking | |
CN119524382A (en) | A virtual simulation interactive image recognition game monitoring method | |
CN115830517A (en) | Examination room abnormal frame extraction method and system based on video | |
Zhou | Computational Analysis of Table Tennis Games from Real-Time Videos Using Deep Learning | |
CN108108010A (en) | A kind of brand-new static gesture detection and identifying system | |
CN108052913A (en) | A kind of art work image identification and comparison method | |
Tavari et al. | A review of literature on hand gesture recognition for Indian Sign Language | |
Shen et al. | A method of billiard objects detection based on Snooker game video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |