[go: up one dir, main page]

CN118540515A - Video frame stream processing method and system - Google Patents

Video frame stream processing method and system Download PDF

Info

Publication number
CN118540515A
CN118540515A CN202410760158.2A CN202410760158A CN118540515A CN 118540515 A CN118540515 A CN 118540515A CN 202410760158 A CN202410760158 A CN 202410760158A CN 118540515 A CN118540515 A CN 118540515A
Authority
CN
China
Prior art keywords
model
video frame
lstm
cnn
frame stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410760158.2A
Other languages
Chinese (zh)
Inventor
黄仕同
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Fupeng Cultural Development Co ltd
Original Assignee
Guangxi Fupeng Cultural Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Fupeng Cultural Development Co ltd filed Critical Guangxi Fupeng Cultural Development Co ltd
Priority to CN202410760158.2A priority Critical patent/CN118540515A/en
Publication of CN118540515A publication Critical patent/CN118540515A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0435Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2347Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving video stream encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4408Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video stream encryption, e.g. re-encrypting a decrypted video stream for redistribution in a home network

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video frame stream processing method and a system thereof, belonging to the field of multimedia security; a video frame stream processing method includes: dividing a video frame stream into key frames and non-key frames; detecting and analyzing the video frame stream in real time by using a CNN-LSTM combined model, identifying possible security threats and detecting abnormal changes in the image; in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model; performing end-to-end encryption transmission on the key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively; identity authentication is performed before transmission begins. Extracting image features through CNN, LSTM modeling time sequence information, and training by combining a linear function and a loss function, so that security threat and abnormal change in a video frame image are effectively identified; and a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model, so as to improve the coping capability of the security threat.

Description

Video frame stream processing method and system
Technical Field
The invention belongs to the field of multimedia security, and particularly relates to a video frame stream processing method and a system thereof.
Background
With the continuous development of computer technology and network technology, a user can watch not only video files stored in terminal equipment used by the user, but also video files on a network side, wherein video files which can be transmitted to the user are stored in a video server on the network side, when the user wants to watch the video files stored in the video server on the network side, a video transmission request can be sent to the video server through the used terminal equipment, after the video server receives the video transmission request, the corresponding video files are transmitted to the terminal equipment, and the terminal equipment plays the received video files. A video file consists of a sequence of ordered video frames (i.e., a single still picture), and thus the video file may also be referred to as a video frame stream.
However, during video transmission, security threats, such as data leakage, tampering, or interception, may be faced. Therefore, appropriate security measures need to be taken during video transmission, and a video frame stream processing method is proposed for this purpose.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a video frame stream processing method and a system thereof, which solve the problems in the prior art.
The aim of the invention can be achieved by the following technical scheme:
A video frame stream processing method, comprising the steps of:
dividing a video frame stream into key frames and non-key frames;
Detecting and analyzing the video frame stream in real time by using a CNN-LSTM combined model, identifying possible security threats and detecting abnormal changes in the image;
in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model;
Performing end-to-end encryption transmission on the key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively;
identity authentication is performed before transmission begins.
Further, the CNN-LSTM combination model includes:
convolution layer: for extracting features in the image;
pooling layer: the method is used for reducing the size of the feature map, reducing the number of parameters and simultaneously keeping key information;
stacking of convolution-pooling layers: for progressively extracting and abstracting image features;
Layer of flat: flattening the feature map output by the convolution layer into a one-dimensional vector serving as an input of the LSTM layer;
LSTM layer: for modeling timing information in the image sequence, capturing a temporal correlation between image frames;
hidden layer: for increasing the depth and complexity of the network;
output layer: and connecting the output of the LSTM layer to the full connection layer, and finally outputting the prediction result of the model.
Further, the linear function of the CNN-LSTM combination model is as follows:
Wherein z j is the input weighted sum of the jth neuron of the full connection layer, w ij is the weight connecting the ith input feature and the jth neuron, x i is the ith element of the input feature vector, b j is the bias term of the jth neuron, and N is the dimension of the input feature vector;
The loss function of the CNN-LSTM combination model is as follows:
in the method, in the process of the invention, Representing the difference between the predicted result output by the model and the real label; y ij represents the tag value of the j-th class in the real tag of sample i; Representing the predicted probability of the j-th class in the model output for sample i.
Further, the specific steps of detecting and analyzing the video frame stream in real time by using the CNN-LSTM combination model include:
S21, preprocessing the acquired video frames, including image size adjustment and normalization operation;
S22, utilizing the preprocessed video frame, adjusting the structure and super parameters of the CNN-LSTM combined model through repeated iterative training and verification, and optimizing the CNN-LSTM combined model;
S23, inputting the frequency frames into a CNN-LSTM model after training and optimizing, analyzing and monitoring each frame in real time through the model, detecting abnormal changes, object shielding and scene change conditions in the image, and identifying security threats.
Further, the step of optimizing the CNN-LSTM combination model includes:
s221, dividing the data set into a training set, a verification set and a test set; the training set is used for parameter training of the model, the verification set is used for super-parameter adjustment and performance evaluation of the model, and the test set is used for performance evaluation of the final model;
s222, training the deep learning model by using a training set, and updating parameters of the model through a back propagation algorithm and an optimizer in the training process so that a loss function of the model is gradually reduced;
s223, evaluating the model obtained by training by using the verification set, and adjusting the super parameters of the model according to the performance of the verification set, wherein the super parameters comprise: learning rate, batch size, network structure, regularization parameters;
And S224, monitoring the performance of the model on the verification set, and stopping training when the performance is not improved any more.
Further, only the sensitive area is encrypted when encrypting the non-key frames.
A video frame stream processing system, comprising:
And a classification module: dividing a video frame stream into key frames and non-key frames;
And a detection and analysis module: detecting and analyzing the video frame stream in real time by using a CNN-LSTM combined model, identifying possible security threats and detecting abnormal changes in the image;
and a transmission feedback module: in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model;
and a transmission encryption module: performing end-to-end encryption transmission on the key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively;
Identity authentication module: identity authentication is performed before transmission begins.
A computer storage medium storing a readable program capable of executing a video frame stream processing method as described above when the program is running.
An electronic device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform an operation corresponding to the video frame stream processing method.
A computer program product comprising computer instructions for instructing a computing device to perform operations corresponding to one of the video frame stream processing methods described above.
The invention has the beneficial effects that:
1. the video frame stream is divided into the key frames and the non-key frames, and different encryption modes are adopted for the key frames and the non-key frames respectively for encryption, wherein the non-key frames are encrypted in a partial encryption mode (only the sensitive area is encrypted), so that the influence of the encryption process on bandwidth and performance is reduced, and meanwhile, the cost is reduced.
2. The CNN-LSTM combined model is used for detecting and analyzing the video frame stream, the CNN in the model is used for extracting image characteristics, LSTM modeling time sequence information is used for training by combining a linear function and a loss function, and security threat and abnormal change in the video frame image can be effectively identified.
3. By establishing a feedback mechanism, corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model, so that the coping capability of the security threat is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
Fig. 1 is a flow chart of a video frame stream processing method of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a video frame stream processing method includes the following steps:
s1, before transmission, dividing a video frame stream into key frames (I frames) and non-key frames (P frames and B frames);
The key frames contain complete image information, but the non-key frames usually only contain partial image information, and by classifying the frames, different security treatments can be specifically carried out on the key frames and the non-key frames, so that the security of video content is more effectively protected.
Meanwhile, since the key frame has higher image quality and importance, security processing such as encryption, transmission and storage of the key frame may require more resources and calculation costs. And the security processing on the non-key frames can be lighter, so that resources and bandwidth are saved. By reasonably utilizing the resources, the overall performance and efficiency of the system can be improved.
The step of classifying the video frame stream specifically comprises the following steps:
S11, extracting frames: first, image data of each frame is extracted from a video frame stream.
S12, frame type identification: and carrying out type identification on each extracted frame, and determining which type of frame belongs to the extracted frame. In a common video coding standard, frames are generally classified into the following types:
Key frame (I frame): key frames are important frames in a video sequence that contain complete image information independent of the data of other frames. Typically, a key frame is a reference image that appears once every few frames, for preserving video, with subsequent frames all being changes relative to the key frame. Whether a frame is a key frame may be determined by examining frame header information or a specific flag.
Non-key frames (P-frames, B-frames): non-key frames are encoded in dependence on the data of other frames, typically only storing differences from a previous frame or frames before and after. In video coding standards such as h.264, non-key frames are generally classified into two types, i.e., predicted frames (P frames) and bi-predicted frames (B frames). P frames rely on the data of the previous frame for encoding, while B frames rely on the data of both the previous and subsequent frames, so B frames typically have higher compression rates, but are also more complex.
S13, frame classification: each frame is classified as either a key frame or a non-key frame according to the type of frame.
S14, marking a frame: each frame is marked indicating the frame type to which it belongs.
S2, detecting and analyzing the video frame stream in real time by utilizing a CNN-LSTM combination model, and identifying possible security threats such as data tampering and malicious attack; abnormal changes in the image, such as object shielding, scene change and the like, are detected, so that the safety monitoring capability of video content is improved;
the method comprises the following specific steps:
S21, preprocessing the acquired video frames, including image size adjustment and normalization operation, to ensure that the input data is matched with the requirements of a training model;
s22, utilizing the preprocessed video frame, adjusting the structure and super parameters of the CNN-LSTM combined model through repeated iterative training and verification, and optimizing the performance of the CNN-LSTM combined model so as to improve the accuracy and generalization capability of the model;
The CNN-LSTM combination model comprises:
Convolution layer: the method is used for extracting features in the image, including low-level features such as edges and textures and high-level features such as objects and scenes;
pooling layer: the method is used for reducing the size of the feature map, reducing the number of parameters and simultaneously keeping key information;
stacking of convolution-pooling layers: a stack of multiple convolution-pooling layers for progressively extracting and abstracting image features;
Layer of flat: flattening the feature map output by the convolution layer into a one-dimensional vector serving as an input of the LSTM layer;
LSTM layer: for modeling timing information in the image sequence, capturing a temporal correlation between image frames;
Hidden layer: a stack of LSTM layers for increasing the depth and complexity of the network;
output layer: and connecting the output of the LSTM layer to the full connection layer, and finally outputting the prediction result of the model.
In the linear function of the CNN-LSTM combination model, an output layer of the CNN-LSTM combination model is connected to a full-connection layer, nonlinear transformation is carried out by using an activation function (ReLU), the output of the full-connection layer is classified by a softmax function, and the probability of each class is output; the linear function of the CNN-LSTM combination model is:
Wherein z j is the input weighted sum of the jth neuron of the full connection layer, w ij is the weight connecting the ith input feature and the jth neuron, x i is the ith element of the input feature vector, b j is the bias term of the jth neuron, and N is the dimension of the input feature vector;
on the basis of the linear function, the output of the neuron is obtained by performing nonlinear transformation by activating the function ReLU
Wherein f (z j) activates the function;
the probability p j of category j is calculated by softmax function:
Where K is the number of output categories.
The CNN-LSTM combination model uses a Cross entropy loss function (Cross-Entropy Loss) to measure the difference between the predicted result output by the model and the real label; the expression is as follows:
in the method, in the process of the invention, Representing the difference between the predicted result output by the model and the real label; y ij represents the tag value of the j-th class in the real tag of sample i (1 represents belonging to the class, 0 represents not belonging to the class); Representing the predicted probability of the j-th class in the model output for sample i.
The image features are extracted through CNN, LSTM modeling time sequence information is combined with a linear function and a loss function to train, and security threats and abnormal changes in video frame images can be effectively identified.
The step of optimizing the CNN-LSTM combination model comprises the following steps:
s221, dividing the data set into a training set, a verification set and a test set; the training set is used for parameter training of the model, the verification set is used for super-parameter adjustment and performance evaluation of the model, and the test set is used for performance evaluation of the final model.
S222, training the deep learning model by using a training set, wherein in the training process, parameters of the model are updated through a back propagation algorithm and an optimizer (Adam) so that a loss function of the model is gradually reduced;
S223, evaluating the model obtained by training by using the verification set, and adjusting the super parameters of the model according to the performance of the verification set, wherein the super parameters comprise: learning rate: too large a step size of the control parameter update may cause model oscillation, and too small a step size may cause slow convergence speed.
Batch size: the number of samples used for each iterative training affects the convergence rate and generalization ability of the model.
Network structure: including the number of network layers, the number of neurons per layer, the size of the convolution kernel, etc.
Regularization parameters: such as L1 regularized, L2 regularized weights.
S224, monitoring the performance of the model on the verification set, and stopping training when the performance is not improved any more so as to prevent over fitting; if the data set is smaller in size, the generalization capability of the model can be evaluated in a cross-validation mode, so that the robustness of the model is further improved.
Through repeated iterative training and verification, the super parameters and the structure of the model are adjusted, so that the performance of the deep learning model can be gradually optimized, and the accuracy and generalization capability of the deep learning model are improved.
S23, inputting video frames into a CNN-LSTM model after training and optimizing, and analyzing and monitoring each frame in real time through the model; the model may detect abnormal changes in the image, object occlusions, scene changes, etc., and identify potential security threats.
And S3, in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model.
If an abnormal condition is detected, timely taking measures, such as triggering an alarm, taking preventive measures or automatically adjusting a security policy, to deal with potential security threats; the feedback mechanism is improved through continuous iteration, so that the coping capability and the transmission efficiency of the security threat are improved.
S4, performing end-to-end encryption transmission on the video key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively, so as to ensure the data security in the transmission process; and meanwhile, secure transmission and management of the secret key are ensured so as to prevent data leakage and tampering.
The step of using an AES symmetric encryption algorithm to perform end-to-end encrypted transmission of video key frames includes:
1) And (3) key generation: the sender and receiver negotiate a key, and a secure key agreement protocol is used to generate a shared key.
2) Encrypting key frames: the sender encrypts the key frame by using the shared key to protect the content of the key frame; an AES (advanced encryption standard) symmetric encryption algorithm is used to encrypt the key frames.
3) Transmitting encrypted data: the sender transmits the encrypted key frame to the receiver, so that the data security in the transmission process is ensured.
4) Decrypting the received data: the receiving party uses the same shared secret key to decrypt the received encrypted data, and the original key frame content is restored.
The step of using a partial encryption method to perform end-to-end encrypted transmission on the video non-key frames comprises the following steps:
1) And (3) key generation: the sender and receiver still negotiate the key, generating a shared key.
2) Encrypting the non-key frames: unlike key frames, non-key frames can generally be more lightweight encrypted because their content is less important than key frames; encryption may be performed in a partial encryption (encryption of only sensitive areas) to reduce the impact of the encryption process on bandwidth and performance.
3) Transmitting encrypted data: the sender transmits the encrypted non-key frame to the receiver, so that the data security in the transmission process is ensured.
4) Decrypting the received data: the receiving party uses the same shared secret key to decrypt the received encrypted data, and the original non-key frame content is restored.
It is worth mentioning that encryption of key frames typically introduces a greater performance overhead, as its content is more important and requires more stringent protection. Encryption of non-key frames may employ a lighter weight encryption algorithm to reduce the impact of the encryption process on transmission performance.
S5, identity authentication is carried out before transmission starts, so that both sides of the transmission are ensured to be legal and trusted; the identity of the transmitting party can be verified by adopting digital certificates, token authentication and other modes so as to prevent unauthorized access and malicious attack.
The method of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the method described herein may be stored on such software process on a recording medium using a general purpose computer, special purpose processor, or programmable or special purpose hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a storage component (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, performs the methods described herein. Furthermore, when a general purpose computer accesses code for implementing the methods illustrated herein, execution of the code converts the general purpose computer into a special purpose computer for performing the methods illustrated herein.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims (10)

1. A method for processing a video frame stream, comprising the steps of:
dividing a video frame stream into key frames and non-key frames;
Detecting and analyzing the video frame stream in real time by using a CNN-LSTM combined model, identifying possible security threats and detecting abnormal changes in the image;
in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model;
Performing end-to-end encryption transmission on the key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively;
identity authentication is performed before transmission begins.
2. The video frame stream processing method according to claim 1, wherein the CNN-LSTM combination model includes:
convolution layer: for extracting features in the image;
pooling layer: the method is used for reducing the size of the feature map, reducing the number of parameters and simultaneously keeping key information;
stacking of convolution-pooling layers: for progressively extracting and abstracting image features;
Layer of flat: flattening the feature map output by the convolution layer into a one-dimensional vector serving as an input of the LSTM layer;
LSTM layer: for modeling timing information in the image sequence, capturing a temporal correlation between image frames;
hidden layer: for increasing the depth and complexity of the network;
output layer: and connecting the output of the LSTM layer to the full connection layer, and finally outputting the prediction result of the model.
3. The video frame stream processing method according to claim 2, wherein the linear function of the CNN-LSTM combination model is:
Wherein z j is the input weighted sum of the jth neuron of the full connection layer, w ij is the weight connecting the ith input feature and the jth neuron, x i is the ith element of the input feature vector, b j is the bias term of the jth neuron, and N is the dimension of the input feature vector;
The loss function of the CNN-LSTM combination model is as follows:
in the method, in the process of the invention, Representing the difference between the predicted result output by the model and the real label; y ij represents the tag value of the j-th class in the real tag of sample i; Representing the predicted probability of the j-th class in the model output for sample i.
4. The method for processing a video frame stream according to claim 2, wherein the specific steps of detecting and analyzing the video frame stream in real time by using the CNN-LSTM combination model include:
S21, preprocessing the acquired video frames, including image size adjustment and normalization operation;
S22, utilizing the preprocessed video frame, adjusting the structure and super parameters of the CNN-LSTM combined model through repeated iterative training and verification, and optimizing the CNN-LSTM combined model;
S23, inputting the frequency frames into a CNN-LSTM model after training and optimizing, analyzing and monitoring each frame in real time through the model, detecting abnormal changes, object shielding and scene change conditions in the image, and identifying security threats.
5. The video frame stream processing method according to claim 4, wherein the step of optimizing the CNN-LSTM combining model comprises:
s221, dividing the data set into a training set, a verification set and a test set; the training set is used for parameter training of the model, the verification set is used for super-parameter adjustment and performance evaluation of the model, and the test set is used for performance evaluation of the final model;
s222, training the deep learning model by using a training set, and updating parameters of the model through a back propagation algorithm and an optimizer in the training process so that a loss function of the model is gradually reduced;
s223, evaluating the model obtained by training by using the verification set, and adjusting the super parameters of the model according to the performance of the verification set, wherein the super parameters comprise: learning rate, batch size, network structure, regularization parameters;
And S224, monitoring the performance of the model on the verification set, and stopping training when the performance is not improved any more.
6. A video frame stream processing method according to claim 1, characterized in that only sensitive areas are encrypted when encrypting non-key frames.
7. A video frame stream processing system, comprising:
And a classification module: dividing a video frame stream into key frames and non-key frames;
And a detection and analysis module: detecting and analyzing the video frame stream in real time by using a CNN-LSTM combined model, identifying possible security threats and detecting abnormal changes in the image;
and a transmission feedback module: in the transmission process, a feedback mechanism is established, and corresponding feedback and processing are carried out according to the result output by the CNN-LSTM model;
and a transmission encryption module: performing end-to-end encryption transmission on the key frames and the non-key frames by adopting an AES symmetric encryption algorithm and a partial encryption mode respectively;
Identity authentication module: identity authentication is performed before transmission begins.
8. A computer storage medium storing a readable program, characterized in that when the program is run, a video frame stream processing method according to any one of claims 1 to 6 is executable.
9. An electronic device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to a video frame stream processing method according to any one of claims 1 to 6.
10. A computer program product comprising computer instructions that instruct a computing device to perform operations corresponding to a video frame stream processing method as claimed in any one of claims 1 to 6.
CN202410760158.2A 2024-06-13 2024-06-13 Video frame stream processing method and system Pending CN118540515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410760158.2A CN118540515A (en) 2024-06-13 2024-06-13 Video frame stream processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410760158.2A CN118540515A (en) 2024-06-13 2024-06-13 Video frame stream processing method and system

Publications (1)

Publication Number Publication Date
CN118540515A true CN118540515A (en) 2024-08-23

Family

ID=92380986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410760158.2A Pending CN118540515A (en) 2024-06-13 2024-06-13 Video frame stream processing method and system

Country Status (1)

Country Link
CN (1) CN118540515A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119723426A (en) * 2025-02-25 2025-03-28 深圳百通玄武技术有限公司 Intelligent analysis method and system for video stream based on neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119723426A (en) * 2025-02-25 2025-03-28 深圳百通玄武技术有限公司 Intelligent analysis method and system for video stream based on neural network
CN119723426B (en) * 2025-02-25 2025-06-20 深圳市极客智能科技有限公司 Intelligent analysis method and system for video stream based on neural network

Similar Documents

Publication Publication Date Title
Lv et al. Secure deep learning in defense in deep-learning-as-a-service computing systems in digital twins
CN113162902B (en) Low-delay safe vehicle-mounted intrusion detection method based on deep learning
Han et al. PPM-InVIDS: Privacy protection model for in-vehicle intrusion detection system based complex-valued neural network
EP3537319A1 (en) Tamper protection and video source identification for video processing pipeline
Guo et al. Efficient privacy-preserving anomaly detection and localization in bitstream video
CN118921161B (en) Data security gateway method and system based on edge privacy computing
CN118540515A (en) Video frame stream processing method and system
Ding et al. Efficient BiSRU combined with feature dimensionality reduction for abnormal traffic detection
CN108491785A (en) A kind of artificial intelligence image identification attack defending system
Bentafat et al. Towards real-time privacy-preserving video surveillance
Zanddizari et al. Generating black-box adversarial examples in sparse domain
US12339937B1 (en) Neural network-based security defense method for encrypted multimedia data, electronic device, and computer program product
CN112561770A (en) Confrontation sample defense method based on fragile watermark
US20250111026A1 (en) Biometric identification method, server, and client
Chen et al. Efficient privacy-preserving forensic method for camera model identification
Lawrence et al. Edwards curve digital signature algorithm for video integrity verification on blockchain framework
Kaushik Leveraging Deep Learning Techniques for Securing the Internet of Things in the Age of Big Data
Rao et al. A steganographic backdoor attack scheme on encrypted traffic
CN108512651A (en) A kind of artificial intelligence image identification attack defense method, system and storage medium
CN118262289A (en) Image segmentation learning method, client, server and system
Li et al. A Novel 6G Scalable Blockchain Clustering-Based Computer Vision Character Detection for Mobile Images.
Ali et al. Cancelable templates for secure face verification based on deep learning and random projections
CN115865411A (en) User verification method and system based on browser fingerprint
CN109218320B (en) Website link security verification method and device, computer equipment and storage medium
CN115426189A (en) Information security protection method and system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication