CN114155464B

CN114155464B - Video data storage method and device, storage medium and terminal

Info

Publication number: CN114155464B
Application number: CN202111438203.5A
Authority: CN
Inventors: 靳凤伟; 夏曙东; 孙智彬; 张志平
Original assignee: Beijing Transwiseway Information Technology Co Ltd
Current assignee: Beijing Xinglu Chelian Technology Co ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-11-25
Anticipated expiration: 2041-11-29
Also published as: CN114155464A

Abstract

The invention discloses a video data evidence storing method which is applied to a first client and comprises the following steps: when video data to be transmitted are acquired, acquiring a pre-trained scene type recognition model set for the video data; extracting a plurality of key video frames in the video data, and determining the scene type of each key video frame based on the model; loading driving data of a current vehicle, and constructing a mask picture carrying parameters according to the driving data and the scene type of each key video frame; synthesizing the mask picture carrying the parameters and each key video frame to generate synthesized target video data; and sending the video data and the driving data to be transmitted to a second client, and sending the processed target video data and the driving data to a cloud server. Because this application passes through the scene type of model identification video data to combine driving data to carry out the secondary synthesis to video data, make the video of driver report difficult to falsify, promoted the authenticity of video.

Description

A video data storage method, device, storage medium and terminal

技术领域technical field

本发明涉及数据安全技术领域，特别涉及一种视频数据存证方法、装置、存储介质及终端。The invention relates to the technical field of data security, in particular to a video data storage method, device, storage medium and terminal.

背景技术Background technique

货车司机当处于装货卸货、堵车、事故、加油、考勤打卡等场景时，货车司机需要通过水印视频进行拍摄当前所处的作业场景，并对对当前作业场景进行上报货主或车主，以进行车辆的报备或管车。When the truck driver is in the scene of loading and unloading, traffic jam, accident, refueling, attendance check-in, etc., the truck driver needs to shoot the current operation scene through the watermark video, and report the current operation scene to the owner or the owner of the vehicle for vehicle report or manage the vehicle.

现有技术中货车司机利用手机端进行取证时，实现方式是：货车司机利用手机传感器进行拍照、录像等取证，将相应取证文件存储于手机内部，将取证文件从手机内部上报货主或车主。由于证据文件会存储在手机内部，再从手机中进行传输时，使得对证据文件进行替换或篡改，导致传输至后端服务器的证据文件并非真正的证据文件，即证据文件的真实性得不到保证，从而降低了视频的真实性。In the prior art, when the truck driver uses the mobile phone to obtain evidence, the realization method is: the truck driver uses the mobile phone sensor to take pictures, video and other evidence collection, stores the corresponding evidence collection documents in the mobile phone, and reports the evidence collection documents to the cargo owner or vehicle owner from the mobile phone. Since the evidence files will be stored in the mobile phone, and then transferred from the mobile phone, the evidence files will be replaced or tampered with, resulting in the evidence files transmitted to the back-end server not being real evidence files, that is, the authenticity of the evidence files cannot be obtained. Warranty, thereby reducing the authenticity of the video.

发明内容Contents of the invention

本申请实施例提供了一种视频数据存证方法、装置、存储介质及终端。为了对披露的实施例的一些方面有一个基本的理解，下面给出了简单的概括。该概括部分不是泛泛评述，也不是要确定关键/重要组成元素或描绘这些实施例的保护范围。其唯一目的是用简单的形式呈现一些概念，以此作为后面的详细说明的序言。Embodiments of the present application provide a video data certificate storage method, device, storage medium, and terminal. In order to provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is presented below. This summary is not an overview, nor is it intended to identify key/critical elements or delineate the scope of these embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.

第一方面，本申请实施例提供了一种视频数据存证方法，应用于第一客户端，方法包括：In the first aspect, the embodiment of the present application provides a video data storage method, which is applied to the first client, and the method includes:

当采集到待传输的视频数据时，获取针对视频数据设置的预先训练的场景类型识别模型；When the video data to be transmitted is collected, a pre-trained scene type recognition model set for the video data is obtained;

提取视频数据中多个关键视频帧，并基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型；Extract multiple key video frames in the video data, and determine the scene type of each key video frame based on the pre-trained scene type recognition model;

加载当前车辆的行车数据，并根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片；Load the driving data of the current vehicle, and construct a mask image with parameters according to the driving data and the scene type of each key video frame;

将携带参数的蒙板图片和每个关键视频帧进行合成，生成合成的目标视频数据；Synthesize the mask image carrying parameters with each key video frame to generate synthesized target video data;

将待传输的视频数据和行车数据发送至第二客户端，并将目标视频数据处理后和行车数据一起发送至云端服务器。Send the video data and driving data to be transmitted to the second client, and send the target video data to the cloud server together with the driving data after processing.

可选的，按照以下步骤生成预先训练的场景类型识别模型，包括：Optionally, follow the steps below to generate a pre-trained scene type recognition model, including:

采集车辆所处的场景图像，得到模型训练样本；其中，场景图像至少包括车辆卸货场景、车辆加油场景、车辆行车场景以及车辆事故场景；Collecting scene images where the vehicle is located to obtain model training samples; where the scene images at least include vehicle unloading scenes, vehicle refueling scenes, vehicle driving scenes, and vehicle accident scenes;

采用YOLOv5算法创建场景类型识别模型；Use the YOLOv5 algorithm to create a scene type recognition model;

将模型训练样本输入场景类型识别模型中进行模型训练，输出损失值；Input the model training samples into the scene type recognition model for model training, and output the loss value;

当损失值到达最小时，生成预先训练的场景类型识别模型；When the loss value reaches the minimum, generate a pre-trained scene type recognition model;

或者，or,

当损失值未到达最小时，将损失值进行反向传播以调整场景类型识别模型的模型参数，并继续将模型训练样本输入场景类型识别模型中进行模型训练。When the loss value does not reach the minimum, the loss value is backpropagated to adjust the model parameters of the scene type recognition model, and continue to input the model training samples into the scene type recognition model for model training.

可选的，场景类型识别模型包括输入端、基准网络、Neck网络以及Head输出端；Optionally, the scene type recognition model includes an input terminal, a reference network, a Neck network and a Head output terminal;

基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型，包括：Determine the scene type of each key video frame based on a pre-trained scene type recognition model, including:

输入端接收每个关键视频帧，并将每个关键视频帧缩放到预设大小后进行归一化，得到归一化后的视频帧；The input terminal receives each key video frame, scales each key video frame to a preset size, and performs normalization to obtain a normalized video frame;

基准网络将归一化后的视频帧进行特征提取，得到特征图集合；The benchmark network performs feature extraction on the normalized video frames to obtain a set of feature maps;

Neck网络将特征图集合中各特征图与预设基础特征进行特征融合，得到融合后的特征图；The Neck network fuses each feature map in the feature map set with the preset basic features to obtain the fused feature map;

Head输出端采用分类分支对融合后的特征图进行分类，并采用回归分支对分类后的类型进行线性回归，得到每个关键视频帧的场景类型。The head output uses the classification branch to classify the fused feature maps, and uses the regression branch to perform linear regression on the classified types to obtain the scene type of each key video frame.

可选的，根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片，包括：Optionally, construct a mask image with parameters according to the driving data and the scene type of each key video frame, including:

获取蒙板图片；Get the mask image;

识别蒙板图片上第一参数标识集合；Identify the first parameter identification set on the mask image;

识别行车数据与每个关键视频帧的场景类型对应的第二参数标识集合；Identify the second parameter identification set corresponding to the driving data and the scene type of each key video frame;

从第一参数标识集合中识别与第二参数标识集合中各参数标识相同的参数标识进行数据映射，生成携带参数的蒙板图片。Identify the same parameter identifiers from the first parameter identifier set as the parameter identifiers in the second parameter identifier set to perform data mapping, and generate a mask picture carrying parameters.

可选的，将目标视频数据处理后和行车数据一起发送至云端服务器，包括：Optionally, the target video data is processed and sent to the cloud server together with the driving data, including:

获取数字水印图像；Obtain digital watermark image;

分别从目标视频数据的图像与数字水印图像中截取正方形的RGB图像，得到第一图像和第二图像；Intercept square RGB images from the image of the target video data and the digital watermark image respectively to obtain the first image and the second image;

将第一图像进行颜色通道分离后得到第一颜色分量集合，并将第二图像进行颜色通道分离后得到第二颜色分量集合；performing color channel separation on the first image to obtain a first color component set, and performing color channel separation on the second image to obtain a second color component set;

对第一颜色分量集合进行Arnold变换后得到变换矩阵；A transformation matrix is obtained after performing Arnold transformation on the first color component set;

根据变换矩阵对第二颜色分量集合进行DCT变换后得到直流分量；performing DCT transformation on the second color component set according to the transformation matrix to obtain a DC component;

根据变换矩阵与直流分量对目标视频数据嵌入数字水印，生成处理后的视频数据；According to the transformation matrix and the DC component, the digital watermark is embedded into the target video data, and the processed video data is generated;

将处理后的视频数据与行车数据发送至云端服务器。Send the processed video data and driving data to the cloud server.

第二方面，本申请实施例提供了一种视频数据存证方法，应用于云端服务器，方法包括：In the second aspect, the embodiment of the present application provides a video data storage method, which is applied to a cloud server, and the method includes:

接收第一客户端针对云端服务器发送的处理后的视频数据与行车数据；receiving the processed video data and driving data sent by the first client to the cloud server;

将处理后的视频数据转换为二进制数据；Convert the processed video data to binary data;

将二进制数据与行车数据进行SHA256哈希运算，得到第一哈希字符串；Perform SHA256 hash operation on the binary data and the driving data to obtain the first hash string;

将第一哈希字符串保存至区块链。Save the first hash string to the blockchain.

第三方面，本申请实施例提供了一种视频数据存证方法，应用于第二客户端，方法包括：In the third aspect, the embodiment of the present application provides a video data storage method, which is applied to the second client, and the method includes:

当接收到第一客户端针对第二客户端发送的待传输的视频数据与行车数据时，与云端服务器建立通信并获取区块链中保存的第一哈希字符串；When receiving the video data and driving data to be transmitted sent by the first client for the second client, establish communication with the cloud server and obtain the first hash string stored in the block chain;

将待传输的视频数据与行车数据进行SHA256哈希运算得到第二哈希字符串；Performing SHA256 hash operation on the video data to be transmitted and the driving data to obtain a second hash string;

当第一哈希字符串与第二哈希字符串相同且待传输的视频数据中数字水印正确时，播放待传输的视频数据；When the first hash string is identical to the second hash string and the digital watermark in the video data to be transmitted is correct, play the video data to be transmitted;

或者，or,

当第一哈希字符串与第二哈希字符串相同且待传输的视频数据中数字水印正确时，确定待传输的视频数据鉴权失败或被篡改，禁止播放待传输的视频数据。When the first hash string is the same as the second hash string and the digital watermark in the video data to be transmitted is correct, it is determined that the video data to be transmitted fails to be authenticated or has been tampered with, and the video data to be transmitted is prohibited from being played.

第四方面，本申请实施例提供了一种视频数据存证装置，应用于第一客户端，装置包括：In the fourth aspect, the embodiment of the present application provides a video data certificate storage device, which is applied to the first client, and the device includes:

模型获取模块，用于当采集到待传输的视频数据时，获取针对视频数据设置的预先训练的场景类型识别模型；A model acquisition module, used to obtain a pre-trained scene type recognition model set for video data when the video data to be transmitted is collected;

场景类型识别模块，用于提取视频数据中多个关键视频帧，并基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型；The scene type identification module is used to extract a plurality of key video frames in the video data, and determine the scene type of each key video frame based on a pre-trained scene type identification model;

蒙板图片构建模块，用于加载当前车辆的行车数据，并根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片；Mask image construction module, used to load the driving data of the current vehicle, and construct a mask image with parameters according to the driving data and the scene type of each key video frame;

视频合成模块，用于将携带参数的蒙板图片和每个关键视频帧进行合成，生成合成的目标视频数据；The video synthesis module is used to synthesize the mask image carrying parameters and each key video frame to generate synthesized target video data;

视频发送模块，用于将待传输的视频数据和行车数据发送至第二客户端，并将目标视频数据处理后和行车数据一起发送至云端服务器。The video sending module is used to send the video data and driving data to be transmitted to the second client, and send the target video data to the cloud server together with the driving data after processing.

第五方面，本申请实施例提供一种计算机存储介质，计算机存储介质存储有多条指令，指令适于由处理器加载并执行上述的方法步骤。In a fifth aspect, embodiments of the present application provide a computer storage medium, where a plurality of instructions are stored in the computer storage medium, and the instructions are adapted to be loaded by a processor and execute the above method steps.

第六方面，本申请实施例提供一种终端，可包括：处理器和存储器；其中，存储器存储有计算机程序，计算机程序适于由处理器加载并执行上述的方法步骤。In a sixth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute the above method steps.

本申请实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

在本申请实施例中，视频数据存证装置首先当采集到待传输的视频数据时，获取针对视频数据设置的预先训练的场景类型识别模型，再提取视频数据中多个关键视频帧，并基于该模型确定每个关键视频帧的场景类型，然后加载当前车辆的行车数据，并根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片，其次将携带参数的蒙板图片和每个关键视频帧进行合成，生成合成的目标视频数据，最后将待传输的视频数据和行车数据发送至第二客户端，并将目标视频数据处理后和行车数据发送至云端服务器。由于本申请通过模型识别视频数据的场景类型，并结合行车数据对视频数据进行二次合成，使得司机上报的视频不易篡改，提升了视频的真实性。In the embodiment of the present application, the video data storage device first obtains the pre-trained scene type recognition model set for the video data when the video data to be transmitted is collected, and then extracts multiple key video frames in the video data, and based on The model determines the scene type of each key video frame, then loads the driving data of the current vehicle, and constructs a mask image with parameters according to the driving data and the scene type of each key video frame, and then combines the mask image with parameters and Each key video frame is synthesized to generate synthesized target video data, and finally the video data and driving data to be transmitted are sent to the second client, and the processed target video data and driving data are sent to the cloud server. Since the application uses the model to identify the scene type of the video data, and combines the driving data to perform secondary synthesis on the video data, the video reported by the driver is not easy to be tampered with, and the authenticity of the video is improved.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本发明。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1是本申请实施例提供的应用在第一客户端的一种视频数据存证方法的流程示意图；Fig. 1 is a schematic flow diagram of a video data storage method applied to a first client provided by an embodiment of the present application;

图2是本申请实施例提供的应用在云端服务器的一种视频数据存证方法的流程示意图；Fig. 2 is a schematic flow diagram of a video data storage method applied to a cloud server provided by an embodiment of the present application;

图3是本申请实施例提供的应用在第二客户端的一种视频数据存证方法的流程示意图；FIG. 3 is a schematic flow diagram of a video data storage method applied to a second client provided by an embodiment of the present application;

图4是本申请实施例提供的一种视频数据存证方法的流程示意图；Fig. 4 is a schematic flow chart of a video data storage method provided by an embodiment of the present application;

图5是本申请实施例提供的一种视频数据存证装置的结构示意图；Fig. 5 is a schematic structural diagram of a video data storage device provided by an embodiment of the present application;

图6是本申请实施例提供的一种终端的结构示意图。FIG. 6 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

具体实施方式Detailed ways

以下描述和附图充分地示出本发明的具体实施方案，以使本领域的技术人员能够实践它们。The following description and drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice them.

应当明确，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反，它们仅是如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.

在本发明的描述中，需要理解的是，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。此外，在本发明的描述中，除非另有说明，“多个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In the description of the present invention, it should be understood that the terms "first", "second" and so on are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention in specific situations. In addition, in the description of the present invention, unless otherwise specified, "plurality" means two or more. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character "/" generally indicates that the contextual objects are an "or" relationship.

本申请提供了一种视频数据存证方法、装置、存储介质及终端，以解决上述相关技术问题中存在的问题。本申请提供的技术方案中，由于本申请通过模型识别视频数据的场景类型，并结合行车数据对视频数据进行二次合成，使得司机上报的视频不易篡改，提升了视频的真实性，下面采用示例性的实施例进行详细说明。The present application provides a video data certificate storage method, device, storage medium and terminal to solve the problems existing in the above related technical problems. In the technical solution provided by this application, since this application uses the model to identify the scene type of the video data, and combines the driving data to perform secondary synthesis on the video data, the video reported by the driver is not easy to be tampered with, and the authenticity of the video is improved. The following uses an example A specific example will be described in detail.

下面将结合附图1-附图4，对本申请实施例提供的视频数据存证方法进行详细介绍。该方法可依赖于计算机程序实现，可运行于基于冯诺依曼体系的视频数据存证装置上。该计算机程序可集成在应用中，也可作为独立的工具类应用运行。The video data storage method provided by the embodiment of the present application will be described in detail below in conjunction with accompanying drawings 1 to 4 . The method can be implemented relying on a computer program, and can run on a video data storage device based on the von Neumann system. The computer program can be integrated in the application, or run as an independent utility application.

请参见图1，为本申请实施例提供了一种视频数据存证方法的流程示意图，应用于第一客户端。如图1所示，本申请实施例的方法可以包括以下步骤：Please refer to FIG. 1 , which provides a schematic flowchart of a method for video data storage in an embodiment of the present application, which is applied to a first client. As shown in Figure 1, the method of the embodiment of the present application may include the following steps:

S101，当采集到待传输的视频数据时，获取针对视频数据设置的预先训练的场景类型识别模型；S101, when the video data to be transmitted is collected, obtain a pre-trained scene type recognition model set for the video data;

通常，场景类型识别模型是识别场景类型的数学模型，该模型可输出视频图像的场景类型，该模型采用YOLOv5算法创建。Generally, the scene type identification model is a mathematical model for identifying the scene type, which can output the scene type of the video image, and the model is created using the YOLOv5 algorithm.

在本申请实施例中，在生成预先训练的场景类型识别模型时，首先采集车辆所处的场景图像，得到模型训练样本；其中，场景图像至少包括车辆卸货场景、车辆加油场景、车辆行车场景以及车辆事故场景，然后采用YOLOv5算法创建场景类型识别模型，再将模型训练样本输入场景类型识别模型中进行模型训练，输出损失值，最后当损失值到达最小时，生成预先训练的场景类型识别模型。In the embodiment of the present application, when generating the pre-trained scene type recognition model, the scene image where the vehicle is located is first collected to obtain a model training sample; wherein, the scene image at least includes a vehicle unloading scene, a vehicle refueling scene, a vehicle driving scene and Vehicle accident scene, then use the YOLOv5 algorithm to create a scene type recognition model, then input the model training samples into the scene type recognition model for model training, output the loss value, and finally generate a pre-trained scene type recognition model when the loss value reaches the minimum.

进一步地，当损失值未到达最小时，将损失值进行反向传播以调整场景类型识别模型的模型参数，并继续将模型训练样本输入场景类型识别模型中进行模型训练。Further, when the loss value does not reach the minimum, the loss value is backpropagated to adjust the model parameters of the scene type recognition model, and the model training samples are continuously input into the scene type recognition model for model training.

在一种可能的实现方式中，当货车司机处于装货卸货、堵车、事故、加油、考勤打卡等场景时，首先通过手机终端上的水印相机采集当前场景的视频，得到待传输的视频数据，此时获取针对视频数据设置的预先训练的场景类型识别模型，该模型可以视频场景的场景类型。In a possible implementation, when the truck driver is in a scene such as loading and unloading, traffic jam, accident, refueling, attendance check-in, etc., first collect the video of the current scene through the watermark camera on the mobile terminal to obtain the video data to be transmitted, At this point, a pre-trained scene type recognition model set for the video data is acquired, and the model can identify the scene type of the video scene.

S102，提取视频数据中多个关键视频帧，并基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型；S102, extracting a plurality of key video frames in the video data, and determining the scene type of each key video frame based on a pre-trained scene type recognition model;

其中，关键视频帧是从数据数据中挑选出的多个高清晰度的视频图像。Wherein, the key video frame is a plurality of high-definition video images selected from the data.

在本申请实施例中，在提取视频数据中多个关键视频帧时，首先计算视频数据中每个视频图像帧的图像参数，例如清晰度、亮度、曝光度等，然后根据每个视频图像帧的图像参数计算每个视频图像帧的权重值，并将每个视频图像帧的权重值与预设权重值进行对比，将大于预设权重值的视频图像帧确定为关键视频帧。In the embodiment of the present application, when extracting multiple key video frames in the video data, the image parameters of each video image frame in the video data, such as sharpness, brightness, exposure, etc., are first calculated, and then according to each video image frame The image parameter calculates the weight value of each video image frame, and compares the weight value of each video image frame with the preset weight value, and determines the video image frame greater than the preset weight value as the key video frame.

进一步地，场景类型识别模型包括输入端、基准网络、Neck网络以及Head输出端。Further, the scene type recognition model includes an input terminal, a reference network, a Neck network and a Head output terminal.

具体的，在基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型时，输入端接收每个关键视频帧，并将每个关键视频帧缩放到预设大小后进行归一化，得到归一化后的视频帧；基准网络将归一化后的视频帧进行特征提取，得到特征图集合；Neck网络将特征图集合中各特征图与预设基础特征进行特征融合，得到融合后的特征图；Head输出端采用分类分支对融合后的特征图进行分类，并采用回归分支对分类后的类型进行线性回归，得到每个关键视频帧的场景类型。Specifically, when determining the scene type of each key video frame based on the pre-trained scene type recognition model, the input terminal receives each key video frame, and scales each key video frame to a preset size for normalization, The normalized video frame is obtained; the benchmark network performs feature extraction on the normalized video frame to obtain a feature map set; the Neck network fuses each feature map in the feature map set with the preset basic features to obtain the fused The feature map of the head output uses the classification branch to classify the fused feature map, and uses the regression branch to perform linear regression on the classified type to obtain the scene type of each key video frame.

S103，加载当前车辆的行车数据，并根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片；S103, load the driving data of the current vehicle, and construct a mask image carrying parameters according to the driving data and the scene type of each key video frame;

在一种可能的实现方式中，在构建携带参数的蒙板图片时，首先获取蒙板图片，然后识别蒙板图片上第一参数标识集合，再识别行车数据与每个关键视频帧的场景类型对应的第二参数标识集合，最后从第一参数标识集合中识别与第二参数标识集合中各参数标识相同的参数标识进行数据映射，生成携带参数的蒙板图片。In a possible implementation, when constructing a mask image with parameters, first obtain the mask image, then identify the first parameter identification set on the mask image, and then identify the driving data and the scene type of each key video frame For the corresponding second parameter identification set, finally identify the parameter identifications from the first parameter identification set that are the same as the parameter identifications in the second parameter identification set for data mapping, and generate a mask image carrying parameters.

具体的，行车数据包括货车北斗终端的实时位置经纬度以及逆地理编码、车牌号、手机GPS经纬度信息，以及用户基本信息、用户手动录入的信息。Specifically, the driving data includes the real-time latitude and longitude of the Beidou terminal of the truck, reverse geocoding, license plate number, GPS latitude and longitude information of the mobile phone, basic user information, and information manually entered by the user.

S104，将携带参数的蒙板图片和每个关键视频帧进行合成，生成合成的目标视频数据；S104, synthesizing the mask image carrying the parameters and each key video frame to generate synthesized target video data;

在一种可能的实现方式中，在生成携带参数的蒙板图片后，可将该携带参数的蒙板图片与每个关键视频帧进行合成，得到合成后的每个关键视频帧，最后将合成后的每个关键视频帧组成的视频确定为合成的目标视频数据。In a possible implementation, after the mask image carrying parameters is generated, the mask image carrying parameters can be synthesized with each key video frame to obtain each key video frame after synthesis, and finally the synthesized The video composed of each key video frame is determined as the target video data for synthesis.

S105，将待传输的视频数据和行车数据发送至第二客户端，并将目标视频数据处理后和行车数据一起发送至云端服务器。S105. Send the video data to be transmitted and the driving data to the second client, and send the target video data to the cloud server together with the driving data after processing.

在一种可能的实现方式中，在得到合成的目标视频数据后，可将待传输的视频数据和行车数据发送至第二客户端，即货主或车主端。In a possible implementation manner, after obtaining the synthesized target video data, the video data and driving data to be transmitted can be sent to the second client, that is, the cargo owner or the vehicle owner.

在一种可能的实现方式中，将目标视频数据处理后和行车数据一起发送至云端服务器时，首先获取数字水印图像，再分别从目标视频数据的图像与数字水印图像中截取正方形的RGB图像，得到第一图像和第二图像，然后将第一图像进行颜色通道分离后得到第一颜色分量集合，并将第二图像进行颜色通道分离后得到第二颜色分量集合，再对第一颜色分量集合进行Arnold变换后得到变换矩阵，其次根据变换矩阵对第二颜色分量集合进行DCT变换后得到直流分量，再根据变换矩阵与直流分量对目标视频数据嵌入数字水印，生成处理后的视频数据，最后将处理后的视频数据与行车数据发送至云端服务器。In a possible implementation, when the target video data is processed and sent to the cloud server together with the driving data, the digital watermark image is first obtained, and then a square RGB image is intercepted from the image of the target video data and the digital watermark image respectively. The first image and the second image are obtained, and then the first image is subjected to color channel separation to obtain the first color component set, and the second image is subjected to color channel separation to obtain the second color component set, and then the first color component set is obtained Arnold transformation is performed to obtain the transformation matrix, followed by DCT transformation of the second color component set according to the transformation matrix to obtain the DC component, and then the digital watermark is embedded in the target video data according to the transformation matrix and the DC component to generate the processed video data, and finally the The processed video data and driving data are sent to the cloud server.

具体的，从目标视频数据的图像与数字水印图像中选取正方形的RGB图像，并满足载体图像的边长为8的倍数且载体图像边长为水印图像边长的8倍。将载体图像的3个颜色通道分离，得到IR、IG、IB 3个颜色分量；将数字水印图像的3个颜色通道分离，得到WR、WG、WB 3个颜色分量。Specifically, a square RGB image is selected from the image of the target video data and the digital watermark image, and the side length of the carrier image is a multiple of 8 and the side length of the carrier image is 8 times the side length of the watermark image. The three color channels of the carrier image are separated to obtain the three color components of IR, IG and IB; the three color channels of the digital watermark image are separated to obtain the three color components of WR, WG and WB.

对WR、WG、WB三个颜色分量进行Arnold变换，(变换的过程可以看作是拉伸、压缩、折叠及拼接的过程)经过变换得到WRA、WGA、WBA。Perform Arnold transformation on the three color components of WR, WG, and WB (the transformation process can be regarded as the process of stretching, compression, folding, and splicing) to obtain WRA, WGA, and WBA after transformation.

目标视频数据的图像的各分量以8×8的大小为一个单位划分为若干子块。将子块视为一个整体，最左上角的子块位置坐标为(1,1)，其相邻的右边的子块位置坐标为(1，2)，依此类推。对每一个子块分别应用DCT变换，然后取出变换后的每一个子块左上角的直流分量组成一个新矩阵，位置坐标为(1,1)的子块的直流分量作为新矩阵(1,1)位置的元素，位置坐标为(1,2)的子块的直流分量作为新矩阵(1,2)位置的元素，依此类推。最终所得的矩阵称为直流分量矩阵IRD、IBD、IGD。Each component of the image of the target video data is divided into several sub-blocks with a size of 8×8 as a unit. Considering the sub-blocks as a whole, the position coordinates of the upper left sub-block are (1,1), the position coordinates of the adjacent right sub-blocks are (1,2), and so on. Apply DCT transformation to each sub-block separately, and then take out the DC component in the upper left corner of each sub-block after transformation to form a new matrix, and the DC component of the sub-block whose position coordinates are (1,1) is used as the new matrix (1,1 ), the DC component of the sub-block whose position coordinates are (1,2) is used as the element of the new matrix (1,2), and so on. The resulting matrices are called DC component matrices IRD, IBD, and IGD.

在直流分量矩阵上嵌入水印，嵌入方法是增加/减去置乱的WRA、WGA、WBA分量k倍的亮度。分量提取公式如下：Embed the watermark on the DC component matrix, the embedding method is to add/subtract the brightness of k times of the scrambled WRA, WGA, WBA components. The component extraction formula is as follows:

IRDE＝IRD+k×WRA；IGDE＝IGD×WGA；IBDE＝IBD+k×WBA；IRDE=IRD+k×WRA; IGDE=IGD×WGA; IBDE=IBD+k×WBA;

最后嵌入水印后的直流分量矩阵IRDE、IBDE、IGDE按照对应位置替换各个子块的直流分量，再对各个子块分别应用反DCT变换后，完成各颜色分量的水印嵌入，得到处理后的视频数据。Finally, the DC component matrices IRDE, IBDE, and IGDE after embedding the watermark replace the DC components of each sub-block according to the corresponding positions, and then apply the inverse DCT transformation to each sub-block to complete the watermark embedding of each color component, and obtain the processed video data .

具体的，本申请基于货车车机北斗终端的实时位置、用户手机GPS位置、车辆基本信息、用户信息得到行车数据，自动识别当前用户的作业场景，最后运用区块链技术、视频数字水印技术可专门针对货车司机报备提供安全便捷的防篡改的视频报备方法。Specifically, this application obtains driving data based on the real-time location of the Beidou terminal of the truck, the GPS location of the user's mobile phone, the basic information of the vehicle, and the user information, and automatically identifies the current user's operation scene. Finally, the blockchain technology and video digital watermarking technology can be used. It provides a safe and convenient tamper-proof video reporting method specifically for truck driver reporting.

请参见图2，为本申请实施例提供了一种视频数据存证方法的流程示意图，应用于云端服务器。如图2所示，本申请实施例的方法可以包括以下步骤：Please refer to FIG. 2 , which provides a schematic flowchart of a video data storage method for an embodiment of the present application, which is applied to a cloud server. As shown in Figure 2, the method of the embodiment of the present application may include the following steps:

S201，接收第一客户端针对云端服务器发送的处理后的视频数据与行车数据；S201, receiving the processed video data and driving data sent by the first client to the cloud server;

S202，将处理后的视频数据转换为二进制数据；S202, converting the processed video data into binary data;

S203，将二进制数据与行车数据进行SHA256哈希运算，得到第一哈希字符串；S203, performing SHA256 hash operation on the binary data and the driving data to obtain a first hash string;

S204，将第一哈希字符串保存至区块链。S204. Save the first hash string to the blockchain.

请参见图3，为本申请实施例提供了一种视频数据存证方法的流程示意图，应用于云端服务器。如图3所示，本申请实施例的方法可以包括以下步骤：Please refer to FIG. 3 , which provides a schematic flowchart of a video data storage method for an embodiment of the present application, which is applied to a cloud server. As shown in Figure 3, the method of the embodiment of the present application may include the following steps:

S301，当接收到第一客户端针对第二客户端发送的待传输的视频数据与行车数据时，与云端服务器建立通信并获取区块链中保存的第一哈希字符串；S301, when receiving the video data and driving data to be transmitted sent by the first client to the second client, establish communication with the cloud server and obtain the first hash string stored in the block chain;

S302，将待传输的视频数据与行车数据进行SHA256哈希运算得到第二哈希字符串；S302, performing SHA256 hash operation on the video data to be transmitted and the driving data to obtain a second hash string;

S303，当第一哈希字符串与第二哈希字符串相同且待传输的视频数据中数字水印正确时，播放待传输的视频数据；或者，当第一哈希字符串与第二哈希字符串相同且待传输的视频数据中数字水印正确时，确定待传输的视频数据鉴权失败或被篡改，禁止播放待传输的视频数据。S303, when the first hash character string is the same as the second hash character string and the digital watermark in the video data to be transmitted is correct, play the video data to be transmitted; or, when the first hash character string is the same as the second hash character string When the character strings are the same and the digital watermark in the video data to be transmitted is correct, it is determined that the video data to be transmitted fails to be authenticated or has been tampered with, and the video data to be transmitted is prohibited from being played.

例如图4所示，图4是本申请提供的一种视频数据存证过程的过程示意框图，在司机的第一客户端，第一客户端首先安装在其上的应该程序启动后自动下载识别模型，再自动通过货运大数据获取手机位置的经纬度、货车北斗终端位置经纬度得到行车数据，然后第一客户端启动水印相机录制视频后，提取视频中的关键视频帧输入下载的识别模型中识别出每个关键视频帧的场景类型，根据场景类型和行车数据得到携带参数的蒙板图片，其次将该蒙板图片与每个关键视频帧合成后得到合成视频，再将合成视频处理后与行车数据一起发送到云端服务器，并将刚开始录制的视频和行车数据发送到第二客户端。For example, as shown in Figure 4, Figure 4 is a schematic block diagram of a video data storage process provided by the present application. On the driver's first client, the first client installed on it should be automatically downloaded and identified after the program is started. model, and then automatically obtain the longitude and latitude of the mobile phone location and the longitude and latitude of the Beidou terminal location of the truck through the freight big data to obtain the driving data, and then the first client starts the watermark camera to record the video, extracts the key video frames in the video and inputs them into the downloaded recognition model to identify The scene type of each key video frame, according to the scene type and driving data to obtain a mask image with parameters, and then combine the mask image with each key video frame to obtain a composite video, and then process the composite video with the driving data Send it to the cloud server together, and send the video and driving data that have just been recorded to the second client.

在云端服务器收到处理后的合成视频和行车数据后，将处理后的合成视频和行车数据进行SHA256哈希运算得到哈希字符串，并将该哈希字符串放入到区块链中防篡改。After the cloud server receives the processed synthetic video and driving data, it performs SHA256 hash operation on the processed synthetic video and driving data to obtain a hash string, and puts the hash string into the blockchain to prevent tamper.

第二客户端在收到刚开始录制的视频和行车数据时，将刚开始录制的视频和行车数据进行SHA256哈希运算得到目标哈希字符串，并获取区块链上存储的哈希字符串，最后进行比对目标哈希字符串与区块链上存储的哈希字符串是否一致，如果一致且数字水印正确，则播放刚开始录制的视频，否则鉴权失败或者视频存在篡改，则结束。When the second client receives the newly recorded video and driving data, it performs SHA256 hash operation on the newly recorded video and driving data to obtain the target hash string, and obtains the hash string stored on the blockchain , and finally compare whether the target hash string is consistent with the hash string stored on the blockchain. If they are consistent and the digital watermark is correct, play the video that has just been recorded. Otherwise, the authentication fails or the video has been tampered with, then the end .

下述为本发明装置实施例，可以用于执行本发明方法实施例。对于本发明装置实施例中未披露的细节，请参照本发明方法实施例。The following are device embodiments of the present invention, which can be used to implement the method embodiments of the present invention. For the details not disclosed in the device embodiment of the present invention, please refer to the method embodiment of the present invention.

请参见图5，其示出了本发明一个示例性实施例提供的视频数据存证装置的结构示意图。该视频数据存证装置可以通过软件、硬件或者两者的结合实现成为终端的全部或一部分。该装置1包括模型获取模块10、场景类型识别模块20、蒙板图片构建模块30、视频合成模块40、视频发送模块50。Please refer to FIG. 5 , which shows a schematic structural diagram of a video data storage device provided by an exemplary embodiment of the present invention. The video data storage device can be implemented as all or a part of the terminal through software, hardware or a combination of the two. The device 1 includes a model acquisition module 10 , a scene type identification module 20 , a mask picture construction module 30 , a video synthesis module 40 , and a video sending module 50 .

模型获取模块10，用于当采集到待传输的视频数据时，获取针对视频数据设置的预先训练的场景类型识别模型；Model acquiring module 10, for when collecting the video data to be transmitted, obtain the pre-trained scene type recognition model set for video data;

场景类型识别模块20，用于提取视频数据中多个关键视频帧，并基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型；The scene type identification module 20 is used to extract a plurality of key video frames in the video data, and determines the scene type of each key video frame based on the pre-trained scene type identification model;

蒙板图片构建模块30，用于加载当前车辆的行车数据，并根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片；Mask picture construction module 30, is used to load the driving data of current vehicle, and constructs the mask picture carrying parameter according to the scene type of driving data and each key video frame;

视频合成模块40，用于将携带参数的蒙板图片和每个关键视频帧进行合成，生成合成的目标视频数据；Video synthesis module 40 is used to synthesize the mask picture carrying parameters and each key video frame to generate synthesized target video data;

视频发送模块50，用于将待传输的视频数据和行车数据发送至第二客户端，并将目标视频数据处理后和行车数据一起发送至云端服务器。The video sending module 50 is configured to send the video data and driving data to be transmitted to the second client, and send the target video data to the cloud server together with the driving data after processing.

需要说明的是，上述实施例提供的视频数据存证装置在执行视频数据存证方法时，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的视频数据存证装置与视频数据存证方法实施例属于同一构思，其体现实现过程详见方法实施例，这里不再赘述。It should be noted that, when the video data storage device provided by the above-mentioned embodiments executes the video data storage method, it only uses the division of the above-mentioned functional modules for illustration. In practical applications, the above-mentioned functions can be assigned by different The functional modules are completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the video data certificate storage device provided in the above embodiments and the video data certificate storage method embodiment belong to the same idea, and the implementation process thereof is detailed in the method embodiment, and will not be repeated here.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

本发明还提供一种计算机可读介质，其上存储有程序指令，该程序指令被处理器执行时实现上述各个方法实施例提供的视频数据存证方法。The present invention also provides a computer-readable medium on which program instructions are stored. When the program instructions are executed by a processor, the video data storage methods provided by the above method embodiments are implemented.

本发明还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行上述各个方法实施例的视频数据存证方法。The present invention also provides a computer program product containing instructions, which, when running on a computer, causes the computer to execute the video data storage method of each method embodiment above.

请参见图6，为本申请实施例提供了一种终端的结构示意图。如图6所示，终端1000可以包括：至少一个处理器1001，至少一个网络接口1004，用户接口1003，存储器1005，至少一个通信总线1002。Referring to FIG. 6 , it provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in FIG. 6 , a terminal 1000 may include: at least one processor 1001 , at least one network interface 1004 , a user interface 1003 , a memory 1005 , and at least one communication bus 1002 .

其中，通信总线1002用于实现这些组件之间的连接通信。Wherein, the communication bus 1002 is used to realize connection and communication between these components.

其中，用户接口1003可以包括显示屏(Display)、摄像头(Camera)，可选用户接口1003还可以包括标准的有线接口、无线接口。Wherein, the user interface 1003 may include a display screen (Display) and a camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.

其中，网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。Wherein, the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).

其中，处理器1001可以包括一个或者多个处理核心。处理器1001利用各种借口和线路连接整个电子设备1000内的各个部分，通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集，以及调用存储在存储器1005内的数据，执行电子设备1000的各种功能和处理数据。可选的，处理器1001可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable Logic Array，PLA)中的至少一种硬件形式来实现。处理器1001可集成中央处理器(Central Processing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示屏所需要显示的内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器1001中，单独通过一块芯片进行实现。Wherein, the processor 1001 may include one or more processing cores. The processor 1001 uses various interfaces and lines to connect various parts of the entire electronic device 1000, and by running or executing instructions, programs, code sets or instruction sets stored in the memory 1005, and calling data stored in the memory 1005, execute Various functions of the electronic device 1000 and processing data. Optionally, the processor 1001 may use at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). implemented in the form of hardware. The processor 1001 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used to render and draw the content that needs to be displayed on the display screen; the modem is used to handle wireless communication. It can be understood that the above modem may also not be integrated into the processor 1001, but implemented by a single chip.

其中，存储器1005可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。可选的，该存储器1005包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等；存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图6所示，作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及视频数据存证应用程序。Wherein, the memory 1005 may include a random access memory (Random Access Memory, RAM), or may include a read-only memory (Read-Only Memory). Optionally, the storage 1005 includes a non-transitory computer-readable storage medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, codes, sets of codes or sets of instructions. The memory 1005 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like for implementing the above method embodiments; the storage data area can store the data and the like involved in the above method embodiments. Optionally, the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 . As shown in FIG. 6 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a video data storage application program.

在图6所示的终端1000中，用户接口1003主要用于为用户提供输入的接口，获取用户输入的数据；而处理器1001可以用于调用存储器1005中存储的视频数据存证应用程序，并具体执行以下操作：In the terminal 1000 shown in FIG. 6 , the user interface 1003 is mainly used to provide the user with an input interface to obtain the data input by the user; and the processor 1001 can be used to call the video data storage application program stored in the memory 1005, and Specifically perform the following operations:

在一个实施例中，处理器1001在生成预先训练的场景类型识别模型时，具体执行以下操作：In one embodiment, when the processor 1001 generates the pre-trained scene type recognition model, it specifically performs the following operations:

或者，or,

在一个实施例中，处理器1001在执行基于预先训练的场景类型识别模型确定每个关键视频帧的场景类型时，具体执行以下操作：In one embodiment, when the processor 1001 determines the scene type of each key video frame based on the pre-trained scene type identification model, it specifically performs the following operations:

在一个实施例中，处理器1001在执行根据行车数据与每个关键视频帧的场景类型构建携带参数的蒙板图片时，具体执行以下操作：In one embodiment, when the processor 1001 constructs a mask picture carrying parameters according to the driving data and the scene type of each key video frame, it specifically performs the following operations:

获取蒙板图片；Get the mask image;

在一个实施例中，处理器1001在执行将目标视频数据处理后和行车数据一起发送至云端服务器时，具体执行以下操作：In one embodiment, when the processor 1001 executes processing the target video data and sending it to the cloud server together with the driving data, it specifically performs the following operations:

获取数字水印图像；Obtain digital watermark image;

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，视频数据存证的程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，的存储介质可为磁碟、光盘、只读存储记忆体或随机存储记忆体等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware. The program for video data storage can be stored in a computer-readable storage medium. When the program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory, and the like.

以上所揭露的仅为本申请较佳实施例而已，当然不能以此来限定本申请之权利范围，因此依本申请权利要求所作的等同变化，仍属本申请所涵盖的范围。The above disclosures are only preferred embodiments of the present application, which certainly cannot limit the scope of the present application. Therefore, equivalent changes made according to the claims of the present application still fall within the scope of the present application.

Claims

1. A method for depositing video data, characterized in that it is applied to the first client, and the method comprises:

When the video data to be transmitted is collected, a pre-trained scene type recognition model set for the video data is obtained;

Extracting a plurality of key video frames in the video data, and determining the scene type of each key video frame based on the pre-trained scene type recognition model;

Load the driving data of the current vehicle, and construct a mask picture carrying parameters according to the driving data and the scene type of each key video frame;

Synthesizing the mask image carrying parameters and each key video frame to generate synthesized target video data;

Send the video data to be transmitted and the driving data to the second client, and send the target video data to the cloud server together with the driving data after processing.

2. The method according to claim 1, characterized in that generating a pre-trained scene type recognition model according to the following steps, comprising:

Collecting scene images where the vehicle is located to obtain model training samples; wherein, the scene images at least include a vehicle unloading scene, a vehicle refueling scene, a vehicle driving scene, and a vehicle accident scene;

Use the YOLOv5 algorithm to create a scene type recognition model;

Inputting the model training sample into the scene type recognition model for model training, and outputting a loss value;

When the loss value reaches the minimum, generate a pre-trained scene type recognition model;

or,

When the loss value does not reach the minimum, the loss value is backpropagated to adjust the model parameters of the scene type recognition model, and continue to input the model training samples into the scene type recognition model for model training .

3. The method according to claim 1, wherein the scene type recognition model comprises an input terminal, a reference network, a Neck network and a Head output terminal;

The scene type identification model based on the pre-training described determines the scene type of each key video frame, including:

The input end receives each key video frame, and normalizes each key video frame after scaling to a preset size to obtain a normalized video frame;

The benchmark network performs feature extraction on the normalized video frames to obtain a set of feature maps;

The Neck network performs feature fusion on each feature map in the feature map set and preset basic features to obtain a fused feature map;

The head output uses the classification branch to classify the fused feature maps, and uses the regression branch to perform linear regression on the classified types to obtain the scene type of each key video frame.

4. The method according to claim 1, wherein said building a mask picture carrying parameters according to the driving data and the scene type of each key video frame includes:

Get the mask image;

Identifying the first parameter identification set on the mask picture;

Identifying the second parameter identification set corresponding to the driving data and the scene type of each key video frame;

Identifying parameter identifiers from the first parameter identifier set that are the same as the parameter identifiers in the second parameter identifier set to perform data mapping to generate a mask picture carrying parameters.

5. The method according to claim 1, wherein said processing said target video data and sending said driving data to a cloud server includes:

Obtain digital watermark image;

Respectively intercepting square RGB images from the image of the target video data and the digital watermark image to obtain a first image and a second image;

performing color channel separation on the first image to obtain a first color component set, and performing color channel separation on the second image to obtain a second color component set;

A transformation matrix is obtained after performing Arnold transformation on the first color component set;

performing DCT transformation on the second color component set according to the transformation matrix to obtain a DC component;

Embedding a digital watermark into the target video data according to the transformation matrix and the DC component to generate processed video data;

Sending the processed video data and the driving data to a cloud server.

6. The method according to claim 1, wherein it is applied to a cloud server, and the method comprises:

receiving the processed video data and driving data sent by the first client for the cloud server;

Convert the processed video data to binary data;

performing SHA256 hash operation on the binary data and the driving data to obtain a first hash string;

Save the first hash string to the block chain.

7. The method according to claim 6, wherein it is applied to the second client, and the method comprises:

When receiving the video data and driving data to be transmitted sent by the first client for the second client, establish communication with the cloud server and obtain the first hash string stored in the block chain;

Performing SHA256 hash operation on the video data to be transmitted and the driving data to obtain a second hash string;

When the first hash character string is identical to the second hash character string and the digital watermark in the video data to be transmitted is correct, play the video data to be transmitted;

Otherwise, it is determined that the video data to be transmitted fails to be authenticated or has been tampered with, and the video data to be transmitted is prohibited from being played.

8. A video data storage device, characterized in that it is applied to the first client, and the device includes:

A model acquisition module, used to obtain a pre-trained scene type recognition model set for the video data when the video data to be transmitted is collected;

A scene type identification module, configured to extract a plurality of key video frames in the video data, and determine the scene type of each key video frame based on the pre-trained scene type identification model;

A mask picture construction module, used to load the driving data of the current vehicle, and construct a mask picture carrying parameters according to the driving data and the scene type of each key video frame;

A video synthesis module, configured to synthesize the mask image carrying parameters and each key video frame to generate synthesized target video data;

The video sending module is used to send the video data to be transmitted and the driving data to the second client, and send the target video data to the cloud server together with the driving data after processing.

9. A computer storage medium, characterized in that the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps according to any one of claims 1-7.

10. A terminal, characterized in that it comprises: a processor and a memory; wherein, the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing any one of claims 1-7 method steps.