CN117540047A

CN117540047A - Method, system, equipment and storage medium for retrieving video based on picture

Info

Publication number: CN117540047A
Application number: CN202311586249.0A
Authority: CN
Inventors: 孔祥博; 郭爱
Original assignee: Zhongke Shitong Hengqi Beijing Technology Co ltd
Current assignee: Zhongke Shitong Hengqi Beijing Technology Co ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-09

Abstract

The present disclosure provides a method, system, device and storage medium for video retrieval based on pictures. The method includes: acquiring a plurality of video data; processing the plurality of video data, generating a first category and a first feature vector of the encoding of each video data and its key frames, and converting the generated plurality of first categories The items and multiple first feature vectors are stored in the video database; image data is obtained; the image data is processed to generate a second category and a second feature vector of the image data; among the multiple first categories in the video database Find the first category that is the same as the second category and confirm it as the query category; calculate the similarity between the second feature vector and multiple first feature vectors corresponding to the query category; sort the similarities and generate The video list corresponding to the similarity can accurately and quickly retrieve videos similar to the image based on the input image.

Description

Method, system, equipment and storage medium for video retrieval based on images

技术领域Technical field

本公开涉及图像技术领域，尤其涉及一种基于图片检索视频的方法、系统、设备及存储介质。The present disclosure relates to the field of image technology, and in particular, to a method, system, device and storage medium for video retrieval based on pictures.

背景技术Background technique

在数字化时代，图片和视频被广泛应用于各个领域，大量的媒体数据被不断产生和积累，如何方便地检索和使用这些数据是一项具有挑战性的任务，因此利用图片检索视频的技术变得越来越重要。In the digital age, pictures and videos are widely used in various fields, and a large amount of media data is continuously generated and accumulated. How to easily retrieve and use this data is a challenging task, so the technology of using pictures to retrieve videos has become more and more important.

发明内容Contents of the invention

有鉴于此，本公开的目的在于提出一种基于图片检索视频的方法、系统、设备及存储介质，可以根据输入的图片准确、快速的检索到与图片相似的视频。In view of this, the purpose of this disclosure is to propose a method, system, device and storage medium for video retrieval based on pictures, which can accurately and quickly retrieve videos similar to pictures based on input pictures.

为了实现上述公开目的之一，本公开提供了一种基于图片检索视频的方法，所述方法包括：获取多个视频数据；对多个所述视频数据进行处理，生成每个视频数据的编码及其关键帧的第一类目和第一特征向量，并将生成的多个第一类目和多个第一特征向量存储在视频数据库中；获取图片数据；对所述图片数据进行处理，生成所述图片数据的第二类目和第二特征向量；在所述视频数据库中的多个第一类目中查找与所述第二类目相同的第一类目，确认为查询类目；计算所述第二特征向量与所述查询类目对应的多个第一特征向量之间的相似度；对所述相似度进行排序，生成与所述相似度对应的视频列表。In order to achieve one of the above disclosed purposes, the present disclosure provides a method for retrieving videos based on pictures. The method includes: acquiring multiple video data; processing multiple video data to generate encoding and decoding of each video data. the first category and the first feature vector of its key frame, and store the generated multiple first categories and multiple first feature vectors in the video database; obtain image data; process the image data to generate The second category and the second feature vector of the picture data; search for the first category that is the same as the second category among multiple first categories in the video database, and confirm it as the query category; Calculate the similarity between the second feature vector and multiple first feature vectors corresponding to the query category; sort the similarities to generate a video list corresponding to the similarities.

在本公开的一些实施例中，将生成的多个第一类目和多个第一特征向量存储在视频数据库中，获取图片数据之前，还包括：在所述视频数据库中，获取多个所述视频数据在设定时间内的单个访问量；响应于所述单个访问量小于阈值，删除所述单个访问量对应的所述视频数据及其关键帧的第一类目和第一特征向量。In some embodiments of the present disclosure, storing the generated plurality of first categories and the plurality of first feature vectors in a video database, and before obtaining the picture data, further includes: obtaining a plurality of the generated first categories and the plurality of first feature vectors in the video database. The single access amount of the video data within the set time; in response to the single access amount being less than the threshold, delete the first category and first feature vector of the video data and its key frames corresponding to the single access amount.

在本公开的一些实施例中，所述对多个所述视频数据进行处理，生成每个视频数据的编码及其关键帧的第一类目和第一特征向量，并将生成的多个第一类目和多个第一特征向量存储在视频数据库中，包括：将每个视频数据储存在分布式文件数据库中，并对每个视频数据编码；依次获取多个所述视频数据中每个视频数据的关键帧；判定每组所述关键帧所属的类目，并作为所述第一类目存储在所述视频数据库中；基于预训练的VGG16模型，计算每组所述关键帧的图片特征向量，并作为所述第一特征向量存储在所述视频数据库。In some embodiments of the present disclosure, the plurality of video data are processed, a first category and a first feature vector of the encoding of each video data and its key frame are generated, and the generated plurality of first feature vectors are generated. Storing a category and a plurality of first feature vectors in a video database includes: storing each video data in a distributed file database and encoding each video data; and sequentially acquiring each of the plurality of video data. Key frames of video data; determine the category to which each group of key frames belongs, and store them in the video database as the first category; calculate pictures of each group of key frames based on the pre-trained VGG16 model feature vector and stored in the video database as the first feature vector.

在本公开的一些实施例中，所述依次获取多个所述视频数据中每个视频数据的关键帧，还包括：获取所述视频数据的MPEG-4标准表，判断所述MPEG-4标准表中是否包括stss部分；响应于所述标准表中包括stss部分，则判定stss部分标识的帧为关键帧；响应于所述标准表中不包括stss部分，则利用边缘检测计算法计算视频帧之间的相似度，生成多个相似帧；对多个所述相似帧进行聚类，生成相似帧集；利用图片质量评估算法对所述相似帧集进行处理，选取每个所述相似帧集中的最优帧为关键帧。In some embodiments of the present disclosure, sequentially acquiring key frames of each video data in a plurality of the video data further includes: acquiring an MPEG-4 standard table of the video data, and determining the MPEG-4 standard Whether the stss part is included in the table; in response to the stss part being included in the standard table, it is determined that the frame identified by the stss part is a key frame; in response to the stss part being not included in the standard table, the video frame is calculated using the edge detection calculation method similarity between them, generate multiple similar frames; cluster multiple similar frames to generate a set of similar frames; use a picture quality assessment algorithm to process the set of similar frames, and select each set of similar frames The optimal frame is the key frame.

在本公开的一些实施例中，所述判定每组所述关键帧所属的类目，并作为所述第一类目存储在所述视频数据库中，包括：基于图片分类算法，判断每组所述关键帧所属的类目，并作为所述第一类目存储在所述视频数据库中。In some embodiments of the present disclosure, determining the category to which each group of key frames belongs and storing it in the video database as the first category includes: based on a picture classification algorithm, determining the category to which each group of key frames belongs. The category to which the key frame belongs is stored in the video database as the first category.

在本公开的一些实施例中，所述获取所述视频数据的MPEG-4标准之前，还包括：对所述视频数据进行分帧，得到单独的视频帧；对所述视频帧进行预处理，得到优化视频帧；所述预处理包括降噪和/或尺寸统一。In some embodiments of the present disclosure, before obtaining the MPEG-4 standard of the video data, the method further includes: dividing the video data into frames to obtain individual video frames; preprocessing the video frames, Optimized video frames are obtained; the preprocessing includes noise reduction and/or size unification.

在本公开的一些实施例中，所述对所述图片数据进行处理，生成所述图片数据的第二类目和第二特征向量，包括：基于图片分类算法，判断所述图片数据的类目，并存储为所述第二类目；基于预训练的VGG16模型，计算所述图片数据的图片特征向量，储存为所述第二特征向量。In some embodiments of the present disclosure, processing the picture data to generate a second category and a second feature vector of the picture data includes: determining the category of the picture data based on a picture classification algorithm. , and stored as the second category; based on the pre-trained VGG16 model, calculate the picture feature vector of the picture data, and store it as the second feature vector.

为了实现上述公开目的另一实施例，本公开提供了一种基于图片检索视频的系统，包括：第一获取模块，用于获取多个视频数据；第一处理模块，用于对多个所述视频数据进行处理，生成每个视频数据的编码及其关键帧的第一类目和第一特征向量，并将生成的多个第一类目和多个第一特征向量存储在视频数据库中；第二获取模块，用于获取图片数据；第二处理模块，用于对所述图片数据进行处理，生成所述图片数据的第二类目和第二特征向量查找模块，用于在所述视频数据库中查找与所述第二类目相同的所述第一类目，确认为查询类目；计算模块，用于计算所述第二特征向量与所述查询类目对应的多个第一特征向量之间的相似度；生成模块，用于对所述相似度进行排序，生成与所述相似度对应的视频列表。In order to achieve another embodiment of the above disclosed purpose, the present disclosure provides a system for retrieving videos based on pictures, including: a first acquisition module for acquiring multiple video data; a first processing module for processing multiple video data Process the video data, generate the encoding of each video data and the first category and the first feature vector of its key frame, and store the generated multiple first categories and multiple first feature vectors in the video database; The second acquisition module is used to obtain picture data; the second processing module is used to process the picture data and generate a second category of the picture data and a second feature vector search module for searching in the video Search the database for the first category that is the same as the second category and confirm it as the query category; a calculation module for calculating a plurality of first features corresponding to the second feature vector and the query category Similarity between vectors; a generation module, used to sort the similarity and generate a video list corresponding to the similarity.

为了实现上述公开目的另一实施例，本公开提供了一种计算机设备，包括处理器和存储器；所述存储器存储有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行如上述任一项所述方法的步骤。In order to achieve another embodiment of the above disclosed object, the present disclosure provides a computer device, including a processor and a memory; the memory stores a computer program, and when the computer program is executed by the processor, the processor Carry out the steps of any of the methods described above.

为了实现上述公开目的另一实施例，本公开提供了一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令用于使计算机执行上述任一所述的基于图片检索视频的方法。In order to achieve another embodiment of the above disclosed object, the present disclosure provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, the computer instructions are used to cause the computer to perform any of the above. A method for retrieving videos based on images.

相对于现有技术，本发明的技术效果在于：通过对视频数据进行处理，保留每个视频的编码，还生成了关键帧的第一类目和第一特征向量存储的视频数据中；在用户输入用于检索的图片后，处理图片得到图片的第二类目和第二特征向量，在与第二类目相同的第一类目的范围内进行检索，减少搜索范围，减小了检索过程中的计算量，降低模型应用的计算量消耗和时间成本，以满足对检索实时性要求较高的场景要求。Compared with the existing technology, the technical effect of the present invention is that: by processing the video data, the encoding of each video is retained, and the first category of the key frame and the first feature vector are generated in the video data stored; in the user After inputting the image for retrieval, the image is processed to obtain the second category and the second feature vector of the image, and the retrieval is performed within the same range of the first category as the second category, reducing the search scope and shortening the retrieval process. Reduce the calculation amount and time cost of model application to meet the requirements of scenarios that require high real-time retrieval.

附图说明Description of drawings

为了更清楚地说明本公开或相关技术中的技术方案，下面将对实施方式或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本公开的实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the present disclosure or related technologies more clearly, the drawings needed to be used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings in the following description are only for illustration of the present disclosure. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.

图1为本公开一实施方式提供的一种基于图片检索视频的方法流程图；Figure 1 is a flow chart of a method for retrieving videos based on images provided by an embodiment of the present disclosure;

图2为本公开另一实施方式提供的优化视频数据库的流程图；Figure 2 is a flow chart of optimizing a video database provided by another embodiment of the present disclosure;

图3为本公开另一实施方式提供的处理视频数据的流程图；Figure 3 is a flow chart for processing video data provided by another embodiment of the present disclosure;

图4为本公开另一实施方式提供的获取关键帧的流程图；Figure 4 is a flow chart for obtaining key frames provided by another embodiment of the present disclosure;

图5为本公开另一实施方式提供的处理图片数据的流程图；Figure 5 is a flow chart for processing image data provided by another embodiment of the present disclosure;

图6为本公开另一实施方式提供的一种基于体图片检索视频的示意图；Figure 6 is a schematic diagram of video retrieval based on volume pictures provided by another embodiment of the present disclosure;

图7为本公开另一实施方式提供的一种计算机设备硬件结构示意图。FIG. 7 is a schematic diagram of the hardware structure of a computer device provided by another embodiment of the present disclosure.

具体实施方式Detailed ways

以下将结合附图所示的具体实施方式对本发明进行详细描述。但这些实施方式并不限制本发明，本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。The present invention will be described in detail below with reference to the specific embodiments shown in the accompanying drawings. However, these embodiments do not limit the present invention. Structural, method, or functional changes made by those of ordinary skill in the art based on these embodiments are all included in the protection scope of the present invention.

实施方式需要说明的是，除非另外定义，本公开实施方式使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施方式中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。Embodiments It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of the present disclosure should have the usual meanings understood by those with ordinary skills in the field to which the disclosure belongs. "First", "second" and similar words used in the embodiments of the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. Words such as "include" or "comprising" mean that the elements or things appearing before the word include the elements or things listed after the word and their equivalents, without excluding other elements or things.

以图片检索视频，本质上还是对图片的检索，而图片检索长期以来一直是计算机视觉研究中的重要方面。图片检索是指从大型图片数据集中搜索和检索相关图片的过程。多年来，已经开发出各种方法和技术，以实现准确和高效的图片检索。Retrieval of videos by images is essentially a retrieval of images, and image retrieval has long been an important aspect in computer vision research. Image retrieval refers to the process of searching and retrieving relevant images from large image datasets. Over the years, various methods and techniques have been developed to achieve accurate and efficient image retrieval.

现有的图片检索的方法中，其中一种方法是基于VGG16网络模型，该模型在图片识别任务中被证明非常有效。Among the existing image retrieval methods, one of the methods is based on the VGG16 network model, which has proven to be very effective in image recognition tasks.

VGG16是一种深度卷积神经网络。其由16层组成，拥有超过1.38亿个参数。VGG16网络模型能够在各种图片识别任务中实现最先进的结果，包括对象分类，对象定位和对象检测。VGG16模型是在ImageNet数据集上进行训练的，该数据集是一个大规模数据集，包含1000个不同类别的超过一百万张图片。该数据集用于使用监督学习技术训练VGG16模型。在训练期间，VGG16模型学习识别图片中的各种功能，如边缘，角落，形状，颜色和纹理等。VGG16 is a deep convolutional neural network. It consists of 16 layers and has more than 138 million parameters. The VGG16 network model is able to achieve state-of-the-art results in a variety of image recognition tasks, including object classification, object localization and object detection. The VGG16 model is trained on the ImageNet dataset, a large-scale dataset containing over a million images in 1000 different categories. This dataset is used to train the VGG16 model using supervised learning techniques. During training, the VGG16 model learns to recognize various features in pictures such as edges, corners, shapes, colors, and textures, etc.

一旦VGG16模型训练完成，它可以用于各种图片处理任务，包括图片检索。使用VGG16模型进行图片检索涉及从图片中提取特征，然后将其与数据库中其他图片的特征进行比较。两个图片之间的特征相似性可以使用各种指标进行测量，包括欧氏距离，余弦相似度和相关系数等。Once the VGG16 model is trained, it can be used for various image processing tasks, including image retrieval. Image retrieval using the VGG16 model involves extracting features from an image and then comparing them to features of other images in the database. Feature similarity between two images can be measured using various metrics, including Euclidean distance, cosine similarity, and correlation coefficient, among others.

但VGG16模型也存在一些缺陷。首先，高度依赖待检索图片与训练数据集的相似性，因此在实际应用中，模型可能会由于过度适应了输入数据的某些方面而产生偏见，比如在测试时遇到与训练数据集不同的场景、变化的环境等情况，模型表现会受到很大影响；其次，由于其深度和参数量过大，导致其耗费计算量和时间的成本较高，所有当前的图片搜索方式往往需要输入准确的关键词或标签，用户需要具有一定的专业知识和多元思考能力；最后，当输入数据量过大时，处理速度会进一步变慢，导致对实时性要求较高的场景无法满足需求。实际应用中，仍需要考虑算法的时间和空间复杂度，以及基于算法开发出的应用程序的运行效率。But the VGG16 model also has some flaws. First, it is highly dependent on the similarity between the image to be retrieved and the training data set. Therefore, in practical applications, the model may be biased due to over-adaptation to certain aspects of the input data. For example, when testing, it encounters an image that is different from the training data set. The performance of the model will be greatly affected by scenes, changing environments, etc.; secondly, due to its excessive depth and parameter volume, it is computationally expensive and time-consuming. All current image search methods often require the input of accurate Keywords or tags, users need to have certain professional knowledge and diverse thinking capabilities; finally, when the amount of input data is too large, the processing speed will further slow down, resulting in scenarios that require high real-time performance being unable to meet the needs. In practical applications, it is still necessary to consider the time and space complexity of the algorithm, as well as the operating efficiency of the applications developed based on the algorithm.

有鉴于此，本公开实施方式提供一种基于图片检索视频的方法，可以降低检索过程中的计算量消耗，如图1所示，基于图片检索视频的方法包括以下步骤：In view of this, the embodiment of the present disclosure provides a method for retrieving videos based on pictures, which can reduce the calculation consumption during the retrieval process. As shown in Figure 1, the method for retrieving videos based on pictures includes the following steps:

步骤S100、获取多个视频数据。Step S100: Obtain multiple video data.

具体的，获取视频数据的范围是检索的范围，如在某视频平台上，以该视频平台的所有视频为检索范围，或用户检索设备本地上的视频，以该设备上的本地视频为检索范围。Specifically, the scope of obtaining video data is the scope of retrieval. For example, on a certain video platform, all videos on the video platform are used as the retrieval scope, or when the user searches for videos locally on the device, the local videos on the device are used as the retrieval scope. .

步骤S200、对多个视频数据进行处理，生成每个视频数据的编码及其关键帧的第一类目和第一特征向量，并将生成的多个第一类目和多个第一特征向量存储在视频数据库中。Step S200: Process the plurality of video data, generate the encoding of each video data and the first category and the first feature vector of the key frame, and use the generated multiple first categories and multiple first feature vectors stored in the video database.

具体的，分别对多个视频数据依次进行处理，每个视频数据具有单独的编码，便于对视频的查询和提取，每个视频数据都具有其关键帧，根据视频数据的时长和内容的不同，每个视频数据可能对应有一个关键帧或多个关键帧，每个关键帧对应有第一类目和第一特征向量，每个视频数据可能对应有多个第一类目和多个第一特征向量。其中第一类目为一种分类划分，因此划分发到第一类目中的视频数据、关键帧及其第一特征向量可能为一个或多个。Specifically, multiple video data are processed in sequence. Each video data has a separate encoding to facilitate video query and extraction. Each video data has its key frame. According to the length and content of the video data, Each video data may correspond to one key frame or multiple key frames. Each key frame may correspond to a first category and a first feature vector. Each video data may correspond to multiple first categories and multiple first features. Feature vector. The first category is a classification classification, so there may be one or more video data, key frames and first feature vectors sent to the first category.

经过对视频数据的处理，视频数据包含的特征被提取为图片格式数据的特征，便于检索输入的图片数据与其直接进行比对。After processing the video data, the features contained in the video data are extracted into the features of the image format data, which facilitates retrieval of the input image data and direct comparison with it.

本公开一可实现方式中，对于步骤S200，参见图3，具体还包括：In an implementation manner of the present disclosure, for step S200, see Figure 3, which specifically also includes:

步骤S210，将每个视频数据储存在分布式文件数据库中，并对每个视频数据编码；Step S210, store each video data in the distributed file database, and encode each video data;

步骤S220，依次获取多个所述视频数据中每个视频数据的关键帧；Step S220: Obtain the key frames of each video data in the plurality of video data in sequence;

步骤S230，判定每组所述关键帧所属的类目，并作为所述第一类目存储在所述视频数据库中；Step S230, determine the category to which each group of key frames belongs, and store it in the video database as the first category;

步骤S240，基于预训练的VGG16模型，计算每组所述关键帧的图片特征向量，并作为所述第一特征向量存储在所述视频数据库。Step S240: Calculate the picture feature vector of each group of key frames based on the pre-trained VGG16 model, and store it in the video database as the first feature vector.

具体的，每个视频数据的编码是唯一可追溯的id，根据编码可以快速查询到存储在不同位置的视频数据。另外，视频时由许多帧组成的连续图像，对于任何一端视频，提取其中的图像帧数目都是非常庞大的，将所有帧一起处理可能会导致性能问题，并且很难找到特定帧或时间段，因此在处理视频时，需要适当控制选取的图像帧的数量以保证处理效率。Specifically, the encoding of each video data is a unique traceable ID, and the video data stored in different locations can be quickly queried based on the encoding. In addition, a video is a continuous image composed of many frames. For any end of the video, the number of image frames extracted is very large. Processing all the frames together may cause performance problems, and it is difficult to find a specific frame or time period. Therefore, when processing video, it is necessary to appropriately control the number of selected image frames to ensure processing efficiency.

预先判断出选取的关键帧所属的类目，将检索范围限定在特定的类目后再进行计算，可以有效的减少计算量，提高响应速度。具体的类目中包括有场景类目，或基于颜色形状等特征规则分类，或基于统计模型进行分类等。Determine in advance the category to which the selected keyframes belong, and limit the search scope to specific categories before performing calculations, which can effectively reduce the amount of calculation and improve response speed. Specific categories include scene categories, classification based on feature rules such as color and shape, or classification based on statistical models.

使用预训练的VGG16模型及逆行关键帧的特征提取。Feature extraction using pre-trained VGG16 model and retrograde keyframes.

具体的过程为：选择VGG16模型的卷积层和全连接层的输出，共4096个特征值，由于关键帧的图片大都是彩色图片，因此采用三通道卷积核为3×3，池化核为2×2。以224×224的视频图像为例，具体计算过程如下：The specific process is: select the output of the convolution layer and fully connected layer of the VGG16 model, with a total of 4096 feature values. Since most of the key frame pictures are color pictures, a three-channel convolution kernel of 3×3 and a pooling kernel are used. is 2×2. Taking a 224×224 video image as an example, the specific calculation process is as follows:

1)输入图像大小为224×224×3，经64个通道的卷积核3×3，步长为1，共卷积2次，输出尺寸为224×224×64的特征向量。进行池化，采用池化核2×2，步长为2，输出尺寸为112×112×64的特征向量；1) The input image size is 224×224×3, and the convolution kernel of 64 channels is 3×3, with a step size of 1, and a total of 2 convolutions, and the output size is a feature vector of 224×224×64. For pooling, use a pooling kernel of 2×2, a stride of 2, and an output feature vector of size 112×112×64;

2)经128个3×3的卷积核，步长为1，卷积2次，尺寸变为112×112×128，进行池化，步长为2，输出尺寸为56×56×128；2) After 128 3×3 convolution kernels, the step size is 1, convolution is performed twice, the size becomes 112×112×128, and pooling is performed, the step size is 2, and the output size is 56×56×128;

3)经256个3×3的卷积核，步长为1，卷积3次，尺寸变为56×56×256，进行池化，步长为2，输出尺寸为28×28×256；3) After 256 3×3 convolution kernels, the step size is 1, convolution is performed 3 times, the size becomes 56×56×256, pooling is performed, the step size is 2, and the output size is 28×28×256;

4)经512个3×3的卷积核，步长为1，卷积3次，尺寸变为28×28×512，进行池化，步长为2，输出尺寸为14×14×256；4) After 512 3×3 convolution kernels, the step size is 1, convolution is performed 3 times, the size becomes 28×28×512, and pooling is performed, the step size is 2, and the output size is 14×14×256;

5)经512个3×3的卷积核，步长为1，卷积3次，尺寸变为14×14×512，进行池化，步长为2，输出尺寸为7×7×256；5) After 512 3×3 convolution kernels, the step size is 1, convolution is performed 3 times, the size becomes 14×14×512, and pooling is performed, the step size is 2, and the output size is 7×7×256;

6)将数据拉平成一维数组，7×7×256＝25088；6) Flatten the data into a one-dimensional array, 7×7×256=25088;

7)经两层1×1×4096与一层1×1×1000的全连接层，最终输出1×1000的特征向量。7) After two layers of 1×1×4096 and one layer of 1×1×1000 fully connected layer, a feature vector of 1×1000 is finally output.

本公开一可实现方式中，对于依次获取多个视频数据中每个视频数据的关键帧，参见图4，方法还包括：In an implementation manner of the present disclosure, for sequentially obtaining key frames of each video data in multiple video data, see Figure 4. The method also includes:

步骤S221，获取视频数据的MPEG-4标准表；Step S221, obtain the MPEG-4 standard table of video data;

步骤S222，判断MPEG-4标准表中是否包括stss部分；Step S222, determine whether the MPEG-4 standard table includes the stss part;

步骤S223，响应于标准表中包括stss部分，则判定stss部分标识的帧为关键帧；Step S223, in response to the standard table including the stss part, determine that the frame identified by the stss part is a key frame;

步骤S224，响应于标准表中不包括stss部分，则利用边缘检测计算法计算视频帧之间的相似度，生成多个相似帧；Step S224, in response to the standard table not including the stss part, use the edge detection calculation method to calculate the similarity between the video frames and generate multiple similar frames;

步骤S225，对多个相似帧进行聚类，生成相似帧集；Step S225, cluster multiple similar frames to generate a set of similar frames;

步骤S226，利用图片质量评估算法对相似帧集进行处理，选取每个相似帧集中的最优帧为关键帧。Step S226: Use the picture quality evaluation algorithm to process the similar frame sets, and select the optimal frame in each similar frame set as the key frame.

具体的，在MPEG-4标准中，stss部分通过标识关键帧的位置和数量，来帮助解码器快速定位和解析关键帧，从而提高视频解码的效率和质量，如MPEG-4标准中包含有stss部分，则直接获取到关键帧。Specifically, in the MPEG-4 standard, the stss part helps the decoder quickly locate and parse key frames by identifying the location and number of key frames, thereby improving the efficiency and quality of video decoding. For example, the MPEG-4 standard includes stss part, the keyframes are obtained directly.

另外，如其中不具有直接的关键帧，再利用边缘检测计算法计算帧之间的相似度，边缘检测计算法是一种用于检测图像边缘的算法。边缘是图像中相邻像素之间灰度值差异较大的区域，通常代表着图像中的物体边界。其基本思想是通过计算图像中每个像素与其相邻像素之间的灰度差值，并根据差值的大小和方向来确定该像素是否为边缘像素。通过对比图像帧上的灰度差值，计算帧之间的相似度，将其相似度大于一定阈值的帧为相似帧，多个相似帧组成的相似帧集可以由一个关键帧代表其特征。In addition, if there are no direct key frames, the edge detection algorithm is used to calculate the similarity between frames. The edge detection algorithm is an algorithm used to detect image edges. Edges are areas with large differences in gray value between adjacent pixels in an image, and usually represent the boundaries of objects in the image. The basic idea is to calculate the grayscale difference between each pixel in the image and its adjacent pixels, and determine whether the pixel is an edge pixel based on the size and direction of the difference. By comparing the grayscale differences on image frames, the similarity between frames is calculated, and frames whose similarity is greater than a certain threshold are considered similar frames. A similar frame set composed of multiple similar frames can be represented by a key frame.

需要说明的是，关键帧的提取的基本思想是选择代表整个视频的帧，以尽量保留视频的内容，这些代表帧被称为关键帧，可以根据以下几个方面来选取关键帧：It should be noted that the basic idea of key frame extraction is to select frames that represent the entire video to retain the content of the video as much as possible. These representative frames are called key frames. Key frames can be selected based on the following aspects:

1.物体移动：如果视频仅包含一个运动物体，则应选择包含该物体的帧作为关键帧。1. Object movement: If the video contains only one moving object, the frame containing the object should be selected as the key frame.

2.场景变化：如果视频中存在场景转换，如建筑物、道路、天空等，则应选择包含此类场景的帧作为关键帧。因此，在场景变化的时候，选取合适的一帧作为关键帧。2. Scene changes: If there are scene changes in the video, such as buildings, roads, sky, etc., frames containing such scenes should be selected as key frames. Therefore, when the scene changes, select an appropriate frame as a key frame.

3.语义信息：如果视频中包含大量文本或标志，我们可以通过识别并选择包含此类元素的帧来作为关键帧。3. Semantic information: If the video contains a lot of text or logos, we can identify and select frames containing such elements as key frames.

本公开一可实现实施方式中，步骤S221，获取视频数据的MPEG-4标准之前，还包括：对视频数据进行分帧，得到单独的视频帧；对视频帧进行预处理，得到优化视频帧；所述预处理包括降噪和/或尺寸统一。In an implementable embodiment of the present disclosure, step S221, before obtaining the MPEG-4 standard of the video data, also includes: dividing the video data into frames to obtain individual video frames; preprocessing the video frames to obtain optimized video frames; The preprocessing includes noise reduction and/or size unification.

在提取关键帧之前还可以对视频数据进行优化，在进行降噪或尺寸统一之后，得到更为清晰和规整的数据帧。The video data can also be optimized before extracting key frames. After noise reduction or size unification, clearer and more regular data frames can be obtained.

本公开一可实现实施方式中，判定每组关键帧所属的类目，并作为第一类目存储在视频数据库中，包括基于图片分类算法，判断每组关键帧所属的类目，并作为第一类目存储在视频数据库中。In an implementable embodiment of the present disclosure, determining the category to which each group of key frames belongs is determined and stored as the first category in the video database, including determining the category to which each group of key frames belongs based on a picture classification algorithm and storing it as the third category. A category is stored in the video database.

具体的，图像分类算法训练集包含大量的图片数据和标签，涵盖多种类别的图像，其中图片数据为算法需要学习和提取信息的内容，标签则指明每张图片的类别。算法会自动针对图片集进行学习，不断优化自身参数，使得自身可以根据输入的图片来进行分类，常见的图像分类算法中有基于规则的算法、基于统计的算法。基于深度学习的算法和基于聚类的算法等，选择一种方式对图片进行分类。Specifically, the image classification algorithm training set contains a large amount of image data and labels, covering multiple categories of images. The image data is what the algorithm needs to learn and extract information, and the labels indicate the category of each image. The algorithm will automatically learn from the picture set and continuously optimize its own parameters so that it can classify the input pictures. Common image classification algorithms include rule-based algorithms and statistics-based algorithms. Choose a way to classify pictures based on deep learning algorithms and clustering-based algorithms.

步骤S300，获取图片数据。Step S300: Obtain picture data.

具体的，用户在检索时，向程序中输入图片，计算机获取到图片数据后进行下一步。Specifically, when searching, the user inputs pictures into the program, and the computer proceeds to the next step after obtaining the picture data.

步骤S400，对图片数据进行处理，生成图片数据的第二类目和第二特征向量。Step S400: Process the image data to generate a second category and a second feature vector of the image data.

具体的，对图片处理之后，使图片数据生成与关键帧相同的特征，便于与关键帧进行比对。一般情景下，用户输入一张图片做检索，因此对一张图片数据处理后，则生成的第二类目和第二特征向量皆为一个数据。Specifically, after processing the image, the image data is generated with the same characteristics as the key frame, so as to facilitate comparison with the key frame. Under normal circumstances, the user inputs an image for retrieval, so after processing the image data, the generated second category and second feature vector are both one data.

本公开一可实现方式中，对于步骤S400，参见图5，具体还包括：In an implementation manner of the present disclosure, for step S400, see Figure 5, which specifically also includes:

步骤S410，图片分类算法，判断图片数据的类目，并存储为第二类目；Step S410, the image classification algorithm determines the category of the image data and stores it as the second category;

步骤S420，基于预训练的VGG16模型，计算图片数据的图片特征向量，储存为第二特征向量。Step S420: Calculate the image feature vector of the image data based on the pre-trained VGG16 model, and store it as a second feature vector.

具体的，利用与关键帧相同的处理方式处理图片并得到同类的特征，如前步骤中，对关键帧进行图片分类算法处理，本步骤中队图片数据进行相同的图片分类算法，且利用图片分类算法中的同一种规则处理，如关键帧利用了基于统计的图片分类算法，对图片数据进行相同的基于统计的算法。对于VGG16模型的利用也与处理关键帧时得模型相同，在此不再赘述。Specifically, use the same processing method as the key frame to process the image and obtain similar features. As in the previous step, the key frame is processed by the image classification algorithm. In this step, the image data is subjected to the same image classification algorithm, and the image classification algorithm is used. The same rule processing in , such as key frames, uses a statistical-based image classification algorithm, and the same statistical-based algorithm is applied to the image data. The use of the VGG16 model is also the same as the model used when processing key frames, so I will not go into details here.

步骤S500，在视频数据库中的多个第一类目中查找与第二类目相同的第一类目，确认为查询类目。Step S500: Search for a first category that is the same as the second category among multiple first categories in the video database and confirm it as the query category.

具体的，预先判断出选取的图片数据的所属的类目，且预先已将视频数据按照类目分类，因此仅在与图片数据的相同类目中查询视频，减小了检索过程中的计算量，提高检索的响应时间。Specifically, the category to which the selected picture data belongs is determined in advance, and the video data has been classified according to the category in advance. Therefore, the video is only queried in the same category as the picture data, which reduces the calculation amount during the retrieval process. , improve the response time of retrieval.

步骤S600，计算第二特征向量与查询类目对应的多个第一特征向量之间的相似度。Step S600: Calculate the similarity between the second feature vector and multiple first feature vectors corresponding to the query category.

具体的，通过计算第一特征向量和第二特征向量之间的距离来度量两者之间的相似性，其实就是在高维空间中找到与目标向量最接近的K个向量，采用欧式距离计算，公式如下：Specifically, by calculating the distance between the first eigenvector and the second eigenvector to measure the similarity between the two, it is actually to find the K vectors closest to the target vector in the high-dimensional space and use the Euclidean distance calculation. , the formula is as follows:

根据在上一步骤中已经得到的待查询图片数据所属的第二类目和第二特征向量，在查询类目下的所有与图片数据匹配的第二特征向量中，轮询其与第二特征向量的相似度。According to the second category and the second feature vector to which the image data to be queried have been obtained in the previous step, among all the second feature vectors matching the image data under the query category, poll the second feature vectors that match the image data. vector similarity.

步骤S700，对相似度进行排序，生成与相似度对应的视频列表。Step S700: Sort the similarities and generate a video list corresponding to the similarities.

具体的，相似度结果对应有特定的第一特征向量，从而对应有关键帧及其视频数据。将相似度按一定的顺序排序后，将相似度对应的视频数据得到同样的顺序，最终将这些有序的视频数据返回，以视频列表的形式展示。Specifically, the similarity result corresponds to a specific first feature vector, thereby corresponding to the key frame and its video data. After sorting the similarities in a certain order, the video data corresponding to the similarities will be in the same order, and finally these ordered video data will be returned and displayed in the form of a video list.

本公开一可实现方式中，对于步骤S200之后，步骤S300之前；参见图2，还包括一种视频数据库优化方法，具体为：In an implementation manner of the present disclosure, after step S200 and before step S300; see Figure 2, a video database optimization method is also included, specifically:

步骤S800，在视频数据库中，获取多个视频数据在设定时间内的单个访问量。Step S800: In the video database, obtain the single access volume of multiple video data within a set time.

步骤S900，响应于所述单个访问量小于阈值，删除所述单个访问量对应的所述视频数据及其关键帧的第一类目和第一特征向量。Step S900: In response to the single number of visits being less than a threshold, delete the first category and first feature vector of the video data and its key frames corresponding to the single number of visits.

需要说明的是，本公开中的视频数据库是为了检索过程而生成的数据库，其中存储的是视频数据的编码及其关键帧的特征，视频数据库中编码对应的视频数据是检索过程中检索的范围，视频数据库不是视频数据在平台上的存储库。It should be noted that the video database in this disclosure is a database generated for the retrieval process, which stores the encoding of video data and the characteristics of its key frames. The video data corresponding to the encoding in the video database is the scope of retrieval during the retrieval process. ,The video database is not a repository of video data on ,the platform.

具体的，对视频数据库进行定时巡检，其中设定的时间为定时巡检的时间间隔，查询每一个视频数据的单个访问量，访问量包括点击量、浏览量和被检索量等，当访问量小于阈值时，将该视频排除在检索范围之外，将该视频数据的编码在视频数据库中删除，使后续的检索过程不必再计算此视频，可以减小检索过程中的而计算量，减少对计算资源的消耗，提高检索速度。Specifically, the video database is regularly inspected, and the set time is the time interval of the scheduled inspection, and the individual visits of each video data are queried. The visits include clicks, page views, retrieved times, etc., when accessing When the amount is less than the threshold, the video is excluded from the retrieval range, and the encoding of the video data is deleted in the video database, so that the subsequent retrieval process does not have to calculate this video, which can reduce the amount of calculation in the retrieval process and reduce Consume computing resources and improve retrieval speed.

以电商平台为例，通过记录商品访问量，对于长时间没有流量或流量低于阈值的商品，将其对应的视频数据、关键帧从视频数据库中删除，同时删除关键帧对应的第一特征向量和第一类目。Taking the e-commerce platform as an example, by recording the number of product visits, for products that have no traffic for a long time or whose traffic is below the threshold, the corresponding video data and key frames are deleted from the video database, and the first feature corresponding to the key frame is also deleted. Vector and first category.

综上所述，本公开提供了一种基于图片检索视频的方法，通过对视频数据进行处理，保留每个视频的编码，还生成了关键帧的第一类目和第一特征向量存储的视频数据中；在用户输入用于检索的图片后，处理图片得到图片的第二类目和第二特征向量，在与第二类目相同的第一类目的范围内进行检索，减少搜索范围，减小了检索过程中的计算量，降低模型应用的计算量消耗和时间成本，以满足对检索实时性要求较高的场景要求。To sum up, the present disclosure provides a method for retrieving videos based on pictures. By processing the video data, the encoding of each video is retained, and the first category of key frames and the video stored in the first feature vector are generated. In the data; after the user inputs the image for retrieval, the image is processed to obtain the second category and the second feature vector of the image, and the search is performed within the same first category range as the second category to reduce the search scope. The calculation amount during the retrieval process is reduced, and the calculation consumption and time cost of model application are reduced to meet the requirements of scenarios that require high real-time retrieval.

本公开具体实施方式提供另一种基于图片检索视频的方法，具体执行步骤为：Specific embodiments of the present disclosure provide another method for retrieving videos based on images. The specific execution steps are:

步骤S101，获取多个视频数据。Step S101, obtain multiple video data.

步骤S102，将每个视频数据储存在分布式文件数据库中，并对每个视频数据编码。Step S102, store each video data in the distributed file database, and encode each video data.

步骤S103，依次获取多个视频数据中每个视频数据的关键帧。Step S103: Obtain key frames of each video data in multiple video data in sequence.

步骤S104，判定每组关键帧所属的类目，并作为第一类目存储在视频数据库中。Step S104: Determine the category to which each group of key frames belongs, and store it in the video database as the first category.

步骤S105，基于预训练的VGG16模型，计算每组关键帧的图片特征向量，并作为第一特征向量存储在视频数据库。Step S105: Based on the pre-trained VGG16 model, the picture feature vector of each group of key frames is calculated and stored in the video database as the first feature vector.

步骤S106，获取图片数据。Step S106: Obtain picture data.

步骤S107，基于图片分类算法，判断图片数据的类目，并存储为第二类目。Step S107: Based on the image classification algorithm, determine the category of the image data and store it as the second category.

步骤S108，基于预训练的VGG16模型，计算图片数据的图片特征向量，储存为第二特征向量。Step S108: Calculate the image feature vector of the image data based on the pre-trained VGG16 model, and store it as a second feature vector.

步骤S109，在视频数据库中的多个第一类目中查找与第二类目相同的第一类目，确认为查询类目。Step S109: Search for a first category that is the same as the second category among multiple first categories in the video database and confirm it as the query category.

步骤S1010，计算第二特征向量与查询类目对应的多个第一特征向量之间的相似度。Step S1010: Calculate the similarity between the second feature vector and multiple first feature vectors corresponding to the query category.

步骤S1011，对相似度进行排序，生成与相似度对应的视频列表。Step S1011, sort the similarities and generate a video list corresponding to the similarities.

需要说明的是，本公开实施方式的方法可以由单个设备执行，例如一台计算机或服务器等。本实施方式的方法也可以应用于分布式场景下，由多台设备相互配合来完成。在这种分布式场景的情况下，这多台设备中的一台设备可以只执行本公开实施方式的方法中的某一个或多个步骤，这多台设备相互之间会进行交互以完成所述的方法。It should be noted that the method of the embodiment of the present disclosure can be executed by a single device, such as a computer or server. The method of this embodiment can also be applied in a distributed scenario, and is completed by multiple devices cooperating with each other. In this distributed scenario, one of the multiple devices can only perform one or more steps in the method of the disclosed embodiment, and the multiple devices will interact with each other to complete all the steps. method described.

需要说明的是，上述对本公开的一些实施方式进行了描述。其它实施方式在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于上述实施方式中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that some embodiments of the present disclosure have been described above. Other implementations are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the above-described embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

基于同一发明构思，与上述任意实施方式方法相对应的，本公开还提供了一种基于图片检索视频的系统。Based on the same inventive concept, corresponding to any of the above implementation methods, the present disclosure also provides a system for retrieving videos based on pictures.

参考图6，一种基于图片检索视频的系统，包括：Referring to Figure 6, a system for retrieving videos based on images includes:

第一获取模块100，用于获取多个视频数据；The first acquisition module 100 is used to acquire multiple video data;

第一处理模块200，用于对多个所述视频数据进行处理，生成每个视频数据的编码及其关键帧的第一类目和第一特征向量，并将生成的多个第一类目和多个第一特征向量存储在视频数据库中；The first processing module 200 is used to process a plurality of the video data, generate a first category and a first feature vector of the encoding of each video data and its key frames, and combine the generated multiple first categories and a plurality of first feature vectors stored in the video database;

第二获取模块300，用于获取图片数据；The second acquisition module 300 is used to acquire image data;

第二处理模块400，用于对所述图片数据进行处理，生成所述图片数据的第二类目和第二特征向量；The second processing module 400 is used to process the image data and generate a second category and a second feature vector of the image data;

查找模块500，用于在所述视频数据库中查找与所述第二类目相同的所述第一类目，确认为查询类目；The search module 500 is used to search the first category that is the same as the second category in the video database and confirm it as a query category;

计算模块600，用于计算所述第二特征向量与所述查询类目对应的多个第一特征向量之间的相似度；The calculation module 600 is used to calculate the similarity between the second feature vector and a plurality of first feature vectors corresponding to the query category;

生成模块700，用于对所述相似度进行排序，生成与所述相似度对应的视频列表。The generation module 700 is used to sort the similarities and generate a video list corresponding to the similarities.

在一个可行的实施方式中，基于图片检索视频的系统中，还包括：In a feasible implementation, the system for retrieving videos based on images also includes:

第三获取模块800，在所述视频数据库中，获取多个所述视频数据在设定时间内的单个访问量。The third acquisition module 800 is to acquire, in the video database, a single number of visits to a plurality of the video data within a set time.

响应模块900，响应于所述单个访问量小于阈值，删除所述单个访问量对应的所述视频数据及其关键帧的第一类目和第一特征向量。The response module 900, in response to the single visit being less than a threshold, deletes the first category and first feature vector of the video data and its key frames corresponding to the single visit.

为了描述的方便，描述以上装置时以功能分为各种模块分别描述。当然，在实施本公开时可以把各模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various modules and described separately. Of course, when implementing the present disclosure, the functions of each module can be implemented in the same or multiple software and/or hardware.

上述实施方式的装置用于实现前述任一实施方式中相应的基于图片检索视频的方法，并且具有相应的方法实施方式的有益效果，在此不再赘述。The devices in the above embodiments are used to implement the corresponding image-based video retrieval method in any of the foregoing embodiments, and have the beneficial effects of the corresponding method implementations, which will not be described again here.

基于同一发明构思，与上述任意实施方式方法相对应的，本公开还提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上任意一实施方式所述的基于图片检索视频的方法。Based on the same inventive concept, corresponding to any of the above implementation methods, the present disclosure also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor When the program is executed, the method for retrieving videos based on pictures described in any of the above embodiments is implemented.

图7示出了本实施方式所提供的一种更为具体的计算机设备硬件结构示意图，该设备可以包括：处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。Figure 7 shows a more specific hardware structure diagram of a computer device provided in this embodiment. The device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040 and a bus 1050. The processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.

处理器1010可以采用通用的CPU(Central Processing Unit，中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本说明书实施方式所提供的技术方案。The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related tasks. program to implement the technical solution provided by the implementation of this specification.

存储器1020可以采用ROM(Read Only Memory，只读存储器)、RAM(Random AccessMemory，随机存取存储器)、静态存储设备，动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施方式所提供的技术方案时，相关的程序代码保存在存储器1020中，并由处理器1010来调用执行。The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store operating systems and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program codes are stored in the memory 1020 and called and executed by the processor 1010.

输入/输出接口1030用于连接输入/输出模块，以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出)，也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等，输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.

通信接口1040用于连接通信模块(图中未示出)，以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

总线1050包括一通路，在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).

需要说明的是，尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050，但是在具体实施过程中，该设备还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述设备中也可以仅包含实现本说明书实施方式方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.

上述实施方式的计算机设备用于实现前述任一实施方式中相应的基于图片检索视频的方法，并且具有相应的方法实施方式的有益效果，在此不再赘述。The computer equipment in the above embodiments is used to implement the corresponding image-based video retrieval method in any of the above embodiments, and has the beneficial effects of the corresponding method implementations, which will not be described again here.

基于同一发明构思，与上述任意实施方式方法相对应的，本公开还提供了一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令用于使所述计算机执行如上任一实施方式所述的基于图片检索视频的方法。Based on the same inventive concept, corresponding to any of the above implementation methods, the present disclosure also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions use To enable the computer to execute the method for retrieving videos based on pictures as described in any of the above embodiments.

本实施方式的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。The computer-readable media of this embodiment include permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.

上述实施方式的存储介质存储的计算机指令用于使所述计算机执行如上任一实施方式所述的基于图片检索视频的方法，并且具有相应的方法实施方式的有益效果，在此不再赘述。The computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the method for retrieving videos based on pictures as described in any of the above embodiments, and have the beneficial effects of the corresponding method implementations, which will not be described again here.

所属领域的普通技术人员应当理解：以上任何实施方式的讨论仅为示例性的，并非旨在暗示本公开的范围(包括权利要求)被限于这些例子；说明书的这种叙述方式仅仅是为清楚起见，本领域技术人员应当将说明书作为一个整体，在本公开的思路下，以上实施方式或者不同实施方式中的技术特征之间也可以进行适当组合，步骤可以以任意顺序实现，并存在如上所述的本公开实施方式的不同方面的许多其它变化，为了简明它们没有在细节中提供。Those of ordinary skill in the art will understand that the above discussion of any embodiments is illustrative only and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; this description is provided for clarity only. , those skilled in the art should take the description as a whole. Under the idea of the present disclosure, the technical features in the above embodiments or different embodiments can also be appropriately combined, and the steps can be implemented in any order, and there are as mentioned above There are many other variations in various aspects of the disclosed embodiments, which have not been provided in detail for the sake of clarity.

另外，为简化说明和讨论，并且为了不会使本公开实施方式难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本公开实施方式难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施方式的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本公开的示例性实施方式的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施方式。因此，这些描述应被认为是说明性的而不是限制性的。Additionally, to simplify illustration and discussion, and so as not to obscure embodiments of the present disclosure, well-known power supplies/power supplies with integrated circuit (IC) chips and other components may or may not be shown in the provided figures. Ground connection. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this is also in view of the fact that the details regarding the implementation of these block diagram devices are highly dependent on the manner in which the disclosed embodiments are to be implemented. platform (i.e., these details should be well within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the present disclosure, it will be apparent to those skilled in the art that systems may be constructed without these specific details or with changes in these specific details. The embodiments of the present disclosure are implemented below. Accordingly, these descriptions should be considered illustrative rather than restrictive.

尽管已经结合了本公开的具体实施方式对本公开进行了描述，但是根据前面的描述，这些实施方式的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施方式。Although the present disclosure has been described in connection with specific embodiments thereof, many substitutions, modifications and variations of these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures such as dynamic RAM (DRAM) may use the discussed implementations.

本公开实施方式旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡未脱离本公开实施方式的精神和原则，所做的任何省略、修改、等同替换、改进等，均应包含在本公开的保护范围之内。The disclosed embodiments are intended to embrace all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and principles of the embodiments of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

1. A method of retrieving video based on pictures, the method comprising:

acquiring a plurality of video data;

processing a plurality of video data, generating a first category and a first feature vector of each video data code and key frames thereof, and storing the generated plurality of first categories and plurality of first feature vectors in a video database;

acquiring picture data;

processing the picture data to generate a second category and a second feature vector of the picture data;

searching a plurality of first categories in the video database, wherein the first categories are the same as the second categories, and confirming the first categories as query categories;

calculating the similarity between the second feature vector and a plurality of first feature vectors corresponding to the query category;

and sequencing the similarity to generate a video list corresponding to the similarity.

2. The method of claim 1, wherein after storing the generated first plurality of categories and first plurality of feature vectors in the video database, the method further comprises, prior to obtaining the picture data:

acquiring single access amounts of a plurality of video data in a set time in the video database;

and deleting the first category and the first feature vector of the video data and the key frames thereof corresponding to the single access amount in response to the single access amount being smaller than a threshold value.

3. The method of picture-based video retrieval according to claim 1, wherein said processing the plurality of video data to generate a first category and a first feature vector for each video data encoding and key frames thereof, and storing the generated plurality of first categories and plurality of first feature vectors in a video database comprises:

storing each video data in a distributed file database and encoding each video data;

sequentially acquiring key frames of each video data in a plurality of video data;

judging the category of each group of the key frames, and storing the category as the first category in the video database;

based on a pretrained VGG16 model, picture feature vectors of each group of key frames are calculated and stored in the video database as the first feature vectors.

4. A method of retrieving video based on pictures as defined in claim 3 wherein said sequentially capturing key frames of each of a plurality of said video data further comprises:

acquiring an MPEG-4 standard table of the video data, and judging whether the MPEG-4 standard table comprises a stss part or not;

in response to the standard table including the stss portion, determining that the frame identified by the stss portion is a key frame;

in response to the standard table not including the stss part, calculating the similarity between the video frames by using an edge detection calculation method, and generating a plurality of similar frames;

clustering a plurality of similar frames to generate a similar frame set;

and processing the similar frame sets by using a picture quality evaluation algorithm, and selecting the optimal frame in each similar frame set as a key frame.

5. A method of retrieving video based on pictures according to claim 3, wherein said determining the category to which each of said key frames belongs and storing as said first category in said video database comprises:

and judging the category of each group of key frames based on a picture classification algorithm, and storing the category as the first category in the video database.

6. The method for retrieving video based on pictures as recited in claim 4 wherein prior to said obtaining said MPEG-4 standard of video data, further comprising:

framing the video data to obtain an independent video frame;

preprocessing the video frame to obtain an optimized video frame; the preprocessing includes noise reduction and/or size unification.

7. The method of retrieving video based on pictures of claim 1, wherein said processing the picture data to generate a second category and a second feature vector of the picture data comprises:

judging the category of the picture data based on a picture classification algorithm, and storing the category as the second category;

and calculating a picture feature vector of the picture data based on a pretrained VGG16 model, and storing the picture feature vector as the second feature vector.

8. A system for retrieving video based on pictures, characterized in that,

the first acquisition module is used for acquiring a plurality of video data;

the first processing module is used for processing a plurality of video data, generating a first category and a first feature vector of each video data code and key frames thereof, and storing the generated plurality of first categories and the generated plurality of first feature vectors in a video database;

the second acquisition module is used for acquiring the picture data;

the second processing module is used for processing the picture data and generating a second category and a second feature vector of the picture data;

the searching module is used for searching the first category which is the same as the second category in the video database and confirming the first category as a query category;

the computing module is used for computing the similarity between the second feature vector and a plurality of first feature vectors corresponding to the query category;

and the generation module is used for sequencing the similarity and generating a video list corresponding to the similarity.

9. A computer device, comprising: a processor and a memory;

the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the picture-based video retrieval method of any one of claims 1 to 7.