CN118093938B

CN118093938B - Video query retrieval method and system based on semantic depth model

Info

Publication number: CN118093938B
Application number: CN202410198615.3A
Authority: CN
Inventors: 饶建华; 谌志华; 林正浩; 洪钰; 孔飞; 饶珺
Original assignee: Jiangxi Tianpeng Technology Development Co ltd
Current assignee: Jiangxi Tianpeng Technology Development Co ltd
Priority date: 2024-02-22
Filing date: 2024-02-22
Publication date: 2024-09-13
Anticipated expiration: 2044-02-22
Also published as: CN118093938A

Abstract

The invention discloses a video query retrieval method and a system based on a semantic depth model, and belongs to the technical field of information retrieval. In order to solve the problem of poor accuracy and correlation of a retrieval result of retrieving videos, rich and comprehensive feature information can be extracted from the videos through a deep learning technology, video contents can be described more accurately, the extracted video features are mapped into a semantic space through constructing a deep neural network model, semantic vector representations are generated, conversion from image features to semantic understanding is effectively achieved, accuracy and efficiency of video retrieval are remarkably improved, semantically most similar videos are rapidly retrieved in a video library through calculating similarity between two semantic vectors according to query conditions input by a user, and more friendly and practical query experience is provided for the user according to similarity sequencing, so that the method can be flexibly applied to various types of video libraries and different query conditions, and has good expandability and adaptability.

Description

Video query retrieval method and system based on semantic deep model

技术领域Technical Field

本发明涉及信息检索技术领域，特别涉及基于语义深度模型的视频查询检索方法及系统。The present invention relates to the field of information retrieval technology, and in particular to a video query retrieval method and system based on a semantic depth model.

背景技术Background Art

随着视频数据的爆炸式增长，视频检索已成为一项重要的技术需求。传统的视频检索方法通常基于颜色、纹理、形状等视觉特征进行相似度匹配。然而，这些方法难以在语义层面理解视频内容，导致检索结果的不准确和不相关。With the explosive growth of video data, video retrieval has become an important technical requirement. Traditional video retrieval methods usually perform similarity matching based on visual features such as color, texture, and shape. However, these methods have difficulty understanding video content at the semantic level, resulting in inaccurate and irrelevant retrieval results.

在现有技术下往往还存在以下缺陷：The prior art often still has the following defects:

传统的视频检索方法通常只利用视频中的部分特征，如颜色、纹理和形状等，而忽略了其他重要的特征信息，如场景、人物和动作等，这种不全面的特征提取方式可能导致检索结果的不准确和不相关。同时传统的视频检索方法通常只关注视频的视觉特征，而忽略了视频的语义信息。由于缺乏对视频内容的深入理解，这些方法难以在语义层面进行准确的相似度匹配。Traditional video retrieval methods usually only use some features in the video, such as color, texture, and shape, while ignoring other important feature information, such as scenes, characters, and actions. This incomplete feature extraction method may lead to inaccurate and irrelevant retrieval results. At the same time, traditional video retrieval methods usually only focus on the visual features of the video, while ignoring the semantic information of the video. Due to the lack of in-depth understanding of the video content, these methods are difficult to perform accurate similarity matching at the semantic level.

发明内容Summary of the invention

本发明的目的在于提供基于语义深度模型的视频查询检索方法及系统，以解决上述背景技术中提出的问题。The purpose of the present invention is to provide a video query retrieval method and system based on a semantic depth model to solve the problems raised in the above background technology.

为实现上述目的，本发明提供如下技术方案：基于语义深度模型的视频查询检索方法，包括以下步骤：To achieve the above object, the present invention provides the following technical solution: a video query retrieval method based on a semantic depth model, comprising the following steps:

步骤一：利用深度学习技术，从视频中提取出丰富的特征信息，所述特征信息包括颜色、纹理、形状、运动；Step 1: Use deep learning technology to extract rich feature information from the video, including color, texture, shape, and motion;

步骤二：构建深度神经网络模型，将提取出的视频特征映射到语义空间中，生成语义向量；Step 2: Build a deep neural network model to map the extracted video features into the semantic space and generate a semantic vector;

步骤三：通过计算两个语义向量之间的相似度，衡量两个视频在语义层面的相似程度；Step 3: Measure the similarity between the two videos at the semantic level by calculating the similarity between the two semantic vectors;

步骤四：根据用户输入的查询条件，在视频库中检索出语义上最相似的视频，首先对用户输入的查询条件进行语义向量表示，随后在视频库中寻找与该语义向量最相似的视频；Step 4: According to the query condition input by the user, the most semantically similar video is retrieved from the video library. First, the query condition input by the user is represented by a semantic vector, and then the video most similar to the semantic vector is searched in the video library;

步骤五：根据相似度计算结果，对检索出的视频进行排序，将最符合用户需求的视频排在前面。Step 5: Sort the retrieved videos based on the similarity calculation results, and put the videos that best meet the user's needs at the front.

进一步的，在步骤四中，对用户输入的查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算，并根据相似度阈值对视频库中每个视频相似度进行比对，得到与用户输入查询条件的语义向量最相似的视频。Furthermore, in step four, the similarity between the semantic vector of the query condition input by the user and the semantic vector of each video in the video library is calculated, and the similarity of each video in the video library is compared according to the similarity threshold to obtain the video that is most similar to the semantic vector of the query condition input by the user.

进一步的，本发明提供了基于语义深度模型的视频查询检索系统，应用在上述的基于语义深度模型的视频查询检索方法中，包括：Furthermore, the present invention provides a video query retrieval system based on a semantic depth model, which is applied in the above-mentioned video query retrieval method based on a semantic depth model, and comprises:

特征提取模块，用于：Feature extraction module for:

利用深度学习技术对输入的视频进行深度分析，从输入的视频中提取颜色、纹理、形状、运动特征信息；Use deep learning technology to deeply analyze the input video and extract color, texture, shape, and motion feature information from the input video;

语义建模模块，用于：Semantic modeling module for:

构建深度神经网络模型，将视频特征映射到语义空间中，捕捉视频内容的时序信息和上下文关系，生成语义向量表示；Build a deep neural network model to map video features into semantic space, capture the temporal information and contextual relationship of video content, and generate semantic vector representation;

相似度计算模块，用于：Similarity calculation module, used for:

通过计算两个语义向量之间的相似度，衡量两个视频在语义层面的相似程度；By calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level is measured;

视频检索模块，用于：Video retrieval module for:

根据用户输入的查询条件在视频库中检索出语义上最相似的视频，对用户输入的查询条件进行语义向量表示，随后在视频库中寻找与该语义向量最相似的视频；Retrieve the most semantically similar videos from the video library according to the query conditions input by the user, represent the query conditions input by the user with a semantic vector, and then search for the video library with the most similar video to the semantic vector;

结果排序模块，用于：Result sorting module, used to:

根据相似度计算结果，对检索出的视频进行排序，调整最符合用户需求的视频排序至最前方。According to the similarity calculation results, the retrieved videos are sorted and the video that best meets the user's needs is adjusted to the front.

进一步的，所述特征提取模块包括：Furthermore, the feature extraction module includes:

特征提取单元，用于：Feature extraction unit, used to:

使用CNN中的特定层来检测和提取视频中的颜色分布、颜色直方图信息，通过CNN分析视频中的纹理模式和结构，对视频中的物体进行轮廓检测和形状识别，结合光流法技术检测视频中的运动模式和动态变化；Use specific layers in CNN to detect and extract color distribution and color histogram information in the video, analyze texture patterns and structures in the video through CNN, perform contour detection and shape recognition on objects in the video, and combine optical flow technology to detect motion patterns and dynamic changes in the video;

特征融合单元，用于：Feature fusion unit, used for:

将提取的颜色、纹理、形状及运动特征进行整合，将各个特征有效地结合起来形成统一的特征向量；Integrate the extracted color, texture, shape and motion features, and effectively combine each feature to form a unified feature vector;

特征标准化单元，用于：Feature normalization unit, used to:

对提取的特征进行标准化处理。The extracted features are normalized.

进一步的，所述语义建模模块包括：Furthermore, the semantic modeling module includes:

时序信息捕获单元，用于：Timing information capture unit, used to:

使用RNN来捕捉视频中的时序信息，分析视频中连续帧之间的时间和空间关系，捕捉动态变化和连续动作；Use RNN to capture the timing information in the video, analyze the temporal and spatial relationship between consecutive frames in the video, and capture dynamic changes and continuous actions;

上下文关系建模单元，用于：Contextual relationship modeling unit, used to:

利用Transformer架构中的自注意力机制来建模视频内容的上下文关系；Use the self-attention mechanism in the Transformer architecture to model the contextual relationship of video content;

特征融合与语义转换单元，用于：Feature fusion and semantic conversion unit, used for:

将从时序信息捕获单元和上下文关系建模单元中获取的信息进行融合，将融合后的特征转换为高层次的语义向量表示；The information obtained from the temporal information capture unit and the contextual relationship modeling unit is fused, and the fused features are converted into a high-level semantic vector representation;

语义向量标准化单元，用于：Semantic vector normalization unit, used to:

对生成的语义向量进行标准化处理。The generated semantic vectors are normalized.

进一步的，所述相似度计算模块包括：Furthermore, the similarity calculation module includes:

语义向量输入单元，用于：Semantic vector input unit, used for:

接收来自语义建模模块的语义向量，对输入的语义向量进行格式化和预处理；Receive the semantic vector from the semantic modeling module, and format and preprocess the input semantic vector;

相似度度量单元，用于：Similarity measurement unit, used to:

根据具体需求选择适合的相似度度量方法，如余弦相似度或欧氏距离，对输入的两个语义向量进行相似度计算，得到一个数值作为相似度的度量；Select a suitable similarity measurement method according to specific needs, such as cosine similarity or Euclidean distance, calculate the similarity of the two input semantic vectors, and obtain a value as a measure of similarity;

阈值设定单元，用于：A threshold setting unit, for:

设定相似度阈值，根据实际检索效果和用户反馈，动态调整相似度阈值。Set the similarity threshold and dynamically adjust the similarity threshold based on actual retrieval results and user feedback.

进一步的，阈值设定单元，包括：Furthermore, the threshold setting unit includes:

阈值初始值提取模块，用于提取相似度阈值初始值；A threshold initial value extraction module is used to extract the initial value of the similarity threshold;

周期时长设置模块，用于设置相似度阈值的观测周期时长；其中，所述相似度阈值的观测周期时长包括30-90个单位时长，并且，所述单位时长为24h；A cycle duration setting module, used to set the observation cycle duration of the similarity threshold; wherein the observation cycle duration of the similarity threshold includes 30-90 unit durations, and the unit duration is 24 hours;

比较结果提取模块，用于针对每个观测周期提取观测周期内所有的两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果；A comparison result extraction module is used to extract, for each observation period, the comparison result between the similarity values corresponding to all two semantic vectors in the observation period and the initial similarity threshold;

真实相似度数值提取模块，用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时，调取实际检索效果和用户反馈中所包含的所述两个语义向量的真实相似度数值；A real similarity value extraction module, used for retrieving the real similarity values of the two semantic vectors contained in the actual retrieval effect and user feedback when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold value indicates that the similarity values of the two semantic vectors are lower than the corresponding similarity threshold value;

阈值调整判断模块，用于利用所述真实相似度数值和两个语义向量对应的相似度数值之间的差异判断是否需要对所述相似度初始阈值进行调整；A threshold adjustment judgment module, used to judge whether the initial similarity threshold needs to be adjusted by using the difference between the real similarity value and the similarity values corresponding to the two semantic vectors;

调整执行模块，用于当确定需要对所述相似度初始阈值进行调整时，利用阈值调整模型对所述相似度初始阈值进行调整；其中，所述阈值调整模型结构如下：The adjustment execution module is used to adjust the initial similarity threshold using a threshold adjustment model when it is determined that the initial similarity threshold needs to be adjusted; wherein the structure of the threshold adjustment model is as follows:

其中，S_yt表示相似度初始阈值进行调整后对应的相似度阈值；S_wy表示预设的判定阈值；S_q表示相似度判定值；k表示观测周期的总个数；S_wj表示第j个观测周期的相似度判断稳定性参数；S_01j和S_02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量；S_y表示相似度阈值；λ表示比例参数。Among them, _Syt represents the similarity threshold corresponding to the adjustment of the initial similarity threshold; _Swy represents the preset judgment threshold; _Sq represents the similarity judgment value; k represents the total number of observation cycles; _Swj represents the similarity judgment stability parameter of the j-th observation cycle; _S01j and _S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation cycle respectively; _Sy represents the similarity threshold; λ represents the ratio parameter.

进一步的，阈值调整判断模块，包括：Furthermore, the threshold adjustment judgment module includes:

组数提取模块，用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时，提取每个观测周期中的低于对应的相似度阈值的两个语义向量所组成的组数；A group number extraction module is used to extract the number of groups consisting of two semantic vectors below the corresponding similarity threshold in each observation period when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold indicates that the similarity values of the two semantic vectors are below the corresponding similarity threshold;

数值调取模块，用于调取每个观测周期中的每组低于对应的相似度阈值的两个语义向量对应的相似度数值；A value retrieving module, used for retrieving the similarity value corresponding to each group of two semantic vectors below the corresponding similarity threshold in each observation period;

稳定参数获取模块，用于利用所述每个观测周期中的低于对应的相似度阈值的两个语义向量对应的相似度数值获取所述每个观测周期所包含的相似度判断稳定性参数；其中，所述相似度判断稳定性参数通过如下公式获取：The stability parameter acquisition module is used to obtain the similarity judgment stability parameter contained in each observation period by using the similarity values corresponding to the two semantic vectors below the corresponding similarity threshold in each observation period; wherein the similarity judgment stability parameter is obtained by the following formula:

其中，S_w表示相似度判断稳定性参数；n表示低于对应的相似度阈值的两个语义向量对应的组数；S₀₁和S₀₂分别表示第一相似度稳定性参量和第二相似度稳定性参量；S_i表示第i组低于对应的相似度阈值的两个语义向量对应的相似度数值；S_y表示相似度阈值；m表示不低于对应的相似度阈值的两个语义向量对应的组数；S_zi表示低于对应的相似度阈值的两个语义向量对应的真实相似度数值；S_j表示第j个不低于对应的相似度阈值的两个语义向量对应的相似度数值；Wherein, _Sw represents the similarity judgment stability parameter; n represents the number of groups corresponding to two semantic vectors below the corresponding similarity threshold; _S01 and _S02 represent the first similarity stability parameter and the second similarity stability parameter respectively; _Si represents the similarity value corresponding to the i-th group of two semantic vectors below the corresponding similarity threshold; _Sy represents the similarity threshold; m represents the number of groups corresponding to two semantic vectors not lower than the corresponding similarity threshold; _Szi represents the true similarity value corresponding to the two semantic vectors below the corresponding similarity threshold; _Sj represents the similarity value corresponding to the j-th two semantic vectors not lower than the corresponding similarity threshold;

综合参数获取模块，用于将所述每个观测周期所包含的相似度判断稳定性参数进行整合，生成综合相似度判定值；其中，所述相似度判定值通过如下公式获取：The comprehensive parameter acquisition module is used to integrate the similarity judgment stability parameters contained in each observation period to generate a comprehensive similarity judgment value; wherein the similarity judgment value is obtained by the following formula:

其中，S_q表示相似度判定值；k表示观测周期的总个数；S_wj表示第j个观测周期的相似度判断稳定性参数；S_01j和S_02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量；Wherein, _Sq represents the similarity judgment value; k represents the total number of observation periods; _Swj represents the similarity judgment stability parameter of the j-th observation period; _S01j and _S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation period, respectively;

阈值调整执行模块，用于当所述相似度判定值低于预设的判定阈值时，则确定为需要对所述相似度初始阈值进行调整。The threshold adjustment execution module is used to determine that the initial similarity threshold needs to be adjusted when the similarity determination value is lower than a preset determination threshold.

进一步的，所述视频检索模块包括：Furthermore, the video retrieval module includes:

查询条件处理单元，用于：The query condition processing unit is used to:

接收用户输入的查询条件，对关键词、场景、人物等信息进行提取并转换为语义向量表示；Receive the query conditions entered by the user, extract keywords, scenes, characters and other information, and convert them into semantic vector representation;

视频库索引单元，用于：Video library index unit, used to:

建立视频库的索引结构，通过与特征提取模块和语义建模模块交互对视频库中的视频进行预处理，为每个视频生成对应的语义向量表示；Establish an index structure for the video library, pre-process the videos in the video library by interacting with the feature extraction module and the semantic modeling module, and generate a corresponding semantic vector representation for each video;

检索匹配单元，用于：Retrieve matching unit, used to:

通过与相似度计算模块交互对查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算，根据相似度阈值筛选出与用户查询条件最为相关的视频。By interacting with the similarity calculation module, the similarity between the semantic vector of the query condition and the semantic vector of each video in the video library is calculated, and the videos most relevant to the user's query condition are screened out according to the similarity threshold.

进一步的，所述结果排序模块包括：Furthermore, the result sorting module includes:

结果接收单元，用于：The result receiving unit is used for:

接收相似度计算模块输出的相似度值；Receiving a similarity value output by a similarity calculation module;

排序应用单元，用于：Sequencing application unit for:

根据相似度计算结果对视频进行排序；Sort the videos according to the similarity calculation results;

输出展示单元，用于：Output display unit for:

将排序后的视频检索结果返回给用户。The sorted video retrieval results are returned to the user.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the present invention has the following beneficial effects:

1.本发明通过深度学习技术能够从视频中提取丰富且全面的特征信息，包括颜色、纹理、形状和运动等，这些特征能够更精确地描述视频内容，为后续的语义建模提供了坚实的基础。通过构建深度神经网络模型，将提取的视频特征映射到语义空间中，生成语义向量表示，有效地实现了从图像特征到语义理解的转化，显著提高了视频检索的准确性和效率。1. The present invention can extract rich and comprehensive feature information from videos through deep learning technology, including color, texture, shape and motion, etc. These features can more accurately describe the video content and provide a solid foundation for subsequent semantic modeling. By building a deep neural network model, the extracted video features are mapped into the semantic space and a semantic vector representation is generated, which effectively realizes the transformation from image features to semantic understanding and significantly improves the accuracy and efficiency of video retrieval.

2.本发明通过计算两个语义向量之间的相似度，能够精确衡量两个视频在语义层面的相似程度，为后续的视频检索提供了精确的匹配依据，根据用户输入的查询条件，在视频库中快速检索出语义上最相似的视频，并根据相似度排序，这种个性化排序将最符合用户需求的视频排在前面，为用户提供了更加友好和实用的查询体验，可以灵活地应用于各种类型的视频库和不同的查询条件，具有良好的可扩展性和适应性。2. The present invention can accurately measure the similarity between two videos at the semantic level by calculating the similarity between two semantic vectors, thereby providing an accurate matching basis for subsequent video retrieval. According to the query conditions input by the user, the most semantically similar videos are quickly retrieved from the video library and sorted according to the similarity. This personalized sorting puts the videos that best meet the user's needs at the front, providing users with a more friendly and practical query experience. It can be flexibly applied to various types of video libraries and different query conditions, and has good scalability and adaptability.

3.本发明的特征提取模块及语义建模模块能够全面提取视频中的颜色、纹理、形状和运动特征，从而更准确地描述视频内容，相似度计算模块通过计算两个语义向量之间的相似度，能够精确衡量视频之间的语义相似度，提高检索的准确性和效率，视频检索模块根据用户查询条件，该系统能够在视频库中快速检索出语义上最相似的视频，并根据相似度进行排序，结果排序模块的个性化排序为用户提供了更加友好和实用的查询体验。3. The feature extraction module and semantic modeling module of the present invention can comprehensively extract the color, texture, shape and motion features in the video, so as to more accurately describe the video content. The similarity calculation module can accurately measure the semantic similarity between videos by calculating the similarity between two semantic vectors, thereby improving the accuracy and efficiency of retrieval. The video retrieval module is based on the user's query conditions. The system can quickly retrieve the most semantically similar videos in the video library and sort them according to the similarity. The personalized sorting of the result sorting module provides users with a more friendly and practical query experience.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明基于语义深度模型的视频查询检索方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a video query retrieval method based on a semantic depth model of the present invention;

图2为本发明视频查询检索系统的模块示意图。FIG. 2 is a schematic diagram of the modules of the video query and retrieval system of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

为了解决对于视频进行检索的检索结果准确度和相关性较差的技术问题，请参阅图1，本发明提供以下技术方案：In order to solve the technical problem of poor accuracy and relevance of retrieval results for video retrieval, please refer to FIG. 1 , the present invention provides the following technical solutions:

基于语义深度模型的视频查询检索方法，包括以下步骤：The video query retrieval method based on the semantic depth model includes the following steps:

步骤一：利用深度学习技术，从视频中提取出丰富的特征信息，所述特征信息包括颜色、纹理、形状、运动，这些特征信息能够全面地描述视频内容，为后续的语义建模提供基础数据；Step 1: Use deep learning technology to extract rich feature information from the video, including color, texture, shape, and motion. These feature information can fully describe the video content and provide basic data for subsequent semantic modeling;

步骤二：构建一个深度神经网络模型，将提取出的视频特征映射到语义空间中，生成语义向量表示，这一步是将图像特征转化为语义理解的桥梁，对于提高视频检索的准确性和效率至关重要；Step 2: Build a deep neural network model to map the extracted video features into the semantic space and generate a semantic vector representation. This step is the bridge from image features to semantic understanding and is crucial to improving the accuracy and efficiency of video retrieval.

步骤三：通过计算两个语义向量之间的相似度，衡量两个视频在语义层面的相似程度，为后续的视频检索提供精确的匹配依据；Step 3: By calculating the similarity between the two semantic vectors, the similarity between the two videos at the semantic level is measured, providing an accurate matching basis for subsequent video retrieval;

步骤五：根据相似度计算结果，对检索出的视频进行排序，将最符合用户需求的视频排在前面，这一步能够为用户提供更加友好的查询体验，提高视频检索的实用性。Step 5: Sort the retrieved videos according to the similarity calculation results, and put the videos that best meet the user's needs at the front. This step can provide users with a more friendly query experience and improve the practicality of video retrieval.

在上述实施例中，对用户输入的查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算，并根据相似度阈值对视频库中每个视频相似度进行比对，得到与用户输入查询条件的语义向量最相似的视频。In the above embodiment, the similarity between the semantic vector of the query condition input by the user and the semantic vector of each video in the video library is calculated, and the similarity of each video in the video library is compared according to the similarity threshold to obtain the video that is most similar to the semantic vector of the query condition input by the user.

具体的，通过深度学习技术能够从视频中提取丰富且全面的特征信息，包括颜色、纹理、形状和运动等，这些特征能够更精确地描述视频内容，为后续的语义建模提供了坚实的基础。通过构建深度神经网络模型，将提取的视频特征映射到语义空间中，生成语义向量表示，有效地实现了从图像特征到语义理解的转化，显著提高了视频检索的准确性和效率。Specifically, deep learning technology can extract rich and comprehensive feature information from videos, including color, texture, shape, and motion. These features can more accurately describe the video content and provide a solid foundation for subsequent semantic modeling. By building a deep neural network model, the extracted video features are mapped into the semantic space and a semantic vector representation is generated, which effectively realizes the transformation from image features to semantic understanding and significantly improves the accuracy and efficiency of video retrieval.

在上述实施例中，通过计算两个语义向量之间的相似度，能够精确衡量两个视频在语义层面的相似程度，为后续的视频检索提供了精确的匹配依据。In the above embodiment, by calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level can be accurately measured, providing an accurate matching basis for subsequent video retrieval.

在上述实施例中，该方法能够根据用户输入的查询条件，在视频库中快速检索出语义上最相似的视频，并根据相似度排序，这种个性化排序将最符合用户需求的视频排在前面，为用户提供了更加友好和实用的查询体验。该方法可以灵活地应用于各种类型的视频库和不同的查询条件，具有良好的可扩展性和适应性。In the above embodiment, the method can quickly retrieve the most semantically similar videos in the video library according to the query conditions input by the user, and sort them according to the similarity. This personalized sorting puts the videos that best meet the user's needs at the front, providing the user with a more friendly and practical query experience. The method can be flexibly applied to various types of video libraries and different query conditions, and has good scalability and adaptability.

请参阅图2，基于语义深度模型的视频查询检索系统，包括：Please refer to FIG. 2 , a video query retrieval system based on a semantic deep model includes:

特征提取模块，用于：Feature extraction module for:

语义建模模块，用于：Semantic modeling module for:

构建深度神经网络模型，将视频特征映射到语义空间中，捕捉视频内容的时序信息和上下文关系，生成语义向量表示，为后续的视频检索提供高质量的语义数据；Build a deep neural network model to map video features into semantic space, capture the temporal information and contextual relationship of video content, generate semantic vector representation, and provide high-quality semantic data for subsequent video retrieval;

相似度计算模块，用于：Similarity calculation module, used for:

通过计算两个语义向量之间的相似度，衡量两个视频在语义层面的相似程度，采用余弦相似度、欧氏距离度量方式，为视频检索提供精确的匹配依据，确保检索结果与用户查询条件高度相关；By calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level is measured. The cosine similarity and Euclidean distance measurement methods are used to provide accurate matching basis for video retrieval, ensuring that the retrieval results are highly relevant to the user's query conditions.

视频检索模块，用于：Video retrieval module for:

结果排序模块，用于：Result sorting module, used to:

根据相似度计算结果，对检索出的视频进行排序，调整最符合用户需求的视频排序至最前方，提供清晰、有序的结果展示，方便用户快速定位和选择所需视频。According to the similarity calculation results, the retrieved videos are sorted, and the videos that best meet the user's needs are adjusted to the front, providing a clear and orderly result display to facilitate users to quickly locate and select the required videos.

具体的，特征提取模块及语义建模模块能够全面提取视频中的颜色、纹理、形状和运动特征，从而更准确地描述视频内容，相似度计算模块通过计算两个语义向量之间的相似度，能够精确衡量视频之间的语义相似度，提高检索的准确性和效率，视频检索模块根据用户查询条件，该系统能够在视频库中快速检索出语义上最相似的视频，并根据相似度进行排序，结果排序模块的个性化排序为用户提供了更加友好和实用的查询体验。Specifically, the feature extraction module and the semantic modeling module can comprehensively extract the color, texture, shape and motion features in the video, thereby more accurately describing the video content. The similarity calculation module can accurately measure the semantic similarity between videos by calculating the similarity between two semantic vectors, thereby improving the accuracy and efficiency of retrieval. The video retrieval module is based on the user's query conditions. The system can quickly retrieve the most semantically similar videos in the video library and sort them according to similarity. The personalized sorting of the result sorting module provides users with a more friendly and practical query experience.

特征提取模块包括：The feature extraction module includes:

特征提取单元，用于：Feature extraction unit, used to:

使用CNN中的特定层来检测和提取视频中的颜色分布、颜色直方图信息，识别视频中的主要颜色、颜色变化等，以反映视频的主题或情感，通过CNN分析视频中的纹理模式和结构，以识别不同的材料、表面质地，对纹理的细致分析有助于理解场景的细节和物体的质感，对视频中的物体进行轮廓检测和形状识别，分析物体的基本形状、大小、方向等，为后续的物体识别提供基础，结合光流法技术检测视频中的运动模式和动态变化，分析物体的运动轨迹、速度、方向等，以理解视频中的动态事件；Use specific layers in CNN to detect and extract color distribution and color histogram information in the video, identify the main colors and color changes in the video to reflect the theme or emotion of the video, analyze the texture patterns and structures in the video through CNN to identify different materials and surface textures. The detailed analysis of texture helps to understand the details of the scene and the texture of the object. Perform contour detection and shape recognition on the objects in the video, analyze the basic shape, size, direction, etc. of the object, and provide a basis for subsequent object recognition. Combined with optical flow technology to detect motion patterns and dynamic changes in the video, analyze the motion trajectory, speed, direction, etc. of the object to understand the dynamic events in the video;

特征融合单元，用于：Feature fusion unit, used for:

将提取的颜色、纹理、形状及运动特征进行整合，将各个特征有效地结合起来形成统一的特征向量，以提供更全面的视频内容描述；Integrate the extracted color, texture, shape and motion features, and effectively combine each feature to form a unified feature vector to provide a more comprehensive description of the video content;

特征标准化单元，用于：Feature normalization unit, used to:

对提取的特征进行标准化处理，确保不同特征之间的可比性和一致性。The extracted features are standardized to ensure the comparability and consistency between different features.

在上述实施例中，通过使用CNN提取颜色、纹理、形状和运动等多种特征，该系统能够提供更丰富、更全面的视频内容描述，有助于更准确地理解视频的主题、情感、物体和动态事件。In the above embodiment, by using CNN to extract multiple features such as color, texture, shape and motion, the system is able to provide a richer and more comprehensive description of the video content, which helps to more accurately understand the subject, emotion, objects and dynamic events of the video.

在上述实施例中，特征融合单元将不同特征有效地结合，形成统一的特征向量，提供了更全面的视频内容描述，特征标准化单元确保了不同特征之间的可比性和一致性，进一步提高了检索的准确性和效率。In the above embodiment, the feature fusion unit effectively combines different features to form a unified feature vector, providing a more comprehensive description of the video content, and the feature standardization unit ensures the comparability and consistency between different features, further improving the accuracy and efficiency of retrieval.

在上述实施例中，结合多种特征的提取和分析，该系统能够更准确地衡量视频之间的相似度，从而提高视频检索的精度，为用户提供了更加精确和相关度高的检索结果。In the above embodiment, by combining the extraction and analysis of multiple features, the system can more accurately measure the similarity between videos, thereby improving the accuracy of video retrieval and providing users with more accurate and relevant retrieval results.

语义建模模块包括：The semantic modeling module includes:

时序信息捕获单元，用于：Timing information capture unit, used to:

利用Transformer架构中的自注意力机制来建模视频内容的上下文关系，通过分析不同视频片段之间的关联和依赖关系，理解视频的整体内容和主题；The self-attention mechanism in the Transformer architecture is used to model the contextual relationship of the video content. By analyzing the associations and dependencies between different video clips, the overall content and theme of the video can be understood.

将从时序信息捕获单元和上下文关系建模单元中获取的信息进行融合，将融合后的特征转换为高层次的语义向量表示，确保语义向量能够全面、准确地反映视频的语义内容；The information obtained from the temporal information capture unit and the contextual relationship modeling unit is fused, and the fused features are converted into high-level semantic vector representations to ensure that the semantic vector can fully and accurately reflect the semantic content of the video;

对生成的语义向量进行标准化处理，以确保不同向量之间的可比性和一致性，有助于提高相似度计算的准确性和检索的效率。The generated semantic vectors are standardized to ensure the comparability and consistency between different vectors, which helps to improve the accuracy of similarity calculation and the efficiency of retrieval.

在上述实施例中，通过使用RNN和Transformer架构，能够有效地捕捉视频中的时序信息和上下文关系，有助于理解视频中的动态变化、连续动作和整体内容主题，提高了对视频内容的深入理解。In the above embodiments, by using the RNN and Transformer architecture, the timing information and contextual relationships in the video can be effectively captured, which helps to understand the dynamic changes, continuous actions and overall content themes in the video, and improves the in-depth understanding of the video content.

在上述实施例中，特征融合与语义转换单元将来自不同单元的信息进行整合，并将其转换为高层次的语义向量表示，确保了语义向量能够全面、准确地反映视频的语义内容，为后续的相似度计算和检索提供了有力的支持。In the above embodiment, the feature fusion and semantic conversion unit integrates the information from different units and converts it into a high-level semantic vector representation, ensuring that the semantic vector can fully and accurately reflect the semantic content of the video, providing strong support for subsequent similarity calculation and retrieval.

相似度计算模块包括：The similarity calculation module includes:

语义向量输入单元，用于：Semantic vector input unit, used for:

接收来自语义建模模块的语义向量，对输入的语义向量进行格式化和预处理，确保它们适用于相似度计算；Receive semantic vectors from the semantic modeling module, format and preprocess the input semantic vectors to ensure that they are suitable for similarity calculation;

相似度度量单元，用于：Similarity measurement unit, used to:

阈值设定单元，用于：A threshold setting unit, for:

设定相似度阈值，根据实际检索效果和用户反馈，动态调整相似度阈值，以确保检索结果的准确性和实用性。Set a similarity threshold and dynamically adjust the similarity threshold based on actual retrieval results and user feedback to ensure the accuracy and practicality of the retrieval results.

在上述实施例中，相似度度量单元对来自语义建模模块的语义向量进行准确度量。这有助于确保相似度计算的准确性和可靠性，提高视频检索的精度。In the above embodiment, the similarity measurement unit accurately measures the semantic vector from the semantic modeling module, which helps to ensure the accuracy and reliability of similarity calculation and improve the accuracy of video retrieval.

在上述实施例中，阈值设定单元允许根据实际检索效果和用户反馈动态调整相似度阈值，有助于平衡视频检索的准确性和召回率，满足不同用户和实际应用的需求。准确的相似度计算和灵活的阈值设定有助于提高视频检索的效率和精度，为用户提供更准确、更相关的检索结果。In the above embodiment, the threshold setting unit allows the similarity threshold to be dynamically adjusted according to the actual retrieval effect and user feedback, which helps to balance the accuracy and recall rate of video retrieval and meet the needs of different users and practical applications. Accurate similarity calculation and flexible threshold setting help improve the efficiency and accuracy of video retrieval and provide users with more accurate and relevant retrieval results.

具体的，阈值设定单元，包括：Specifically, the threshold setting unit includes:

上述技术方案的技术效果为：阈值初始值提取模块能够提取相似度阈值的初始值，为后续的阈值调整提供基础。周期时长设置模块能够设置相似度阈值的观测周期时长，确保阈值调整的及时性和准确性。通过设置合适的观测周期，能够更好地监测语义向量的相似度变化。比较结果提取模块能够针对每个观测周期提取观测周期内所有的两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果。这种比较结果的提取有助于更好地了解语义向量的相似度状态。当比较结果表明两个语义向量的相似度数值低于初始阈值时，真实相似度数值提取模块能够调取实际检索效果和用户反馈中所包含的这两个语义向量的真实相似度数值。这种调取机制能够更好地了解用户的实际需求和反馈，提高语义相似度判断的准确性。阈值调整判断模块利用真实相似度数值和两个语义向量对应的相似度数值之间的差异判断是否需要对相似度初始阈值进行调整。这种判断机制能够根据实际需求和反馈进行动态调整，提高相似度阈值设定的准确性和适应性。当确定需要对相似度初始阈值进行调整时，调整执行模块利用阈值调整模型对相似度初始阈值进行调整。这种调整机制能够根据实际情况进行动态调整，确保相似度阈值的设定更加准确和合理。本实施例上述技术方案通过预设的阈值调整模型和算法，能够自动化地进行阈值的调整和优化，实现语义相似度判断的智能化管理。这种自动化和智能化的方式能够大大提高语义相似度判断的效率和准确性，降低人工干预和误判的可能性。The technical effect of the above technical solution is: the threshold initial value extraction module can extract the initial value of the similarity threshold, providing a basis for subsequent threshold adjustment. The cycle duration setting module can set the observation cycle duration of the similarity threshold to ensure the timeliness and accuracy of the threshold adjustment. By setting a suitable observation cycle, the similarity changes of the semantic vectors can be better monitored. The comparison result extraction module can extract the comparison results between the similarity values corresponding to all two semantic vectors in the observation cycle and the similarity initial threshold for each observation cycle. The extraction of such comparison results helps to better understand the similarity state of the semantic vectors. When the comparison result shows that the similarity values of the two semantic vectors are lower than the initial threshold, the real similarity value extraction module can retrieve the real similarity values of the two semantic vectors contained in the actual retrieval effect and user feedback. This retrieval mechanism can better understand the actual needs and feedback of users and improve the accuracy of semantic similarity judgment. The threshold adjustment judgment module uses the difference between the real similarity value and the similarity values corresponding to the two semantic vectors to determine whether the initial similarity threshold needs to be adjusted. This judgment mechanism can be dynamically adjusted according to actual needs and feedback to improve the accuracy and adaptability of similarity threshold setting. When it is determined that the initial similarity threshold needs to be adjusted, the adjustment execution module uses the threshold adjustment model to adjust the initial similarity threshold. This adjustment mechanism can be dynamically adjusted according to the actual situation to ensure that the setting of the similarity threshold is more accurate and reasonable. The above technical solution of this embodiment can automatically adjust and optimize the threshold through the preset threshold adjustment model and algorithm to achieve intelligent management of semantic similarity judgment. This automated and intelligent approach can greatly improve the efficiency and accuracy of semantic similarity judgment and reduce the possibility of manual intervention and misjudgment.

综上所述，上述技术方案中的阈值设定单元提供了一种动态、准确、自动化的语义相似度判断方法，有助于更好地了解用户的实际需求和反馈，提高语义相似度判断的准确性和适应性。In summary, the threshold setting unit in the above technical solution provides a dynamic, accurate and automated method for determining semantic similarity, which helps to better understand the actual needs and feedback of users and improve the accuracy and adaptability of semantic similarity determination.

具体的，阈值调整判断模块，包括：Specifically, the threshold adjustment judgment module includes:

上述技术方案的技术效果为：当两个语义向量的相似度数值低于相似度阈值时，组数提取模块能够提取每个观测周期中低于阈值的语义向量组数，而数值调取模块则能够调取这些低于阈值的语义向量对应的相似度数值。这两个模块共同工作，为后续的稳定性参数计算提供了必要的数据基础。利用每个观测周期中低于相似度阈值的两个语义向量对应的相似度数值，稳定参数获取模块能够计算出每个观测周期的相似度判断稳定性参数。这个参数反映了在每个观测周期内，低于阈值的语义向量相似度的稳定性情况，有助于判断是否需要调整相似度阈值。综合参数获取模块将每个观测周期的相似度判断稳定性参数进行整合，生成综合相似度判定值。这个判定值综合了所有观测周期的信息，提供了一个全面的、用于判断是否需要调整相似度阈值的指标。阈值调整执行模块根据综合相似度判定值与预设的判定阈值的比较结果，自动判断是否需要对相似度初始阈值进行调整。这种自动化的判断和执行方式提高了系统的智能性和响应速度。通过对低于相似度阈值的语义向量进行细致的分析和处理，并结合多个观测周期的数据进行综合判断，该模块能够更准确地识别出需要调整相似度阈值的情况，从而提高了相似度判断的准确性和稳定性。由于语义向量的相似度可能会随着时间和语境的变化而发生变化，该模块通过动态地监测和调整相似度阈值，能够更好地适应这种变化的语义环境，提高了系统的适应性和鲁棒性。The technical effect of the above technical solution is: when the similarity value of two semantic vectors is lower than the similarity threshold, the group number extraction module can extract the number of semantic vector groups below the threshold in each observation period, and the value retrieval module can retrieve the similarity values corresponding to these semantic vectors below the threshold. The two modules work together to provide the necessary data basis for the subsequent stability parameter calculation. Using the similarity values corresponding to the two semantic vectors below the similarity threshold in each observation period, the stability parameter acquisition module can calculate the similarity judgment stability parameter of each observation period. This parameter reflects the stability of the similarity of the semantic vectors below the threshold in each observation period, which helps to determine whether the similarity threshold needs to be adjusted. The comprehensive parameter acquisition module integrates the similarity judgment stability parameters of each observation period to generate a comprehensive similarity judgment value. This judgment value integrates the information of all observation periods and provides a comprehensive indicator for determining whether the similarity threshold needs to be adjusted. The threshold adjustment execution module automatically determines whether the initial similarity threshold needs to be adjusted based on the comparison result between the comprehensive similarity judgment value and the preset judgment threshold. This automated judgment and execution method improves the intelligence and response speed of the system. By carefully analyzing and processing semantic vectors below the similarity threshold and making comprehensive judgments based on data from multiple observation periods, the module can more accurately identify situations where the similarity threshold needs to be adjusted, thereby improving the accuracy and stability of similarity judgments. Since the similarity of semantic vectors may change over time and context, the module can better adapt to this changing semantic environment by dynamically monitoring and adjusting the similarity threshold, thereby improving the adaptability and robustness of the system.

综上所述，上述技术方案中的阈值调整判断模块通过细致的数据处理、综合的参数获取以及自动化的阈值调整判断和执行机制，实现了对相似度阈值的准确、稳定和动态的管理，有助于提高语义相似度判断的准确性和系统的整体性能。In summary, the threshold adjustment judgment module in the above technical solution realizes accurate, stable and dynamic management of the similarity threshold through meticulous data processing, comprehensive parameter acquisition and automated threshold adjustment judgment and execution mechanism, which helps to improve the accuracy of semantic similarity judgment and the overall performance of the system.

视频检索模块包括：The video retrieval module includes:

视频库索引单元，用于：Video library index unit, used to:

建立视频库的索引结构，以便快速检索与查询条件相似的视频，通过与特征提取模块和语义建模模块交互对视频库中的视频进行预处理，为每个视频生成对应的语义向量表示；Establish an index structure for the video library to quickly retrieve videos similar to the query conditions. Preprocess the videos in the video library by interacting with the feature extraction module and the semantic modeling module to generate a corresponding semantic vector representation for each video.

检索匹配单元，用于：Retrieve matching unit, used to:

在上述实施例中，通过接收用户输入的查询条件，系统能够自动提取关键词、场景、人物等信息，并将其转换为语义向量表示，为用户提供了更加简洁和直观的查询方式，减少了用户需要手动进行特征选择和提取的步骤。In the above embodiment, by receiving the query conditions input by the user, the system can automatically extract keywords, scenes, characters and other information, and convert them into semantic vector representations, providing users with a more concise and intuitive query method and reducing the steps that users need to manually select and extract features.

在上述实施例中，视频库索引单元的建立，以及对视频库中视频的预处理，使得系统能够快速检索与查询条件相似的视频，通过与特征提取模块和语义建模模块的交互，每个视频都被转换为对应的语义向量表示，提高了检索的效率和准确性。In the above embodiment, the establishment of the video library index unit and the preprocessing of the videos in the video library enable the system to quickly retrieve videos similar to the query conditions. Through the interaction with the feature extraction module and the semantic modeling module, each video is converted into a corresponding semantic vector representation, thereby improving the efficiency and accuracy of the retrieval.

在上述实施例中，检索匹配单元通过与相似度计算模块的交互，对查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算，根据相似度阈值筛选出与用户查询条件最为相关的视频，确保了检索结果的精确性和相关性。In the above embodiment, the retrieval matching unit calculates the similarity between the semantic vector of the query condition and the semantic vector of each video in the video library through interaction with the similarity calculation module, and selects the videos most relevant to the user's query condition according to the similarity threshold, thereby ensuring the accuracy and relevance of the retrieval results.

结果排序模块包括：The result sorting module includes:

结果接收单元，用于：The result receiving unit is used for:

排序应用单元，用于：Sorting application unit for:

根据相似度计算结果，应用适当的排序算法对视频进行排序；According to the similarity calculation results, apply an appropriate sorting algorithm to sort the videos;

输出展示单元，用于：Output display unit for:

将排序后的视频检索结果返回给用户，提供清晰、有序的结果展示，方便用户浏览和选择所需视频。The sorted video retrieval results are returned to the user, providing a clear and orderly result display to facilitate users to browse and select the required videos.

在上述实施例中，结果排序模块根据相似度计算结果对视频进行排序，确保了检索结果的准确性和可靠性，能够将最相关和最相似的视频排在前面，提高用户满意度，为用户提供了清晰、有序的结果展示，方便用户浏览和选择所需视频，增强了实用性，有助于增强用户的满意度和忠诚度，促进视频检索系统的广泛应用和接受度。In the above embodiment, the result sorting module sorts the videos according to the similarity calculation results, ensuring the accuracy and reliability of the retrieval results, and can put the most relevant and similar videos in the front, improving user satisfaction, providing users with a clear and orderly result display, facilitating users to browse and select the required videos, enhancing practicality, helping to enhance user satisfaction and loyalty, and promoting the widespread application and acceptance of video retrieval systems.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，根据本发明的技术方案及其发明构思加以等同替换或改变，都应涵盖在本发明的保护范围之内。The above description is only a preferred specific implementation manner of the present invention, but the protection scope of the present invention is not limited thereto. Any technician familiar with the technical field can make equivalent replacements or changes according to the technical scheme and inventive concept of the present invention within the technical scope disclosed by the present invention, which should be covered by the protection scope of the present invention.

Claims

1. The video query retrieval method based on the semantic depth model is characterized by comprising the following steps of:

Step one: extracting rich characteristic information from the video by using a deep learning technology, wherein the characteristic information comprises colors, textures, shapes and motions;

step two: constructing a deep neural network model, mapping the extracted video features into a semantic space, and generating semantic vectors;

step three: measuring the similarity degree of two videos at the semantic level by calculating the similarity between two semantic vectors;

step four: searching a semantically most similar video from a video library according to a query condition input by a user, firstly carrying out semantic vector representation on the query condition input by the user, and then searching the video most similar to the semantic vector from the video library;

step five: according to the similarity calculation result, ordering the searched videos, and arranging the video which meets the requirements of the user at the front;

In the fourth step, calculating the similarity between the semantic vector of the query condition input by the user and the semantic vector of each video in the video library, and comparing the similarity between each video in the video library according to a similarity threshold value to obtain a video which is most similar to the semantic vector of the query condition input by the user;

the video query retrieval system based on the semantic depth model is applied to the video query retrieval method based on the semantic depth model, and comprises the following steps:

the feature extraction module is used for:

Performing depth analysis on the input video by using a depth learning technology, and extracting color, texture, shape and motion characteristic information from the input video;

The semantic modeling module is used for:

Constructing a deep neural network model, mapping video features into a semantic space, capturing time sequence information and context relation of video content, and generating semantic vector representation;

The similarity calculation module is used for:

Measuring the similarity degree of two videos at the semantic level by calculating the similarity between two semantic vectors;

a video retrieval module for:

searching the semantically most similar video in the video library according to the query condition input by the user, carrying out semantic vector representation on query conditions input by a user, and then searching a video most similar to the semantic vector in a video library;

the result ordering module is used for:

According to the similarity calculation result, sequencing the searched videos, and adjusting the video sequencing most in line with the user requirement to the forefront;

wherein, the similarity calculation module includes:

a semantic vector input unit for:

Receiving a semantic vector from a semantic modeling module, and formatting and preprocessing the input semantic vector;

a similarity measurement unit for:

Selecting a proper similarity measurement method according to specific requirements, and performing similarity calculation on two input semantic vectors to obtain a numerical value as a similarity measurement;

a threshold setting unit configured to:

setting a similarity threshold value, and dynamically adjusting the similarity threshold value according to the actual retrieval effect and user feedback;

Wherein the threshold setting unit includes:

the threshold initial value extraction module is used for extracting a similarity threshold initial value;

the period duration setting module is used for setting the observation period duration of the similarity threshold; the observation period time of the similarity threshold comprises 30-90 unit time lengths, and the unit time length is 24 hours;

the comparison result extraction module is used for extracting comparison results between similarity values corresponding to all the two semantic vectors in the observation period and a similarity initial threshold value for each observation period;

the true similarity value extraction module is used for calling the true similarity values of the two semantic vectors contained in the actual retrieval effect and the user feedback when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold value shows that the similarity values of the two semantic vectors are lower than the corresponding similarity threshold value;

The threshold adjustment judging module is used for judging whether the initial threshold of the similarity is required to be adjusted or not by utilizing the difference between the true similarity value and the similarity value corresponding to the two semantic vectors;

the adjustment execution module is used for adjusting the initial threshold value of the similarity by using a threshold adjustment model when the initial threshold value of the similarity is determined to be required to be adjusted; the threshold adjustment model structure is as follows:

s _yt represents a similarity threshold value corresponding to the initial similarity threshold value after adjustment; s _wy represents a preset judgment threshold value; s _q represents a similarity determination value; k represents the total number of observation periods; s _wj represents a similarity judgment stability parameter of the j-th observation period; s _01j and S _02j respectively represent a first similarity stability parameter and a second similarity stability parameter of a j-th observation period; s _y represents a similarity threshold; λ represents a scale parameter.

2. The semantic depth model based video query retrieval system according to claim 1, wherein: the feature extraction module includes:

a feature extraction unit for:

Detecting and extracting color distribution and color histogram information in the video by using a specific layer in the CNN, analyzing a texture mode and a structure in the video by the CNN, performing contour detection and shape recognition on an object in the video, and detecting a motion mode and dynamic change in the video by combining an optical flow method technology;

The feature fusion unit is used for:

integrating the extracted color, texture, shape and motion characteristics, and effectively combining the characteristics to form a unified characteristic vector;

a feature normalization unit for:

And (5) carrying out standardization treatment on the extracted features.

3. The semantic depth model based video query retrieval system according to claim 1, wherein: the semantic modeling module includes:

A timing information capturing unit configured to:

Capturing time sequence information in the video by using the RNN, analyzing time and space relation between continuous frames in the video, and capturing dynamic changes and continuous actions;

A context modeling unit for:

modeling a context of video content using a self-attention mechanism in a Transformer architecture;

The feature fusion and semantic conversion unit is used for:

The information acquired from the time sequence information capturing unit and the context modeling unit is fused, and the fused features are converted into high-level semantic vector representation;

a semantic vector normalization unit configured to:

And carrying out standardization processing on the generated semantic vector.

4. The semantic depth model based video query retrieval system according to claim 1, wherein: the threshold adjustment judging module comprises:

The group number extraction module is used for extracting the group number formed by the two semantic vectors which are lower than the corresponding similarity threshold value in each observation period when the comparison result between the similarity value corresponding to the two semantic vectors and the similarity initial threshold value shows that the similarity value of the two semantic vectors is lower than the corresponding similarity threshold value;

The numerical value calling module is used for calling the similarity numerical value corresponding to the two semantic vectors, of which each group is lower than the corresponding similarity threshold value, in each observation period;

The stability parameter obtaining module is used for obtaining similarity judgment stability parameters contained in each observation period by using similarity values corresponding to two semantic vectors lower than the corresponding similarity threshold value in each observation period; the similarity judging stability parameter is obtained through the following formula:

Wherein S _w represents a similarity determination stability parameter; n represents the number of groups corresponding to two semantic vectors below the corresponding similarity threshold; s ₀₁ and S ₀₂ represent a first similarity stability parameter and a second similarity stability parameter, respectively; s _i represents similarity values corresponding to two semantic vectors of the ith group below the corresponding similarity threshold; s _y represents a similarity threshold; m represents the number of groups corresponding to two semantic vectors not lower than the corresponding similarity threshold; s _zi represents the true similarity value corresponding to two semantic vectors below the corresponding similarity threshold; s _j represents the similarity value corresponding to the j-th two semantic vectors which are not lower than the corresponding similarity threshold;

The comprehensive parameter acquisition module is used for integrating the similarity judgment stability parameters contained in each observation period to generate a comprehensive similarity judgment value; the similarity determination value is obtained through the following formula:

Wherein S _q represents a similarity determination value; k represents the total number of observation periods; s _wj represents a similarity judgment stability parameter of the j-th observation period; s _01j and S _02j respectively represent a first similarity stability parameter and a second similarity stability parameter of a j-th observation period;

And the threshold adjustment execution module is used for determining that the initial threshold of the similarity needs to be adjusted when the similarity judging value is lower than a preset judging threshold.

5. The semantic depth model based video query retrieval system according to claim 1, wherein: the video retrieval module comprises:

A query condition processing unit, configured to:

receiving query conditions input by a user, extracting keywords, scenes and characters, and converting the keywords, the scenes and the characters into semantic vector representations;

A video library index unit for:

establishing an index structure of a video library, preprocessing videos in the video library through interaction with a feature extraction module and a semantic modeling module, and generating corresponding semantic vector representations for each video;

a search matching unit for:

And calculating the similarity of the semantic vector of the query condition and the semantic vector of each video in the video library through interaction with the similarity calculation module, and screening the video most relevant to the query condition of the user according to the similarity threshold.

6. The semantic depth model based video query retrieval system according to claim 1, wherein: the result ordering module comprises:

A result receiving unit configured to:

receiving a similarity value output by a similarity calculation module;

a ranking application unit configured to:

ordering the videos according to the similarity calculation result;

an output display unit for:

And returning the ordered video retrieval results to the user.