CN118093938B - Video query retrieval method and system based on semantic depth model - Google Patents
Video query retrieval method and system based on semantic depth model Download PDFInfo
- Publication number
- CN118093938B CN118093938B CN202410198615.3A CN202410198615A CN118093938B CN 118093938 B CN118093938 B CN 118093938B CN 202410198615 A CN202410198615 A CN 202410198615A CN 118093938 B CN118093938 B CN 118093938B
- Authority
- CN
- China
- Prior art keywords
- similarity
- video
- semantic
- threshold
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/785—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using shape
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7857—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及信息检索技术领域,特别涉及基于语义深度模型的视频查询检索方法及系统。The present invention relates to the field of information retrieval technology, and in particular to a video query retrieval method and system based on a semantic depth model.
背景技术Background Art
随着视频数据的爆炸式增长,视频检索已成为一项重要的技术需求。传统的视频检索方法通常基于颜色、纹理、形状等视觉特征进行相似度匹配。然而,这些方法难以在语义层面理解视频内容,导致检索结果的不准确和不相关。With the explosive growth of video data, video retrieval has become an important technical requirement. Traditional video retrieval methods usually perform similarity matching based on visual features such as color, texture, and shape. However, these methods have difficulty understanding video content at the semantic level, resulting in inaccurate and irrelevant retrieval results.
在现有技术下往往还存在以下缺陷:The prior art often still has the following defects:
传统的视频检索方法通常只利用视频中的部分特征,如颜色、纹理和形状等,而忽略了其他重要的特征信息,如场景、人物和动作等,这种不全面的特征提取方式可能导致检索结果的不准确和不相关。同时传统的视频检索方法通常只关注视频的视觉特征,而忽略了视频的语义信息。由于缺乏对视频内容的深入理解,这些方法难以在语义层面进行准确的相似度匹配。Traditional video retrieval methods usually only use some features in the video, such as color, texture, and shape, while ignoring other important feature information, such as scenes, characters, and actions. This incomplete feature extraction method may lead to inaccurate and irrelevant retrieval results. At the same time, traditional video retrieval methods usually only focus on the visual features of the video, while ignoring the semantic information of the video. Due to the lack of in-depth understanding of the video content, these methods are difficult to perform accurate similarity matching at the semantic level.
发明内容Summary of the invention
本发明的目的在于提供基于语义深度模型的视频查询检索方法及系统,以解决上述背景技术中提出的问题。The purpose of the present invention is to provide a video query retrieval method and system based on a semantic depth model to solve the problems raised in the above background technology.
为实现上述目的,本发明提供如下技术方案:基于语义深度模型的视频查询检索方法,包括以下步骤:To achieve the above object, the present invention provides the following technical solution: a video query retrieval method based on a semantic depth model, comprising the following steps:
步骤一:利用深度学习技术,从视频中提取出丰富的特征信息,所述特征信息包括颜色、纹理、形状、运动;Step 1: Use deep learning technology to extract rich feature information from the video, including color, texture, shape, and motion;
步骤二:构建深度神经网络模型,将提取出的视频特征映射到语义空间中,生成语义向量;Step 2: Build a deep neural network model to map the extracted video features into the semantic space and generate a semantic vector;
步骤三:通过计算两个语义向量之间的相似度,衡量两个视频在语义层面的相似程度;Step 3: Measure the similarity between the two videos at the semantic level by calculating the similarity between the two semantic vectors;
步骤四:根据用户输入的查询条件,在视频库中检索出语义上最相似的视频,首先对用户输入的查询条件进行语义向量表示,随后在视频库中寻找与该语义向量最相似的视频;Step 4: According to the query condition input by the user, the most semantically similar video is retrieved from the video library. First, the query condition input by the user is represented by a semantic vector, and then the video most similar to the semantic vector is searched in the video library;
步骤五:根据相似度计算结果,对检索出的视频进行排序,将最符合用户需求的视频排在前面。Step 5: Sort the retrieved videos based on the similarity calculation results, and put the videos that best meet the user's needs at the front.
进一步的,在步骤四中,对用户输入的查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算,并根据相似度阈值对视频库中每个视频相似度进行比对,得到与用户输入查询条件的语义向量最相似的视频。Furthermore, in step four, the similarity between the semantic vector of the query condition input by the user and the semantic vector of each video in the video library is calculated, and the similarity of each video in the video library is compared according to the similarity threshold to obtain the video that is most similar to the semantic vector of the query condition input by the user.
进一步的,本发明提供了基于语义深度模型的视频查询检索系统,应用在上述的基于语义深度模型的视频查询检索方法中,包括:Furthermore, the present invention provides a video query retrieval system based on a semantic depth model, which is applied in the above-mentioned video query retrieval method based on a semantic depth model, and comprises:
特征提取模块,用于:Feature extraction module for:
利用深度学习技术对输入的视频进行深度分析,从输入的视频中提取颜色、纹理、形状、运动特征信息;Use deep learning technology to deeply analyze the input video and extract color, texture, shape, and motion feature information from the input video;
语义建模模块,用于:Semantic modeling module for:
构建深度神经网络模型,将视频特征映射到语义空间中,捕捉视频内容的时序信息和上下文关系,生成语义向量表示;Build a deep neural network model to map video features into semantic space, capture the temporal information and contextual relationship of video content, and generate semantic vector representation;
相似度计算模块,用于:Similarity calculation module, used for:
通过计算两个语义向量之间的相似度,衡量两个视频在语义层面的相似程度;By calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level is measured;
视频检索模块,用于:Video retrieval module for:
根据用户输入的查询条件在视频库中检索出语义上最相似的视频,对用户输入的查询条件进行语义向量表示,随后在视频库中寻找与该语义向量最相似的视频;Retrieve the most semantically similar videos from the video library according to the query conditions input by the user, represent the query conditions input by the user with a semantic vector, and then search for the video library with the most similar video to the semantic vector;
结果排序模块,用于:Result sorting module, used to:
根据相似度计算结果,对检索出的视频进行排序,调整最符合用户需求的视频排序至最前方。According to the similarity calculation results, the retrieved videos are sorted and the video that best meets the user's needs is adjusted to the front.
进一步的,所述特征提取模块包括:Furthermore, the feature extraction module includes:
特征提取单元,用于:Feature extraction unit, used to:
使用CNN中的特定层来检测和提取视频中的颜色分布、颜色直方图信息,通过CNN分析视频中的纹理模式和结构,对视频中的物体进行轮廓检测和形状识别,结合光流法技术检测视频中的运动模式和动态变化;Use specific layers in CNN to detect and extract color distribution and color histogram information in the video, analyze texture patterns and structures in the video through CNN, perform contour detection and shape recognition on objects in the video, and combine optical flow technology to detect motion patterns and dynamic changes in the video;
特征融合单元,用于:Feature fusion unit, used for:
将提取的颜色、纹理、形状及运动特征进行整合,将各个特征有效地结合起来形成统一的特征向量;Integrate the extracted color, texture, shape and motion features, and effectively combine each feature to form a unified feature vector;
特征标准化单元,用于:Feature normalization unit, used to:
对提取的特征进行标准化处理。The extracted features are normalized.
进一步的,所述语义建模模块包括:Furthermore, the semantic modeling module includes:
时序信息捕获单元,用于:Timing information capture unit, used to:
使用RNN来捕捉视频中的时序信息,分析视频中连续帧之间的时间和空间关系,捕捉动态变化和连续动作;Use RNN to capture the timing information in the video, analyze the temporal and spatial relationship between consecutive frames in the video, and capture dynamic changes and continuous actions;
上下文关系建模单元,用于:Contextual relationship modeling unit, used to:
利用Transformer架构中的自注意力机制来建模视频内容的上下文关系;Use the self-attention mechanism in the Transformer architecture to model the contextual relationship of video content;
特征融合与语义转换单元,用于:Feature fusion and semantic conversion unit, used for:
将从时序信息捕获单元和上下文关系建模单元中获取的信息进行融合,将融合后的特征转换为高层次的语义向量表示;The information obtained from the temporal information capture unit and the contextual relationship modeling unit is fused, and the fused features are converted into a high-level semantic vector representation;
语义向量标准化单元,用于:Semantic vector normalization unit, used to:
对生成的语义向量进行标准化处理。The generated semantic vectors are normalized.
进一步的,所述相似度计算模块包括:Furthermore, the similarity calculation module includes:
语义向量输入单元,用于:Semantic vector input unit, used for:
接收来自语义建模模块的语义向量,对输入的语义向量进行格式化和预处理;Receive the semantic vector from the semantic modeling module, and format and preprocess the input semantic vector;
相似度度量单元,用于:Similarity measurement unit, used to:
根据具体需求选择适合的相似度度量方法,如余弦相似度或欧氏距离,对输入的两个语义向量进行相似度计算,得到一个数值作为相似度的度量;Select a suitable similarity measurement method according to specific needs, such as cosine similarity or Euclidean distance, calculate the similarity of the two input semantic vectors, and obtain a value as a measure of similarity;
阈值设定单元,用于:A threshold setting unit, for:
设定相似度阈值,根据实际检索效果和用户反馈,动态调整相似度阈值。Set the similarity threshold and dynamically adjust the similarity threshold based on actual retrieval results and user feedback.
进一步的,阈值设定单元,包括:Furthermore, the threshold setting unit includes:
阈值初始值提取模块,用于提取相似度阈值初始值;A threshold initial value extraction module is used to extract the initial value of the similarity threshold;
周期时长设置模块,用于设置相似度阈值的观测周期时长;其中,所述相似度阈值的观测周期时长包括30-90个单位时长,并且,所述单位时长为24h;A cycle duration setting module, used to set the observation cycle duration of the similarity threshold; wherein the observation cycle duration of the similarity threshold includes 30-90 unit durations, and the unit duration is 24 hours;
比较结果提取模块,用于针对每个观测周期提取观测周期内所有的两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果;A comparison result extraction module is used to extract, for each observation period, the comparison result between the similarity values corresponding to all two semantic vectors in the observation period and the initial similarity threshold;
真实相似度数值提取模块,用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时,调取实际检索效果和用户反馈中所包含的所述两个语义向量的真实相似度数值;A real similarity value extraction module, used for retrieving the real similarity values of the two semantic vectors contained in the actual retrieval effect and user feedback when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold value indicates that the similarity values of the two semantic vectors are lower than the corresponding similarity threshold value;
阈值调整判断模块,用于利用所述真实相似度数值和两个语义向量对应的相似度数值之间的差异判断是否需要对所述相似度初始阈值进行调整;A threshold adjustment judgment module, used to judge whether the initial similarity threshold needs to be adjusted by using the difference between the real similarity value and the similarity values corresponding to the two semantic vectors;
调整执行模块,用于当确定需要对所述相似度初始阈值进行调整时,利用阈值调整模型对所述相似度初始阈值进行调整;其中,所述阈值调整模型结构如下:The adjustment execution module is used to adjust the initial similarity threshold using a threshold adjustment model when it is determined that the initial similarity threshold needs to be adjusted; wherein the structure of the threshold adjustment model is as follows:
其中,Syt表示相似度初始阈值进行调整后对应的相似度阈值;Swy表示预设的判定阈值;Sq表示相似度判定值;k表示观测周期的总个数;Swj表示第j个观测周期的相似度判断稳定性参数;S01j和S02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量;Sy表示相似度阈值;λ表示比例参数。Among them, Syt represents the similarity threshold corresponding to the adjustment of the initial similarity threshold; Swy represents the preset judgment threshold; Sq represents the similarity judgment value; k represents the total number of observation cycles; Swj represents the similarity judgment stability parameter of the j-th observation cycle; S01j and S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation cycle respectively; Sy represents the similarity threshold; λ represents the ratio parameter.
进一步的,阈值调整判断模块,包括:Furthermore, the threshold adjustment judgment module includes:
组数提取模块,用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时,提取每个观测周期中的低于对应的相似度阈值的两个语义向量所组成的组数;A group number extraction module is used to extract the number of groups consisting of two semantic vectors below the corresponding similarity threshold in each observation period when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold indicates that the similarity values of the two semantic vectors are below the corresponding similarity threshold;
数值调取模块,用于调取每个观测周期中的每组低于对应的相似度阈值的两个语义向量对应的相似度数值;A value retrieving module, used for retrieving the similarity value corresponding to each group of two semantic vectors below the corresponding similarity threshold in each observation period;
稳定参数获取模块,用于利用所述每个观测周期中的低于对应的相似度阈值的两个语义向量对应的相似度数值获取所述每个观测周期所包含的相似度判断稳定性参数;其中,所述相似度判断稳定性参数通过如下公式获取:The stability parameter acquisition module is used to obtain the similarity judgment stability parameter contained in each observation period by using the similarity values corresponding to the two semantic vectors below the corresponding similarity threshold in each observation period; wherein the similarity judgment stability parameter is obtained by the following formula:
其中,Sw表示相似度判断稳定性参数;n表示低于对应的相似度阈值的两个语义向量对应的组数;S01和S02分别表示第一相似度稳定性参量和第二相似度稳定性参量;Si表示第i组低于对应的相似度阈值的两个语义向量对应的相似度数值;Sy表示相似度阈值;m表示不低于对应的相似度阈值的两个语义向量对应的组数;Szi表示低于对应的相似度阈值的两个语义向量对应的真实相似度数值;Sj表示第j个不低于对应的相似度阈值的两个语义向量对应的相似度数值;Wherein, Sw represents the similarity judgment stability parameter; n represents the number of groups corresponding to two semantic vectors below the corresponding similarity threshold; S01 and S02 represent the first similarity stability parameter and the second similarity stability parameter respectively; Si represents the similarity value corresponding to the i-th group of two semantic vectors below the corresponding similarity threshold; Sy represents the similarity threshold; m represents the number of groups corresponding to two semantic vectors not lower than the corresponding similarity threshold; Szi represents the true similarity value corresponding to the two semantic vectors below the corresponding similarity threshold; Sj represents the similarity value corresponding to the j-th two semantic vectors not lower than the corresponding similarity threshold;
综合参数获取模块,用于将所述每个观测周期所包含的相似度判断稳定性参数进行整合,生成综合相似度判定值;其中,所述相似度判定值通过如下公式获取:The comprehensive parameter acquisition module is used to integrate the similarity judgment stability parameters contained in each observation period to generate a comprehensive similarity judgment value; wherein the similarity judgment value is obtained by the following formula:
其中,Sq表示相似度判定值;k表示观测周期的总个数;Swj表示第j个观测周期的相似度判断稳定性参数;S01j和S02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量;Wherein, Sq represents the similarity judgment value; k represents the total number of observation periods; Swj represents the similarity judgment stability parameter of the j-th observation period; S01j and S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation period, respectively;
阈值调整执行模块,用于当所述相似度判定值低于预设的判定阈值时,则确定为需要对所述相似度初始阈值进行调整。The threshold adjustment execution module is used to determine that the initial similarity threshold needs to be adjusted when the similarity determination value is lower than a preset determination threshold.
进一步的,所述视频检索模块包括:Furthermore, the video retrieval module includes:
查询条件处理单元,用于:The query condition processing unit is used to:
接收用户输入的查询条件,对关键词、场景、人物等信息进行提取并转换为语义向量表示;Receive the query conditions entered by the user, extract keywords, scenes, characters and other information, and convert them into semantic vector representation;
视频库索引单元,用于:Video library index unit, used to:
建立视频库的索引结构,通过与特征提取模块和语义建模模块交互对视频库中的视频进行预处理,为每个视频生成对应的语义向量表示;Establish an index structure for the video library, pre-process the videos in the video library by interacting with the feature extraction module and the semantic modeling module, and generate a corresponding semantic vector representation for each video;
检索匹配单元,用于:Retrieve matching unit, used to:
通过与相似度计算模块交互对查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算,根据相似度阈值筛选出与用户查询条件最为相关的视频。By interacting with the similarity calculation module, the similarity between the semantic vector of the query condition and the semantic vector of each video in the video library is calculated, and the videos most relevant to the user's query condition are screened out according to the similarity threshold.
进一步的,所述结果排序模块包括:Furthermore, the result sorting module includes:
结果接收单元,用于:The result receiving unit is used for:
接收相似度计算模块输出的相似度值;Receiving a similarity value output by a similarity calculation module;
排序应用单元,用于:Sequencing application unit for:
根据相似度计算结果对视频进行排序;Sort the videos according to the similarity calculation results;
输出展示单元,用于:Output display unit for:
将排序后的视频检索结果返回给用户。The sorted video retrieval results are returned to the user.
与现有技术相比,本发明的有益效果是:Compared with the prior art, the present invention has the following beneficial effects:
1.本发明通过深度学习技术能够从视频中提取丰富且全面的特征信息,包括颜色、纹理、形状和运动等,这些特征能够更精确地描述视频内容,为后续的语义建模提供了坚实的基础。通过构建深度神经网络模型,将提取的视频特征映射到语义空间中,生成语义向量表示,有效地实现了从图像特征到语义理解的转化,显著提高了视频检索的准确性和效率。1. The present invention can extract rich and comprehensive feature information from videos through deep learning technology, including color, texture, shape and motion, etc. These features can more accurately describe the video content and provide a solid foundation for subsequent semantic modeling. By building a deep neural network model, the extracted video features are mapped into the semantic space and a semantic vector representation is generated, which effectively realizes the transformation from image features to semantic understanding and significantly improves the accuracy and efficiency of video retrieval.
2.本发明通过计算两个语义向量之间的相似度,能够精确衡量两个视频在语义层面的相似程度,为后续的视频检索提供了精确的匹配依据,根据用户输入的查询条件,在视频库中快速检索出语义上最相似的视频,并根据相似度排序,这种个性化排序将最符合用户需求的视频排在前面,为用户提供了更加友好和实用的查询体验,可以灵活地应用于各种类型的视频库和不同的查询条件,具有良好的可扩展性和适应性。2. The present invention can accurately measure the similarity between two videos at the semantic level by calculating the similarity between two semantic vectors, thereby providing an accurate matching basis for subsequent video retrieval. According to the query conditions input by the user, the most semantically similar videos are quickly retrieved from the video library and sorted according to the similarity. This personalized sorting puts the videos that best meet the user's needs at the front, providing users with a more friendly and practical query experience. It can be flexibly applied to various types of video libraries and different query conditions, and has good scalability and adaptability.
3.本发明的特征提取模块及语义建模模块能够全面提取视频中的颜色、纹理、形状和运动特征,从而更准确地描述视频内容,相似度计算模块通过计算两个语义向量之间的相似度,能够精确衡量视频之间的语义相似度,提高检索的准确性和效率,视频检索模块根据用户查询条件,该系统能够在视频库中快速检索出语义上最相似的视频,并根据相似度进行排序,结果排序模块的个性化排序为用户提供了更加友好和实用的查询体验。3. The feature extraction module and semantic modeling module of the present invention can comprehensively extract the color, texture, shape and motion features in the video, so as to more accurately describe the video content. The similarity calculation module can accurately measure the semantic similarity between videos by calculating the similarity between two semantic vectors, thereby improving the accuracy and efficiency of retrieval. The video retrieval module is based on the user's query conditions. The system can quickly retrieve the most semantically similar videos in the video library and sort them according to the similarity. The personalized sorting of the result sorting module provides users with a more friendly and practical query experience.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明基于语义深度模型的视频查询检索方法的流程示意图;FIG1 is a schematic diagram of a flow chart of a video query retrieval method based on a semantic depth model of the present invention;
图2为本发明视频查询检索系统的模块示意图。FIG. 2 is a schematic diagram of the modules of the video query and retrieval system of the present invention.
具体实施方式DETAILED DESCRIPTION
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
为了解决对于视频进行检索的检索结果准确度和相关性较差的技术问题,请参阅图1,本发明提供以下技术方案:In order to solve the technical problem of poor accuracy and relevance of retrieval results for video retrieval, please refer to FIG. 1 , the present invention provides the following technical solutions:
基于语义深度模型的视频查询检索方法,包括以下步骤:The video query retrieval method based on the semantic depth model includes the following steps:
步骤一:利用深度学习技术,从视频中提取出丰富的特征信息,所述特征信息包括颜色、纹理、形状、运动,这些特征信息能够全面地描述视频内容,为后续的语义建模提供基础数据;Step 1: Use deep learning technology to extract rich feature information from the video, including color, texture, shape, and motion. These feature information can fully describe the video content and provide basic data for subsequent semantic modeling;
步骤二:构建一个深度神经网络模型,将提取出的视频特征映射到语义空间中,生成语义向量表示,这一步是将图像特征转化为语义理解的桥梁,对于提高视频检索的准确性和效率至关重要;Step 2: Build a deep neural network model to map the extracted video features into the semantic space and generate a semantic vector representation. This step is the bridge from image features to semantic understanding and is crucial to improving the accuracy and efficiency of video retrieval.
步骤三:通过计算两个语义向量之间的相似度,衡量两个视频在语义层面的相似程度,为后续的视频检索提供精确的匹配依据;Step 3: By calculating the similarity between the two semantic vectors, the similarity between the two videos at the semantic level is measured, providing an accurate matching basis for subsequent video retrieval;
步骤四:根据用户输入的查询条件,在视频库中检索出语义上最相似的视频,首先对用户输入的查询条件进行语义向量表示,随后在视频库中寻找与该语义向量最相似的视频;Step 4: According to the query condition input by the user, the most semantically similar video is retrieved from the video library. First, the query condition input by the user is represented by a semantic vector, and then the video most similar to the semantic vector is searched in the video library;
步骤五:根据相似度计算结果,对检索出的视频进行排序,将最符合用户需求的视频排在前面,这一步能够为用户提供更加友好的查询体验,提高视频检索的实用性。Step 5: Sort the retrieved videos according to the similarity calculation results, and put the videos that best meet the user's needs at the front. This step can provide users with a more friendly query experience and improve the practicality of video retrieval.
在上述实施例中,对用户输入的查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算,并根据相似度阈值对视频库中每个视频相似度进行比对,得到与用户输入查询条件的语义向量最相似的视频。In the above embodiment, the similarity between the semantic vector of the query condition input by the user and the semantic vector of each video in the video library is calculated, and the similarity of each video in the video library is compared according to the similarity threshold to obtain the video that is most similar to the semantic vector of the query condition input by the user.
具体的,通过深度学习技术能够从视频中提取丰富且全面的特征信息,包括颜色、纹理、形状和运动等,这些特征能够更精确地描述视频内容,为后续的语义建模提供了坚实的基础。通过构建深度神经网络模型,将提取的视频特征映射到语义空间中,生成语义向量表示,有效地实现了从图像特征到语义理解的转化,显著提高了视频检索的准确性和效率。Specifically, deep learning technology can extract rich and comprehensive feature information from videos, including color, texture, shape, and motion. These features can more accurately describe the video content and provide a solid foundation for subsequent semantic modeling. By building a deep neural network model, the extracted video features are mapped into the semantic space and a semantic vector representation is generated, which effectively realizes the transformation from image features to semantic understanding and significantly improves the accuracy and efficiency of video retrieval.
在上述实施例中,通过计算两个语义向量之间的相似度,能够精确衡量两个视频在语义层面的相似程度,为后续的视频检索提供了精确的匹配依据。In the above embodiment, by calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level can be accurately measured, providing an accurate matching basis for subsequent video retrieval.
在上述实施例中,该方法能够根据用户输入的查询条件,在视频库中快速检索出语义上最相似的视频,并根据相似度排序,这种个性化排序将最符合用户需求的视频排在前面,为用户提供了更加友好和实用的查询体验。该方法可以灵活地应用于各种类型的视频库和不同的查询条件,具有良好的可扩展性和适应性。In the above embodiment, the method can quickly retrieve the most semantically similar videos in the video library according to the query conditions input by the user, and sort them according to the similarity. This personalized sorting puts the videos that best meet the user's needs at the front, providing the user with a more friendly and practical query experience. The method can be flexibly applied to various types of video libraries and different query conditions, and has good scalability and adaptability.
请参阅图2,基于语义深度模型的视频查询检索系统,包括:Please refer to FIG. 2 , a video query retrieval system based on a semantic deep model includes:
特征提取模块,用于:Feature extraction module for:
利用深度学习技术对输入的视频进行深度分析,从输入的视频中提取颜色、纹理、形状、运动特征信息;Use deep learning technology to deeply analyze the input video and extract color, texture, shape, and motion feature information from the input video;
语义建模模块,用于:Semantic modeling module for:
构建深度神经网络模型,将视频特征映射到语义空间中,捕捉视频内容的时序信息和上下文关系,生成语义向量表示,为后续的视频检索提供高质量的语义数据;Build a deep neural network model to map video features into semantic space, capture the temporal information and contextual relationship of video content, generate semantic vector representation, and provide high-quality semantic data for subsequent video retrieval;
相似度计算模块,用于:Similarity calculation module, used for:
通过计算两个语义向量之间的相似度,衡量两个视频在语义层面的相似程度,采用余弦相似度、欧氏距离度量方式,为视频检索提供精确的匹配依据,确保检索结果与用户查询条件高度相关;By calculating the similarity between two semantic vectors, the similarity between two videos at the semantic level is measured. The cosine similarity and Euclidean distance measurement methods are used to provide accurate matching basis for video retrieval, ensuring that the retrieval results are highly relevant to the user's query conditions.
视频检索模块,用于:Video retrieval module for:
根据用户输入的查询条件在视频库中检索出语义上最相似的视频,对用户输入的查询条件进行语义向量表示,随后在视频库中寻找与该语义向量最相似的视频;Retrieve the most semantically similar videos from the video library according to the query conditions input by the user, represent the query conditions input by the user with a semantic vector, and then search for the video library with the most similar video to the semantic vector;
结果排序模块,用于:Result sorting module, used to:
根据相似度计算结果,对检索出的视频进行排序,调整最符合用户需求的视频排序至最前方,提供清晰、有序的结果展示,方便用户快速定位和选择所需视频。According to the similarity calculation results, the retrieved videos are sorted, and the videos that best meet the user's needs are adjusted to the front, providing a clear and orderly result display to facilitate users to quickly locate and select the required videos.
具体的,特征提取模块及语义建模模块能够全面提取视频中的颜色、纹理、形状和运动特征,从而更准确地描述视频内容,相似度计算模块通过计算两个语义向量之间的相似度,能够精确衡量视频之间的语义相似度,提高检索的准确性和效率,视频检索模块根据用户查询条件,该系统能够在视频库中快速检索出语义上最相似的视频,并根据相似度进行排序,结果排序模块的个性化排序为用户提供了更加友好和实用的查询体验。Specifically, the feature extraction module and the semantic modeling module can comprehensively extract the color, texture, shape and motion features in the video, thereby more accurately describing the video content. The similarity calculation module can accurately measure the semantic similarity between videos by calculating the similarity between two semantic vectors, thereby improving the accuracy and efficiency of retrieval. The video retrieval module is based on the user's query conditions. The system can quickly retrieve the most semantically similar videos in the video library and sort them according to similarity. The personalized sorting of the result sorting module provides users with a more friendly and practical query experience.
特征提取模块包括:The feature extraction module includes:
特征提取单元,用于:Feature extraction unit, used to:
使用CNN中的特定层来检测和提取视频中的颜色分布、颜色直方图信息,识别视频中的主要颜色、颜色变化等,以反映视频的主题或情感,通过CNN分析视频中的纹理模式和结构,以识别不同的材料、表面质地,对纹理的细致分析有助于理解场景的细节和物体的质感,对视频中的物体进行轮廓检测和形状识别,分析物体的基本形状、大小、方向等,为后续的物体识别提供基础,结合光流法技术检测视频中的运动模式和动态变化,分析物体的运动轨迹、速度、方向等,以理解视频中的动态事件;Use specific layers in CNN to detect and extract color distribution and color histogram information in the video, identify the main colors and color changes in the video to reflect the theme or emotion of the video, analyze the texture patterns and structures in the video through CNN to identify different materials and surface textures. The detailed analysis of texture helps to understand the details of the scene and the texture of the object. Perform contour detection and shape recognition on the objects in the video, analyze the basic shape, size, direction, etc. of the object, and provide a basis for subsequent object recognition. Combined with optical flow technology to detect motion patterns and dynamic changes in the video, analyze the motion trajectory, speed, direction, etc. of the object to understand the dynamic events in the video;
特征融合单元,用于:Feature fusion unit, used for:
将提取的颜色、纹理、形状及运动特征进行整合,将各个特征有效地结合起来形成统一的特征向量,以提供更全面的视频内容描述;Integrate the extracted color, texture, shape and motion features, and effectively combine each feature to form a unified feature vector to provide a more comprehensive description of the video content;
特征标准化单元,用于:Feature normalization unit, used to:
对提取的特征进行标准化处理,确保不同特征之间的可比性和一致性。The extracted features are standardized to ensure the comparability and consistency between different features.
在上述实施例中,通过使用CNN提取颜色、纹理、形状和运动等多种特征,该系统能够提供更丰富、更全面的视频内容描述,有助于更准确地理解视频的主题、情感、物体和动态事件。In the above embodiment, by using CNN to extract multiple features such as color, texture, shape and motion, the system is able to provide a richer and more comprehensive description of the video content, which helps to more accurately understand the subject, emotion, objects and dynamic events of the video.
在上述实施例中,特征融合单元将不同特征有效地结合,形成统一的特征向量,提供了更全面的视频内容描述,特征标准化单元确保了不同特征之间的可比性和一致性,进一步提高了检索的准确性和效率。In the above embodiment, the feature fusion unit effectively combines different features to form a unified feature vector, providing a more comprehensive description of the video content, and the feature standardization unit ensures the comparability and consistency between different features, further improving the accuracy and efficiency of retrieval.
在上述实施例中,结合多种特征的提取和分析,该系统能够更准确地衡量视频之间的相似度,从而提高视频检索的精度,为用户提供了更加精确和相关度高的检索结果。In the above embodiment, by combining the extraction and analysis of multiple features, the system can more accurately measure the similarity between videos, thereby improving the accuracy of video retrieval and providing users with more accurate and relevant retrieval results.
语义建模模块包括:The semantic modeling module includes:
时序信息捕获单元,用于:Timing information capture unit, used to:
使用RNN来捕捉视频中的时序信息,分析视频中连续帧之间的时间和空间关系,捕捉动态变化和连续动作;Use RNN to capture the timing information in the video, analyze the temporal and spatial relationship between consecutive frames in the video, and capture dynamic changes and continuous actions;
上下文关系建模单元,用于:Contextual relationship modeling unit, used to:
利用Transformer架构中的自注意力机制来建模视频内容的上下文关系,通过分析不同视频片段之间的关联和依赖关系,理解视频的整体内容和主题;The self-attention mechanism in the Transformer architecture is used to model the contextual relationship of the video content. By analyzing the associations and dependencies between different video clips, the overall content and theme of the video can be understood.
特征融合与语义转换单元,用于:Feature fusion and semantic conversion unit, used for:
将从时序信息捕获单元和上下文关系建模单元中获取的信息进行融合,将融合后的特征转换为高层次的语义向量表示,确保语义向量能够全面、准确地反映视频的语义内容;The information obtained from the temporal information capture unit and the contextual relationship modeling unit is fused, and the fused features are converted into high-level semantic vector representations to ensure that the semantic vector can fully and accurately reflect the semantic content of the video;
语义向量标准化单元,用于:Semantic vector normalization unit, used to:
对生成的语义向量进行标准化处理,以确保不同向量之间的可比性和一致性,有助于提高相似度计算的准确性和检索的效率。The generated semantic vectors are standardized to ensure the comparability and consistency between different vectors, which helps to improve the accuracy of similarity calculation and the efficiency of retrieval.
在上述实施例中,通过使用RNN和Transformer架构,能够有效地捕捉视频中的时序信息和上下文关系,有助于理解视频中的动态变化、连续动作和整体内容主题,提高了对视频内容的深入理解。In the above embodiments, by using the RNN and Transformer architecture, the timing information and contextual relationships in the video can be effectively captured, which helps to understand the dynamic changes, continuous actions and overall content themes in the video, and improves the in-depth understanding of the video content.
在上述实施例中,特征融合与语义转换单元将来自不同单元的信息进行整合,并将其转换为高层次的语义向量表示,确保了语义向量能够全面、准确地反映视频的语义内容,为后续的相似度计算和检索提供了有力的支持。In the above embodiment, the feature fusion and semantic conversion unit integrates the information from different units and converts it into a high-level semantic vector representation, ensuring that the semantic vector can fully and accurately reflect the semantic content of the video, providing strong support for subsequent similarity calculation and retrieval.
相似度计算模块包括:The similarity calculation module includes:
语义向量输入单元,用于:Semantic vector input unit, used for:
接收来自语义建模模块的语义向量,对输入的语义向量进行格式化和预处理,确保它们适用于相似度计算;Receive semantic vectors from the semantic modeling module, format and preprocess the input semantic vectors to ensure that they are suitable for similarity calculation;
相似度度量单元,用于:Similarity measurement unit, used to:
根据具体需求选择适合的相似度度量方法,如余弦相似度或欧氏距离,对输入的两个语义向量进行相似度计算,得到一个数值作为相似度的度量;Select a suitable similarity measurement method according to specific needs, such as cosine similarity or Euclidean distance, calculate the similarity of the two input semantic vectors, and obtain a value as a measure of similarity;
阈值设定单元,用于:A threshold setting unit, for:
设定相似度阈值,根据实际检索效果和用户反馈,动态调整相似度阈值,以确保检索结果的准确性和实用性。Set a similarity threshold and dynamically adjust the similarity threshold based on actual retrieval results and user feedback to ensure the accuracy and practicality of the retrieval results.
在上述实施例中,相似度度量单元对来自语义建模模块的语义向量进行准确度量。这有助于确保相似度计算的准确性和可靠性,提高视频检索的精度。In the above embodiment, the similarity measurement unit accurately measures the semantic vector from the semantic modeling module, which helps to ensure the accuracy and reliability of similarity calculation and improve the accuracy of video retrieval.
在上述实施例中,阈值设定单元允许根据实际检索效果和用户反馈动态调整相似度阈值,有助于平衡视频检索的准确性和召回率,满足不同用户和实际应用的需求。准确的相似度计算和灵活的阈值设定有助于提高视频检索的效率和精度,为用户提供更准确、更相关的检索结果。In the above embodiment, the threshold setting unit allows the similarity threshold to be dynamically adjusted according to the actual retrieval effect and user feedback, which helps to balance the accuracy and recall rate of video retrieval and meet the needs of different users and practical applications. Accurate similarity calculation and flexible threshold setting help improve the efficiency and accuracy of video retrieval and provide users with more accurate and relevant retrieval results.
具体的,阈值设定单元,包括:Specifically, the threshold setting unit includes:
阈值初始值提取模块,用于提取相似度阈值初始值;A threshold initial value extraction module is used to extract the initial value of the similarity threshold;
周期时长设置模块,用于设置相似度阈值的观测周期时长;其中,所述相似度阈值的观测周期时长包括30-90个单位时长,并且,所述单位时长为24h;A cycle duration setting module, used to set the observation cycle duration of the similarity threshold; wherein the observation cycle duration of the similarity threshold includes 30-90 unit durations, and the unit duration is 24 hours;
比较结果提取模块,用于针对每个观测周期提取观测周期内所有的两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果;A comparison result extraction module is used to extract, for each observation period, the comparison result between the similarity values corresponding to all two semantic vectors in the observation period and the initial similarity threshold;
真实相似度数值提取模块,用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时,调取实际检索效果和用户反馈中所包含的所述两个语义向量的真实相似度数值;A real similarity value extraction module, used for retrieving the real similarity values of the two semantic vectors contained in the actual retrieval effect and user feedback when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold value indicates that the similarity values of the two semantic vectors are lower than the corresponding similarity threshold value;
阈值调整判断模块,用于利用所述真实相似度数值和两个语义向量对应的相似度数值之间的差异判断是否需要对所述相似度初始阈值进行调整;A threshold adjustment judgment module, used to judge whether the initial similarity threshold needs to be adjusted by using the difference between the real similarity value and the similarity values corresponding to the two semantic vectors;
调整执行模块,用于当确定需要对所述相似度初始阈值进行调整时,利用阈值调整模型对所述相似度初始阈值进行调整;其中,所述阈值调整模型结构如下:The adjustment execution module is used to adjust the initial similarity threshold using a threshold adjustment model when it is determined that the initial similarity threshold needs to be adjusted; wherein the structure of the threshold adjustment model is as follows:
其中,Syt表示相似度初始阈值进行调整后对应的相似度阈值;Swy表示预设的判定阈值;Sq表示相似度判定值;k表示观测周期的总个数;Swj表示第j个观测周期的相似度判断稳定性参数;S01j和S02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量;Sy表示相似度阈值;λ表示比例参数。Among them, Syt represents the similarity threshold corresponding to the adjustment of the initial similarity threshold; Swy represents the preset judgment threshold; Sq represents the similarity judgment value; k represents the total number of observation cycles; Swj represents the similarity judgment stability parameter of the j-th observation cycle; S01j and S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation cycle respectively; Sy represents the similarity threshold; λ represents the ratio parameter.
上述技术方案的技术效果为:阈值初始值提取模块能够提取相似度阈值的初始值,为后续的阈值调整提供基础。周期时长设置模块能够设置相似度阈值的观测周期时长,确保阈值调整的及时性和准确性。通过设置合适的观测周期,能够更好地监测语义向量的相似度变化。比较结果提取模块能够针对每个观测周期提取观测周期内所有的两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果。这种比较结果的提取有助于更好地了解语义向量的相似度状态。当比较结果表明两个语义向量的相似度数值低于初始阈值时,真实相似度数值提取模块能够调取实际检索效果和用户反馈中所包含的这两个语义向量的真实相似度数值。这种调取机制能够更好地了解用户的实际需求和反馈,提高语义相似度判断的准确性。阈值调整判断模块利用真实相似度数值和两个语义向量对应的相似度数值之间的差异判断是否需要对相似度初始阈值进行调整。这种判断机制能够根据实际需求和反馈进行动态调整,提高相似度阈值设定的准确性和适应性。当确定需要对相似度初始阈值进行调整时,调整执行模块利用阈值调整模型对相似度初始阈值进行调整。这种调整机制能够根据实际情况进行动态调整,确保相似度阈值的设定更加准确和合理。本实施例上述技术方案通过预设的阈值调整模型和算法,能够自动化地进行阈值的调整和优化,实现语义相似度判断的智能化管理。这种自动化和智能化的方式能够大大提高语义相似度判断的效率和准确性,降低人工干预和误判的可能性。The technical effect of the above technical solution is: the threshold initial value extraction module can extract the initial value of the similarity threshold, providing a basis for subsequent threshold adjustment. The cycle duration setting module can set the observation cycle duration of the similarity threshold to ensure the timeliness and accuracy of the threshold adjustment. By setting a suitable observation cycle, the similarity changes of the semantic vectors can be better monitored. The comparison result extraction module can extract the comparison results between the similarity values corresponding to all two semantic vectors in the observation cycle and the similarity initial threshold for each observation cycle. The extraction of such comparison results helps to better understand the similarity state of the semantic vectors. When the comparison result shows that the similarity values of the two semantic vectors are lower than the initial threshold, the real similarity value extraction module can retrieve the real similarity values of the two semantic vectors contained in the actual retrieval effect and user feedback. This retrieval mechanism can better understand the actual needs and feedback of users and improve the accuracy of semantic similarity judgment. The threshold adjustment judgment module uses the difference between the real similarity value and the similarity values corresponding to the two semantic vectors to determine whether the initial similarity threshold needs to be adjusted. This judgment mechanism can be dynamically adjusted according to actual needs and feedback to improve the accuracy and adaptability of similarity threshold setting. When it is determined that the initial similarity threshold needs to be adjusted, the adjustment execution module uses the threshold adjustment model to adjust the initial similarity threshold. This adjustment mechanism can be dynamically adjusted according to the actual situation to ensure that the setting of the similarity threshold is more accurate and reasonable. The above technical solution of this embodiment can automatically adjust and optimize the threshold through the preset threshold adjustment model and algorithm to achieve intelligent management of semantic similarity judgment. This automated and intelligent approach can greatly improve the efficiency and accuracy of semantic similarity judgment and reduce the possibility of manual intervention and misjudgment.
综上所述,上述技术方案中的阈值设定单元提供了一种动态、准确、自动化的语义相似度判断方法,有助于更好地了解用户的实际需求和反馈,提高语义相似度判断的准确性和适应性。In summary, the threshold setting unit in the above technical solution provides a dynamic, accurate and automated method for determining semantic similarity, which helps to better understand the actual needs and feedback of users and improve the accuracy and adaptability of semantic similarity determination.
具体的,阈值调整判断模块,包括:Specifically, the threshold adjustment judgment module includes:
组数提取模块,用于当所述两个语义向量对应的相似度数值与相似度初始阈值之间的比较结果表明两个语义向量的相似度数值低于对应的相似度阈值时,提取每个观测周期中的低于对应的相似度阈值的两个语义向量所组成的组数;A group number extraction module is used to extract the number of groups consisting of two semantic vectors below the corresponding similarity threshold in each observation period when the comparison result between the similarity values corresponding to the two semantic vectors and the similarity initial threshold indicates that the similarity values of the two semantic vectors are below the corresponding similarity threshold;
数值调取模块,用于调取每个观测周期中的每组低于对应的相似度阈值的两个语义向量对应的相似度数值;A value retrieving module, used for retrieving the similarity value corresponding to each group of two semantic vectors below the corresponding similarity threshold in each observation period;
稳定参数获取模块,用于利用所述每个观测周期中的低于对应的相似度阈值的两个语义向量对应的相似度数值获取所述每个观测周期所包含的相似度判断稳定性参数;其中,所述相似度判断稳定性参数通过如下公式获取:The stability parameter acquisition module is used to obtain the similarity judgment stability parameter contained in each observation period by using the similarity values corresponding to the two semantic vectors below the corresponding similarity threshold in each observation period; wherein the similarity judgment stability parameter is obtained by the following formula:
其中,Sw表示相似度判断稳定性参数;n表示低于对应的相似度阈值的两个语义向量对应的组数;S01和S02分别表示第一相似度稳定性参量和第二相似度稳定性参量;Si表示第i组低于对应的相似度阈值的两个语义向量对应的相似度数值;Sy表示相似度阈值;m表示不低于对应的相似度阈值的两个语义向量对应的组数;Szi表示低于对应的相似度阈值的两个语义向量对应的真实相似度数值;Sj表示第j个不低于对应的相似度阈值的两个语义向量对应的相似度数值;Wherein, Sw represents the similarity judgment stability parameter; n represents the number of groups corresponding to two semantic vectors below the corresponding similarity threshold; S01 and S02 represent the first similarity stability parameter and the second similarity stability parameter respectively; Si represents the similarity value corresponding to the i-th group of two semantic vectors below the corresponding similarity threshold; Sy represents the similarity threshold; m represents the number of groups corresponding to two semantic vectors not lower than the corresponding similarity threshold; Szi represents the true similarity value corresponding to the two semantic vectors below the corresponding similarity threshold; Sj represents the similarity value corresponding to the j-th two semantic vectors not lower than the corresponding similarity threshold;
综合参数获取模块,用于将所述每个观测周期所包含的相似度判断稳定性参数进行整合,生成综合相似度判定值;其中,所述相似度判定值通过如下公式获取:The comprehensive parameter acquisition module is used to integrate the similarity judgment stability parameters contained in each observation period to generate a comprehensive similarity judgment value; wherein the similarity judgment value is obtained by the following formula:
其中,Sq表示相似度判定值;k表示观测周期的总个数;Swj表示第j个观测周期的相似度判断稳定性参数;S01j和S02j分别表示第j个观测周期的第一相似度稳定性参量和第二相似度稳定性参量;Wherein, Sq represents the similarity judgment value; k represents the total number of observation periods; Swj represents the similarity judgment stability parameter of the j-th observation period; S01j and S02j represent the first similarity stability parameter and the second similarity stability parameter of the j-th observation period, respectively;
阈值调整执行模块,用于当所述相似度判定值低于预设的判定阈值时,则确定为需要对所述相似度初始阈值进行调整。The threshold adjustment execution module is used to determine that the initial similarity threshold needs to be adjusted when the similarity determination value is lower than a preset determination threshold.
上述技术方案的技术效果为:当两个语义向量的相似度数值低于相似度阈值时,组数提取模块能够提取每个观测周期中低于阈值的语义向量组数,而数值调取模块则能够调取这些低于阈值的语义向量对应的相似度数值。这两个模块共同工作,为后续的稳定性参数计算提供了必要的数据基础。利用每个观测周期中低于相似度阈值的两个语义向量对应的相似度数值,稳定参数获取模块能够计算出每个观测周期的相似度判断稳定性参数。这个参数反映了在每个观测周期内,低于阈值的语义向量相似度的稳定性情况,有助于判断是否需要调整相似度阈值。综合参数获取模块将每个观测周期的相似度判断稳定性参数进行整合,生成综合相似度判定值。这个判定值综合了所有观测周期的信息,提供了一个全面的、用于判断是否需要调整相似度阈值的指标。阈值调整执行模块根据综合相似度判定值与预设的判定阈值的比较结果,自动判断是否需要对相似度初始阈值进行调整。这种自动化的判断和执行方式提高了系统的智能性和响应速度。通过对低于相似度阈值的语义向量进行细致的分析和处理,并结合多个观测周期的数据进行综合判断,该模块能够更准确地识别出需要调整相似度阈值的情况,从而提高了相似度判断的准确性和稳定性。由于语义向量的相似度可能会随着时间和语境的变化而发生变化,该模块通过动态地监测和调整相似度阈值,能够更好地适应这种变化的语义环境,提高了系统的适应性和鲁棒性。The technical effect of the above technical solution is: when the similarity value of two semantic vectors is lower than the similarity threshold, the group number extraction module can extract the number of semantic vector groups below the threshold in each observation period, and the value retrieval module can retrieve the similarity values corresponding to these semantic vectors below the threshold. The two modules work together to provide the necessary data basis for the subsequent stability parameter calculation. Using the similarity values corresponding to the two semantic vectors below the similarity threshold in each observation period, the stability parameter acquisition module can calculate the similarity judgment stability parameter of each observation period. This parameter reflects the stability of the similarity of the semantic vectors below the threshold in each observation period, which helps to determine whether the similarity threshold needs to be adjusted. The comprehensive parameter acquisition module integrates the similarity judgment stability parameters of each observation period to generate a comprehensive similarity judgment value. This judgment value integrates the information of all observation periods and provides a comprehensive indicator for determining whether the similarity threshold needs to be adjusted. The threshold adjustment execution module automatically determines whether the initial similarity threshold needs to be adjusted based on the comparison result between the comprehensive similarity judgment value and the preset judgment threshold. This automated judgment and execution method improves the intelligence and response speed of the system. By carefully analyzing and processing semantic vectors below the similarity threshold and making comprehensive judgments based on data from multiple observation periods, the module can more accurately identify situations where the similarity threshold needs to be adjusted, thereby improving the accuracy and stability of similarity judgments. Since the similarity of semantic vectors may change over time and context, the module can better adapt to this changing semantic environment by dynamically monitoring and adjusting the similarity threshold, thereby improving the adaptability and robustness of the system.
综上所述,上述技术方案中的阈值调整判断模块通过细致的数据处理、综合的参数获取以及自动化的阈值调整判断和执行机制,实现了对相似度阈值的准确、稳定和动态的管理,有助于提高语义相似度判断的准确性和系统的整体性能。In summary, the threshold adjustment judgment module in the above technical solution realizes accurate, stable and dynamic management of the similarity threshold through meticulous data processing, comprehensive parameter acquisition and automated threshold adjustment judgment and execution mechanism, which helps to improve the accuracy of semantic similarity judgment and the overall performance of the system.
视频检索模块包括:The video retrieval module includes:
查询条件处理单元,用于:The query condition processing unit is used to:
接收用户输入的查询条件,对关键词、场景、人物等信息进行提取并转换为语义向量表示;Receive the query conditions entered by the user, extract keywords, scenes, characters and other information, and convert them into semantic vector representation;
视频库索引单元,用于:Video library index unit, used to:
建立视频库的索引结构,以便快速检索与查询条件相似的视频,通过与特征提取模块和语义建模模块交互对视频库中的视频进行预处理,为每个视频生成对应的语义向量表示;Establish an index structure for the video library to quickly retrieve videos similar to the query conditions. Preprocess the videos in the video library by interacting with the feature extraction module and the semantic modeling module to generate a corresponding semantic vector representation for each video.
检索匹配单元,用于:Retrieve matching unit, used to:
通过与相似度计算模块交互对查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算,根据相似度阈值筛选出与用户查询条件最为相关的视频。By interacting with the similarity calculation module, the similarity between the semantic vector of the query condition and the semantic vector of each video in the video library is calculated, and the videos most relevant to the user's query condition are screened out according to the similarity threshold.
在上述实施例中,通过接收用户输入的查询条件,系统能够自动提取关键词、场景、人物等信息,并将其转换为语义向量表示,为用户提供了更加简洁和直观的查询方式,减少了用户需要手动进行特征选择和提取的步骤。In the above embodiment, by receiving the query conditions input by the user, the system can automatically extract keywords, scenes, characters and other information, and convert them into semantic vector representations, providing users with a more concise and intuitive query method and reducing the steps that users need to manually select and extract features.
在上述实施例中,视频库索引单元的建立,以及对视频库中视频的预处理,使得系统能够快速检索与查询条件相似的视频,通过与特征提取模块和语义建模模块的交互,每个视频都被转换为对应的语义向量表示,提高了检索的效率和准确性。In the above embodiment, the establishment of the video library index unit and the preprocessing of the videos in the video library enable the system to quickly retrieve videos similar to the query conditions. Through the interaction with the feature extraction module and the semantic modeling module, each video is converted into a corresponding semantic vector representation, thereby improving the efficiency and accuracy of the retrieval.
在上述实施例中,检索匹配单元通过与相似度计算模块的交互,对查询条件的语义向量与视频库中每个视频的语义向量的相似度进行计算,根据相似度阈值筛选出与用户查询条件最为相关的视频,确保了检索结果的精确性和相关性。In the above embodiment, the retrieval matching unit calculates the similarity between the semantic vector of the query condition and the semantic vector of each video in the video library through interaction with the similarity calculation module, and selects the videos most relevant to the user's query condition according to the similarity threshold, thereby ensuring the accuracy and relevance of the retrieval results.
结果排序模块包括:The result sorting module includes:
结果接收单元,用于:The result receiving unit is used for:
接收相似度计算模块输出的相似度值;Receiving a similarity value output by a similarity calculation module;
排序应用单元,用于:Sorting application unit for:
根据相似度计算结果,应用适当的排序算法对视频进行排序;According to the similarity calculation results, apply an appropriate sorting algorithm to sort the videos;
输出展示单元,用于:Output display unit for:
将排序后的视频检索结果返回给用户,提供清晰、有序的结果展示,方便用户浏览和选择所需视频。The sorted video retrieval results are returned to the user, providing a clear and orderly result display to facilitate users to browse and select the required videos.
在上述实施例中,结果排序模块根据相似度计算结果对视频进行排序,确保了检索结果的准确性和可靠性,能够将最相关和最相似的视频排在前面,提高用户满意度,为用户提供了清晰、有序的结果展示,方便用户浏览和选择所需视频,增强了实用性,有助于增强用户的满意度和忠诚度,促进视频检索系统的广泛应用和接受度。In the above embodiment, the result sorting module sorts the videos according to the similarity calculation results, ensuring the accuracy and reliability of the retrieval results, and can put the most relevant and similar videos in the front, improving user satisfaction, providing users with a clear and orderly result display, facilitating users to browse and select the required videos, enhancing practicality, helping to enhance user satisfaction and loyalty, and promoting the widespread application and acceptance of video retrieval systems.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,根据本发明的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明的保护范围之内。The above description is only a preferred specific implementation manner of the present invention, but the protection scope of the present invention is not limited thereto. Any technician familiar with the technical field can make equivalent replacements or changes according to the technical scheme and inventive concept of the present invention within the technical scope disclosed by the present invention, which should be covered by the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410198615.3A CN118093938B (en) | 2024-02-22 | 2024-02-22 | Video query retrieval method and system based on semantic depth model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410198615.3A CN118093938B (en) | 2024-02-22 | 2024-02-22 | Video query retrieval method and system based on semantic depth model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118093938A CN118093938A (en) | 2024-05-28 |
CN118093938B true CN118093938B (en) | 2024-09-13 |
Family
ID=91148866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410198615.3A Active CN118093938B (en) | 2024-02-22 | 2024-02-22 | Video query retrieval method and system based on semantic depth model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118093938B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118673180B (en) * | 2024-08-23 | 2024-10-18 | 成都华栖云科技有限公司 | Video content retrieval method based on label retrieval and multi-modal vector |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723692A (en) * | 2020-06-03 | 2020-09-29 | 西安交通大学 | A near-duplicate video detection method based on label features of convolutional neural network semantic classification |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8200602B2 (en) * | 2009-02-02 | 2012-06-12 | Napo Enterprises, Llc | System and method for creating thematic listening experiences in a networked peer media recommendation environment |
US9595264B2 (en) * | 2014-10-06 | 2017-03-14 | Avaya Inc. | Audio search using codec frames |
CN111898416A (en) * | 2020-06-17 | 2020-11-06 | 绍兴埃瓦科技有限公司 | Video stream processing method and device, computer equipment and storage medium |
CN114329013B (en) * | 2021-09-29 | 2025-08-19 | 腾讯科技(深圳)有限公司 | Data processing method, device and computer readable storage medium |
-
2024
- 2024-02-22 CN CN202410198615.3A patent/CN118093938B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723692A (en) * | 2020-06-03 | 2020-09-29 | 西安交通大学 | A near-duplicate video detection method based on label features of convolutional neural network semantic classification |
Also Published As
Publication number | Publication date |
---|---|
CN118093938A (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113963315B (en) | A method and system for real-time video multi-person behavior recognition in complex scenes | |
Cong et al. | Towards scalable summarization of consumer videos via sparse dictionary selection | |
Rui et al. | Constructing table-of-content for videos | |
CN101872346B (en) | Method for generating video navigation system automatically | |
CN110070066A (en) | A kind of video pedestrian based on posture key frame recognition methods and system again | |
CN112085072B (en) | Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information | |
CN105205135B (en) | A kind of 3D model retrieval methods and its retrieval device based on topic model | |
CN118093938B (en) | Video query retrieval method and system based on semantic depth model | |
CN114579794B (en) | Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion | |
CN114528762B (en) | Model training method, device, equipment and storage medium | |
CN117392289A (en) | Method and system for automatically generating case field video based on AI (advanced technology attachment) voice | |
CN118799919B (en) | Full-time multi-mode pedestrian re-recognition method based on simulation augmentation and prototype learning | |
CN107688830A (en) | It is a kind of for case string and show survey visual information association figure layer generation method | |
CN119828856B (en) | A face recognition application method and system combined with smart glasses | |
CN109947990A (en) | A kind of wonderful detection method and system | |
CN101876993A (en) | A Texture Feature Extraction and Retrieval Method of Ground-Based Digital Cloud Image | |
CN115131694A (en) | Target tracking method and system based on twin network and YOLO target detection model | |
CN118038494A (en) | A cross-modal person re-identification method robust to damaged scenes | |
CN117851654A (en) | Archives resource retrieval system based on artificial intelligence pronunciation and image recognition | |
CN116824490A (en) | A camera monitoring network target matching method based on camera network topology | |
CN116708941A (en) | Video pushing method and system based on face recognition technology | |
CN114708653A (en) | Specified pedestrian action retrieval method based on pedestrian re-identification algorithm | |
CN109101653A (en) | The search method and its system of a kind of video file and application | |
CN119495125B (en) | A combined sports action detection method | |
CN110879970A (en) | Video interest area face abstraction method and device based on deep learning and storage device thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |