CN104462489B

CN104462489B - A kind of cross-module state search method based on Deep model

Info

Publication number: CN104462489B
Application number: CN201410800393.4A
Authority: CN
Inventors: 李睿凡; 张光卫; 鲁鹏; 芦效峰; 冯方向; 李蕾; 刘咏彬; 王小捷
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2014-12-18
Filing date: 2014-12-18
Publication date: 2018-02-23
Anticipated expiration: 2034-12-18
Also published as: CN104462489A

Abstract

The present invention proposes a cross-modal retrieval method based on a deep model, which includes: using a feature extraction method to obtain the target retrieval modality and the low-level expression vector of each retrieved modality in the retrieval database; the target retrieval modality The low-level expression vectors are respectively associated with the low-level expression vectors of each retrieved modality in the retrieval library, and the high-level expression vectors of the target retrieval modality and each of the retrieved modality are obtained by stacking the corresponding Restricted Boltzmann Machine Corr‑RBMs deep model. The high-level expression vector of the retrieved modality; using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval library to calculate the distance between the target retrieval modality and each retrieved modality in the retrieval library; Determining at least one retrieved modality closest to the target retrieval modality in the retrieval library as an object matching the target retrieval modality.

Description

A Deep Model-Based Cross-Modal Retrieval Method

技术领域technical field

本发明涉及多媒体检索技术,特别是一种基于深层模型的跨模态检索方法。The invention relates to multimedia retrieval technology, in particular to a cross-modal retrieval method based on a deep model.

背景技术Background technique

近些年互联网的发展使得多模态的数据呈现爆炸式增长。例如，电子商务网站上的产品通常包含主干文字、简短的文本描述、以及相关的图片；社交网站上分享的图片通常伴有标记的描述词；一些在线新闻上包含的图片和视频信息比单纯的文字报道更具有吸引力，多模态数据的快速增长带来了巨大的跨模态检索需求。In recent years, the development of the Internet has resulted in an explosive growth of multimodal data. For example, products on e-commerce websites usually contain main text, short text descriptions, and related pictures; pictures shared on social networking sites are usually accompanied by tagged descriptors; pictures and video information contained in some online news are more than simple Text reports are more attractive, and the rapid growth of multimodal data has brought about a huge demand for cross-modal retrieval.

与传统的单模态检索不同，跨模态检索更多关注不同模态间的关系。因此，跨模态检索问题包含两个挑战问题：一是来自不同模态的数据具有完全不同的统计特性，这使得很难直接获得不同模态数据的关联关系；二是从不同模态数据中抽取的特征通常具有高维的特性并且数据集的规模非常大，这使得高效的检索不容易实现。Different from traditional unimodal retrieval, cross-modal retrieval pays more attention to the relationship between different modalities. Therefore, the cross-modal retrieval problem contains two challenges: one is that the data from different modalities have completely different statistical properties, which makes it difficult to directly obtain the association relationship of different modal data; The extracted features usually have high-dimensional characteristics and the size of the data set is very large, which makes efficient retrieval difficult to achieve.

发明内容Contents of the invention

有鉴于此，本发明提供了一种基于深层模型的跨模态检索方法，应用深层模型解决跨模态数据的处理问题，使得经深层模型处理后的跨模态数据能够高效的进行距离计算，从而得到较优的检索结果。本发明提出的技术方案是：In view of this, the present invention provides a cross-modal retrieval method based on a deep model, which uses a deep model to solve the processing problem of cross-modal data, so that the cross-modal data processed by the deep model can efficiently perform distance calculation, So as to get better search results. The technical scheme that the present invention proposes is:

一种基于深层模型的跨模态检索方法，该方法包括：A deep model-based cross-modal retrieval method comprising:

利用特征提取方法分别获得目标检索模态与检索库中每一个被检索模态的低级表达向量；Using the feature extraction method to obtain the low-level expression vector of the target retrieval modality and each retrieved modality in the retrieval database;

所述目标检索模态的低级表达向量分别与所述检索库中每一个被检索模态的低级表达向量，通过堆叠对应的受限波尔兹曼机Corr-RBMs深层模型获得所述目标检索模态的高级表达向量和所述检索库中每一个被检索模态的高级表达向量；The low-level expression vectors of the target retrieval modality are respectively connected with the low-level expression vectors of each retrieved modality in the retrieval library, and the target retrieval modality is obtained by stacking the corresponding Restricted Boltzmann Machine Corr-RBMs deep model The high-level expression vector of the state and the high-level expression vector of each retrieved modality in the retrieval library;

利用所述目标检索模态的高级表达向量和所述检索库中每一个被检索模态的高级表达向量计算所述目标检索模态与所述检索库中每一个被检索模态的距离；calculating the distance between the target retrieval modality and each retrieved modality in the retrieval library by using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval library;

将所述检索库中与所述目标检索模态距离最近的至少一个被检索模态确定为与所述目标检索模态匹配的对象。Determining at least one retrieved modality closest to the target retrieval modality in the retrieval library as an object matching the target retrieval modality.

综上所述，本发明技术方案提出了一种基于深层模型的跨模态检索方法，对于跨模态原始数据进行特征提取获得的低级表达，通过堆叠对应的受限波尔兹曼机(Corr-RBM，Correspondence Restricted Boltzmann Machine)的Corr-RBMs深层模型的处理，得到跨模态数据在相同表示空间中的低维高级表达，进而对跨模态数据的低维高级表达进行距离计算，根据距离确定检索结果。To sum up, the technical solution of the present invention proposes a deep model-based cross-modal retrieval method. For the low-level representation obtained by feature extraction of cross-modal raw data, the corresponding Restricted Boltzmann Machine (Corr -RBM, Correspondence Restricted Boltzmann Machine)'s Corr-RBMs deep model processing, to obtain the low-dimensional high-level expression of cross-modal data in the same representation space, and then perform distance calculation on the low-dimensional high-level expression of cross-modal data, according to the distance Confirm the search results.

附图说明Description of drawings

图1为本发明技术方案的流程图；Fig. 1 is the flowchart of technical scheme of the present invention;

图2为本发明Corr-RBMs深层模型神经网络结构图；Fig. 2 is the neural network structure diagram of Corr-RBMs deep layer model of the present invention;

图3为本发明Corr-RBM模型神经网络结构图；Fig. 3 is a Corr-RBM model neural network structural diagram of the present invention;

图4为受限波尔兹曼机RBM模型的结构图；Fig. 4 is a structural diagram of a restricted Boltzmann machine RBM model;

图5为根据目标函数F确定Θ的方法流程图；Fig. 5 is the method flow chart that determines Θ according to objective function F;

图6为本发明实施例的流程图。Fig. 6 is a flowchart of an embodiment of the present invention.

具体实施方式Detailed ways

为解决跨模态间的检索问题，本发明提出一种基于Corr-RBMs深层模型的跨模态检索方法，本发明技术方案的流程图如图1所示，包括以下步骤：In order to solve the cross-modal retrieval problem, the present invention proposes a cross-modal retrieval method based on the Corr-RBMs deep model. The flow chart of the technical solution of the present invention is shown in Figure 1, including the following steps:

步骤101：利用特征提取方法分别获得目标检索模态与检索库中任一被检索模态的低级表达向量。Step 101: Obtain the low-level expression vectors of the target retrieval modality and any retrieved modality in the retrieval database by feature extraction method.

本步骤中，为在检索库中检索与目标检索模态匹配的对象，首先需要对目标检索模态和检索库中任一被检索模态的低级表达向量，特征提取方法获得的低级表达向量一般维数较高，且不同模态的低级表达向量元素各异，一般不能直接用于检索运算。In this step, in order to retrieve objects that match the target retrieval modality in the retrieval database, firstly, the low-level expression vectors of the target retrieval modality and any retrieved modality in the retrieval database are needed. The low-level expression vectors obtained by the feature extraction method are generally The dimensionality is high, and the elements of the low-level expression vectors of different modalities are different, and generally cannot be directly used for retrieval operations.

步骤102：目标检索模态的低级表达向量分别与检索库中每一个被检索模态的低级表达向量，通过堆叠对应的受限波尔兹曼机Corr-RBMs深层模型获得目标检索模态的高级表达向量和检索库中每一个被检索模态的高级表达向量。Step 102: The low-level expression vector of the target retrieval modality is separately connected with the low-level expression vector of each retrieved modality in the retrieval library, and the high-level expression vector of the target retrieval modality is obtained by stacking the corresponding Restricted Boltzmann Machine Corr-RBMs deep model Expression vectors and high-level expression vectors for each retrieved modality in the retrieval library.

本步骤中，将目标检索模态的低级表达向量分别与检索库中每一个被检索模态的低级表达向量作为一个组合，通过堆叠对应的受限波尔兹曼机Corr-RBMs深层模型获得目标检索模态的高级表达向量和检索库中每一个被检索模态的高级表达向量。通过Corr-RBMs深层模型得到的目标检索模态的高级表达向量和检索库中每一个被检索模态的高级表达向量具有低维、空间元素一致等特点，能够高效的进行检索运算。In this step, the low-level expression vector of the target retrieval modality is combined with the low-level expression vector of each retrieved modality in the retrieval library, and the target is obtained by stacking the corresponding Restricted Boltzmann Machine Corr-RBMs deep model The high-level representation vector of the retrieval modality and the high-level representation vector of each retrieved modality in the retrieval library. The high-level expression vector of the target retrieval modality obtained through the Corr-RBMs deep model and the high-level expression vector of each retrieved modality in the retrieval database have the characteristics of low dimensionality and consistent spatial elements, and can perform retrieval operations efficiently.

步骤103：利用目标检索模态的高级表达向量和检索库中每一个被检索模态的高级表达向量计算目标检索模态与检索库中任一被检索模态的距离。Step 103: Using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval database to calculate the distance between the target retrieval modality and any retrieved modality in the retrieval database.

具体地，可以用欧氏距离表示目标检索模态与检索库中每一个被检索模态的距离。Specifically, the Euclidean distance can be used to represent the distance between the target retrieval modality and each retrieved modality in the retrieval library.

步骤104：将检索库中与目标检索模态距离最近的至少一个被检索模态确定为与目标检索模态匹配的对象。Step 104: Determine at least one retrieved modality closest to the target retrieval modality in the retrieval database as an object matching the target retrieval modality.

本步骤中，将检索库中每个被检索模态与目标检索模态的距离进行排序，选择距离目标检索模态最近的至少一个被检索模态确定为与目标检索模态匹配的对象。In this step, the distance between each retrieved modality in the retrieval database and the target retrieval modality is sorted, and at least one retrieved modality closest to the target retrieval modality is selected as an object matching the target retrieval modality.

本发明提出了一种使用堆叠Corr-RBM的Corr-RBMs深层模型进行跨模态检索的方法，图2为本发明堆叠Corr-RBM的Corr-RBMs深层模型神经网络结构图，如图2所示，Corr-RBMs深层模型由至少两层Corr-RBM模型堆叠而成，该Corr-RBMs深层模型能够由两种不同模态原始数据的低级表达获得该两种不同模态原始数据的高级表达；每层Corr-RBM模型神经网络结构图如图3所示，Corr-RBM模型是在受限波尔兹曼机RBM的基础上建立的，图4为受限波尔兹曼机的神经网络结构图，下面分别对RBM模型、Corr-RBM模型以及Corr-RBMs深层模型进行详细介绍。The present invention proposes a method for cross-modal retrieval using the Corr-RBMs deep model of the stacked Corr-RBM, and Fig. 2 is a neural network structure diagram of the Corr-RBMs deep model neural network of the stacked Corr-RBM of the present invention, as shown in Fig. 2 , the Corr-RBMs deep model is stacked by at least two layers of Corr-RBM models, and the Corr-RBMs deep model can obtain the high-level expression of the original data of two different modalities from the low-level expression of the original data of two different modalities; each The neural network structure diagram of the layered Corr-RBM model is shown in Figure 3. The Corr-RBM model is established on the basis of the RBM of the restricted Boltzmann machine, and Figure 4 is the neural network structure diagram of the restricted Boltzmann machine , the RBM model, Corr-RBM model and Corr-RBMs deep model are introduced in detail below.

(一)RBM模型：(1) RBM model:

图4为RBM的神经网络结构图，如图4所示，RBM可见层V包含m个神经单元v₁～v_m，每个神经单元v_i的偏置为b_i，可见层神经单元之间没有连接；隐藏层H包含s个神经单元h₁～h_s，每个神经单元h_j的偏置为c_j，可见层神经单元之间没有连接；可见层神经单元v_i与隐藏层神经单元h_j的连接权值为w_ij。为了便于理解，图4中仅画出了部分可见层神经单元与隐藏层神经单元的连接权值。Fig. 4 is the neural network structure diagram of RBM. As shown in Fig. 4, the visible layer V of RBM includes m neural units v ₁ ~ v _m , and the bias of each neuron unit v _i is b _i . There is no connection; the hidden layer H contains s neuron units h ₁ ～h _s , the bias of each neuron unit h _j is c _j , and there is no connection between the neuron units in the visible layer; the neuron unit v _i in the visible layer is connected to the neuron unit in the hidden layer The connection weight of h _j is w _ij . For ease of understanding, only part of the connection weights between visible layer neural units and hidden layer neural units are drawn in FIG. 4 .

RBM具有无向图的结构，具有Logistic激活函数δ(x)＝1/(1+exp(-x))，则可见层V和隐藏层H神经单元的联合概率分布为：RBM has an undirected graph structure, with a Logistic activation function δ(x)=1/(1+exp(-x)), then the joint probability distribution of the visible layer V and hidden layer H neural units is:

其中，Z为归一化常数，E(v,h)是由RBM的可见层神经单元、隐藏层神经单元的不同配置定义的能量函数，根据可见层神经单元、隐藏层神经单元的不同配置，E(v,h)有不同的表示，即只要RBM的可见层神经单元配置与隐藏层神经单元配置确定，就有相应的能量函数，在此不作详细介绍。Among them, Z is a normalization constant, and E(v,h) is an energy function defined by different configurations of visible layer neural units and hidden layer neural units of RBM. According to different configurations of visible layer neural units and hidden layer neural units, E(v,h) has different representations, that is, as long as the configuration of the visible layer neuron unit and the hidden layer neuron unit of the RBM are determined, there will be a corresponding energy function, which will not be described in detail here.

RBM的可见层神经单元v_i的偏置b_i、隐藏层神经单元h_j的偏置c_j、可见层神经单元v_i与隐藏层神经单元h_j的连接权值w_ij的学习可以通过比照散度估计算法得到，比照散度估记算法为较为成熟的现有技术，在此不作详细介绍。The bias b _i of the visible layer neuron unit v _i of RBM, the bias c _j of the hidden layer neuron unit h _j , the connection weight w _ij of the visible layer neuron unit v _i and the hidden layer neuron unit h _j can be learned by comparing The divergence estimation algorithm is obtained, compared with the divergence estimation algorithm, which is a relatively mature prior art, and will not be introduced in detail here.

(二)对应的受限波尔兹曼机Corr-RBM模型：(2) The corresponding restricted Boltzmann machine Corr-RBM model:

图3为本发明Corr-RBM模型的结构图，如图3所示，Corr-RBM模型包含第一模态RBM和第二模态RBM，第一模态RBM与第二模态RBM包含有相同的可见层神经单元数目m和相同的隐藏层神经单元数目s，并且第一模态RBM与所述第二模态RBM的隐藏层之间具有相关性约束。Fig. 3 is the structural diagram of Corr-RBM model of the present invention, as shown in Fig. 3, Corr-RBM model comprises the first mode RBM and the second mode RBM, and the first mode RBM and the second mode RBM comprise the same The number m of neuron units in the visible layer and the same number s of neuron units in the hidden layer, and there is a correlation constraint between the hidden layer of the first modality RBM and the second modality RBM.

假定Θ表示Corr-RBM模型的参数集合,即Θ＝{W^I,C^I,B^I,W^T,C^T,B^T}，其中，上标I表示第一模态，上标T表示第二模态，具体地，W^I为第一模态RBM的各可见层神经单元与隐藏层神经单元之间的连接权值参数集合，C^I为第一模态RBM的可见层神经单元偏置参数集合，B^I为第一模态RBM的隐藏层神经单元偏置参数集合，W^T为第二模态RBM的各可见层神经单元与隐藏层神经单元之间的连接权值参数集合，C^T为第二模态RBM的可见层神经单元偏置参数集合，B^T为第二模态RBM的隐藏层神经单元偏置参数集合。It is assumed that Θ represents the parameter set of the Corr-RBM model, that is, Θ={W ^I , C ^I , B ^I , W ^T , C ^T , B ^T }, where the superscript I represents the first mode, and the superscript T represents the first mode. Two modes, specifically, W ^I is the connection weight parameter set between each visible layer neuron unit and hidden layer neuron unit of the first mode RBM, and C ^I is the bias of the visible layer neuron unit of the first mode RBM Parameter set, B ^I is the hidden layer neuron unit bias parameter set of the first modality RBM, W ^T is the connection weight parameter set between each visible layer neuron unit and the hidden layer neuron unit of the second modality RBM, C ^T is the visible layer neuron unit bias parameter set of the second modality RBM, and B ^T is the hidden layer neuron unit bias parameter set of the second modality RBM.

Corr-RBM模型的参数集合Θ通过下面的参数学习算法进行确定：The parameter set Θ of the Corr-RBM model is determined by the following parameter learning algorithm:

根据下述原则定义目标函数F：Corr-RBM模型的参数集合Θ能够最小化第一模态与第二模态在共享表示空间上的距离，以及最小化第一模态和第二模态的负对数似然函数。目标函数F为F＝l_D+αl_I+βl_T，即Θ为令F最小的参数集合。The objective function F is defined according to the following principle: the parameter set Θ of the Corr-RBM model can minimize the distance between the first modality and the second modality in the shared representation space, and minimize the distance between the first modality and the second modality Negative log-likelihood function. The objective function F is F=l _D +αl _I +βl _T , that is, Θ is the set of parameters that minimizes F.

其中，in,

其中，l_D为第一模态与第二模态在嵌套空间上的距离，l_I为第一模态的负对数似然函数，l_T为第二模态的负对数似然函数，α和β是常数，α∈(0,1)，β∈(0,1)；f_I(·)是第一模态RBM可见层到隐藏层的映射函数，f_T(·)是第二模态RBM可见层到隐藏层的映射函数；p_I(·)为第一模态RBM可见层和隐藏层神经单元的联合概率分布，p_T(·)为第二模态RBM可见层和隐藏层神经单元的联合概率分布，||·||为二范数映射。Among them, l _D is the distance between the first mode and the second mode in the nesting space, l _I is the negative log-likelihood function of the first mode, and l _T is the negative log-likelihood of the second mode function, α and β are constants, α∈(0,1), β∈(0,1); f _I (·) is the mapping function from the visible layer to the hidden layer of the first modality RBM, and f _T (·) is The mapping function from the second modality RBM visible layer to the hidden layer; p _I (·) is the joint probability distribution of the first modality RBM visible layer and the hidden layer neural unit, p _T (·) is the second modality RBM visible layer and the joint probability distribution of hidden layer neural units, ||·|| is a two-norm mapping.

为了根据目标函数F确定Θ，可以采用交替迭代的优化过程，首先对两个似然函数l_I和l_T采用比照散度估记算法进行更新，然后使用梯度下降法更新l_D，收敛性可以在验证集上使用跨模态检索进行检测，具体地，图5为根据目标函数F确定Θ的流程图，包括以下步骤：In order to determine Θ according to the objective function F, an alternate iterative optimization process can be used. First, the two likelihood functions l _I and l _T are updated using the contrastive divergence estimation algorithm, and then the gradient descent method is used to update l _D . The convergence can be On the verification set, use cross-modal retrieval to detect, specifically, Fig. 5 is a flowchart of determining Θ according to the objective function F, including the following steps:

步骤501：利用比照散度估记算法更新第一模态RBM的参数。Step 501: Utilize the contrastive divergence estimation algorithm to update the parameters of the first mode RBM.

第一模态RBM的可见层神经单元与隐藏层神经单元之间的连接权值参数集合可见层神经单元的偏置和隐藏层神经单元的偏置用θ^I统一表示，根据公式θ^I←θ^I+τ·α·△θ^I进行更新，其中τ为学习速率，τ∈(0,1)；α∈(0,1)；并且，The set of connection weight parameters between the visible layer neuron unit and the hidden layer neuron unit of the first mode RBM Visible layer neural unit bias and hidden layer neurons bias Expressed uniformly by θ ^I , updated according to the formula θ ^I ←θ ^I +τ·α·△θ ^I , where τ is the learning rate, τ∈(0,1); α∈(0,1); and,

其中，<·>_data为经验分布下的数学期望，<·>_model为模型分布下的数学期望；Among them, <·> _data is the mathematical expectation under the empirical distribution, <·> _model is the mathematical expectation under the model distribution;

步骤502：利用比照散度估记算法更新第二模态RBM的参数。Step 502: Utilize the contrast divergence estimation algorithm to update the parameters of the second mode RBM.

第二模态RBM的可见层神经单元与隐藏层神经单元之间的连接权值参数集合可见层神经单元的偏置和隐藏层神经单元的偏置用θ^T统一表示，根据公式θ^T←θ^T+τ·β·△θ^T进行更新，其中β∈(0,1)；并且，A set of connection weight parameters between the visible layer neuron unit and the hidden layer neuron unit of the second modality RBM Visible layer neural unit bias and hidden layer neurons bias Expressed uniformly by θ ^T , and updated according to the formula θ ^T ←θ ^T +τ·β·△θ ^T , where β∈(0,1); and,

步骤503：使用梯度下降的方法更新第一模态与第二模态在嵌套空间上的距离。Step 503: Using gradient descent method to update the distance between the first mode and the second mode in the nesting space.

具体地，根据以下公式使用梯度下降的方法更新第一模态与第二模态在嵌套空间上的距离l_D：Specifically, the distance l _D between the first mode and the second mode in the nesting space is updated using the method of gradient descent according to the following formula:

其中，δ'(·)＝δ(·)(1-δ(·))，且δ(·)为Logistic激活函数δ(x)＝1/(1+exp(-x))。Wherein, δ'(·)=δ(·)(1-δ(·)), and δ(·) is the Logistic activation function δ(x)=1/(1+exp(-x)).

步骤504：重复步骤501～503，直至算法收敛。Step 504: Repeat steps 501-503 until the algorithm converges.

通过上述方法即可获得Corr-RBM模型的参数集合Θ。The parameter set Θ of the Corr-RBM model can be obtained by the above method.

(三)Corr-RBMs深层模型(3) Corr-RBMs deep model

图2为Corr-RBMs深层模型神经网络结构图，如图2所示，Corr-RBMs深层模型由至少两层对应的受限波尔兹曼机Corr-RBM模型堆叠而成，Corr-RBMs深层模型包括第一模态Corr-RBMs和第二模态Corr-RBMs，第一模态Corr-RBMs处理目标检索模态低级表达，第二模态Corr-RBMs处理检索库中任一被检索模态的低级表达。Figure 2 is a neural network structure diagram of the Corr-RBMs deep model. Including the first modal Corr-RBMs and the second modal Corr-RBMs, the first modal Corr-RBMs deals with the low-level expression of the target retrieval modality, and the second modal Corr-RBMs deals with any retrieved modality in the retrieval library low-level expression.

底层Corr-RBM的第一模态RBM可见层神经单元的输入为第一模态原始数据经特征提取获得的第一模态的低级表达，底层Corr-RBM的第二模态RBM可见层神经单元的输入为第二模态原始数据经特征提取获得的第一模态的低级表达，由原始数据经过特定提取获得低级表达为现有技术，在此不作详细介绍。The input of the first modality RBM visible layer neural unit of the underlying Corr-RBM is the low-level expression of the first modality obtained by feature extraction from the raw data of the first modality, and the second modality RBM visible layer neural unit of the underlying Corr-RBM The input is the low-level representation of the first modality obtained through feature extraction from the raw data of the second modality, and the low-level representation obtained from the raw data through specific extraction is a prior art, which will not be described in detail here.

顶层Corr-RBM的第一RBM隐藏层输出第一模态的高级表达，顶层Corr-RBM的第二RBM隐藏层输出第二模态的高级表达。The first RBM hidden layer of the top Corr-RBM outputs a high-level representation of the first modality, and the second RBM hidden layer of the top Corr-RBM outputs a high-level representation of the second modality.

为使本发明的目的、技术方案和优点表达的更加清楚明白，下面结合附图及具体实施例对本发明再作进一步详细的说明。In order to make the object, technical solution and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

本实施例假设检索库包括N个被检索模态，以在该检索库中检索与图片P相关的对象为例对本发明技术方案进行说明，图6为本实施例的流程图，如图6所示，包括以下步骤：In this embodiment, it is assumed that the retrieval database includes N retrieved modalities, and the technical solution of the present invention is described by taking the retrieval of an object related to the picture P in the retrieval database as an example. Figure 6 is a flow chart of this embodiment, as shown in Figure 6 , including the following steps:

步骤601：采用特征提取方法获得检索库中各被检索模态的低级表达，以及图片P的低级表达。Step 601: Obtain the low-level representation of each retrieved modality in the retrieval database and the low-level representation of the picture P by using a feature extraction method.

本步骤中，检索库中的被检索模态的模态种类不作限定，可能为图像模态，可能为文本模态，也可能为语音模态，不同模态的原始数据目前均有较成熟的特征提取方法，例如图像模态可以应用MPEG-7和Gist描述符进行特征提取，文本模态可以应用词袋模型进行特征提取等，在此不再对获得图片P与检索库中各被检索模态的低级表达的过程进行详细描述。In this step, the modal types of the retrieved modalities in the retrieval database are not limited. They may be image modalities, text modalities, or voice modalities. The original data of different modalities currently have relatively mature Feature extraction methods, for example, MPEG-7 and Gist descriptors can be used for feature extraction for image modality, feature extraction can be done for text modality using bag-of-words model, etc. The process of low-level expression of the state is described in detail.

步骤602：图片P的低级表达分别与检索库中每个被检索模态的低级表达通过Corr-RBMs深层模型处理，获得图片P的高级表达与检索库中每个被检索模态的的高级表达，再利用图片P的高级表达与检索库中每个被检索模态的的高级表达进行欧氏距离计算，计算图片P与检索库中每个被检索模态的欧氏距离。Step 602: The low-level expression of the picture P and the low-level expression of each retrieved modality in the retrieval database are respectively processed through the Corr-RBMs deep model to obtain the high-level expression of the picture P and the high-level expression of each retrieved modality in the retrieval database , and then use the high-level expression of the picture P and the high-level expression of each retrieved modality in the retrieval database to perform Euclidean distance calculation, and calculate the Euclidean distance between the picture P and each retrieved modality in the retrieval database.

本步骤中，将检索库中任一个被检索模态与图片P作为一个组合，通过Corr-RBMs深层模型对组合中的被检索模态低级表达和图片P的低级表达进行处理，获得该组合中被检索模态的高级表达、图片P的高级表达，然后根据欧氏距离计算公式计算图片P与该被检索模态的欧氏距离。In this step, any retrieved modality and picture P in the retrieval library are taken as a combination, and the low-level expression of the retrieved modality in the combination and the low-level expression of the picture P are processed through the Corr-RBMs deep model to obtain the The high-level representation of the retrieved modality, the high-level representation of the picture P, and then calculate the Euclidean distance between the picture P and the retrieved modality according to the Euclidean distance calculation formula.

一般的,对于n维欧式空间中的两点t和y,他们的距离d的计算公式为以此计算图片P和任一被检索模态的欧氏距离。Generally, for two points t and y in n-dimensional Euclidean space, the formula for calculating their distance d is Calculate the Euclidean distance between the picture P and any retrieved modality.

步骤603：根据图片P与检索库中每个被检索模态的欧氏距离由低到高进行排序，选择排在前面的K个被检索模态作为检索结果输出。Step 603: Sort according to the Euclidean distance between the picture P and each retrieved modality in the retrieval library from low to high, and select the top K retrieved modalities as the retrieval result output.

本实施例通过Corr-RBMs深层模型对图片模态的低级表达和检索库中各被检索模态的低级表达进行处理，获得各自的高级表达，再利用高级表达进行欧氏距离计算能够高效的获得检索结果。In this embodiment, the low-level expression of the image modality and the low-level expression of each retrieved modality in the retrieval library are processed through the Corr-RBMs deep model to obtain their respective high-level expressions, and then the Euclidean distance calculation can be efficiently obtained by using the high-level expressions Search Results.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A cross-modal retrieval method based on a deep model, wherein the deep model is a stack-up corresponding restricted Boltzmann machine (Corr-RBMs) deep model, and the method comprises the following steps:

respectively obtaining a target retrieval mode and a low-level expression vector of each retrieved mode in a retrieval library by using a feature extraction method;

the low-level expression vector of the target retrieval mode is respectively matched with the low-level expression vector of each retrieved mode in the retrieval library, and the high-level expression vector of the target retrieval mode and the high-level expression vector of each retrieved mode in the retrieval library are obtained by stacking corresponding limited Boltzmann machine Corr-RBMs deep models;

calculating the distance between the target retrieval modality and each retrieved modality in the retrieval library by using the high-level expression vector of the target retrieval modality and the high-level expression vector of each retrieved modality in the retrieval library;

determining at least one retrieved modality in the retrieval library which is closest to the target retrieval modality as an object matched with the target retrieval modality;

wherein,

the deep layer models of the Corr-RBMs are formed by stacking at least two layers of corresponding limited Boltzmann machine Corr-RBMs, each deep layer model of the Corr-RBMs comprises first-mode Corr-RBMs and second-mode Corr-RBMs, the first-mode Corr-RBMs process the low-level expression vectors of the target retrieval modes, and the second-mode Corr-RBMs process the low-level expression vectors of any retrieved mode in the retrieval library;

the Corr-RBM includes a first-mode restricted boltzmann machine RBM and a second-mode restricted boltzmann machine RBM, the first-mode RBM and the second-mode RBM include the same number of visible layer neural units and the same number of hidden layer neural units, and a hidden layer of the first-mode RBM and the second-mode RBM has a dependency constraint therebetween.

2. The method of claim 1, further comprising:

configuration parameters theta = { W of Corr-RBM ^I ,C ^I ,B ^I ,W ^T ,C ^T ,B ^T Wherein superscript I denotes a first modality, superscript T denotes a second modality, in particular W ^I A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the first modality, C ^I Set of visible layer neural unit bias parameters for RBM of first modality, B ^I Hidden layer neural cell bias parameter set, W, for first modality RBM ^T A set of connection weight parameters between the visible layer neural units and the hidden layer neural units of the RBM of the second modality, C ^T Set of visible layer neural unit bias parameters for RBM of second modality, B ^T A hidden layer neural unit bias parameter set of a second mode RBM;

the configuration parameters theta of the corresponding restricted Boltzmann machine Corr-RBM are order objective functionsMinimum configuration parameters, and

wherein,the distance between the first mode and the second mode on the nesting space,is a negative log-likelihood function of the first modality,a negative log-likelihood function for the second modality; α and β are constants, and α ∈ (0,1), β ∈ (0,1); f. of _I (. Is a first modality RBM visible layer to hidden layer mapping function, f _T () and a second modality RBM visible layer to hidden layer mapping function; p is a radical of _I (.) a joint probability distribution of the RBM visible layer and hidden layer neural units in the first modality, p _T () a joint probability distribution of the visible layer and hidden layer neural units for the second modality RBM; | | · | | is a two-norm mapping; v refers to a visible unit in the RBM, corresponding to a visible variable; m is the number of modal samples.

3. The method according to claim 2, wherein the algorithm for determining Θ from the objective function F is:

A. set of connection weight parameters between visible layer neural units and hidden layer neural units of first modality RBMVisible layer nerve cellIs offset fromAnd hidden layer nerve cellIs offset fromBy theta ^I Expressed uniformly according to the formula theta ^I ←θ ^I +τ·α·△θ ^I Updating, wherein tau is a learning rate and tau e (0,1); α ∈ (0,1);and,

wherein,<·> _data for the mathematical expectations under the empirical distribution,<·> _model is a mathematical expectation under a model distribution;

B. set of connection weight parameters between visible layer neural units and hidden layer neural units of second modality RBMVisible layer nerve cellIs offset fromAnd hidden layer nerve unitIs offset fromBy theta ^T Expressed uniformly according to the formula theta ^T ←θ ^T +τ·β·△θ ^T Updating, wherein the beta epsilon (0,1);and,

C. updating using a gradient descent method according to the following formula

Wherein δ' (·) = δ (·) (1- δ (·)), and δ (·) is a Logistic activation function δ (x) = 1/(1 + exp (-x));

and repeating the steps A to C until the algorithm converges.