CN105551503B

CN105551503B - Based on the preselected Audio Matching method for tracing of atom and system

Info

Publication number: CN105551503B
Application number: CN201510982266.5A
Authority: CN
Inventors: 胡瑞敏; 姜林; 胡霞; 王晓晨; 涂卫平; 张茂胜; 李登实
Original assignee: Wuhan University WHU
Current assignee: Booslink Suzhou Information Technology Co ltd
Priority date: 2015-12-24
Filing date: 2015-12-24
Publication date: 2019-03-01
Anticipated expiration: 2035-12-24
Also published as: CN105551503A

Abstract

The invention discloses a kind of Audio Matching method for tracing and system preselected based on atom, the invention firstly uses correlations existing between signal energy and Auditory Perception, pretreatment based on energy is carried out to original signal, extracts the higher part signal of its Energy distribution；Matched jamming is carried out for the part signal again, obtains sparse coefficient；Signal reconstruction is carried out by sparse coefficient and original dictionary.Computation complexity and calculating speed can be greatly reduced while guaranteeing sound quality without decline in the present invention.

Description

Audio matching tracking method and system based on atomic preselection

技术领域technical field

本发明属于音频编码技术领域，尤其涉及一种基于原子预选择的音频匹配追踪方法与系统。The invention belongs to the technical field of audio coding, and in particular relates to an audio matching tracking method and system based on atomic pre-selection.

背景技术Background technique

稀疏表示一般指用尽量少的基函数来准确地表示原始信号，从而抓住信号的主要特征，进而从本质上降低信号处理成本。匹配追踪(MP，Matching pursuit)作为使用较广的稀疏表示算法之一，其基本思路是在迭代过程中依次从过完备字典中选择最优原子，使得信号的逼近更为优化。由于MP算法用来表示信号的过完备字典基可以自适应根据信号本身的特点来灵活地选取；并且其在原子选择过程中采取的是一种重复迭代逼近的贪婪算法，保证了最终得到的原子系数个数较少，MP算法被广泛应用于信号分析的各个领域，如图像处理，生物医学信号处理，音频处理等。Sparse representation generally refers to using as few basis functions as possible to accurately represent the original signal, so as to capture the main features of the signal, thereby substantially reducing the cost of signal processing. Matching pursuit (MP, Matching pursuit) is one of the widely used sparse representation algorithms. Its basic idea is to select the optimal atom from the overcomplete dictionary in turn in the iterative process, so that the approximation of the signal is more optimal. Since the overcomplete dictionary base used by the MP algorithm to represent the signal can be adaptively and flexibly selected according to the characteristics of the signal itself; and it adopts a greedy algorithm of repeated iterative approximation in the process of atom selection, which ensures that the final atomic The number of coefficients is small, and the MP algorithm is widely used in various fields of signal analysis, such as image processing, biomedical signal processing, audio processing, etc.

随着人们对流媒体质量要求的提高以及移动终端用户数量的不断增加，音视频编码效率的要求也日渐提高。传统匹配追踪算法因其计算复杂度较高，不适应于实时处理。目前已提出多种快速匹配追踪算法，如文献1的联合字典方法，文献2的算法改进优化方法，然而这些算法都涉及耗时的优化，或是牺牲稀疏表示效率为补偿，计算速度也难以满足大规模问题的需要，文献3等人提出一种基于短时Gabor原子的遍历型算法，利用非完备的定长原子从信号起始端至终端遍历，多次迭代选择最优匹配原子得到最终稀疏系数。这一算法字典的数据量非常小，在降低计算复杂度的同时有效减少了存储计算负担。With the improvement of people's requirements for streaming media quality and the continuous increase of the number of mobile terminal users, the requirements for audio and video coding efficiency are also increasing day by day. The traditional matching pursuit algorithm is not suitable for real-time processing due to its high computational complexity. At present, a variety of fast matching pursuit algorithms have been proposed, such as the joint dictionary method in Reference 1 and the algorithm improvement optimization method in Reference 2. However, these algorithms involve time-consuming optimization, or sacrifice the efficiency of sparse representation for compensation, and the calculation speed is also difficult to meet. The Need for Large-Scale Problems, Reference 3 et al. proposed a traversal algorithm based on short-term Gabor atoms, which uses incomplete fixed-length atoms to traverse from the start to the end of the signal, and selects the optimal matching atom for multiple iterations to obtain the final sparse coefficient. The data volume of this algorithm dictionary is very small, which effectively reduces the computational burden on storage while reducing the computational complexity.

虽然该方法相较于其他稀疏表示算法而言，计算复杂度略有降低，但仍然难以在实时应用中使用。匹配追踪算法中降低计算复杂度的主要途径之一为减少其迭代次数，当所使用稀疏字典为短时字典时，对长时信号局部进行MP算法的耗时必将远远小于遍历型MP算法。Although this method is slightly less computationally complex than other sparse representation algorithms, it is still difficult to use in real-time applications. One of the main ways to reduce the computational complexity of the matching pursuit algorithm is to reduce the number of iterations. When the sparse dictionary used is a short-term dictionary, the time-consuming of MP algorithm for local long-term signals will be much less than that of the ergodic MP algorithm.

文中涉及如下参考文献：The following references are included in the text:

[1]Ravelli E,Richard G,Daudet L.Union of MDCT bases for audio coding[J].Audio,Speech,and Language Processing,IEEE Transactions on,2008,16(8):1361-1372.[1]Ravelli E,Richard G,Daudet L.Union of MDCT bases for audio coding[J].Audio,Speech,and Language Processing,IEEE Transactions on,2008,16(8):1361-1372.

[2]Gharavi-Alkhansari M,Huang T S.A fast orthogonal matching pursuitalgorithm[C]//Acoustics,Speech and Signal Processing,1998.Proceedings of the1998IEEE International Conference on.IEEE,1998,3:1389-1392.[2] Gharavi-Alkhansari M, Huang T S.A fast orthogonal matching pursuitalgorithm[C]//Acoustics,Speech and Signal Processing,1998.Proceedings of the1998IEEE International Conference on.IEEE,1998,3:1389-1392.

[3]S,Gribonval R.MPTK:Matching pursuit made tractable[C]//Acoustics,Speech and Signal Processing,2006.ICASSP 2006Proceedings.2006IEEEInternational Conference on.IEEE,2006,3:III-III.[3] S,Gribonval R.MPTK:Matching pursuit made tractable[C]//Acoustics,Speech and Signal Processing,2006.ICASSP 2006Proceedings.2006IEEEInternational Conference on.IEEE,2006,3:III-III.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的不足，本发明根据能量对听觉感知的影响，提供了一种基于原子预选择的音频匹配追踪方法与系统，本发明通过依次迭代对信号能量值较高的部分进行匹配跟踪，在不影响信号最终重构效果的同时缩短了遍历过程所耗时间。Aiming at the deficiencies of the prior art, the present invention provides an audio matching and tracking method and system based on atomic pre-selection according to the influence of energy on auditory perception. The present invention performs matching and tracking on the part with higher signal energy value through successive iterations. , which shortens the time spent in the traversal process without affecting the final reconstruction effect of the signal.

本发明采用的技术方案如下：The technical scheme adopted in the present invention is as follows:

一种基于原子预选择的音频匹配追踪方法，包括：An audio matching tracking method based on atomic preselection, comprising:

信号分解和信号重构，其中，信号分解包括步骤：Signal decomposition and signal reconstruction, wherein the signal decomposition includes the steps:

S1根据原始信号类型选择短时字典，并以短时字典为稀疏字典；S1 selects a short-term dictionary according to the original signal type, and uses the short-term dictionary as a sparse dictionary;

S2逐一计算原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量，i依次取1、2、…length(S)-N+1，提取能量最高的连续样本，记为S_maxenergy；N为短时字典原子长度；length(S)为原始信号长度；S2 calculates the energy of consecutive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal one by one, i takes 1, 2,...length(S)-N+1 in turn, and the extracted energy is the highest The continuous samples of , denoted as S _maxenergy ; N is the atomic length of the short-term dictionary; length(S) is the length of the original signal;

S3获得稀疏字典各原子在S_maxenergy上的原子权重，原子权重绝对值的最大值为 S3 obtains the atomic weight of each atom in the sparse dictionary on S _maxenergy , and the maximum value of the absolute value of the atomic weight is

S4计算信号残差为对应的原子；同时，将记录于当前稀疏系数矩阵的第i_opt max行第j_opt max列，i_opt max为的原子标号，j_opt max为的原子中心位置，当前稀疏系数矩阵初始值为零矩阵；S4 calculates the signal residual for the corresponding atom; at the same time, the Recorded in the i _opt max row and the j _opt max column of the current sparse coefficient matrix, i _opt max is The atomic label of , j _opt max is The atomic center position of , the initial value of the current sparse coefficient matrix is zero matrix;

S5当信号残差S′_later达到目标SNR或迭代次数达到预设值时，结束信号分解，输出当前稀疏系数矩阵；否则，将当前信号残差S′_later作为原始信号重复步骤2～5；S5, when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value, the signal decomposition is ended, and the current sparse coefficient matrix is output; otherwise, steps 2 to 5 are repeated with the current signal residual S' _later as the original signal;

信号重构包括：Signal reconstruction includes:

S7提取当前稀疏系数矩阵中原子权重及其对应的行号、列号；S7 extracts the atomic weights and their corresponding row numbers and column numbers in the current sparse coefficient matrix;

S8将原子权重分别与对应的原子做乘积得恢复信号，将各恢复信号分别赋值给与步骤1中原始信号长度相同的零向量M_i，以零向量M_i第j_opt max个点为恢复信号的中心点，j_opt max为当前恢复信号对应原子权重的列号；赋值后的向量依次累加得重构信号。S8 Multiplies the atom weights with the corresponding atoms to obtain the restored signal, assigns each restored signal to the zero vector M _i with the same length as the original signal in step 1, and takes the j _opt max point of the zero vector M _i as the restored signal The center point of , j _opt max is the column number of the atomic weight corresponding to the current restored signal; the assigned vectors are sequentially accumulated to obtain the reconstructed signal.

步骤S2中，原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量即该连续样本中所有样本幅度的平方和。In step S2, the energy of the continuous samples {S _i , S _i+1 , . . . S _i+N-1 } in the original signal is the sum of the squares of the amplitudes of all the samples in the continuous samples.

步骤S2中，原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量即该连续样本中所有样本幅度的绝对值之和。In step S2, the energy of the continuous samples {S _i , S _i+1 , . . . S _i+N-1 } in the original signal is the sum of the absolute values of the amplitudes of all samples in the continuous samples.

步骤S2中，原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量即该连续样本中所有样本幅度的最大值。In step S2, the energy of the continuous samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal is the maximum value of all sample amplitudes in the continuous samples.

上述基于原子预选择的音频匹配追踪方法对应的系统，包括：The above-mentioned system corresponding to the audio matching tracking method based on atomic pre-selection includes:

信号分解单元和信号重构单元，其中，信号分解单元进一步包括：A signal decomposition unit and a signal reconstruction unit, wherein the signal decomposition unit further includes:

字典建立模块101，用来根据原始信号类型选择短时字典，并以短时字典为稀疏字典；The dictionary building module 101 is used to select a short-term dictionary according to the original signal type, and use the short-term dictionary as a sparse dictionary;

预处理模块102，用来逐一计算原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量，i依次取1、2、…length(S)-N+1，提取能量最高的连续样本，记为S_maxenergy；N为短时字典原子长度；length(S)为原始信号长度；The preprocessing module 102 is used to calculate the energy of successive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal one by one, i is 1, 2,...length(S)-N in turn +1, extract the continuous sample with the highest energy, denoted as S _maxenergy ; N is the atomic length of the short-term dictionary; length(S) is the length of the original signal;

权值比较模块103，用来获得稀疏字典各原子在S_maxenergy上的原子权重，原子权重绝对值的最大值为 The weight comparison module 103 is used to obtain the atomic weight of each atom of the sparse dictionary on S _maxenergy , and the maximum value of the absolute value of the atomic weight is

残差计算模块104，用来计算信号残差为对应的原子；同时，将记录于当前稀疏系数矩阵的第i_opt max行第j_opt max列，i_opt max为的原子标号，j_opt max为的原子中心位置，当前稀疏系数矩阵初始值为零矩阵；The residual error calculation module 104 is used to calculate the signal residual error for the corresponding atom; at the same time, the Recorded in the i _opt max row and the j _opt max column of the current sparse coefficient matrix, i _opt max is The atomic label of , j _opt max is The atomic center position of , the initial value of the current sparse coefficient matrix is zero matrix;

阈值控制模块105，用来当信号残差S′_later达到目标SNR或迭代次数达到预设值时，结束信号分解，输出当前稀疏系数矩阵；否则，将当前信号残差S′_later作为原始信号输入预处理模块102；The threshold value control module 105 is used to end the signal decomposition and output the current sparse coefficient matrix when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value; otherwise, the current signal residual S' _later is input as the original signal preprocessing module 102;

信号重构单元进一步包括：The signal reconstruction unit further includes:

重构系数提取模块201，用来提取当前稀疏系数矩阵中原子权重及其对应的行号、列号；The reconstruction coefficient extraction module 201 is used to extract the atomic weights and their corresponding row numbers and column numbers in the current sparse coefficient matrix;

信号合成模块202，用来将原子权重分别与对应的原子做乘积得恢复信号，将各恢复信号分别赋值给与原始信号长度相同的零向量M_i，以零向量M_i第j_opt max个点为恢复信号的中心点，j_opt max为当前恢复信号对应原子权重的列号；赋值后的向量依次累加得重构信号。The signal synthesis module 202 is used to multiply the atomic weights with the corresponding atoms to obtain the recovered signal, assign each recovered signal to a zero vector M _i with the same length as the original signal, and use the zero vector M _i as the j _opt max point is the center point of the restored signal, and j _opt max is the column number of the atomic weight corresponding to the current restored signal; the assigned vectors are sequentially accumulated to obtain the reconstructed signal.

和现有技术相比，本发明具有如下特点：Compared with the prior art, the present invention has the following characteristics:

本发明通过对信号中短时能量较高的部分进行非完备字典的MP算法，减少了遍历计算的次数，降低了计算复杂度。在字典构建中，增大了原子的频率跨度，减少了字典对频率成分的约束。该稀疏表达算法不受待处理信号长度的限制，字典数据量较小。该发明生成的重构信号相比于其他匹配追踪快速算法(如方法)在音质无下降的同时可以获得较快的计算速度。The invention reduces the number of traversal calculations and reduces the computational complexity by performing the MP algorithm of the incomplete dictionary on the part with high short-term energy in the signal. In the dictionary construction, the frequency span of atoms is increased, and the constraints on frequency components of the dictionary are reduced. The sparse expression algorithm is not limited by the length of the signal to be processed, and the amount of dictionary data is small. Compared with other fast matching pursuit algorithms (such as method) can obtain a faster calculation speed without degrading the sound quality.

附图说明Description of drawings

图1是本发明实施例信号分解部分的具体流程图；Fig. 1 is the concrete flow chart of the signal decomposition part of the embodiment of the present invention;

图2是本发明实施例信号重构部分的具体流程图；Fig. 2 is the specific flow chart of the signal reconstruction part of the embodiment of the present invention;

图3是本发明实施例信号分解子系统结构框图；3 is a structural block diagram of a signal decomposition subsystem according to an embodiment of the present invention;

图4是本发明实施例的信号重构子系统结构框图；4 is a structural block diagram of a signal reconstruction subsystem according to an embodiment of the present invention;

图5是原子中心位置示意图。Figure 5 is a schematic diagram of the position of the atomic center.

具体实施方式Detailed ways

为了便于本领域技术人员理解和实施，下面结合附图及实施例对本发明技术方案作进一步详细描述，应当理解，此处所描述的实施例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate understanding and implementation by those skilled in the art, the technical solutions of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention. .

图1～2为本发明方法的具体流程，包括信号分解和信号重构两大部分。Figures 1 to 2 show the specific flow of the method of the present invention, including two parts: signal decomposition and signal reconstruction.

信号分解的具体实现包括以下步骤：The specific implementation of signal decomposition includes the following steps:

步骤1，根据原始信号类型选择短时字典。Step 1, select a short-term dictionary according to the original signal type.

本步骤为音频匹配跟踪方法中常规步骤。对于语音处理系统，选择具有语音特性的短时字典；对于瞬变信号处理系统，选择相对瞬态的短时字典。对于一些特征并不明显或是同时需要处理多种类型信号的系统，选择普适性较强的短时字典。This step is a conventional step in the audio matching tracking method. For a speech processing system, a short-term dictionary with speech characteristics is selected; for a transient signal processing system, a relatively transient short-term dictionary is selected. For some systems whose features are not obvious or need to deal with multiple types of signals at the same time, a short-term dictionary with strong universality is selected.

本实施例中，测试样本包含语言信号、音乐信号等类型，短时字典选择伸缩性较强的Gabor字典。Gabor字典中原子构建方式如下：In this embodiment, the test samples include speech signals, music signals, etc., and the short-term dictionary selects the Gabor dictionary with strong flexibility. Atoms in the Gabor dictionary are constructed as follows:

式(1)中，w表示频率尺度；μ表示时间偏移量；σ表示时间尺度；λ_w,μ,σ表示w、μ、σ下的原子能量；n表示Gabor原子的时域样点；g_w,μ,σ(n)表示在时域样点n下原子幅度。In formula (1), w represents the frequency scale; μ represents the time offset; σ represents the time scale; λ _{w, μ, σ} represents the atomic energy under w, μ, and σ; _gw,μ,σ (n) represents the atomic amplitude at time domain sample n.

传统基于Gabor字典的匹配追踪法在时间偏移量μ取值时，会在字典中原子数目允许的范围内尽可能多的得到各种尺度的时间偏移量μ。本实施例中μ＝0，使得字典中所有原子与预处理中选取的能量值较高的部分信号对应，能量均位于其中心位置。假设n的变化范围是1～N，频率尺度w、时间尺度σ、原子能量λ共有M个组合形式，则字典大小为M×N。本实施例中，M取20，N取1001。When the traditional Gabor dictionary-based matching pursuit method takes the value of the time offset μ, the time offset μ of various scales will be obtained as much as possible within the range allowed by the number of atoms in the dictionary. In this embodiment, μ=0, so that all atoms in the dictionary correspond to the part of the signal with higher energy value selected in the preprocessing, and the energy is located at the center position thereof. Assuming that the variation range of n is from 1 to N, and there are M combinations of frequency scale w, time scale σ, and atomic energy λ, the size of the dictionary is M×N. In this embodiment, M is 20, and N is 1001.

步骤2，原始信号进行预处理，逐一计算原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量，将能量值最高的连续样本记为S_maxenergy，连续样本长度即步骤1选择的短时字典的原子长度N。Step 2, the original signal is preprocessed, and the energy of successive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal is calculated one by one, and the continuous sample with the highest energy value is recorded as S _maxenergy , The length of consecutive samples is the atomic length N of the short-term dictionary selected in step 1.

下面将提供几种能量计算方法。Several energy calculation methods are provided below.

(1)根据能量定义计算连续样本的能量值Energy，如下：(1) Calculate the energy value Energy of continuous samples according to the energy definition, as follows:

式(1)中，S_i为原始信号S的第i个样本，也用来表示原始信号S第i个样本的幅度；m为尺度平移量，m依次取0、1、…length(S)-N，length(S)为原始信号S的长度。In formula (1), S _i is the ith sample of the original signal S, which is also used to represent the amplitude of the ith sample of the original signal S; m is the scale shift, and m takes 0, 1, ... length(S) in turn -N,length(S) is the length of the original signal S.

(2)由于样本的幅度平方和与信号的幅度绝对值之和存在类正比关系，另外，样本幅度绝对值之和的计算量远小于样本的幅度平方和的计算量。因此，可采用公式(3)近似计算连续样本的能量值Energy：(2) Since there is an analogous proportional relationship between the sum of the squares of the amplitudes of the samples and the sum of the absolute values of the amplitudes of the signals, in addition, the calculation amount of the sum of the absolute values of the sample amplitudes is much smaller than that of the sum of the squares of the amplitudes of the samples. Therefore, formula (3) can be used to approximate the energy value Energy of continuous samples:

式(3)中，S_i为原始信号S的第i个样本，也用来表示原始信号S第i个样本的幅度；m为尺度平移量，m依次取0、1、…length(S)-N，length(S)为原始信号S的长度。In formula (3), S _i is the ith sample of the original signal S, and is also used to represent the amplitude of the ith sample of the original signal S; m is the scale shift, and m takes 0, 1, ... length(S) in turn -N,length(S) is the length of the original signal S.

(3)根据原始信号特征可选择不同的能量计算方式。若原始信号大多数为幅值相对连续的信号，则以连续样本所有样本幅度的最大值作为该连续样本的能量。本方法相较于(1)与(2)进一步缩减了计算复杂度。(3) Different energy calculation methods can be selected according to the original signal characteristics. If most of the original signals are signals with relatively continuous amplitudes, the maximum value of the amplitudes of all samples of the continuous sample is used as the energy of the continuous sample. Compared with (1) and (2), this method further reduces the computational complexity.

步骤3，以步骤1选择的短时字典为稀疏字典，使稀疏字典中各原子依次与S_maxenergy做内积，得各原子在S_maxenergy上的原子权重，将原子权重绝对值的最大值记为 Step 3, take the short-term dictionary selected in step 1 as the sparse dictionary, make each atom in the sparse dictionary Do inner product with S _maxenergy in turn to get each atom The atomic weight on S _maxenergy , the maximum value of the absolute value of the atomic weight is recorded as

的计算公式如下： The calculation formula is as follows:

式(4)中，i_opt表示稀疏字典中原子标号，i_opt＝1，2，...M，M为稀疏字典中原子数；即稀疏字典中第i_opt个原子；表示和S′内积的绝对值。In formula (4), i _opt represents the atomic label in the sparse dictionary, i _opt =1,2,...M, M is the number of atoms in the sparse dictionary; That is, the i _opt -th atom in the sparse dictionary; express and the absolute value of the inner product of S'.

步骤4，计算S_maxenergy在稀疏字典最大原子处的分量信号残差S′_later即S_maxenergy与的矢量差值，见公式(5)；同时更新当前稀疏系数矩阵。Step 4, calculate the component of S _maxenergy at the largest atom of the sparse dictionary The signal residual S′ _later is S _maxenergy and The vector difference value of , see formula (5); at the same time, update the current sparse coefficient matrix.

其中，为对应的原子，即最大原子。in, for The corresponding atom, the largest atom.

当前稀疏系数矩阵的更新如下：The current sparse coefficient matrix is updated as follows:

稀疏系数矩阵初始值为零矩阵，其行号表示原子标号，列号表示原子中心位置，其中元素为原子权值。原子中心位置即连续样本S_maxenergy的中心样本相对原始信号初始点的位置，见图5所示，令原始信号初始点位置为0，则原子中心位置为m。The initial value of the sparse coefficient matrix is a zero matrix, the row number represents the atomic label, the column number represents the atomic center position, and the element is the atomic weight. The atomic center position is the position of the central sample of the continuous sample S _maxenergy relative to the initial point of the original signal, as shown in Figure 5. If the initial position of the original signal is 0, the atomic center position is m.

为更新前的稀疏系数矩阵，为更新后的稀疏系数矩阵，权重矩阵c大小同稀疏系数矩阵。权重矩阵c的获得方式为：将步骤3所得赋值给权重矩阵c第i_opt max行第j_optmax列，i_opt max为最大原子的标号，j_opt max为的原子中心位置，即S_maxenergy的中心样本位置 is the sparse coefficient matrix before the update, is the updated sparse coefficient matrix, and the weight matrix c has the same size as the sparse coefficient matrix. The way to obtain the weight matrix c is: Assigned to the weight matrix c, i _opt max row, j _opt max column, i _opt max is the largest atom The label of , j _opt max is The atomic center position of , which is the center sample position of S _maxenergy

步骤5，当信号残差S′_later达到目标SNR或迭代次数达到预设值时，结束信号分解，并输出当前稀疏系数矩阵；否则，将信号残差S′_later作为步骤2中原始信号重复步骤2～5。Step 5, when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value, the signal decomposition is ended, and the current sparse coefficient matrix is output; otherwise, the signal residual S' _later is used as the original signal in step 2. Repeat the steps 2 to 5.

匹配追踪法处理信号是通过累计迭代，将原始信号表示成原子权重与对应原子相乘的叠加与信号残差之和。由步骤4可获得信号残差S′_later，当S′_later达到目标SNR或迭代次数达到预设值时终止迭代，并输出当前稀疏系数矩阵。目标SNR和迭代次数预设值根据经验和实际需要人为设定。The matching pursuit method processes the signal by accumulating iteratively, and expresses the original signal as the sum of the superposition of the multiplication of the atomic weight and the corresponding atom and the sum of the signal residual. The signal residual S′ _later can be obtained from step 4. When S′ _later reaches the target SNR or the number of iterations reaches a preset value, the iteration is terminated, and the current sparse coefficient matrix is output. The target SNR and the preset number of iterations are artificially set according to experience and actual needs.

信噪比SNR定义如下：The signal-to-noise ratio SNR is defined as follows:

式(7)中，S表示原始信号，S′为此次稀疏恢复后的信号。In formula (7), S represents the original signal, and S′ is the signal after the sparse restoration.

本实施例，对采样频率为48kHz片段信号，其长度为500000样本点(10s)，迭代次数预设值为20000次，目标SNR为20dB。In this embodiment, for a segment signal whose sampling frequency is 48 kHz, its length is 500,000 sample points (10 s), the preset number of iterations is 20,000 times, and the target SNR is 20 dB.

信号重构方法，其具体实现包括以下步骤：A signal reconstruction method, the specific implementation of which includes the following steps:

步骤6：从当前稀疏系数矩阵提取重构信号要使用到的原子权重及原子权重对应的原子标号、原子中心位置。Step 6: Extract from the current sparse coefficient matrix the atomic weight to be used in the reconstructed signal and the atomic label and atomic center position corresponding to the atomic weight.

步骤7，将原子权重分别与其对应的原子做乘积得长度为N的恢复信号将各恢复信号分别赋值给与步骤1中原始信号长度相同的零向量M_i，赋值时以零向量M_i的第j_opt max个点为恢复信号的中心点，j_opt max即原子权重在当前稀疏系数矩阵中列号；赋值后的向量M_i依次累加，得重构信号S′。Step 7, put the atomic weights their corresponding atoms Do the product to get the recovered signal of length N Each recovered signal They are respectively assigned to the zero vector M _i with the same length as the original signal in step 1, and the j _opt max point of the zero vector M _i is used as the recovered signal during assignment. The center point of , j _opt max is the atomic weight The column number in the current sparse coefficient matrix; the assigned vectors _Mi are successively accumulated to obtain the reconstructed signal S'.

重构信号合成公式如下：The reconstructed signal synthesis formula is as follows:

其中，k为当前稀疏系数矩阵中原子权值数量。where k is the number of atomic weights in the current sparse coefficient matrix.

见图3～4，本发明还提供了一种基于原子预选择的音频匹配追踪系统，包括信号分解单元和信号重构单元。其中，信号分解单元进一步包括字典建立模块101、预处理模块102、权值比较模块103、残差计算模块104、阈值控制模块105；信号重构单元进一步包括重构系数提取模块201和信号合成模块202。其中：Referring to Figures 3-4, the present invention also provides an audio matching tracking system based on atomic pre-selection, including a signal decomposition unit and a signal reconstruction unit. The signal decomposition unit further includes a dictionary building module 101, a preprocessing module 102, a weight comparison module 103, a residual calculation module 104, and a threshold control module 105; the signal reconstruction unit further includes a reconstruction coefficient extraction module 201 and a signal synthesis module 202. in:

字典建立模块101用来根据原始信号类型选择短时字典，并以短时字典为稀疏字典。The dictionary building module 101 is used to select a short-term dictionary according to the original signal type, and use the short-term dictionary as a sparse dictionary.

预处理模块102，用来逐一计算原始信号中连续样本{S_i,S_i+1,...S_i+N-1}的能量，i依次取1、2、…length(S)-N+1，提取能量最高的连续样本，记为S_maxenergy；N为短时字典原子长度；length(S)为原始信号长度。The preprocessing module 102 is used to calculate the energy of successive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal one by one, i is 1, 2,...length(S)-N in turn +1, extract the continuous sample with the highest energy, denoted as S _maxenergy ; N is the atomic length of the short-term dictionary; length(S) is the length of the original signal.

预处理模块102中，可采用如下方式计算连续样本能量：In the preprocessing module 102, the continuous sample energy can be calculated in the following manner:

(1)将连续样本{S_i,S_i+1,...S_i+N-1}中所有样本幅度的平方和作为该连续样本的能量，见公式(2)。(1) The square sum of all sample amplitudes in the continuous samples {S _i , S _i+1 ,...S _i+N-1 } is taken as the energy of the continuous sample, see formula (2).

(2)将连续样本{S_i,S_i+1,...S_i+N-1}中所有样本幅度的绝对值之和作为该连续样本的能量，见公式(3)。(2) The sum of the absolute values of all sample amplitudes in the continuous samples {S _i , S _i+1 ,...S _i+N-1 } is taken as the energy of the continuous sample, see formula (3).

(3)将连续样本{S_i,S_i+1,...S_i+N-1}中所有样本幅度的最大值作为该连续样本的能量。(3) The maximum value of all sample amplitudes in the continuous samples {S _i , S _i+1 ,...S _i+N-1 } is taken as the energy of the continuous sample.

权值比较模块103用来获得稀疏字典各原子在S_maxenergy上的原子权重，原子权重绝对值的最大值为 The weight comparison module 103 is used to obtain the atomic weight of each atom of the sparse dictionary on S _maxenergy , and the maximum value of the absolute value of the atomic weight is

残差计算模块104用来计算信号残差为对应的原子；同时，将记录于当前稀疏系数矩阵的第i_opt max行第j_opt max列，i_opt max为的原子标号，j_optmax为的原子中心位置，当前稀疏系数矩阵初始值为零矩阵。The residual error calculation module 104 is used to calculate the signal residual error for the corresponding atom; at the same time, the Recorded in the i _opt max row and the j _opt max column of the current sparse coefficient matrix, i _opt max is The atomic label of , j _opt max is The atomic center position of , the initial value of the current sparse coefficient matrix is zero matrix.

阈值控制模块105，用来当信号残差S′_later达到目标SNR或迭代次数达到预设值时，结束信号分解，输出当前稀疏系数矩阵；否则，将当前信号残差S′_later作为原始信号输入预处理模块102。The threshold value control module 105 is used to end the signal decomposition and output the current sparse coefficient matrix when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value; otherwise, the current signal residual S' _later is input as the original signal Preprocessing module 102 .

重构系数提取模块201，用来提取当前稀疏系数矩阵中原子权重及其对应的行号、列号。The reconstruction coefficient extraction module 201 is used to extract the atomic weights and their corresponding row numbers and column numbers in the current sparse coefficient matrix.

应当理解的是，本说明书未详细阐述的部分均属于现有技术。It should be understood that the parts not described in detail in this specification belong to the prior art.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.

Claims

1. an audio matching tracking method based on atomic pre-selection, is characterized in that, comprises:

Signal decomposition and signal reconstruction, wherein the signal decomposition includes the steps:

S1 selects a short-term dictionary according to the original signal type, and uses the short-term dictionary as a sparse dictionary;

S2 calculates the energy of consecutive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal one by one, i takes 1, 2,...length(S)-N+1 in turn, and the extracted energy is the highest The continuous samples of , denoted as S _maxenergy ; N is the atomic length of the short-term dictionary; length(S) is the length of the original signal;

S3 obtains the atomic weight of each atom in the sparse dictionary on S _maxenergy , and the maximum value of the absolute value of the atomic weight is S4 calculates the signal residual for the corresponding atom; at the same time, the Recorded in the i _opt max row and the j _opt max column of the current sparse coefficient matrix, i _opt max is The atomic label of , j _opt max is The atomic center position of , the initial value of the current sparse coefficient matrix is zero matrix;

S5, when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value, the signal decomposition is ended, and the current sparse coefficient matrix is output; otherwise, steps 2 to 5 are repeated with the current signal residual S' _later as the original signal;

Signal reconstruction includes:

S7 extracts the atomic weights and their corresponding row numbers and column numbers in the current sparse coefficient matrix;

S8 Multiplies the atom weights with the corresponding atoms to obtain the restored signal, assigns each restored signal to a zero vector M _i with the same length as the original signal in step 1, and takes the j _opt max point of the zero vector M _i as the restored signal The center point of , j _opt max is the column number of the atomic weight corresponding to the current restored signal; the assigned vectors are sequentially accumulated to obtain the reconstructed signal.

2. the audio frequency matching tracking method based on atomic pre-selection as claimed in claim 1, is characterized in that:

In step S2, the energy of the continuous samples {S _i , S _i+1 , . . . S _i+N-1 } in the original signal is the sum of the squares of the amplitudes of all the samples in the continuous samples.

3. the audio matching tracking method based on atomic pre-selection as claimed in claim 1, is characterized in that:

In step S2, the energy of the continuous samples {S _i , S _i+1 , . . . S _i+N-1 } in the original signal is the sum of the absolute values of the amplitudes of all samples in the continuous samples.

4. the audio matching tracking method based on atomic pre-selection as claimed in claim 1, is characterized in that:

In step S2, the energy of the continuous samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal is the maximum value of all sample amplitudes in the continuous samples.

5. An audio matching tracking system based on atomic pre-selection, is characterized in that, comprises:

A signal decomposition unit and a signal reconstruction unit, wherein the signal decomposition unit further includes:

The dictionary building module 101 is used to select a short-term dictionary according to the original signal type, and use the short-term dictionary as a sparse dictionary;

The preprocessing module 102 is used to calculate the energy of successive samples {S _i , S _i+1 ,...S _i+N-1 } in the original signal one by one, i is 1, 2,...length(S)-N in turn +1, extract the continuous sample with the highest energy, denoted as S _maxenergy ; N is the atomic length of the short-term dictionary; length(S) is the length of the original signal;

The weight comparison module 103 is used to obtain the atomic weight of each atom of the sparse dictionary on S _maxenergy , and the maximum value of the absolute value of the atomic weight is

The residual error calculation module 104 is used to calculate the signal residual error for the corresponding atom; at the same time, the Recorded in the i _opt max row and the j _opt max column of the current sparse coefficient matrix, i _opt max is The atomic label of , j _opt max is The atomic center position of , the initial value of the current sparse coefficient matrix is zero matrix;

The threshold control module 105 is used to end the signal decomposition and output the current sparse coefficient matrix when the signal residual S' _later reaches the target SNR or the number of iterations reaches a preset value; otherwise, the current signal residual S' _later is input as the original signal preprocessing module 102;

The signal reconstruction unit further includes:

The reconstruction coefficient extraction module 201 is used to extract the atomic weights and their corresponding row numbers and column numbers in the current sparse coefficient matrix;

The signal synthesis module 202 is used to multiply the atomic weights with the corresponding atoms to obtain the recovered signal, assign each recovered signal to a zero vector M _i with the same length as the original signal, and use the zero vector M _i as the j _opt max point is the center point of the restored signal, and j _opt max is the column number of the atomic weight corresponding to the current restored signal; the assigned vectors are sequentially accumulated to obtain the reconstructed signal.