HK1241135B

HK1241135B - Audio coding method and related device

Info

Publication number: HK1241135B
Application number: HK18100590.4A
Authority: HK
Inventors: 王喆
Original assignee: 华为技术有限公司
Filing date: 2016-04-18
Publication date: 2021-05-14

Description

Audio coding method and related device

技术领域Technical Field

本发明涉及音频编码技术，具体涉及音频编码方法及相关装置。The present invention relates to audio coding technology, and in particular to an audio coding method and related devices.

背景技术Background Art

在过去相当长一段时期内，语音信号的编码和非语音信号(如音乐)的编码都是相对独立的，即语音信号的编码由专门的语音编码器实现，而非语音信号的编码由专门的非语音编码器(其中，非语音编码器也可称之为一般音频编码器)来实现。For a long time in the past, the encoding of speech signals and the encoding of non-speech signals (such as music) were relatively independent, that is, the encoding of speech signals was implemented by a dedicated speech encoder, while the encoding of non-speech signals was implemented by a dedicated non-speech encoder (wherein the non-speech encoder can also be called a general audio encoder).

其中，语音编码器一般不用来编码非语音信号，非语音编码器一般也不被用来编码语音信号，这不仅仅是因为语音编码和非语音信号编码在编码理论上的相对独立，也是因为这两种信号在实际应用中通常相对独立。例如在话音通信网络中，由于过去很长一段时间内，话音都是全部或主要的信源，且带宽限制严格，所以在话音通信网络中各种低速率的语音编码器被大量的使用。而在影音、娱乐等应用中，由于非语音信号占据信源的大多数且出于这些应用对音频质量的相对较高要求和码率的相对宽松，在这些场景下非语音编码器被大量的使用。Speech encoders are generally not used to encode non-speech signals, and vice versa. This is not only because speech and non-speech signal encoding are relatively independent in coding theory, but also because these two types of signals are often relatively independent in practical applications. For example, in voice communication networks, since voice has long been the sole or primary signal source and bandwidth constraints are strict, various low-rate speech encoders are widely used in voice communication networks. In audio and video, entertainment, and other applications, non-speech encoders are widely used because non-speech signals make up the majority of the signal source and these applications have relatively high requirements for audio quality and relatively loose bitrate requirements.

近些年，在传统的话音通信网络中出现了越来越多的多媒体信源，例如彩铃等等。这对编码器的编码质量提出了更高的要求，专门的语音编码器已不能提供这些多媒体信号所需的较高编码质量，新的编码技术如混合音频编码器应运而生。In recent years, more and more multimedia sources, such as ringback tone (CRBT), have appeared in traditional voice communication networks. This places higher demands on the encoding quality of encoders. Dedicated voice encoders can no longer provide the high encoding quality required for these multimedia signals. New encoding technologies, such as hybrid audio encoders, have emerged.

其中，所谓混合音频编码器，即一个音频编码器中既包含有适合编码语音信号的子编码器，也包含有适合编码非语音信号的子编码器。其中，混合音频编码器总是试图在所有子编码器中动态的选择最适合的一个子编码器对输入音频信号进行编码。其中，如何从所有子编码器中选出最适合的一个子编码器来对输入的当前音频帧进行编码，是混合编码器的一个重要功能和要求，对子编码器的选择也叫模式选择，这将直接关系到混合编码器的编码质量好坏。A hybrid audio codec is an audio codec that includes both subcoders for encoding speech signals and subcoders for encoding non-speech signals. The hybrid audio codec dynamically selects the most suitable subcoder from among all subcoders to encode the input audio signal. Choosing the most suitable subcoder to encode the current input audio frame is a key function and requirement of the hybrid codec. This subcoder selection, also known as mode selection, directly impacts the encoding quality of the hybrid codec.

现有技术一般采用闭环模式选择子编码器，即每个子编码器都用来对输入的当前音频帧进行一次编码，通过直接比较编码后的当前音频帧的质量好坏来选择最优的子编码器。但是，闭环模式选择的缺点是使得编码运算复杂度相对很高(因为每个子编码器都用来对输入的当前音频帧进行一次编码)，进而使得实际音频编码的开销变得较大。Existing technologies typically use a closed-loop mode to select sub-encoders. This means each sub-encoder is used to encode the current input audio frame once, and the optimal sub-encoder is selected by directly comparing the quality of the encoded current audio frame. However, the disadvantage of closed-loop selection is that it results in relatively high encoding complexity (because each sub-encoder is used to encode the current input audio frame once), which in turn increases the actual audio encoding overhead.

发明内容Summary of the Invention

本发明实施例提供了一种音频编码方法以及相关装置，以期降低音频编码的开销。The embodiments of the present invention provide an audio encoding method and related devices, in order to reduce the overhead of audio encoding.

本发明实施例第一方面提供一种音频编码方法，包括：A first aspect of an embodiment of the present invention provides an audio encoding method, including:

估计当前音频帧的参考线性预测效率；estimating a reference linear prediction efficiency for the current audio frame;

确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式；Determining an audio coding mode that matches a reference linear prediction efficiency of the current audio frame;

按照与所述当前音频帧的参考线性预测效率匹配的音频编码方式对所述当前音频帧进行音频编码。The current audio frame is audio-encoded according to an audio coding method that matches a reference linear prediction efficiency of the current audio frame.

结合第一方面，在第一方面的第一种可能的实施方式中，In combination with the first aspect, in a first possible implementation of the first aspect,

所述参考线性预测效率包括如下线性预测效率的至少一种：参考长时线性预测效率、参考短时线性预测效率和参考综合线性预测效率。The reference linear prediction efficiency includes at least one of the following linear prediction efficiencies: a reference long-term linear prediction efficiency, a reference short-term linear prediction efficiency, and a reference comprehensive linear prediction efficiency.

结合第一方面的第一种可能的实施方式，在第一方面的第二种可能的实施方式中，所述参考综合线性预测效率为所述参考长时线性预测效率和所述参考短时线性预测效率的和值、加权和值或平均值。In combination with the first possible implementation of the first aspect, in a second possible implementation of the first aspect, the reference comprehensive linear prediction efficiency is the sum, weighted sum or average of the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency.

结合第一方面的第一种可能的实施方式，在第一方面的第三种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率和所述当前音频帧的参考短时线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：In conjunction with the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, if the reference linear prediction efficiency of the current audio frame includes a reference long-time linear prediction efficiency of the current audio frame and a reference short-time linear prediction efficiency of the current audio frame, then determining an audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes:

若所述当前音频帧的参考长时线性预测效率小于第一阈值，和/或所述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；If the reference long-term linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-term linear prediction efficiency of the current audio frame is less than a second threshold, determining that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode not based on linear prediction;

和/或，and/or,

若所述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或所述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。If the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a second threshold, it is determined that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

结合第一方面的第一种可能的实施方式，在第一方面的第四种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：In conjunction with the first possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, then determining the audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes:

若所述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；If the reference long-term linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, determining that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode based on linear prediction;

和/或，若所述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。And/or, if the reference long-term linear prediction efficiency of the current audio frame is less than a fourth threshold, it is determined that the audio coding method matching the reference linear prediction efficiency of the current audio frame is an audio coding method not based on linear prediction.

结合第一方面的第一种可能的实施方式，在第一方面的第五种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定所述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，所述第一音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In combination with the first possible implementation of the first aspect, in the fifth possible implementation of the first aspect, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining a first linear prediction efficiency interval in which the reference long-time linear prediction efficiency of the current audio frame falls, and determining a first audio encoding method having a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, wherein the first audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the first audio encoding method is an audio encoding method based on linear prediction or an audio encoding method not based on linear prediction.

结合第一方面的第一种可能的实施方式，在第一方面的第六种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考短时线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：In conjunction with the first possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, if the reference linear prediction efficiency of the current audio frame includes a reference short-time linear prediction efficiency of the current audio frame, then determining the audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes:

若所述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；If the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a fifth threshold, determining that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode based on linear prediction;

和/或，若所述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。And/or, if the reference short-time linear prediction efficiency of the current audio frame is less than a fifth threshold, it is determined that the audio coding mode matching the reference linear prediction efficiency of the current audio frame is an audio coding mode not based on linear prediction.

结合第一方面的第一种可能的实施方式，在第一方面的第七种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考短时线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定所述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第二线性预测效率区间具有映射关系的第二音频编码方式，其中，所述第二音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第二音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In combination with the first possible implementation of the first aspect, in the seventh possible implementation of the first aspect, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining a second linear prediction efficiency interval in which the reference short-time linear prediction efficiency of the current audio frame falls, and determining a second audio encoding method having a mapping relationship with the second linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, wherein the second audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio encoding method is an audio encoding method based on linear prediction or an audio encoding method not based on linear prediction.

结合第一方面的第一种可能的实施方式或第一方面的第二种可能的实施方式，在第一方面的第八种可能的实施方式中，In combination with the first possible implementation manner of the first aspect or the second possible implementation manner of the first aspect, in an eighth possible implementation manner of the first aspect,

若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考综合线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：If the reference linear prediction efficiency of the current audio frame includes a reference integrated linear prediction efficiency of the current audio frame, determining an audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes:

若所述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；If the reference integrated linear prediction efficiency of the current audio frame is greater than or equal to a sixth threshold, determining that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode based on linear prediction;

和/或，若所述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与所述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。And/or, if the reference comprehensive linear prediction efficiency of the current audio frame is less than a sixth threshold, it is determined that the audio encoding method matching the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

结合第一方面的第一种可能的实施方式或第一方面的第二种可能的实施方式，在第一方面的第九种可能的实施方式中，In combination with the first possible implementation manner of the first aspect or the second possible implementation manner of the first aspect, in a ninth possible implementation manner of the first aspect,

若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考综合线性预测效率，则所述确定与所述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定所述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第三线性预测效率区间具有映射关系的第三音频编码方式，其中，所述第三音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第三音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。If the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining a third linear prediction efficiency interval into which the reference comprehensive linear prediction efficiency of the current audio frame falls, and determining a third audio encoding method that has a mapping relationship with the third linear prediction efficiency interval based on a mapping relationship between linear prediction efficiency intervals and audio encoding methods based on linear prediction, wherein the third audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio encoding method is an audio encoding method based on linear prediction or an audio encoding method not based on linear prediction.

结合第一方面的第一至九种可能的实施方式，在第一方面的第十种可能的实施方式中，所述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计当前音频帧的长时线性预测效率，其中，所述当前音频帧的长时线性预测效率为所述当前音频帧的参考长时线性预测效率；或者，In combination with the first to ninth possible implementations of the first aspect, in a tenth possible implementation of the first aspect, the reference long-term linear prediction efficiency of the current audio frame is estimated by: estimating the long-term linear prediction efficiency of the current audio frame, wherein the long-term linear prediction efficiency of the current audio frame is the reference long-term linear prediction efficiency of the current audio frame; or

所述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N1个历史音频帧的线性预测效率；计算所述N1个历史音频帧的线性预测效率和所述当前音频帧的长时线性预测效率的第一统计值，其中，所述N1为正整数，所述第一统计值为所述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；所述N11个历史音频帧为所述N1个历史音频帧的子集；或者，The reference long-time linear prediction efficiency of the current audio frame is estimated by: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N1 historical audio frames of the current audio frame; calculating a first statistical value of the linear prediction efficiencies of the N1 historical audio frames and the long-time linear prediction efficiency of the current audio frame, wherein N1 is a positive integer, and the first statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N11 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the N11 historical audio frames are a subset of the N1 historical audio frames; or,

所述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N2个历史音频帧的参考线性预测效率；计算所述N2个历史音频帧的参考线性预测效率和所述当前音频帧的长时线性预测效率的第二统计值，其中，所述N2为正整数，所述第二统计值为所述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，所述N21个历史音频帧为所述N2个历史音频帧的子集；或者，The reference long-time linear prediction efficiency of the current audio frame is estimated by: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N2 historical audio frames of the current audio frame; calculating the reference linear prediction efficiencies of the N2 historical audio frames and a second statistical value of the long-time linear prediction efficiency of the current audio frame, wherein N2 is a positive integer, and the second statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N21 historical audio frames is at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, and the N21 historical audio frames are a subset of the N2 historical audio frames; or,

所述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N4个历史音频帧的参考线性预测效率，获取所述当前音频帧的N3个历史音频帧的线性预测效率；计算所述N3个历史音频帧的线性预测效率、所述N4个历史音频帧的参考线性预测效率和所述当前音频帧的长时线性预测效率的第三统计值，其中，所述N3和所述N4为正整数，所述第三统计值为所述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N31个历史音频帧为所述N3个历史音频帧的子集，其中，所述N41个历史音频帧为所述N4个历史音频帧的子集。The reference long-term linear prediction efficiency of the current audio frame is estimated by: estimating the long-term linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N4 historical audio frames of the current audio frame, and obtaining the linear prediction efficiencies of N3 historical audio frames of the current audio frame; and calculating a third statistical value of the linear prediction efficiencies of the N3 historical audio frames, the reference linear prediction efficiencies of the N4 historical audio frames, and the long-term linear prediction efficiency of the current audio frame, wherein N3 and N4 are positive integers, and the third statistical value is the reference long-term linear prediction efficiency of the current audio frame. , the linear prediction efficiency of each of the N31 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each of the N41 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N31 historical audio frames are a subset of the N3 historical audio frames, and wherein the N41 historical audio frames are a subset of the N4 historical audio frames.

结合第一方面的第一至九种可能的实施方式，在第一方面的第十一种可能的实施方式中，In combination with the first to ninth possible implementations of the first aspect, in an eleventh possible implementation of the first aspect,

所述当前音频帧的参考短时线性预测效率通过如下方式估计得到：估计当前音频帧的短时线性预测效率，其中，所述当前音频帧的短时线性预测效率为所述当前音频帧的参考短时线性预测效率；或者，The reference short-time linear prediction efficiency of the current audio frame is estimated by: estimating the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame; or

所述当前音频帧的参考短时线性预测效率通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N5个历史音频帧的线性预测效率；计算所述N5个历史音频帧的线性预测效率和所述当前音频帧的短时线性预测效率的第四统计值，其中，所述N5为正整数，所述第四统计值为所述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，所述N51个历史音频帧为所述N5个历史音频帧的子集；或者，The reference short-time linear prediction efficiency of the current audio frame is estimated by: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N5 historical audio frames of the current audio frame; calculating a fourth statistical value of the linear prediction efficiencies of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N5 is a positive integer, and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency, and the N51 historical audio frames are a subset of the N5 historical audio frames; or,

所述当前音频帧的参考短时线性预测效率通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N6个历史音频帧的参考线性预测效率；计算所述N6个历史音频帧的参考线性预测效率和所述当前音频帧的短时线性预测效率的第五统计值，其中，所述N6为正整数，所述第五统计值为所述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N61个历史音频帧为所述N6个历史音频帧的子集；或者，The reference short-time linear prediction efficiency of the current audio frame is estimated in the following manner: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N6 historical audio frames of the current audio frame; calculating the fifth statistical value of the reference linear prediction efficiencies of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N6 is a positive integer, and the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N61 historical audio frames is at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the N61 historical audio frames are a subset of the N6 historical audio frames; or,

所述当前音频帧的参考短时线性预测效率通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N8个历史音频帧的参考线性预测效率；获取所述当前音频帧的N7个历史音频帧的线性预测效率；计算所述N7个历史音频帧的线性预测效率、所述N8个历史音频帧的参考线性预测效率和所述当前音频帧的短时线性预测效率的第六统计值，其中，所述N7和所述N8为正整数，所述第六统计值为所述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N71个历史音频帧为所述N7个历史音频帧的子集，所述N81个历史音频帧为所述N8个历史音频帧的子集。The reference short-time linear prediction efficiency of the current audio frame is estimated by: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiency of N8 historical audio frames of the current audio frame; obtaining the linear prediction efficiency of N7 historical audio frames of the current audio frame; calculating the sixth statistical value of the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames, and the short-time linear prediction efficiency of the current audio frame, wherein N7 and N8 are positive integers, and the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame. The linear prediction efficiency of each historical audio frame in the N71 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames.

结合第一方面的第十一种可能的实施方式，在第一方面的第十二种可能的实施方式中，所述估计得到当前音频帧的短时线性预测效率，包括：基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率。In combination with the eleventh possible implementation of the first aspect, in the twelfth possible implementation of the first aspect, the estimating the short-time linear prediction efficiency of the current audio frame includes: obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame.

结合第一方面的第十二种可能的实施方式，在第一方面的第十三种可能的实施方式中，所述基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率，包括：In conjunction with the twelfth possible implementation manner of the first aspect, in a thirteenth possible implementation manner of the first aspect, obtaining the short-term linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame includes:

计算当前音频帧进行短时线性预测前后的能量变化率，其中，所述能量变化率为所述当前音频帧的短时线性预测效率，或者所述当前音频帧的短时线性预测效率基于所述能量变化率变换得到，其中，所述当前音频帧进行短时线性预测后的能量为所述当前音频帧的线性预测残差的能量。Calculate an energy change rate of a current audio frame before and after short-time linear prediction is performed, where the energy change rate is a short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained by transforming the energy change rate, and the energy of the current audio frame after short-time linear prediction is an energy of a linear prediction residual of the current audio frame.

结合第一方面的第十三种可能的实施方式，在第一方面的第十四种可能的实施方式中，所述当前音频帧进行短时线性预测前后的能量变化率为所述当前音频帧进行短时线性预测前的能量与所述当前音频帧的线性预测残差的能量的比值。In combination with the thirteenth possible implementation of the first aspect, in the fourteenth possible implementation of the first aspect, the energy change rate of the current audio frame before and after short-time linear prediction is the ratio of the energy of the current audio frame before short-time linear prediction to the energy of the linear prediction residual of the current audio frame.

结合第一方面的第十种可能的实施方式，在第一方面的第十五种可能的实施方式中，In combination with the tenth possible implementation manner of the first aspect, in a fifteenth possible implementation manner of the first aspect,

所述估计得到当前音频帧的长时线性预测效率包括：根据当前音频帧的线性预测残差与第一历史线性预测信号，得到所述当前音频帧的线性预测残差与所述第一历史线性预测信号之间的相关性，其中，所述相关性为所述当前音频帧的长时线性预测效率，或者所述当前音频帧的长时线性预测效率基于所述相关性得到，其中，所述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；所述第一历史线性预测残差为所述当前音频帧的历史音频帧的线性预测残差，所述第一历史线性预测激励为所述当前音频帧的历史音频帧的线性预测激励。The estimating of the long-term linear prediction efficiency of the current audio frame includes: obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the correlation, wherein the first historical linear prediction signal is the first historical linear prediction excitation or the first historical linear prediction residual; the first historical linear prediction residual is the linear prediction residual of the historical audio frame of the current audio frame, and the first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the current audio frame.

结合第一方面的第十五种可能的实施方式，在第一方面的第十六种可能的实施方式中，所述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到所述当前音频帧的线性预测残差与所述第一历史线性预测信号之间的相关性，包括：In combination with the fifteenth possible implementation manner of the first aspect, in a sixteenth possible implementation manner of the first aspect, obtaining, based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal includes:

计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性；Calculating the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

或者，or,

将当前音频帧的线性预测残差乘以增益因子以得到所述当前音频帧的增益线性预测残差，计算得到所述当前音频帧的增益线性预测残差与第一历史线性预测信号之间的相关性，其中，计算得到的所述当前音频帧的增益线性预测残差与所述第一历史线性预测信号之间的相关性，为所述当前音频帧的线性预测残差与所述第一历史线性预测信号之间的相关性；multiplying a linear prediction residual of a current audio frame by a gain factor to obtain a gain linear prediction residual of the current audio frame, and calculating a correlation between the gain linear prediction residual of the current audio frame and a first historical linear prediction signal, wherein the calculated correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

或者，将第一历史线性预测信号乘以增益因子以得到增益后的第一历史线性预测信号，计算得到所述当前音频帧的线性预测残差与所述增益后的第一历史线性预测信号之间的相关性，其中，计算得到的所述当前音频帧的线性预测残差与所述增益后的第一历史线性预测信号之间的相关性，为所述当前音频帧的线性预测残差与所述第一历史线性预测信号之间的相关性。Alternatively, the first historical linear prediction signal is multiplied by a gain factor to obtain a first historical linear prediction signal after gain, and the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal after gain is calculated, wherein the calculated correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal after gain is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

结合第一方面的第十五种可能的实施方式或第一方面的第十六种可能的实施方式，在第一方面的第十七种可能的实施方式中，所述第一历史线性预测激励或第一历史线性预测残差基于所述当前音频帧的基音确定。In combination with the fifteenth possible implementation of the first aspect or the sixteenth possible implementation of the first aspect, in the seventeenth possible implementation of the first aspect, the first historical linear prediction excitation or the first historical linear prediction residual is determined based on the fundamental tone of the current audio frame.

结合第一方面的第十五至十七种可能的实施方式，在第一方面的第十八种可能的实施方式中，所述第一历史线性预测激励与所述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测激励与所述当前音频帧的线性预测残差在时域上的相关性；In combination with the fifteenth to seventeenth possible implementations of the first aspect, in an eighteenth possible implementation of the first aspect, a time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to a time-domain correlation between other historical linear prediction excitations and the linear prediction residual of the current audio frame;

或者，所述第一历史线性预测残差与所述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测残差与所述当前音频帧的线性预测残差在时域上的相关性。Alternatively, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction residuals and the linear prediction residual of the current audio frame.

结合第一方面的第十五至十八种可能的实施方式，在第一方面的第十九种可能的实施方式中，所述第一历史线性预测激励为利用基于线性预测的编码方式对所述当前音频帧的历史音频帧进行音频编码而产生的线性预测激励。In combination with the fifteenth to eighteenth possible implementations of the first aspect, in the nineteenth possible implementation of the first aspect, the first historical linear prediction excitation is a linear prediction excitation generated by audio encoding the historical audio frames of the current audio frame using a linear prediction-based encoding method.

结合第一方面的第十五至十九种可能的实施方式，在第一方面的第二十种可能的实施方式中，所述第一历史线性预测残差基于所述当前音频帧的第一历史音频帧的时域信号和所述第一历史音频帧的线性预测系数得到，其中，所述第一历史音频帧的线性预测编码系数为量化后的线性预测系数或未经量化的线性预测系数。In combination with the fifteenth to nineteenth possible implementations of the first aspect, in the twentieth possible implementation of the first aspect, the first historical linear prediction residual is obtained based on the time domain signal of the first historical audio frame of the current audio frame and the linear prediction coefficient of the first historical audio frame, wherein the linear prediction coding coefficient of the first historical audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

结合第一方面的第十五至二十种可能的实施方式，在第一方面的第二十一种可能的实施方式中，所述当前音频帧的线性预测残差基于所述当前音频帧的时域信号和所述当前音频帧的线性预测系数得到，其中，所述当前音频帧的线性预测系数为量化后的线性预测系数或未经量化的线性预测系数。In combination with the fifteenth to twentieth possible implementations of the first aspect, in the twenty-first possible implementation of the first aspect, the linear prediction residual of the current audio frame is obtained based on the time domain signal of the current audio frame and the linear prediction coefficient of the current audio frame, wherein the linear prediction coefficient of the current audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

结合第一方面的第十五至二十一种可能的实施方式，在第一方面的第二十二种可能的实施方式中，所述第一历史线性预测激励为自适应码本激励与固定码本激励的叠加激励，或者所述第一历史线性预测激励为自适应码本激励。In combination with the fifteenth to twenty-first possible implementations of the first aspect, in the twenty-second possible implementation of the first aspect, the first historical linear prediction excitation is a superimposed excitation of an adaptive codebook excitation and a fixed codebook excitation, or the first historical linear prediction excitation is an adaptive codebook excitation.

结合第一方面的第十五至二十二种可能的实施方式，在第一方面的第二十三种可能的实施方式中，所述相关性为时域上的互相关函数值和/或频域上的互相关函数值，或者所述相关性为时域上的失真和/或频域上的失真。In combination with the fifteenth to twenty-second possible implementations of the first aspect, in the twenty-third possible implementation of the first aspect, the correlation is the cross-correlation function value in the time domain and/or the cross-correlation function value in the frequency domain, or the correlation is the distortion in the time domain and/or the distortion in the frequency domain.

结合第一方面的二十三种可能的实施方式，在第一方面的第二十四种可能的实施方式中，所述频域上的失真为在频域上的K1个频点的失真的和值或加权和值，或者所述频域上的失真为在频域上的K2个子带上的失真的和值或加权和值，所述K1和所述K2为正整数。In combination with the twenty-three possible implementations of the first aspect, in the twenty-fourth possible implementation of the first aspect, the distortion in the frequency domain is the sum or weighted sum of the distortions of K1 frequency points in the frequency domain, or the distortion in the frequency domain is the sum or weighted sum of the distortions on K2 subbands in the frequency domain, and K1 and K2 are positive integers.

结合第一方面的二十四种可能的实施方式，在第一方面的第二十五种可能的实施方式中，所述失真的加权和值所对应的加权系数为反映心理声学模型的感知加权系数。In combination with the twenty-four possible implementations of the first aspect, in a twenty-fifth possible implementation of the first aspect, the weighting coefficient corresponding to the weighted sum of the distortion values is a perceptual weighting coefficient reflecting a psychoacoustic model.

本发明实施例第二方面提供一种音频编码器，包括：A second aspect of an embodiment of the present invention provides an audio encoder, including:

估计单元，用于估计当前音频帧的参考线性预测效率；an estimating unit, configured to estimate a reference linear prediction efficiency of a current audio frame;

确定单元，用于确定与所述估计单元估计出的所述当前音频帧的参考线性预测效率匹配的音频编码方式；a determining unit, configured to determine an audio coding mode that matches the reference linear prediction efficiency of the current audio frame estimated by the estimating unit;

编码单元，用于按照所述确定单元确定出的与所述当前音频帧的参考线性预测效率匹配的音频编码方式，对所述当前音频帧进行音频编码。An encoding unit is configured to perform audio encoding on the current audio frame according to the audio encoding mode determined by the determining unit and matching the reference linear prediction efficiency of the current audio frame.

结合第二方面，在第二方面的第一种可能的实施方式中，所述参考线性预测效率包括如下线性预测效率的至少一种：参考长时线性预测效率、参考短时线性预测效率和参考综合线性预测效率。In combination with the second aspect, in a first possible implementation of the second aspect, the reference linear prediction efficiency includes at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency.

结合第二方面的第一种可能的实施方式，在第二方面的第二种可能的实施方式中，所述参考综合线性预测效率为所述参考长时线性预测效率和所述参考短时线性预测效率的和值、加权和值或平均值。In combination with the first possible implementation of the second aspect, in a second possible implementation of the second aspect, the reference comprehensive linear prediction efficiency is the sum, weighted sum or average of the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency.

结合第二方面的第一种可能的实施方式，在第二方面的第三种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率和所述当前音频帧的参考短时线性预测效率，则确定单元具体用于：In conjunction with the first possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, if the reference linear prediction efficiency of the current audio frame includes a reference long-time linear prediction efficiency of the current audio frame and a reference short-time linear prediction efficiency of the current audio frame, the determining unit is specifically configured to:

和/或，and/or,

结合第二方面的第一种可能的实施方式，在第二方面的第四种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率，则确定单元具体用于：In combination with the first possible implementation manner of the second aspect, in a fourth possible implementation manner of the second aspect, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, the determining unit is specifically configured to:

结合第二方面的第一种可能的实施方式，在第二方面的第五种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考长时线性预测效率，则所述确定单元具体用于：确定所述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，所述第一音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In combination with the first possible implementation of the second aspect, in the fifth possible implementation of the second aspect, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, the determination unit is specifically used to: determine the first linear prediction efficiency interval into which the reference long-time linear prediction efficiency of the current audio frame falls, and determine a first audio coding method having a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the first audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

结合第二方面的第一种可能的实施方式，在第二方面的第六种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考短时线性预测效率，则所述确定单元具体用于：In combination with the first possible implementation manner of the second aspect, in a sixth possible implementation manner of the second aspect, if the reference linear prediction efficiency of the current audio frame includes a reference short-time linear prediction efficiency of the current audio frame, the determining unit is specifically configured to:

结合第二方面的第一种可能的实施方式，在第二方面的第七种可能的实施方式中，若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考短时线性预测效率，则所述确定单元具体用于：确定所述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第二线性预测效率区间具有映射关系的第二音频编码方式，其中，所述第二音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第二音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In combination with the first possible implementation of the second aspect, in the seventh possible implementation of the second aspect, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, the determination unit is specifically used to: determine the second linear prediction efficiency interval into which the reference short-time linear prediction efficiency of the current audio frame falls, and determine a second audio coding method having a mapping relationship with the second linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the second audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

结合第二方面的第一种可能的实施方式或第二方面的第二种可能的实施方式，在第二方面的第八种可能的实施方式中，In combination with the first possible implementation manner of the second aspect or the second possible implementation manner of the second aspect, in an eighth possible implementation manner of the second aspect,

若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考综合线性预测效率，则所述确定单元具体用于：If the reference linear prediction efficiency of the current audio frame includes a reference integrated linear prediction efficiency of the current audio frame, the determining unit is specifically configured to:

结合第二方面的第一种可能的实施方式或第二方面的第二种可能的实施方式，在第二方面的第九种可能的实施方式中，In combination with the first possible implementation manner of the second aspect or the second possible implementation manner of the second aspect, in a ninth possible implementation manner of the second aspect,

若所述当前音频帧的参考线性预测效率包括所述当前音频帧的参考综合线性预测效率，所述确定单元具体用于：确定所述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与所述第三线性预测效率区间具有映射关系的第三音频编码方式，其中，所述第三音频编码方式为与所述当前音频帧的参考线性预测效率匹配的音频编码方式，所述第三音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。If the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, the determination unit is specifically used to: determine a third linear prediction efficiency interval into which the reference comprehensive linear prediction efficiency of the current audio frame falls, and determine a third audio coding method having a mapping relationship with the third linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the third audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

结合第二方面的第一至九种可能的实施方式，在第二方面的第十种可能的实施方式中，在估计当前音频帧的参考长时线性预测效率的方面，所述估计单元具体用于：估计当前音频帧的长时线性预测效率，其中，所述当前音频帧的长时线性预测效率为所述当前音频帧的参考长时线性预测效率；或者，In combination with the first to ninth possible implementations of the second aspect, in a tenth possible implementation of the second aspect, in terms of estimating the reference long-term linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: estimate the long-term linear prediction efficiency of the current audio frame, wherein the long-term linear prediction efficiency of the current audio frame is the reference long-term linear prediction efficiency of the current audio frame; or

在估计所述当前音频帧的参考长时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N1个历史音频帧的线性预测效率；计算所述N1个历史音频帧的线性预测效率和所述当前音频帧的长时线性预测效率的第一统计值，其中，所述N1为正整数，所述第一统计值为所述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；所述N11个历史音频帧为所述N1个历史音频帧的子集；或者，In terms of estimating the reference long-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N1 historical audio frames of the current audio frame; calculate the linear prediction efficiencies of the N1 historical audio frames and a first statistical value of the long-time linear prediction efficiency of the current audio frame, wherein N1 is a positive integer, and the first statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N11 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the N11 historical audio frames are a subset of the N1 historical audio frames; or,

在估计所述当前音频帧的参考长时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N2个历史音频帧的参考线性预测效率；计算所述N2个历史音频帧的参考线性预测效率和所述当前音频帧的长时线性预测效率的第二统计值，其中，所述N2为正整数，其中，所述第二统计值为所述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N21个历史音频帧为所述N2个历史音频帧的子集；或者，In terms of estimating the reference long-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N2 historical audio frames of the current audio frame; calculate the reference linear prediction efficiencies of the N2 historical audio frames and a second statistical value of the long-time linear prediction efficiency of the current audio frame, wherein N2 is a positive integer, wherein the second statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N21 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N21 historical audio frames are a subset of the N2 historical audio frames; or,

在估计所述当前音频帧的参考长时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的长时线性预测效率；获取所述当前音频帧的N4个历史音频帧的参考线性预测效率，获取所述当前音频帧的N3个历史音频帧的线性预测效率；计算所述N3个历史音频帧的线性预测效率、所述N4个历史音频帧的参考线性预测效率和所述当前音频帧的长时线性预测效率的第三统计值，其中，所述N3和所述N4为正整数，所述第三统计值为所述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N31个历史音频帧为所述N3个历史音频帧的子集，所述N41个历史音频帧为所述N4个历史音频帧的子集。In terms of estimating the reference long-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N4 historical audio frames of the current audio frame, and obtain the linear prediction efficiency of N3 historical audio frames of the current audio frame; calculate the linear prediction efficiency of the N3 historical audio frames, the reference linear prediction efficiency of the N4 historical audio frames, and the third statistical value of the long-time linear prediction efficiency of the current audio frame, wherein N3 and N4 are positive integers, and the third statistical value is the reference long-time linear prediction efficiency of the current audio frame. The linear prediction efficiency of each historical audio frame in the N31 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each historical audio frame in the N41 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N31 historical audio frames are a subset of the N3 historical audio frames, and the N41 historical audio frames are a subset of the N4 historical audio frames.

结合第二方面的第一至九种可能的实施方式，在第二方面的第十一种可能的实施方式中，In combination with the first to ninth possible implementations of the second aspect, in an eleventh possible implementation of the second aspect,

在估计所述当前音频帧的参考短时线性预测效率的方面，所述估计单元具体用于：估计当前音频帧的短时线性预测效率，其中，所述当前音频帧的短时线性预测效率为所述当前音频帧的参考短时线性预测效率；In terms of estimating the reference short-time linear prediction efficiency of the current audio frame, the estimating unit is specifically configured to: estimate the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame;

或者，or,

在估计所述当前音频帧的参考短时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N5个历史音频帧的线性预测效率；计算所述N5个历史音频帧的线性预测效率和所述当前音频帧的短时线性预测效率的第四统计值，其中，所述N5为正整数，所述第四统计值为所述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，所述N51个历史音频帧为所述N5个历史音频帧的子集；或者，In terms of estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N5 historical audio frames of the current audio frame; calculate a fourth statistical value of the linear prediction efficiencies of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N5 is a positive integer, and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency, and the N51 historical audio frames are a subset of the N5 historical audio frames; or,

在估计所述当前音频帧的参考短时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N6个历史音频帧的参考线性预测效率；计算所述N6个历史音频帧的参考线性预测效率和所述当前音频帧的短时线性预测效率的第五统计值，其中，所述N6为正整数，所述第五统计值为所述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N61个历史音频帧为所述N6个历史音频帧的子集；或者，In terms of estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N6 historical audio frames of the current audio frame; calculate the fifth statistical value of the reference linear prediction efficiencies of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N6 is a positive integer, and the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N61 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N61 historical audio frames are a subset of the N6 historical audio frames; or,

在估计所述当前音频帧的参考短时线性预测效率的方面，所述估计单元具体用于：估计得到当前音频帧的短时线性预测效率；获取所述当前音频帧的N8个历史音频帧的参考线性预测效率；获取所述当前音频帧的N7个历史音频帧的线性预测效率；计算所述N7个历史音频帧的线性预测效率、所述N8个历史音频帧的参考线性预测效率和所述当前音频帧的短时线性预测效率的第六统计值，其中，所述N7和所述N8为正整数，所述第六统计值为所述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，所述N71个历史音频帧为所述N7个历史音频帧的子集，所述N81个历史音频帧为所述N8个历史音频帧的子集。In terms of estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N8 historical audio frames of the current audio frame; obtain the linear prediction efficiency of N7 historical audio frames of the current audio frame; calculate the sixth statistical value of the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N7 and N8 are positive integers, and the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame. The linear prediction efficiency of each historical audio frame in the N71 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames.

结合第二方面的第十一种可能的实施方式，在第二方面的第十二种可能的实施方式中，在所述估计得到当前音频帧的短时线性预测效率的方面，所述估计单元具体用于：基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率。In combination with the eleventh possible implementation of the second aspect, in the twelfth possible implementation of the second aspect, in terms of estimating the short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: obtain the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame.

结合第二方面的第十二种可能的实施方式，在第二方面的第十三种可能的实施方式中，在所述基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率的方面，所述估计单元具体用于：计算当前音频帧进行短时线性预测前后的能量变化率，其中，所述能量变化率为所述当前音频帧的短时线性预测效率，或者所述当前音频帧的短时线性预测效率基于所述能量变化率变换得到，其中，所述当前音频帧进行短时线性预测后的能量为所述当前音频帧的线性预测残差的能量。In combination with the twelfth possible implementation of the second aspect, in the thirteenth possible implementation of the second aspect, in the aspect of obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame, the estimation unit is specifically used to: calculate the energy change rate of the current audio frame before and after short-time linear prediction, wherein the energy change rate is the short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained based on the transformation of the energy change rate, wherein the energy of the current audio frame after short-time linear prediction is the energy of the linear prediction residual of the current audio frame.

结合第二方面的第十三种可能的实施方式，在第二方面的第十四种可能的实施方式中，所述当前音频帧进行短时线性预测前后的能量变化率，为所述当前音频帧进行短时线性预测前的能量与所述当前音频帧的线性预测残差的能量的比值。In combination with the thirteenth possible implementation of the second aspect, in the fourteenth possible implementation of the second aspect, the energy change rate of the current audio frame before and after the short-time linear prediction is the ratio of the energy of the current audio frame before the short-time linear prediction to the energy of the linear prediction residual of the current audio frame.

结合第二方面的第十种可能的实施方式，在第二方面的第十五种可能的实施方式中，In combination with the tenth possible implementation manner of the second aspect, in a fifteenth possible implementation manner of the second aspect,

在所述估计得到当前音频帧的长时线性预测效率的方面，所述估计单元具体用于：根据计算当前音频帧的线性预测残差和第一历史线性预测信号，得到当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性，其中，所述相关性为所述当前音频帧的长时线性预测效率，或者所述当前音频帧的长时线性预测效率基于所述相关性得到，其中，所述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差，所述第一历史线性预测残差为所述当前音频帧的历史音频帧的线性预测残差，所述第一历史线性预测激励为所述当前音频帧的历史音频帧的线性预测激励。In terms of estimating the long-term linear prediction efficiency of the current audio frame, the estimation unit is specifically used to: obtain the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the calculated linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the correlation, wherein the first historical linear prediction signal is the first historical linear prediction excitation or the first historical linear prediction residual, the first historical linear prediction residual is the linear prediction residual of the historical audio frame of the current audio frame, and the first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the current audio frame.

结合第二方面的第十五种可能的实施方式，在第二方面的第十六种可能的实施方式中，在所述根据计算当前音频帧的线性预测残差和第一历史线性预测信号，得到当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性的方面，所述估计单元具体用于：计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性；In conjunction with the fifteenth possible implementation manner of the second aspect, in a sixteenth possible implementation manner of the second aspect, in the aspect of obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on calculating the linear prediction residual of the current audio frame and the first historical linear prediction signal, the estimation unit is specifically configured to: calculate the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

或者，将当前音频帧的线性预测残差乘以增益因子以得到所述当前音频帧的增益线性预测残差，计算得到所述当前音频帧的增益线性预测残差与第一历史线性预测信号之间的相关性，其中，计算得到的所述当前音频帧的增益线性预测残差与所述第一历史线性预测信号之间的相关性，为所述当前音频帧的线性预测残差与所述第一历史线性预测信号之间的相关性；Alternatively, multiplying a linear prediction residual of a current audio frame by a gain factor to obtain a gain linear prediction residual of the current audio frame, and calculating a correlation between the gain linear prediction residual of the current audio frame and a first historical linear prediction signal, wherein the calculated correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

结合第二方面的第十五种可能的实施方式或第二方面的第十六种可能的实施方式，在第二方面的第十七种可能的实施方式中，所述第一历史线性预测激励或第一历史线性预测残差基于所述当前音频帧的基音确定。In combination with the fifteenth possible implementation of the second aspect or the sixteenth possible implementation of the second aspect, in the seventeenth possible implementation of the second aspect, the first historical linear prediction excitation or the first historical linear prediction residual is determined based on the fundamental tone of the current audio frame.

结合第二方面的第十五至十七种可能的实施方式，在第二方面的第十八种可能的实施方式中，所述第一历史线性预测激励与所述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测激励与所述当前音频帧的线性预测残差在时域上的相关性；In combination with the fifteenth to seventeenth possible implementations of the second aspect, in an eighteenth possible implementation of the second aspect, a time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to a time-domain correlation between other historical linear prediction excitations and the linear prediction residual of the current audio frame;

结合第二方面的第十五至十八种可能的实施方式，在第二方面的第十九种可能的实施方式中，所述第一历史线性预测激励为利用基于线性预测的编码方式对所述当前音频帧的历史音频帧进行音频编码而产生的线性预测激励。In combination with the fifteenth to eighteenth possible implementations of the second aspect, in the nineteenth possible implementation of the second aspect, the first historical linear prediction excitation is a linear prediction excitation generated by audio encoding the historical audio frames of the current audio frame using a linear prediction-based encoding method.

结合第二方面的第十五至十九种可能的实施方式，在第二方面的第二十种可能的实施方式中，所述第一历史线性预测残差基于所述当前音频帧的第一历史音频帧的时域信号和所述第一历史音频帧的线性预测系数得到，其中，所述第一历史音频帧的线性预测编码系数为量化后的线性预测系数或未经量化的线性预测系数。In combination with the fifteenth to nineteenth possible implementations of the second aspect, in the twentieth possible implementation of the second aspect, the first historical linear prediction residual is obtained based on the time domain signal of the first historical audio frame of the current audio frame and the linear prediction coefficient of the first historical audio frame, wherein the linear prediction coding coefficient of the first historical audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

结合第二方面的第十五至二十种可能的实施方式，在第二方面的第二十一种可能的实施方式中，所述当前音频帧的线性预测残差基于所述当前音频帧的时域信号和所述当前音频帧的线性预测系数得到，其中，所述当前音频帧的线性预测系数为量化后的线性预测系数或未经量化的线性预测系数。In combination with the fifteenth to twentieth possible implementations of the second aspect, in the twenty-first possible implementation of the second aspect, the linear prediction residual of the current audio frame is obtained based on the time domain signal of the current audio frame and the linear prediction coefficient of the current audio frame, wherein the linear prediction coefficient of the current audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

结合第二方面的第十五至二十一种可能的实施方式，在第二方面的第二十二种可能的实施方式中，所述第一历史线性预测激励为自适应码本激励与固定码本激励的叠加激励，或者所述第一历史线性预测激励为自适应码本激励。In combination with the fifteenth to twenty-first possible implementations of the second aspect, in the twenty-second possible implementation of the second aspect, the first historical linear prediction excitation is a superimposed excitation of an adaptive codebook excitation and a fixed codebook excitation, or the first historical linear prediction excitation is an adaptive codebook excitation.

结合第二方面的第十五至二十二种可能的实施方式，在第二方面的第二十三种可能的实施方式中，所述相关性为时域上的互相关函数值和/或频域上的互相关函数值，或者所述相关性为时域上的失真和/或频域上的失真。In combination with the fifteenth to twenty-second possible implementations of the second aspect, in the twenty-third possible implementation of the second aspect, the correlation is the cross-correlation function value in the time domain and/or the cross-correlation function value in the frequency domain, or the correlation is the distortion in the time domain and/or the distortion in the frequency domain.

结合第二方面的二十三种可能的实施方式，在第二方面的第二十四种可能的实施方式中，所述频域上的失真为在频域上的K1个频点的失真的和值或加权和值，或者所述频域上的失真为在频域上的K2个子带上的失真的和值或加权和值，所述K1和所述K2为正整数。In combination with the twenty-three possible implementations of the second aspect, in the twenty-fourth possible implementation of the second aspect, the distortion in the frequency domain is the sum or weighted sum of the distortions of K1 frequency points in the frequency domain, or the distortion in the frequency domain is the sum or weighted sum of the distortions on K2 subbands in the frequency domain, and K1 and K2 are positive integers.

结合第二方面的二十四种可能的实施方式，在第二方面的第二十五种可能的实施方式中，所述失真的加权和值所对应的加权系数为反映心理声学模型的感知加权系数。In combination with the twenty-four possible implementations of the second aspect, in a twenty-fifth possible implementation of the second aspect, the weighting coefficient corresponding to the weighted sum of the distortion values is a perceptual weighting coefficient reflecting a psychoacoustic model.

可以看出，在本发明一些实施例的技术方案中，由于是先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述技术方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。It can be seen that in the technical solutions of some embodiments of the present invention, the reference linear prediction efficiency of the current audio frame is first estimated; the audio coding method that matches the current audio frame is determined by the estimated reference linear prediction efficiency of the current audio frame, and the current audio frame is audio-encoded according to the determined audio coding method that matches the current audio frame. Since the above-mentioned solution does not need to perform the operation of completely encoding the current audio frame using each audio coding method required by the existing closed-loop selection mode in the process of determining the audio coding method, but instead determines the audio coding method to be selected based on the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is usually much smaller than the computational complexity of completely encoding the current audio frame using each audio coding method, compared with the existing mechanism, the above-mentioned technical solution of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the overhead of audio coding.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the drawings required for use in the description of the embodiments. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1为本发明一个实施例提供的一种音频编码方法的流程示意图；FIG1 is a schematic flow chart of an audio encoding method provided by one embodiment of the present invention;

图2为本发明另一个实施例提供的另一种音频编码方法的流程示意图；FIG2 is a schematic flow chart of another audio encoding method provided by another embodiment of the present invention;

图3-a为本发明一个实施例提供的一种音频编码器的结构示意图；FIG3-a is a schematic structural diagram of an audio encoder provided by an embodiment of the present invention;

图3-b为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-b is a schematic diagram of the structure of another audio encoder provided by another embodiment of the present invention;

图3-c为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-c is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图3-d为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-d is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图3-e为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-e is a schematic diagram of the structure of another audio encoder provided by another embodiment of the present invention;

图3-f为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-f is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图3-g为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-g is a schematic diagram of the structure of another audio encoder provided by another embodiment of the present invention;

图3-h为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-h is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图3-i为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG3-i is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图4为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG4 is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图5为本发明另一个实施例提供的另一种音频编码器的结构示意图；FIG5 is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention;

图6为本发明另一个实施例提供的另一种音频编码器的结构示意图。FIG6 is a schematic structural diagram of another audio encoder provided by another embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the embodiments described are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without making creative efforts should fall within the scope of protection of the present invention.

以下分别进行详细说明。The following are detailed descriptions of each.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first," "second," "third," "fourth," and so on, in the description and claims of the present invention and the accompanying drawings are used to distinguish between different items, not to describe a particular order. Furthermore, the terms "including," "having," and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or apparatus comprising a series of steps or elements is not limited to the listed steps or elements, but may optionally include steps or elements not listed, or may optionally include other steps or elements inherent to the process, method, product, or apparatus.

下面先介绍本发明实施例提供的音频编码方法，本发明实施例提供的音频编码方法的执行主体可为音频编码器，该音频编码器可为任何需要采集、存储或者向外传输音频信号的装置，例如手机、平板电脑、个人电脑、笔记本电脑等等。The following first introduces the audio encoding method provided by an embodiment of the present invention. The executor of the audio encoding method provided by an embodiment of the present invention can be an audio encoder, and the audio encoder can be any device that needs to collect, store or transmit audio signals to the outside, such as a mobile phone, a tablet computer, a personal computer, a laptop computer, etc.

本发明音频编码方法的一实施例，其中，一种音频编码方法可包括：估计当前音频帧的参考线性预测效率；确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式；按照与上述当前音频帧的参考线性预测效率匹配的音频编码方式，对上述当前音频帧进行音频编码。An embodiment of the audio encoding method of the present invention, wherein an audio encoding method may include: estimating a reference linear prediction efficiency of a current audio frame; determining an audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame; and performing audio encoding on the above-mentioned current audio frame according to the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame.

首先请参见图1，图1为本发明的一个实施例提供的一种音频编码方法的流程示意图。其中，如图1所示，本发明实施例提供的一种音频编码方法可包括以下内容：First, please refer to Figure 1, which is a flow chart of an audio encoding method provided by an embodiment of the present invention. As shown in Figure 1, an audio encoding method provided by an embodiment of the present invention may include the following contents:

101、估计当前音频帧的参考线性预测效率。101. Estimate a reference linear prediction efficiency of the current audio frame.

在实际应用中，可以采用多种可用算法来估计当前音频帧的参考线性预测效率。In practical applications, a variety of available algorithms can be used to estimate the reference linear prediction efficiency of the current audio frame.

其中，在本发明的各实施例中，音频帧(如当前音频帧或当前音频帧的历史音频帧)的参考线性预测效率可用于表示该音频帧能够被进行线性预测的程度。其中，音频帧(如当前音频帧或者当前音频帧的历史音频帧)的线性预测结果指该音频帧的线性预测值。其中，音频帧(如当前音频帧或当前音频帧的历史音频帧)的参考线性预测效率越高，则表示该音频帧能够被进行线性预测的程度越高。In various embodiments of the present invention, a reference linear prediction efficiency of an audio frame (such as a current audio frame or a historical audio frame of the current audio frame) may be used to indicate the degree to which the audio frame can be linearly predicted. A linear prediction result of an audio frame (such as a current audio frame or a historical audio frame of the current audio frame) refers to a linear prediction value of the audio frame. A higher reference linear prediction efficiency of an audio frame (such as a current audio frame or a historical audio frame of the current audio frame) indicates a higher degree to which the audio frame can be linearly predicted.

在本发明的一些实施例中，上述参考线性预测效率包括如下线性预测效率的至少一种：参考长时线性预测效率、参考短时线性预测效率和参考综合线性预测效率，其中，上述参考综合线性预测效率基于上述参考长时线性预测效率和上述参考短时线性预测效率得到。In some embodiments of the present invention, the above-mentioned reference linear prediction efficiency includes at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the above-mentioned reference comprehensive linear prediction efficiency is obtained based on the above-mentioned reference long-time linear prediction efficiency and the above-mentioned reference short-time linear prediction efficiency.

其中，当前音频帧的参考长时线性预测效率可基于当前音频帧的长时线性预测效率得到。当前音频帧的参考短时线性预测效率可基于当前音频帧的短时线性预测效率得到。当前音频帧的参考综合线性预测效率例如可基于当前音频帧的长时线性预测效率和短时线性预测效率得到。The reference long-term linear prediction efficiency of the current audio frame may be obtained based on the long-term linear prediction efficiency of the current audio frame. The reference short-term linear prediction efficiency of the current audio frame may be obtained based on the short-term linear prediction efficiency of the current audio frame. The reference comprehensive linear prediction efficiency of the current audio frame may be obtained based on, for example, the long-term linear prediction efficiency and the short-term linear prediction efficiency of the current audio frame.

可以理解，参考线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x1(x1为正数)。其中，参考长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x2(x2为正数)。参考短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x3(x3为正数)。其中，参考综合线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x4(x4为正数)。其中，长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x5(x5为正数)。短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x6(x6为正数)。其中，x1、x2、x3、x4、x5或x6例如可为0.5、0.8或1.5、2、5、10、50、100或其它正数。为便于描述，下面举例中主要以各线性预测效率的取值范围为0～1(即0％～100％)为例，而其它取值范围可以据此类推。It can be understood that the value range of the reference linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x1 (x1 is a positive number). Among them, the value range of the reference long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x2 (x2 is a positive number). The value range of the reference short-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x3 (x3 is a positive number). Among them, the value range of the reference comprehensive linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x4 (x4 is a positive number). Among them, the value range of the long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x5 (x5 is a positive number). The short-term linear prediction efficiency can range from 0 to 1 (i.e., 0% to 100%), or can also range from 0 to x6 (x6 is a positive number). Here, x1, x2, x3, x4, x5, or x6 can be, for example, 0.5, 0.8, 1.5, 2, 5, 10, 50, 100, or other positive numbers. For ease of description, the following examples mainly use the range of 0 to 1 (i.e., 0% to 100%) for each linear prediction efficiency, and other value ranges can be deduced accordingly.

102、确定与估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式。102. Determine an audio coding mode that matches the estimated reference linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，音频编码方式与音频帧的参考线性预测效率之间可以具有设定的映射关系，例如，不同的音频编码方式可以对应不同的参考线性预测效率，或者，不同的音频编码方式可以对应不同的参考线性预测效率区间等。例如可在至少两个音频编码方式中，确定与估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式。In some embodiments of the present invention, a predetermined mapping relationship may be established between the audio coding mode and the reference linear prediction efficiency of the audio frame. For example, different audio coding modes may correspond to different reference linear prediction efficiencies, or different audio coding modes may correspond to different reference linear prediction efficiency ranges. For example, an audio coding mode that matches the estimated reference linear prediction efficiency of the current audio frame may be determined from at least two audio coding modes.

103、按照与上述当前音频帧的参考线性预测效率匹配的音频编码方式对上述当前音频帧进行音频编码。103. Perform audio encoding on the current audio frame according to an audio encoding method that matches a reference linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，在估计当前音频帧的参考线性预测效率之前可以先判断当前音频帧是否为语音音频帧。例如，上述估计当前音频帧的参考线性预测效率可以包括：当当前音频帧为非语音音频帧，估计上述当前音频帧的参考线性预测效率。此外，也可在上述估计当前音频帧的参考线性预测效率之前不区分当前音频帧是否为语音音频帧，即，无论当前音频帧为语音音频帧还是非语音音频帧，均执行步骤101～步骤103。In some embodiments of the present invention, before estimating the reference linear prediction efficiency of the current audio frame, it may be determined whether the current audio frame is a speech audio frame. For example, estimating the reference linear prediction efficiency of the current audio frame may include estimating the reference linear prediction efficiency of the current audio frame when the current audio frame is a non-speech audio frame. Furthermore, estimating the reference linear prediction efficiency of the current audio frame may not distinguish whether the current audio frame is a speech audio frame before estimating the reference linear prediction efficiency of the current audio frame. That is, steps 101 to 103 are performed regardless of whether the current audio frame is a speech audio frame or a non-speech audio frame.

可以看出，本实施例的技术方案中，由于是先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。It can be seen that in the technical solution of this embodiment, the reference linear prediction efficiency of the current audio frame is first estimated; the audio coding method that matches it is determined by the estimated reference linear prediction efficiency of the above-mentioned current audio frame, and the audio encoding of the above-mentioned current audio frame is performed according to the determined audio coding method that matches it. Since the above-mentioned scheme does not need to perform the operation of fully encoding the current audio frame using each audio coding method required by the existing closed-loop selection mode in the process of determining the audio coding method, but instead determines the audio coding method to be selected based on the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is usually much smaller than the computational complexity of fully encoding the current audio frame using each audio coding method, compared with the existing mechanism, the above-mentioned scheme of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the overhead of audio coding.

在本发明的一些实施例中，音频帧(例如当前音频帧或其它音频帧)的参考综合线性预测效率基于该音频帧的参考长时线性预测效率和该音频帧的参考短时线性预测效率得到。例如，上述当前音频帧的参考综合线性预测效率例如可为上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率的和值、加权和值(其中，此处加权和值所对应的权值可以根据实际需要进行设定，其中1个权值例如可为0.5、1.、2、3、5、10或者其它值)或者平均值。当然，也可能通过其它算法，基于上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率得到上述当前音频帧的参考综合线性预测效率。In some embodiments of the present invention, a reference comprehensive linear prediction efficiency for an audio frame (e.g., a current audio frame or another audio frame) is obtained based on a reference long-term linear prediction efficiency and a reference short-term linear prediction efficiency for the audio frame. For example, the reference comprehensive linear prediction efficiency for the current audio frame may be the sum, weighted sum (wherein the weight corresponding to the weighted sum may be set as needed, such as 0.5, 1.0, 2.0, 3.0, 5.0, 10.0, or other values), or average value of the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency for the current audio frame. Of course, the reference comprehensive linear prediction efficiency for the current audio frame may also be obtained using other algorithms based on the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency for the current audio frame.

在本发明一些实施例中，基于线性预测的音频编码方式可包括代数码激励线性预测(ACELP，Algebraic Code Excited Linear Prediction)编码、变换激励编码(TCX，Transform Coded Excitation)等。非基于线性预测的音频编码方式可包括一般音频编码(GAC，Generic Audio Coding)，GAC例如可包括修正离散余弦变换(MDCT，ModifiedDiscrete Cosine Transform)编码或离散余弦变换(DCT，Discrete Cosine Transform)编码等。In some embodiments of the present invention, audio coding methods based on linear prediction may include Algebraic Code Excited Linear Prediction (ACELP) coding, Transform Coded Excitation (TCX) coding, etc. Audio coding methods not based on linear prediction may include Generic Audio Coding (GAC), which may include, for example, Modified Discrete Cosine Transform (MDCT) coding or Discrete Cosine Transform (DCT) coding.

可以理解的是，上述当前音频帧的参考线性预测效率所包括的线性预测效率的种类不同，确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的具体方式也就可能不同。下面举例一些可能的实施例方式。It is understandable that the specific method for determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame may vary depending on the type of linear prediction efficiency included in the reference linear prediction efficiency of the current audio frame. Some possible embodiments are exemplified below.

举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式可以包括：若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式可以包括：若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式可包括：若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction; if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: if the reference long-time linear prediction efficiency of the current audio frame is less than a fourth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction; if the reference long-time linear prediction efficiency of the current audio frame is less than a fourth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，上述第一音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。其中，不同的线性预测效率区间对应于不同的音频编码方式。例如假设存着3个线性预测效率区间，分别可为0～30％GAC、30％～70％TCX和70％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～30％(即第一线性预测效率区间为线性预测效率区间0～30％)，可确定线性预测效率区间0～30％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式(例如GAC)。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间30％～70％(即第一线性预测效率区间为线性预测效率区间30％～70％)，可以确定线性预测效率区间30％～70％对应的音频编码方式(例如TCX)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间70％～100％(即第一线性预测效率区间为线性预测效率区间70％～100％)，可确定线性预测效率区间70％～100％对应的音频编码方式(如ACELP编码)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景可以以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, then determining an audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes: determining a first linear prediction efficiency interval within which the reference long-term linear prediction efficiency of the current audio frame falls, and determining a first audio coding mode that has a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between linear prediction efficiency intervals and audio coding modes based on linear prediction, wherein the first audio coding mode is an audio coding mode that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding mode is an audio coding mode based on linear prediction or an audio coding mode not based on linear prediction. Different linear prediction efficiency intervals correspond to different audio coding modes. For example, assuming there are three linear prediction efficiency intervals, namely 0-30% GAC, 30%-70% TCX, and 70%-100%, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-30% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 0-30%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-30% can be determined to be the audio coding mode (e.g., GAC) that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 30%-70% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 30%-70%), the audio coding mode (e.g., TCX) that corresponds to the linear prediction efficiency interval of 30%-70% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 70% to 100% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 70% to 100%), the audio coding method corresponding to the linear prediction efficiency interval of 70% to 100% (e.g., ACELP coding) can be determined as the audio coding method that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between the linear prediction efficiency interval and the linear prediction-based audio coding method can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame may include: if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction; if the reference short-time linear prediction efficiency of the current audio frame is less than the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第二线性预测效率区间具有映射关系的第二音频编码方式或为非基于线性预测的音频编码方式，其中，上述第二音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第二音频编码方式为基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～40％、40％～60％和60％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～40％(即第二线性预测效率区间为线性预测效率区间0～40％)，则可确定线性预测效率区间0～40％对应的音频编码方式(例如GAC)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间40％～60％(即第二线性预测效率区间为线性预测效率区间40％～60％)，确定线性预测效率区间40％～60％对应的音频编码方式(例如TCX)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间60％～100％(即第二线性预测效率区间为线性预测效率区间60％～100％)，确定线性预测效率区间60％～100％对应的音频编码方式(例如ACELP编码)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景可以以此类推。可根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining the second linear prediction efficiency interval in which the reference short-time linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, determining a second audio encoding method that has a mapping relationship with the second linear prediction efficiency interval or an audio encoding method that is not based on linear prediction, wherein the above-mentioned second audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the above-mentioned second audio encoding method is an audio encoding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which may be 0-40%, 40%-60%, and 60%-100%, respectively, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-40% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 0-40%), then the audio coding mode corresponding to the linear prediction efficiency interval of 0-40% (e.g., GAC) can be determined as the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 40%-60% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 40%-60%), then the audio coding mode corresponding to the linear prediction efficiency interval of 40%-60% (e.g., TCX) can be determined as the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency range of 60% to 100% (i.e., the second linear prediction efficiency range is the linear prediction efficiency range of 60% to 100%), the audio coding method corresponding to the linear prediction efficiency range of 60% to 100% (e.g., ACELP coding) is determined as the audio coding method that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between the linear prediction efficiency range and the linear prediction-based audio coding method can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference comprehensive linear prediction efficiency of the current audio frame is greater than or equal to a sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference comprehensive linear prediction efficiency of the current audio frame is less than a sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，可包括：若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame may include: if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than the sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第三线性预测效率区间具有映射关系的第三音频编码方式或为非基于线性预测的音频编码方式，其中，上述第三音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第三音频编码方式为基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～50％、50％～80％和80％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～50％(即第三线性预测效率区间为线性预测效率区间0～50％)，则可确定线性预测效率区间0～50％对应的音频编码方式(例如GAC)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间50～80％(即第三线性预测效率区间为线性预测效率区间50％～80％)，确定线性预测效率区间50％～80％对应的音频编码方式(例如TCX)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间80％～100％(即第三线性预测效率区间为线性预测效率区间80％～100％)，确定线性预测效率区间80％～100％对应的音频编码方式(例如ACELP编码)，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。其它场景可以以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining the third linear prediction efficiency interval in which the reference comprehensive linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, determining a third audio encoding method that has a mapping relationship with the third linear prediction efficiency interval or an audio encoding method that is not based on linear prediction, wherein the third audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio encoding method is an audio encoding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which may be 0-50%, 50%-80%, and 80%-100%, respectively, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-50% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 0-50%), then the audio coding mode corresponding to the linear prediction efficiency interval of 0-50% (e.g., GAC) can be determined as the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 50%-80% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 50%-80%), then the audio coding mode corresponding to the linear prediction efficiency interval of 50%-80% (e.g., TCX) can be determined as the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency range of 80% to 100% (i.e., the third linear prediction efficiency range is the linear prediction efficiency range of 80% to 100%), the audio coding method corresponding to the linear prediction efficiency range of 80% to 100% (e.g., ACELP coding) is determined to be the audio coding method that matches the reference linear prediction efficiency of the current audio frame. The same logic can be applied to other scenarios. The mapping relationship between the linear prediction efficiency range and the linear prediction-based audio coding method can be set according to the needs of different application scenarios.

可以理解，上述举例中提及的各种阈值(例如第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值)的具体取值，可根据需要或者根据应用的环境和场景进行设定。例如上述当前音频帧的参考长时线性预测效率的取值范围为0～1，则第一阈值可取值为0.2、0.5、0.6、0.8、0.9等、上述当前音频帧的参考短时线性预测效率的取值范围为0～1，第二阈值可取值为0.3、0.3、0.6或0.8、0.9等。其它场景以此类推。进一步的，还可根据需要对各种阈值的取值进行动态适应性的调整。举例来说，若倾向于选择基于线性预测的音频编码方式(如TCX、ACELP编码等)来编码音频帧，则相应的阈值(例如第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值)可以设定的相对小一些。若倾向于选择非基于线性预测的音频编码方式(如GAC编码等)来编码音频帧，则相应阈值(如第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值)可以设定的相对大一些。以此类推。It is understood that the specific values of the various thresholds (e.g., the first threshold, the second threshold, the third threshold, the fourth threshold, the fifth threshold, and the sixth threshold) mentioned in the above examples can be set as needed or according to the application environment and scenario. For example, if the reference long-term linear prediction efficiency of the current audio frame ranges from 0 to 1, the first threshold can be 0.2, 0.5, 0.6, 0.8, 0.9, etc.; if the reference short-term linear prediction efficiency of the current audio frame ranges from 0 to 1, the second threshold can be 0.3, 0.3, 0.6, or 0.8, 0.9, etc. The same applies to other scenarios. Furthermore, the values of the various thresholds can be dynamically and adaptively adjusted as needed. For example, if a linear prediction-based audio coding method (such as TCX or ACELP coding) is preferred for encoding audio frames, the corresponding thresholds (e.g., the first threshold, the second threshold, the third threshold, the fourth threshold, the fifth threshold, and the sixth threshold) can be set relatively small. If an audio coding method not based on linear prediction (such as GAC coding) is preferred for encoding audio frames, the corresponding thresholds (such as the first threshold, the second threshold, the third threshold, the fourth threshold, the fifth threshold, and the sixth threshold) may be set relatively large.

可以理解的是，上述当前音频帧的参考线性预测效率所包括的不同种类线性预测效率的具体估计方式可能有所不同。下面通过举例一些可能的实施例方式进行说明。It is understandable that the specific estimation methods of different types of linear prediction efficiencies included in the reference linear prediction efficiency of the current audio frame may be different. The following describes some possible embodiments.

举例来说，在本发明的一些实施例中，当前音频帧的参考长时线性预测效率可通过如下方式估计得到：估计当前音频帧的长时线性预测效率，上述当前音频帧的长时线性预测效率为上述当前音频帧的参考长时线性预测效率。For example, in some embodiments of the present invention, the reference long-time linear prediction efficiency of the current audio frame can be estimated as follows: estimating the long-time linear prediction efficiency of the current audio frame, the long-time linear prediction efficiency of the current audio frame being the reference long-time linear prediction efficiency of the current audio frame.

或者，or,

上述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N1个历史音频帧的线性预测效率；计算上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值，其中，上述N1为正整数(例如N1可等于1、2、3或其它值)，上述第一统计值为上述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；上述每个历史音频帧的综合线性预测效率可基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到(例如上述N11个历史音频帧为音频帧F1、F2和F3，则音频帧F1的线性预测效率为音频帧F1的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，音频帧F2的线性预测效率为音频帧F2的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率、音频帧F3的线性预测效率为音频帧F3的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，音频帧F1的综合线性预测效率可基于上述音频帧F1的长时线性预测效率和短时线性预测效率得到，音频帧F2的综合线性预测效率可基于上述音频帧F2的长时线性预测效率和短时线性预测效率得到，音频帧F3的综合线性预测效率可基于上述音频帧F3的长时线性预测效率和短时线性预测效率得到，N11取其它值的场景以此类推)，上述N11个历史音频帧为上述N1个历史音频帧的子集(上述N11小于或等于上述N1)。其中，上述N1个历史音频帧可以是上述当前音频帧的任意N1个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N1个历史音频帧。上述N1个历史音频帧中除上述N11个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N11个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。其中，计算得到的上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值例如可以是，上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference long-time linear prediction efficiency of the current audio frame is estimated as follows: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N1 historical audio frames of the current audio frame; calculating a first statistical value of the linear prediction efficiencies of the N1 historical audio frames and the long-time linear prediction efficiency of the current audio frame, wherein N1 is a positive integer (for example, N1 may be equal to 1, 2, 3 or other values), and the first statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N11 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each of the historical audio frames can be obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the historical audio frames (for example, the N11 historical audio frames are audio frames F1, F2 and F3, then the linear prediction efficiency of audio frame F1 is the following linear prediction efficiency of audio frame F1 At least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the linear prediction efficiency of audio frame F2 is at least one of the following linear prediction efficiency of audio frame F2: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the linear prediction efficiency of audio frame F3 is at least one of the following linear prediction efficiency of audio frame F3: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of audio frame F1 can be obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of the above-mentioned audio frame F1; the comprehensive linear prediction efficiency of audio frame F2 can be obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of the above-mentioned audio frame F2; the comprehensive linear prediction efficiency of audio frame F3 can be obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of the above-mentioned audio frame F3; and so on for scenarios where N11 takes other values); the above-mentioned N11 historical audio frames are a subset of the above-mentioned N1 historical audio frames (the above-mentioned N11 is less than or equal to the above-mentioned N1). The N1 historical audio frames may be any N1 historical audio frames of the current audio frame, or may be the N1 historical audio frames adjacent to the current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the N1 historical audio frames other than the N11 historical audio frames may be other types of linear prediction efficiency that are different from the linear prediction efficiency of the N11 historical audio frames, which will not be detailed here. The first statistical value of the linear prediction efficiency of the N1 historical audio frames and the long-term linear prediction efficiency of the current audio frame obtained by calculation may be, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the N1 historical audio frames and the long-term linear prediction efficiency of the current audio frame.

或者，上述当前音频帧的参考长时线性预测效率例如可通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N2个历史音频帧的参考线性预测效率；计算上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值，其中，上述N2为正整数(例如N2可等于1、2、3或其它值)，上述第二统计值为上述当前音频帧的参考长时线性预测效率，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N21个历史音频帧为上述N2个历史音频帧的子集(上述N21小于或等于上述N2)。其中，上述N2个历史音频帧可以是上述当前音频帧的任意N2个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N2个历史音频帧。上述N2个历史音频帧中除上述N21个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N21个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。计算得到的上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值例如为，上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the reference long-time linear prediction efficiency of the current audio frame can be estimated, for example, by: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N2 historical audio frames of the current audio frame; calculating the reference linear prediction efficiencies of the N2 historical audio frames and a second statistical value of the long-time linear prediction efficiency of the current audio frame, wherein N2 is a positive integer (for example, N2 can be equal to 1, 2, 3 or other values), the second statistical value is the reference long-time linear prediction efficiency of the current audio frame, and the reference linear prediction efficiency of each of the N21 historical audio frames is at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N21 historical audio frames are a subset of the N2 historical audio frames (the N21 is less than or equal to the N2). Among them, the above-mentioned N2 historical audio frames can be any N2 historical audio frames of the above-mentioned current audio frame, or can be N2 historical audio frames adjacent to the above-mentioned current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the above-mentioned N2 historical audio frames except the above-mentioned N21 historical audio frames can be other types of linear prediction efficiency different from the linear prediction efficiency of the above-mentioned N21 historical audio frames, which are not listed in detail here. The second statistical value of the reference linear prediction efficiency of the above-mentioned N2 historical audio frames and the long-term linear prediction efficiency of the above-mentioned current audio frame calculated is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above-mentioned N2 historical audio frames and the long-term linear prediction efficiency of the above-mentioned current audio frame.

或者，上述当前音频帧的参考长时线性预测效率例如可通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N4个历史音频帧的参考线性预测效率，获取上述当前音频帧的N3个历史音频帧的线性预测效率；计算上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值，其中，上述N3和上述N4为正整数(例如N3和上述N4可等于1、2、3或其它值)，上述第三统计值为上述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述N31个历史音频帧为上述N3个历史音频帧的子集，上述N31小于或等于上述N3。其中，上述N3个历史音频帧可以是上述当前音频帧的任意N3个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N3个历史音频帧。上述N3个历史音频帧中除上述N31个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N31个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。上述N41个历史音频帧为上述N4个历史音频帧的子集，上述N41小于或等于上述N4，其中，上述N4个历史音频帧可以是上述当前音频帧的任意N4个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N4个历史音频帧。上述N4个历史音频帧中除上述N41个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N41个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到。上述N3个历史音频帧和上述N4个历史音频帧的交集可为空集或不是空集。计算得到的上述上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值例如为，上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the reference long-term linear prediction efficiency of the current audio frame can be estimated, for example, by: estimating the long-term linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N4 historical audio frames of the current audio frame, and obtaining the linear prediction efficiencies of N3 historical audio frames of the current audio frame; calculating a third statistical value of the linear prediction efficiencies of the N3 historical audio frames, the reference linear prediction efficiencies of the N4 historical audio frames, and the long-term linear prediction efficiency of the current audio frame, wherein the N3 and the N4 are positive integers (for example, the N3 and the N4 can be equal to 1, 2, 3 or other values), and the third statistical value is obtained. The statistical value is the reference long-term linear prediction efficiency of the current audio frame. The linear prediction efficiency of each of the N31 historical audio frames is at least one of the following linear prediction efficiencies: long-term linear prediction efficiency, short-term linear prediction efficiency, and comprehensive linear prediction efficiency. The reference linear prediction efficiency of each of the N41 historical audio frames is at least one of the following linear prediction efficiencies: reference long-term linear prediction efficiency, reference short-term linear prediction efficiency, and reference comprehensive linear prediction efficiency. The N31 historical audio frames are a subset of the N3 historical audio frames, and N31 is less than or equal to N3. The N3 historical audio frames can be any N3 historical audio frames of the current audio frame, or can be N3 historical audio frames that are temporally adjacent to the current audio frame. The linear prediction efficiencies of the remaining N3 historical audio frames, excluding the N31 historical audio frames, can be other types of linear prediction efficiencies different from the linear prediction efficiencies of the N31 historical audio frames, which are not detailed here. The above-mentioned N41 historical audio frames are a subset of the above-mentioned N4 historical audio frames, and the above-mentioned N41 is less than or equal to the above-mentioned N4, wherein the above-mentioned N4 historical audio frames can be any N4 historical audio frames of the above-mentioned current audio frame, or can be N4 historical audio frames adjacent to the above-mentioned current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the above-mentioned N4 historical audio frames except the above-mentioned N41 historical audio frames can be other types of linear prediction efficiencies different from the linear prediction efficiency of the above-mentioned N41 historical audio frames, which are not listed in detail here. The comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the above-mentioned historical audio frames, and the reference comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each of the above-mentioned historical audio frames. The intersection of the above-mentioned N3 historical audio frames and the above-mentioned N4 historical audio frames can be an empty set or not an empty set. The third statistical value calculated for the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame.

举例来说，在本发明的一些实施例中，上述当前音频帧的参考短时线性预测效率例如通过如下方式估计得到：估计当前音频帧的短时线性预测效率，其中上述当前音频帧的短时线性预测效率为上述当前音频帧的参考短时线性预测效率。For example, in some embodiments of the present invention, the reference short-time linear prediction efficiency of the current audio frame is estimated, for example, by estimating the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N5个历史音频帧的线性预测效率；计算上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值，其中，上述N5为正整数(例如N5可等于1、2、3或其它值)，上述第四统计值为上述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N51个历史音频帧为上述N5个历史音频帧的子集(上述N51小于或等于上述N5)。其中，上述N5个历史音频帧可以是上述当前音频帧的任意N5个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N5个历史音频帧。上述N5个历史音频帧中除上述N51个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N51个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。其中，计算得到的上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值可为上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated as follows: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N5 historical audio frames of the current audio frame; calculating a fourth statistical value of the linear prediction efficiencies of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein the N5 is a positive integer (for example, N5 can be equal to 1, 2, 3 or other values), and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, and the comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the long-time linear prediction efficiency and the short-time linear prediction efficiency of each of the historical audio frames, and the N51 historical audio frames are a subset of the N5 historical audio frames (the N51 is less than or equal to the N5). The N5 historical audio frames may be any N5 historical audio frames of the current audio frame, or may be the N5 historical audio frames adjacent to the current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the N5 historical audio frames other than the N51 historical audio frames may be other types of linear prediction efficiency that are different from the linear prediction efficiency of the N51 historical audio frames, which will not be detailed here. The fourth statistical value of the linear prediction efficiency of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N6个历史音频帧的参考线性预测效率；计算上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值，上述N6为正整数(例如N6可等于1、2、3或其它值)，上述第五统计值为上述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N61个历史音频帧为上述N6个历史音频帧的子集(上述N61小于或等于上述N6)。其中，上述N6个历史音频帧可以是上述当前音频帧的任意N6个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N6个历史音频帧。上述N6个历史音频帧中除上述N61个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N61个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。其中，计算得到的上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值可为，上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated in the following manner: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N6 historical audio frames of the current audio frame; calculating a fifth statistical value of the reference linear prediction efficiencies of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, where N6 is a positive integer (for example, N6 can be equal to 1, 2, 3 or other values), and the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N61 historical audio frames is at least one of the following linear prediction efficiencies: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N61 historical audio frames are a subset of the N6 historical audio frames (the N61 is less than or equal to the N6). Among them, the above-mentioned N6 historical audio frames can be any N6 historical audio frames of the above-mentioned current audio frame, or can be N6 historical audio frames adjacent to the above-mentioned current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the above-mentioned N6 historical audio frames except the above-mentioned N61 historical audio frames can be other types of linear prediction efficiency different from the linear prediction efficiency of the above-mentioned N61 historical audio frames, which are not listed in detail here. Among them, the fifth statistical value of the reference linear prediction efficiency of the above-mentioned N6 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame calculated can be the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above-mentioned N6 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N8个历史音频帧的参考线性预测效率；获取上述当前音频帧的N7个历史音频帧的线性预测效率；计算上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值，上述N7和上述N8为正整数(例如上述N7和上述N8可等于1、2、3或其它值)，上述第六统计值为上述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N71个历史音频帧为上述N7个历史音频帧的子集(上述N71小于或等于上述N7)。其中，上述N7个历史音频帧可以是上述当前音频帧的任意N7个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N7个历史音频帧。上述N7个历史音频帧中除上述N71个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N71个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。上述N81个历史音频帧为上述N8个历史音频帧的子集(上述N81小于或等于上述N8)。其中，上述N8个历史音频帧可以是上述当前音频帧的任意N8个历史音频帧，或可以是时间域上与上述当前音频帧相邻的N8个历史音频帧。上述N8个历史音频帧中除上述N81个历史音频帧中之外的剩余历史音频帧的线性预测效率可为不同于上述N81个历史音频帧的线性预测效率的其它类型线性预测效率，此处不再详举。上述N7个历史音频帧和上述N8个历史音频帧的交集可为空集或不是空集。其中，计算得到的上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值可为，上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated as follows: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N8 historical audio frames of the current audio frame; obtain the linear prediction efficiency of N7 historical audio frames of the current audio frame; calculate the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames and the sixth statistical value of the short-time linear prediction efficiency of the current audio frame, where N7 and N8 are positive integers (for example, N7 and N8 can be equal to 1, 2, 3 or other values), the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame, and the linear prediction efficiency of each historical audio frame in the N71 historical audio frames is the following linear prediction efficiency. At least one of the following efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; the reference linear prediction efficiency of each of the N81 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each of the above historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the above historical audio frames; wherein, the reference comprehensive linear prediction efficiency of each of the above historical audio frames is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each of the above historical audio frames; the above N71 historical audio frames are a subset of the above N7 historical audio frames (the above N71 is less than or equal to the above N7); wherein, the above N7 historical audio frames can be any N7 historical audio frames of the above current audio frame, or can be N7 historical audio frames adjacent to the above current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the above-mentioned N7 historical audio frames except the above-mentioned N71 historical audio frames may be other types of linear prediction efficiency different from the linear prediction efficiency of the above-mentioned N71 historical audio frames, which will not be listed in detail here. The above-mentioned N81 historical audio frames are a subset of the above-mentioned N8 historical audio frames (the above-mentioned N81 is less than or equal to the above-mentioned N8). Among them, the above-mentioned N8 historical audio frames may be any N8 historical audio frames of the above-mentioned current audio frame, or may be N8 historical audio frames adjacent to the above-mentioned current audio frame in the time domain. The linear prediction efficiency of the remaining historical audio frames in the above-mentioned N8 historical audio frames except the above-mentioned N81 historical audio frames may be other types of linear prediction efficiency different from the linear prediction efficiency of the above-mentioned N81 historical audio frames, which will not be listed in detail here. The intersection of the above-mentioned N7 historical audio frames and the above-mentioned N8 historical audio frames may be an empty set or not an empty set. Among them, the sixth statistical value of the calculated linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明一些实施例中，音频帧(如当前音频帧或当前音频帧的历史音频帧)的线性预测效率(如长时线性预测效率、短时线性预测效率)可用于表示该音频帧能够被进行线性预测的程度。其中，音频帧(如当前音频帧或者当前音频帧的历史音频帧)的线性预测结果指该音频帧的线性预测值。音频帧(如当前音频帧或当前音频帧的历史音频帧)的线性预测效率(例如长时线性预测效率、短时线性预测效率)越高，则表示该音频帧能够被进行线性预测的程度越高。In some embodiments of the present invention, the linear prediction efficiency (e.g., long-term linear prediction efficiency, short-term linear prediction efficiency) of an audio frame (e.g., a current audio frame or a historical audio frame of the current audio frame) may be used to indicate the degree to which the audio frame can be linearly predicted. The linear prediction result of an audio frame (e.g., a current audio frame or a historical audio frame of the current audio frame) refers to the linear prediction value of the audio frame. A higher linear prediction efficiency (e.g., long-term linear prediction efficiency, short-term linear prediction efficiency) of an audio frame (e.g., a current audio frame or a historical audio frame of the current audio frame) indicates a higher degree of linear prediction for the audio frame.

其中，在本发明的一些实施例中，上述估计得到当前音频帧的短时线性预测效率可以包括：基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率。In some embodiments of the present invention, the estimating and obtaining the short-time linear prediction efficiency of the current audio frame may include: obtaining the short-time linear prediction efficiency of the current audio frame based on a linear prediction residual of the current audio frame.

在本发明的一些实施例中，上述基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率，例如包括：计算当前音频帧进行短时线性预测前后的能量变化率，其中，计算出的上述能量变化率为当前音频帧的短时线性预测效率，或者，当前音频帧的短时线性预测效率基于计算出的上述能量变化率变换得到，其中，上述当前音频帧进行短时线性预测后的能量为上述当前音频帧的线性预测残差的能量。例如，能量变化率与当前音频帧的短时线性预测效率之间可具有映射关系，可基于能量变化率与当前音频帧的短时线性预测效率之间的映射关系，得到与计算出的上述能量变化率具有映射关系的当前音频帧的短时线性预测效率。一般来说，当前音频帧进行短时线性预测前后的能量变化率越大，表示当前音频帧的短时线性预测效率越高。In some embodiments of the present invention, obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame may, for example, include calculating an energy change rate before and after performing short-time linear prediction on the current audio frame, wherein the calculated energy change rate is the short-time linear prediction efficiency of the current audio frame. Alternatively, the short-time linear prediction efficiency of the current audio frame is obtained by transforming the calculated energy change rate, wherein the energy of the current audio frame after performing short-time linear prediction is the energy of the linear prediction residual of the current audio frame. For example, a mapping relationship may exist between the energy change rate and the short-time linear prediction efficiency of the current audio frame. Based on the mapping relationship between the energy change rate and the short-time linear prediction efficiency of the current audio frame, the short-time linear prediction efficiency of the current audio frame having a mapping relationship with the calculated energy change rate may be obtained. Generally speaking, a greater energy change rate before and after performing short-time linear prediction on the current audio frame indicates a higher short-time linear prediction efficiency of the current audio frame.

例如，上述当前音频帧进行短时线性预测前后的能量变化率，可为上述当前音频帧进行短时线性预测前的能量与上述当前音频帧的线性预测残差的能量的比值或比值的倒数。一般来说，上述当前音频帧进行短时线性预测前的能量除以上述当前音频帧的线性预测残差的能量得到的比值越大，表示当前音频帧的短时线性预测效率越高。For example, the energy change rate of the current audio frame before and after short-time linear prediction may be a ratio of the energy of the current audio frame before short-time linear prediction to the energy of the linear prediction residual of the current audio frame, or the inverse of the ratio. Generally speaking, a larger ratio of the energy of the current audio frame before short-time linear prediction divided by the energy of the linear prediction residual of the current audio frame indicates a higher short-time linear prediction efficiency for the current audio frame.

在本发明的一些实施例中，上述估计得到当前音频帧的长时线性预测效率可包括：根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，上述相关性为当前音频帧的长时线性预测效率，或者当前音频帧的长时线性预测效率基于上述变换得到。其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差(例如，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某一帧历史音频帧的线性预测残差，或者，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测残差)，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励(例如，上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某一帧历史音频帧的线性预测激励，或者上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测激励)。举例来说，例如相关性与音频帧的长时线性预测效率之间具有映射关系，可基于相关性与音频帧的长时线性预测效率之间的映射关系，得到与计算出的上述相关性具有映射关系的上述当前音频帧的长时线性预测效率。In some embodiments of the present invention, the above-mentioned estimation of the long-time linear prediction efficiency of the current audio frame may include: obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, the above-mentioned correlation is the long-time linear prediction efficiency of the current audio frame, or the long-time linear prediction efficiency of the current audio frame is obtained based on the above-mentioned transformation. In which, the above-mentioned first historical linear prediction signal is the first historical linear prediction excitation or the first historical linear prediction residual; the above-mentioned first historical linear prediction residual is the linear prediction residual of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction residual can be the linear prediction residual of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction residual can be the linear prediction residual of a part of the continuous audio signal of two adjacent historical audio frames of the above-mentioned current audio frame, and being the current audio frame), and the above-mentioned first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a part of the continuous audio signal of two adjacent historical audio frames of the current audio frame). For example, there is a mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame. Based on the mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame, the long-term linear prediction efficiency of the current audio frame having a mapping relationship with the calculated correlation can be obtained.

其中，根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性的方式可以是多种多样的。There are various ways to obtain the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal.

例如，上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性可以包括：计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性。For example, obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal may include: calculating the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

或者，上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性可包括：将当前音频帧的线性预测残差乘以增益因子以得到上述当前音频帧的增益线性预测残差，计算得到上述当前音频帧的增益线性预测残差与第一历史线性预测信号之间的相关性，其中，计算得到的上述当前音频帧的增益线性预测残差与上述第一历史线性预测信号之间的相关性，为上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性。Alternatively, obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame may include: multiplying the linear prediction residual of the current audio frame by a gain factor to obtain the gain linear prediction residual of the current audio frame, and calculating the correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the calculated correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

或者，上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，可以包括：将第一历史线性预测信号乘以增益因子以得到增益后的第一历史线性预测信号，计算得到上述当前音频帧的线性预测残差与上述增益后的第一历史线性预测信号之间的相关性，其中，计算得到的上述当前音频帧的线性预测残差与上述增益后的第一历史线性预测信号之间的相关性，为上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性。Alternatively, obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal may include: multiplying the first historical linear prediction signal by a gain factor to obtain the gained first historical linear prediction signal, and calculating the correlation between the linear prediction residual of the current audio frame and the gained first historical linear prediction signal, wherein the calculated correlation between the linear prediction residual of the current audio frame and the gained first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

其中，上述第一历史线性预测激励或上述第一历史线性预测残差可基于上述当前音频帧的基音确定。例如，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它至少1个历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。例如，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它至少1个历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。The first historical linear prediction excitation or the first historical linear prediction residual may be determined based on the pitch of the current audio frame. For example, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction excitations and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction excitation and the linear prediction residual of the current audio frame. For example, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction residuals and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction residual and the linear prediction residual of the current audio frame.

一般来说，上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性越大，表示上述当前音频帧的长时线性预测效率越高。Generally speaking, a greater correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal indicates a higher long-term linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，上述相关性例如为时域上的互相关函数值和/或频域上的互相关函数值，或者上述相关性可为时域上的失真和/或频域上的失真(其中，频域上的失真亦可称之为谱失真)。In some embodiments of the present invention, the above-mentioned correlation is, for example, a cross-correlation function value in the time domain and/or a cross-correlation function value in the frequency domain, or the above-mentioned correlation may be a distortion in the time domain and/or a distortion in the frequency domain (wherein the distortion in the frequency domain may also be referred to as spectral distortion).

其中，在本发明的一些实施例中，上述频域上的失真可在频域上的K1个频点的失真的和值或加权和值，或者上述频域上的失真可为在频域上的K2个子带上的失真的和值或加权和值，上述K1和上述K2为正整数。In some embodiments of the present invention, the above-mentioned distortion in the frequency domain may be the sum or weighted sum of the distortions of K1 frequency points in the frequency domain, or the above-mentioned distortion in the frequency domain may be the sum or weighted sum of the distortions on K2 sub-bands in the frequency domain, and the above-mentioned K1 and the above-mentioned K2 are positive integers.

一般来说，上述当前音频帧的线性预测残差与上述第一历史线性预测信号在时域上的互相关函数值越大，则可表示上述当前音频帧的长时线性预测效率越高。一般来说，上述当前音频帧的线性预测残差与上述第一历史线性预测信号在频域上的互相关函数值越大，可表示上述当前音频帧的长时线性预测效率越高。一般来说，上述当前音频帧的线性预测残差与上述第一历史线性预测信号在频域上的失真越小，表示上述当前音频帧的长时线性预测效率越高。一般来说，上述当前音频帧的线性预测残差与上述第一历史线性预测信号在时域上的失真越小，表示上述当前音频帧的长时线性预测效率越高。Generally speaking, the larger the cross-correlation function value between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the time domain, the higher the long-term linear prediction efficiency of the current audio frame. Generally speaking, the larger the cross-correlation function value between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the frequency domain, the higher the long-term linear prediction efficiency of the current audio frame. Generally speaking, the smaller the distortion between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the frequency domain, the higher the long-term linear prediction efficiency of the current audio frame. Generally speaking, the smaller the distortion between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the time domain, the higher the long-term linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，上述失真的加权和值所对应的加权系数为反映心理声学模型的感知加权系数。当然，上述失真的加权和值所对应的加权系数亦可为基于实际需要设定的其它加权系数。其中，测试发现，使用感知加权系数有利于使得计算出的失真更加符合主观的质量，从而有利于提升性能。In some embodiments of the present invention, the weighting coefficient corresponding to the weighted sum of the distortion values is a perceptual weighting coefficient reflecting a psychoacoustic model. Of course, the weighting coefficient corresponding to the weighted sum of the distortion values can also be other weighting coefficients set based on actual needs. Testing has found that using a perceptual weighting coefficient helps make the calculated distortion more consistent with subjective quality, thereby improving performance.

在本发明的一些实施例中，上述第一历史线性预测激励可为利用基于线性预测的编码方式对上述当前音频帧的历史音频帧进行音频编码而产生的线性预测激励。In some embodiments of the present invention, the first historical linear prediction excitation may be a linear prediction excitation generated by audio encoding a historical audio frame of the current audio frame using a linear prediction-based encoding method.

在本发明的一些实施例中，上述第一历史线性预测残差，可基于上述当前音频帧的第一历史音频帧的时域信号和上述第一历史音频帧的线性预测系数得到，其中，上述第一历史音频帧的线性预测编码系数为量化后的线性预测系数或未经量化的线性预测系数。其中，由于实际编解码过程中对最终质量起作用的通常都是量化后的线性预测系数，因此使用量化后的线性预测系数计算线性预测残差有利于使计算出的相关性更准确。In some embodiments of the present invention, the first historical linear prediction residual may be obtained based on a time domain signal of a first historical audio frame of the current audio frame and linear prediction coefficients of the first historical audio frame, wherein the linear prediction coding coefficients of the first historical audio frame are quantized linear prediction coefficients or unquantized linear prediction coefficients. Since the quantized linear prediction coefficients generally contribute to the final quality in actual encoding and decoding processes, using the quantized linear prediction coefficients to calculate the linear prediction residual facilitates more accurate calculated correlation.

在本发明的一些实施例中，上述当前音频帧的线性预测残差可基于上述当前音频帧的时域信号和上述当前音频帧的线性预测系数得到，其中，上述当前音频帧的线性预测系数可为量化后的线性预测系数或者未经量化的线性预测系数。其中，由于实际编解码过程中对最终质量起作用的通常都是量化后的线性预测系数，因此使用量化后的线性预测系数计算线性预测残差有利于使计算出的相关性更准确。In some embodiments of the present invention, the linear prediction residual of the current audio frame may be obtained based on the time domain signal of the current audio frame and the linear prediction coefficient of the current audio frame, wherein the linear prediction coefficient of the current audio frame may be a quantized linear prediction coefficient or an unquantized linear prediction coefficient. Since the quantized linear prediction coefficients generally contribute to the final quality in actual encoding and decoding processes, using the quantized linear prediction coefficients to calculate the linear prediction residual facilitates more accurate calculated correlation.

在本发明的一些实施例中，上述第一历史线性预测激励可为自适应码本激励与固定码本激励的叠加激励，或者上述第一历史线性预测激励可为自适应码本激励。或上述第一历史线性预测激励可为其它类型的码本激励。In some embodiments of the present invention, the first historical linear prediction excitation may be a superposition of an adaptive codebook excitation and a fixed codebook excitation, or the first historical linear prediction excitation may be an adaptive codebook excitation. Alternatively, the first historical linear prediction excitation may be another type of codebook excitation.

可以理解的是，在本发明各实施例中，音频帧(例如当前音频帧或时域上位于当前音频帧之前或之后的音频帧)的历史音频帧是指，在同一个音频流中时域上位于该音频帧之前的音频帧。可见历史音频帧是相对的概念，例如假设同一个音频流之中包含的4个音频帧在时域上的先后顺序为音频帧y1—>音频帧y2—>音频帧y3—>音频帧y4，那么音频帧y1、音频帧y2、音频帧y3都是音频帧y4的历史音频帧，音频帧y1和音频帧y2都是音频帧y3的历史音频帧，而音频帧y1是音频帧y2的历史音频帧。可以理解的是，音频帧y4不是音频帧y3的历史音频帧、音频帧y4也不是音频帧y2和音频帧y1历史音频帧，其它场景可以此类推。It is understandable that in each embodiment of the present invention, the historical audio frame of an audio frame (such as the current audio frame or an audio frame that is located before or after the current audio frame in the time domain) refers to the audio frame that is located before the audio frame in the time domain in the same audio stream. It can be seen that the historical audio frame is a relative concept. For example, assuming that the order of the four audio frames contained in the same audio stream in the time domain is audio frame y1->audio frame y2->audio frame y3->audio frame y4, then audio frame y1, audio frame y2, and audio frame y3 are all historical audio frames of audio frame y4, audio frame y1 and audio frame y2 are both historical audio frames of audio frame y3, and audio frame y1 is the historical audio frame of audio frame y2. It is understandable that audio frame y4 is not the historical audio frame of audio frame y3, nor is audio frame y4 the historical audio frame of audio frame y2 and audio frame y1, and other scenarios can be deduced by analogy.

为便于更好的理解本发明实施例的上述技术方面，下面通过一些具体的应用场景进行举例介绍。To facilitate a better understanding of the above technical aspects of the embodiments of the present invention, some specific application scenarios are given as examples below.

首先请参见图2，图2为本发明实施例提供的一种音频编码方法的流程示意图。其中，如图2所示，本发明实施例提供的一种音频编码方法可包括以下内容：First, please refer to Figure 2, which is a flowchart of an audio encoding method provided by an embodiment of the present invention. As shown in Figure 2, an audio encoding method provided by an embodiment of the present invention may include the following contents:

201、判断当前音频帧是否为语音音频帧。201. Determine whether the current audio frame is a voice audio frame.

若是，则执行步骤202。If so, execute step 202.

若否、则执行步骤203。If not, execute step 203.

202、基于语音编码方式对上述当前音频帧进行音频编码。202. Perform audio encoding on the current audio frame based on a speech encoding method.

在本发明一些实施例中，若当前音频帧为语音音频帧，可基于代数码激励线性预测(ACELP，Algebraic Code Excited Linear Prediction)编码对上述当前音频帧进行音频编码。例如，若当前音频帧为语音音频帧，则可将当前音频帧输入到ACELP子编码器中进行进行音频编码。其中，ACELP子编码器为采用ACELP编码的子编码。In some embodiments of the present invention, if the current audio frame is a speech audio frame, audio encoding may be performed on the current audio frame based on Algebraic Code Excited Linear Prediction (ACELP) encoding. For example, if the current audio frame is a speech audio frame, the current audio frame may be input into an ACELP subcoder for audio encoding. The ACELP subcoder is a subcoder that uses ACELP encoding.

203、估计当前音频帧的参考线性预测效率。203. Estimate a reference linear prediction efficiency of the current audio frame.

其中，可以采用多种算法来估计当前音频帧的参考线性预测效率。Therein, a variety of algorithms can be used to estimate the reference linear prediction efficiency of the current audio frame.

可以理解，参考线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x1(x1为正数)。其中，参考长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x2(x2为正数)。参考短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x3(x3为正数)。其中，参考综合线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x4(x4为正数)。其中，长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x5(x5为正数)。短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x6(x6为正数)。其中，x1、x2、x3、x4、x5或x6例如可为0.5、0.8或1.5、2、5、10、50、100或其它正数It can be understood that the value range of the reference linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x1 (x1 is a positive number). Among them, the value range of the reference long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x2 (x2 is a positive number). The value range of the reference short-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x3 (x3 is a positive number). Among them, the value range of the reference comprehensive linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x4 (x4 is a positive number). Among them, the value range of the long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x5 (x5 is a positive number). The short-time linear prediction efficiency may range from 0 to 1 (i.e., 0% to 100%), or may range from 0 to x6 (x6 is a positive number). Here, x1, x2, x3, x4, x5, or x6 may be, for example, 0.5, 0.8, 1.5, 2, 5, 10, 50, 100, or other positive numbers.

204、确定与估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式。204. Determine an audio coding mode that matches the estimated reference linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，音频编码方式与音频帧的参考线性预测效率之间可以具有映射关系，例如，不同的音频编码方式可对应不同的参考线性预测效率。例如可在至少两个音频编码方式中，确定与估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式。In some embodiments of the present invention, a mapping relationship may be established between audio coding modes and reference linear prediction efficiencies of audio frames. For example, different audio coding modes may correspond to different reference linear prediction efficiencies. For example, an audio coding mode that matches the estimated reference linear prediction efficiency of the current audio frame may be determined from at least two audio coding modes.

其中，与估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式可能是变换激励编码(TCX，Transform Coded Excitation)、也可能是一般音频编码(GAC，Generic Audio Coding)。其中，GAC例如可以是修正离散余弦变换(Modified DiscreteCosine Transform)编码。The audio coding method that matches the estimated reference linear prediction efficiency of the current audio frame may be Transform Coded Excitation (TCX) or Generic Audio Coding (GAC). GAC may be, for example, Modified Discrete Cosine Transform (MDCT) coding.

205、按照确定出的上述音频编码方式对上述当前音频帧进行音频编码。205. Perform audio encoding on the current audio frame according to the determined audio encoding method.

可以看出，本实施例的技术方案中，首先判断出当前音频帧是否为语音音频帧，若当前音频帧为语音音频帧，则基于语音编码方式对上述当前音频帧进行音频编码。若当前音频帧为非语音音频帧，则先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。As can be seen, in the technical solution of this embodiment, it is first determined whether the current audio frame is a speech audio frame. If the current audio frame is a speech audio frame, the current audio frame is audio-encoded based on the speech coding mode. If the current audio frame is a non-speech audio frame, the reference linear prediction efficiency of the current audio frame is first estimated; the audio coding mode that matches the current audio frame is determined based on the estimated reference linear prediction efficiency of the current audio frame, and the current audio frame is audio-encoded according to the determined matching audio coding mode. Since the above scheme does not need to perform the operation of fully encoding the current audio frame using each audio coding mode required by the existing closed-loop selection mode during the process of determining the audio coding mode, but instead determines the audio coding mode to be selected based on the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is generally much less than the computational complexity of fully encoding the current audio frame using each audio coding mode, compared to the existing mechanism, the above scheme of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the audio coding overhead.

在本发明的一些实施例中，上述当前音频帧的参考综合线性预测效率例如可为上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率的和值、加权和值(其中，此处加权和值所对应的权值可以根据实际需要进行设定，其中1个权值例如可为0.5、1.、2、3、5、10或者其它值)或平均值。In some embodiments of the present invention, the reference comprehensive linear prediction efficiency of the current audio frame may be, for example, the sum of the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, a weighted sum (wherein, the weight corresponding to the weighted sum here may be set according to actual needs, where one weight may be, for example, 0.5, 1., 2, 3, 5, 10 or other values) or an average value.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式可以包括：若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may include: if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction; if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，上述第一音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。其中，不同的线性预测效率区间对应于不同的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～30％、30％～70％和70％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～30％(即第一线性预测效率区间为线性预测效率区间0～30％)，可确定线性预测效率区间0～30％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间30％～70％(即第一线性预测效率区间为线性预测效率区间30％～70％)，可以确定线性预测效率区间30％～70％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, then determining an audio coding mode that matches the reference linear prediction efficiency of the current audio frame includes: determining a first linear prediction efficiency interval within which the reference long-term linear prediction efficiency of the current audio frame falls, and determining a first audio coding mode that has a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between linear prediction efficiency intervals and audio coding modes based on linear prediction, wherein the first audio coding mode is an audio coding mode that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding mode is an audio coding mode based on linear prediction or an audio coding mode not based on linear prediction. Different linear prediction efficiency intervals correspond to different audio coding modes. For example, assuming there are three linear prediction efficiency intervals, which can be 0-30%, 30%-70%, and 70%-100%, respectively, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-30% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 0-30%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-30% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 30%-70% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 30%-70%), the audio coding mode corresponding to the linear prediction efficiency interval of 30%-70% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第二线性预测效率区间具有映射关系的第二音频编码方式，其中，上述第二音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第二音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～40％、40％～60％和60％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～40％(即第二线性预测效率区间为线性预测效率区间0～40％)，则可确定线性预测效率区间0～40％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间40％～60％(即第二线性预测效率区间为线性预测效率区间40％～60％)，确定线性预测效率区间40％～60％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining the second linear prediction efficiency interval in which the reference short-time linear prediction efficiency of the current audio frame falls, and determining a second audio encoding method having a mapping relationship with the second linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, wherein the second audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio encoding method is an audio encoding method based on linear prediction or an audio encoding method not based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which can be 0-40%, 40%-60%, and 60%-100%, respectively. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-40% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 0-40%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-40% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 40%-60% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 40%-60%), the audio coding mode corresponding to the linear prediction efficiency interval of 40%-60% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式，包括：确定上述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第三线性预测效率区间具有映射关系的第三音频编码方式，其中，上述第三音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第三音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～50％、50％～80％和80％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～50％(即第三线性预测效率区间为线性预测效率区间0～50％)，则可确定线性预测效率区间0～50％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间50～80％(即第三线性预测效率区间为线性预测效率区间50％～80％)，确定线性预测效率区间50％～80％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame includes: determining the third linear prediction efficiency interval in which the reference comprehensive linear prediction efficiency of the current audio frame falls, and determining a third audio encoding method that has a mapping relationship with the third linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, wherein the third audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio encoding method is an audio encoding method based on linear prediction or an audio encoding method not based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which can be 0-50%, 50%-80%, and 80%-100%, respectively. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-50% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 0-50%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-50% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 50%-80% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 50%-80%), the audio coding mode corresponding to the linear prediction efficiency interval of 50%-80% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

或者，or,

上述当前音频帧的参考长时线性预测效率通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N1个历史音频帧的线性预测效率；计算上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值，其中，上述N1为正整数，上述第一统计值为上述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N11个历史音频帧为上述N1个历史音频帧的子集。其中，计算得到的上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值例如可以是，上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference long-time linear prediction efficiency of the current audio frame is estimated in the following manner: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N1 historical audio frames of the current audio frame; calculating the linear prediction efficiencies of the N1 historical audio frames and the first statistical value of the long-time linear prediction efficiency of the current audio frame, wherein the N1 is a positive integer, and the first statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N11 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the historical audio frames, and the N11 historical audio frames are a subset of the N1 historical audio frames. Among them, the first statistical value of the calculated linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame can be, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，上述当前音频帧的参考长时线性预测效率例如可通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N2个历史音频帧的参考线性预测效率；计算上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值，其中，上述N2为正整数，上述第二统计值为上述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N21个历史音频帧为上述N2个历史音频帧的子集。计算得到的上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值例如为，上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the reference long-time linear prediction efficiency of the current audio frame can be estimated, for example, in the following manner: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N2 historical audio frames of the current audio frame; calculating the reference linear prediction efficiencies of the N2 historical audio frames and the second statistical value of the long-time linear prediction efficiency of the current audio frame, wherein the N2 is a positive integer, and the second statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N21 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N21 historical audio frames are a subset of the N2 historical audio frames. The second statistical value of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，上述当前音频帧的参考长时线性预测效率例如可通过如下方式估计得到：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N4个历史音频帧的参考线性预测效率，获取上述当前音频帧的N3个历史音频帧的线性预测效率；计算上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值，其中，上述N3和上述N4为正整数，上述第三统计值为上述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述N31个历史音频帧为上述N3个历史音频帧的子集，上述N41个历史音频帧为上述N4个历史音频帧的子集，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到。上述N3个历史音频帧和上述N4个历史音频帧的交集可为空集或不是空集。计算得到的上述上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值例如为，上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the reference long-time linear prediction efficiency of the current audio frame can be estimated, for example, by: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N4 historical audio frames of the current audio frame, and obtaining the linear prediction efficiencies of N3 historical audio frames of the current audio frame; calculating a third statistical value of the linear prediction efficiencies of the N3 historical audio frames, the reference linear prediction efficiencies of the N4 historical audio frames, and the long-time linear prediction efficiency of the current audio frame, wherein the N3 and the N4 are positive integers, the third statistical value is the reference long-time linear prediction efficiency of the current audio frame, and the linear prediction efficiency of each of the N31 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: long-time linear prediction Efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each historical audio frame in the N41 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the N31 historical audio frames are a subset of the N3 historical audio frames, the N41 historical audio frames are a subset of the N4 historical audio frames, the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, and the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame. The intersection of the N3 historical audio frames and the N4 historical audio frames may be an empty set or not an empty set. The third statistical value calculated for the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N5个历史音频帧的线性预测效率；计算上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值，其中，上述N5为正整数，上述第四统计值为上述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N51个历史音频帧为上述N5个历史音频帧的子集。其中，计算得到的上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值可为，上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated in the following manner: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N5 historical audio frames of the current audio frame; calculating the fourth statistical value of the linear prediction efficiencies of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein the N5 is a positive integer, and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, and the comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the historical audio frames, and the N51 historical audio frames are a subset of the N5 historical audio frames. Among them, the fourth statistical value of the calculated linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N6个历史音频帧的参考线性预测效率；计算上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值，上述N6为正整数，上述第五统计值为上述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N61个历史音频帧为上述N6个历史音频帧的子集。其中，计算得到的上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值可为，上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N6 historical audio frames of the current audio frame; calculate the fifth statistical value of the reference linear prediction efficiencies of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, where N6 is a positive integer and the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N61 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N61 historical audio frames are a subset of the N6 historical audio frames. Among them, the fifth statistical value of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted average of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，or,

上述当前音频帧的参考短时线性预测效率可通过如下方式估计得到：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N8个历史音频帧的参考线性预测效率；获取上述当前音频帧的N7个历史音频帧的线性预测效率；计算上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值，上述N7和上述N8为正整数，上述第六统计值为上述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N71个历史音频帧为上述N7个历史音频帧的子集，上述N81个历史音频帧为上述N8个历史音频帧的子集。上述N7个历史音频帧和上述N8个历史音频帧的交集可为空集或不是空集。其中，计算得到的上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值可为，上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。The reference short-time linear prediction efficiency of the current audio frame can be estimated in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N8 historical audio frames of the current audio frame; obtain the linear prediction efficiency of N7 historical audio frames of the current audio frame; calculate the sixth statistical value of the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N7 and N8 are positive integers, the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame, and the linear prediction efficiency of each historical audio frame in the N71 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: long-time linear prediction efficiency, Short-time linear prediction efficiency and comprehensive linear prediction efficiency. The reference linear prediction efficiency of each of the N81 historical audio frames is at least one of the following linear prediction efficiencies for each of the above-mentioned historical audio frames: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency. The comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the above-mentioned historical audio frames, wherein the reference comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each of the above-mentioned historical audio frames. The above-mentioned N71 historical audio frames are a subset of the above-mentioned N7 historical audio frames, and the above-mentioned N81 historical audio frames are a subset of the above-mentioned N8 historical audio frames. The intersection of the above-mentioned N7 historical audio frames and the above-mentioned N8 historical audio frames may be an empty set or a non-empty set. Among them, the sixth statistical value of the calculated linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明的一些实施例中，上述估计得到当前音频帧的长时线性预测效率可包括：根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，上述相关性为当前音频帧的长时线性预测效率，或者当前音频帧的长时线性预测效率基于上述变换得到。其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励。举例来说，例如相关性与音频帧的长时线性预测效率之间具有映射关系，可基于相关性与音频帧的长时线性预测效率之间的映射关系，得到与计算出的上述相关性具有映射关系的上述当前音频帧的长时线性预测效率。In some embodiments of the present invention, estimating the long-term linear prediction efficiency of the current audio frame may include: obtaining a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the transformation. The first historical linear prediction signal is a first historical linear prediction excitation or a first historical linear prediction residual; the first historical linear prediction residual is a linear prediction residual of historical audio frames of the current audio frame; and the first historical linear prediction excitation is a linear prediction excitation of historical audio frames of the current audio frame. For example, if there is a mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame, based on the mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame, the long-term linear prediction efficiency of the current audio frame having a mapping relationship with the calculated correlation can be obtained.

在本发明的一些实施例中，可利用分析滤波器A(Z)对当前音频帧的时域信号进行滤波，得到当前音频帧的线性预测残差R，其中，滤波器A(Z)的滤波器系数为当前音频帧的线性预测系数。In some embodiments of the present invention, an analysis filter A(Z) may be used to filter the time domain signal of the current audio frame to obtain a linear prediction residual R of the current audio frame, where the filter coefficients of the filter A(Z) are the linear prediction coefficients of the current audio frame.

具体可如下面公式1所示：The specific formula is as follows:

其中，公式1中的S(i)表示当前音频帧的第i个时域样点的信号，a(k)表示当前音频帧的第k阶线性预测系数，M为滤波器总阶数，上述N为当前音频帧的时域长度，R(i)表示当前音频帧的第i个时域样点的线性预测残差。Wherein, S(i) in Formula 1 represents the signal of the i-th time domain sample point of the current audio frame, a(k) represents the k-th order linear prediction coefficient of the current audio frame, M is the total order of the filter, the above N is the time domain length of the current audio frame, and R(i) represents the linear prediction residual of the i-th time domain sample point of the current audio frame.

可以理解，任何1个音频帧(如当前音频帧或当前音频帧的历史音频帧)的线性预测残差均可通过上述举例方式得到。It can be understood that the linear prediction residual of any audio frame (such as the current audio frame or the historical audio frame of the current audio frame) can be obtained through the above example method.

其中，例如可以缓存每个音频帧或者部分音频帧的线性预测激励或线性预测残差，以便作为在可能的下一音频帧将可能用到的历史线性预测激励或历史线性预测残差，以计算其与下一音频帧的线性预测残差的相关性。For example, the linear prediction excitation or linear prediction residual of each audio frame or part of the audio frame can be cached so as to serve as the historical linear prediction excitation or historical linear prediction residual that may be used in the possible next audio frame to calculate its correlation with the linear prediction residual of the next audio frame.

或者，上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，可包括：将当前音频帧的线性预测残差乘以增益因子以得到上述当前音频帧的增益线性预测残差，计算得到上述当前音频帧的增益线性预测残差与第一历史线性预测信号之间的相关性，其中，计算得到的上述当前音频帧的增益线性预测残差与上述第一历史线性预测信号之间的相关性，为上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性。Alternatively, obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame may include: multiplying the linear prediction residual of the current audio frame by a gain factor to obtain the gain linear prediction residual of the current audio frame, and calculating the correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the calculated correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

在本发明的一些实施例中，上述相关性例如为时域上的互相关函数值和/或频域上的互相关函数值。In some embodiments of the present invention, the above correlation is, for example, a cross-correlation function value in the time domain and/or a cross-correlation function value in the frequency domain.

其中，在本发明一可选的实施方式中，计算频域上互相关函数值时，可对当前音频帧的线性预测残差进行时频变换(如离散傅利叶变换(DFT，Discrete Fourier Transform)或离散余弦变换(DCT:discrete Cosine Transform))以得到当前音频帧的线性预测残差的频域信号，可对第一历史线性预测信号进行时频变换(例如DFT或DCT)以得到第一历史线性预测信号的频域信号。下面举例一种相关性计算公式，如下公式1所示：In an optional embodiment of the present invention, when calculating the cross-correlation function value in the frequency domain, a time-frequency transform (such as a discrete Fourier transform (DFT) or a discrete cosine transform (DCT)) may be performed on the linear prediction residual of the current audio frame to obtain a frequency domain signal of the linear prediction residual of the current audio frame, and a time-frequency transform (such as a DFT or DCT) may be performed on the first historical linear prediction signal to obtain a frequency domain signal of the first historical linear prediction signal. An example correlation calculation formula is provided below, as shown in Formula 1:

其中，上述公式2中C表示当前音频帧的线性预测残差与第一历史线性预测信号在时域上的互相关函数值，R(i)表示当前音频帧的第i个时域样点的线性预测残差，E(i)表示第一历史线性预测信号的第i个时域样点的信号，N表示一个音频帧的时域样点总数。或者，上述公式2中C表示当前音频帧的线性预测残差与第一历史线性预测信号在频域上的互相关函数值，R(i)表示当前音频帧的线性预测残差的第i个频谱包络，E(i)表示第一历史线性预测信号的第i个频谱包络的信号，N表示一个音频帧的频谱包络总数。当然，本发明也不限制其它的相关性计算方法。Wherein, in the above formula 2, C represents the cross-correlation function value between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the time domain, R(i) represents the linear prediction residual of the i-th time domain sample of the current audio frame, E(i) represents the signal of the i-th time domain sample of the first historical linear prediction signal, and N represents the total number of time domain samples of an audio frame. Alternatively, in the above formula 2, C represents the cross-correlation function value between the linear prediction residual of the current audio frame and the first historical linear prediction signal in the frequency domain, R(i) represents the i-th spectral envelope of the linear prediction residual of the current audio frame, E(i) represents the signal of the i-th spectral envelope of the first historical linear prediction signal, and N represents the total number of spectral envelopes of an audio frame. Of course, the present invention does not limit other correlation calculation methods.

在本发明的另一个在频域计算相关性的实施例中，为了更好的克服基音的抖动，可以在计算互相关之前先将R(i)或E(i)中的其中一个信号进行移位处理，例如如下公式3所示：In another embodiment of the present invention in which correlation is calculated in the frequency domain, in order to better overcome pitch jitter, one of the signals R(i) or E(i) may be shifted before calculating the cross-correlation, as shown in the following formula 3:

其中，在公式2的基础中，上述公式3进一步对E(i)进行移位处理。j表移位量，j可为整数，而对R(i)进行移位处理的方式与之类似。Based on Formula 2, Formula 3 further performs a shift process on E(i). j represents the shift amount and can be an integer. The shift process on R(i) is similar.

在本发明的另一些实施例中，上述相关性例如可为时域上的失真和/或频域上的失真。In other embodiments of the present invention, the above correlation may be, for example, distortion in the time domain and/or distortion in the frequency domain.

其中，在本发明一可选的实施方式中，在计算频域的失真时，可对当前音频帧的线性预测残差进行时频变换(例如DFT或DCT)以得到当前音频帧的线性预测残差的频域信号，可对第一历史线性预测信号进行时频变换(例如DFT或DCT)以得到第一历史线性预测信号的频域信号。计算当前音频帧的线性预测残差的频域信号与第一历史线性预测信号的频域信号之间的失真D。In an optional embodiment of the present invention, when calculating the frequency domain distortion, a time-frequency transform (e.g., DFT or DCT) may be performed on the linear prediction residual of the current audio frame to obtain a frequency domain signal of the linear prediction residual of the current audio frame, and a time-frequency transform (e.g., DFT or DCT) may be performed on the first historical linear prediction signal to obtain a frequency domain signal of the first historical linear prediction signal. A distortion D is calculated between the frequency domain signal of the linear prediction residual of the current audio frame and the frequency domain signal of the first historical linear prediction signal.

其中，失真D越小，则表示相关性越强，长时线性预测效率越高。下面举例一种失真D计算公式，如公式4所示：The smaller the distortion D, the stronger the correlation and the higher the efficiency of long-term linear prediction. The following example shows a calculation formula for distortion D, as shown in Formula 4:

其中，公式4中的N可表示一个音频帧的时域样点总数，R(k)表示当前音频帧的第k个时域样点的线性预测残差，E(k)表示第一历史线性预测信号的第k个时域样点的信号。或者，公式4中的N也可表示一个音频帧的频谱包络总数，R(k)表示当前音频帧的线性预测残差的第k个频谱包络，E(k)表示第一历史线性预测信号的第k个频谱包络。In Formula 4, N may represent the total number of time-domain samples in an audio frame, R(k) represents the linear prediction residual of the k-th time-domain sample of the current audio frame, and E(k) represents the signal of the k-th time-domain sample of the first historical linear prediction signal. Alternatively, N in Formula 4 may represent the total number of spectral envelopes in an audio frame, R(k) represents the k-th spectral envelope of the linear prediction residual of the current audio frame, and E(k) represents the k-th spectral envelope of the first historical linear prediction signal.

下面举例另两种失真D计算公式，如公式5或公式6所示：The following are two other distortion D calculation formulas, as shown in Formula 5 or Formula 6:

其中，公式5和公式6中的N可表示一个音频帧的时域样点总数，R(k)表示当前音频帧的第k个时域样点的线性预测残差，E(k)表示第一历史线性预测信号的第k个时域样点的信号。或者，公式5和公式6中的N也可表示一个音频帧的频谱包络总数，R(k)表示当前音频帧的线性预测残差的第k个频谱包络，E(k)表示第一历史线性预测信号的第k个频谱包络。In Formulas 5 and 6, N may represent the total number of time-domain samples in an audio frame, R(k) represents the linear prediction residual of the k-th time-domain sample of the current audio frame, and E(k) represents the signal of the k-th time-domain sample of the first historical linear prediction signal. Alternatively, N in Formulas 5 and 6 may represent the total number of spectral envelopes in an audio frame, R(k) represents the k-th spectral envelope of the linear prediction residual of the current audio frame, and E(k) represents the k-th spectral envelope of the first historical linear prediction signal.

其中，公式5和公式6中的G表示增益因子，通过选取合适取值的G，可以使得求得的失真D最小。公式4中将增益因子G施加给了E(k)，公式5中将增益因子G施加给了R(k)。In Formulas 5 and 6, G represents a gain factor. By selecting an appropriate value for G, the distortion D can be minimized. In Formula 4, the gain factor G is applied to E(k), and in Formula 5, the gain factor G is applied to R(k).

下面又举例三种失真D计算公式，如公式7或公式8或公式9所示：The following are three examples of calculation formulas for distortion D, as shown in Formula 7, Formula 8, or Formula 9:

其中，在公式7～公式9中，P(k)为一组加权系数，P(k)可以是一组反映心理声学模型的感知加权系数或其它加权系数。In Formula 7 to Formula 9, P(k) is a set of weighting coefficients. P(k) may be a set of perceptual weighting coefficients or other weighting coefficients reflecting a psychoacoustic model.

其中，公式7～公式9中的N、R(k)、E(k)、G的含义与公式5相同。In Formulas 7 to 9, N, R(k), E(k), and G have the same meanings as in Formula 5.

在本发明的一些实施例中，上述第一历史线性预测激励可为利用基于线性预测的编码方式对上述当前音频帧的历史音频帧s进行音频编码而产生的线性预测激励。In some embodiments of the present invention, the first historical linear prediction excitation may be a linear prediction excitation generated by audio encoding the historical audio frame s of the current audio frame using a linear prediction-based encoding method.

在本发明的一些实施例中，上述第一历史线性预测残差，可基于上述当前音频帧的第一历史音频帧的时域信号和上述第一历史音频帧的线性预测系数得到，其中，上述第一历史音频帧的线性预测编码系数为量化后的线性预测系数或未经量化的线性预测系数。In some embodiments of the present invention, the first historical linear prediction residual can be obtained based on the time domain signal of the first historical audio frame of the current audio frame and the linear prediction coefficient of the first historical audio frame, wherein the linear prediction coding coefficient of the first historical audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

在本发明的一些实施例中，上述当前音频帧的线性预测残差可基于上述当前音频帧的时域信号和上述当前音频帧的线性预测系数得到，其中，上述当前音频帧的线性预测系数可为量化后的线性预测系数或者未经量化的线性预测系数。In some embodiments of the present invention, the linear prediction residual of the current audio frame may be obtained based on the time domain signal of the current audio frame and the linear prediction coefficient of the current audio frame, wherein the linear prediction coefficient of the current audio frame may be a quantized linear prediction coefficient or an unquantized linear prediction coefficient.

在本发明的一些实施例中，上述第一历史线性预测激励可为自适应码本激励与固定码本激励的叠加激励，或者上述第一历史线性预测激励可为自适应码本激励。In some embodiments of the present invention, the first historical linear prediction excitation may be a superposition of an adaptive codebook excitation and a fixed codebook excitation, or the first historical linear prediction excitation may be an adaptive codebook excitation.

下面还提供用于实施上述方案的相关装置。Related devices for implementing the above solutions are also provided below.

参见图3-a，图3-a为本发明的另一实施例提供的一种音频编码器300的结构示意图。Refer to FIG3 - a , which is a schematic structural diagram of an audio encoder 300 provided in another embodiment of the present invention.

其中，时域的音频信号可被以帧为单位输入到本发明实施例提供的音频编码器300之中，经过音频编码器300的编码处理，输入音频帧可被压缩为相对较小的比特流。该比特流可用于存储或传输目的，并可经过一个音频解码器恢复出原始的时域音频帧。The time-domain audio signal can be input into the audio encoder 300 provided in the embodiment of the present invention in units of frames. After encoding by the audio encoder 300, the input audio frame can be compressed into a relatively small bitstream. This bitstream can be used for storage or transmission purposes, and can be restored to the original time-domain audio frame through an audio decoder.

其中，本实施例中的音频编码器300可包括多个子编码器，具体可包括至少1个子编码器是基于线性预测的子编码器(为方便起见，下文中可将基于线性预测的子编码器称为A类子编码器)、至少1个子编码器为非基于线性预测的子编码器(为方便起见，下文中可将非基于线性预测的子编码器称为B类编码器)。Among them, the audio encoder 300 in this embodiment may include multiple sub-encoders, specifically, at least one sub-encoder is a sub-encoder based on linear prediction (for convenience, the sub-encoder based on linear prediction may be referred to as a Class A sub-encoder hereinafter), and at least one sub-encoder is a sub-encoder not based on linear prediction (for convenience, the sub-encoder not based on linear prediction may be referred to as a Class B encoder hereinafter).

如图3-a所示，音频编码器300包括选择器301、A类子编码器302、B类子编码器303和受控选路器304。As shown in FIG3 - a , the audio encoder 300 includes a selector 301 , a class A sub-encoder 302 , a class B sub-encoder 303 and a controlled router 304 .

其中，选择器301用于估计当前音频帧的参考线性预测效率；确定与上述当前音频帧的参考线性预测效率匹配的音频编码器；向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304的当前音频帧输出至与上述当前音频帧的参考线性预测效率匹配的音频编码器(如A类子编码器302或B类子编码器303)。A类子编码器302或B类子编码器303用于对输入的当前音频帧进行音频编码，输出编码音频信号。例如，A类子编码器302可为TCX编码器，B类子编码器302为GAC编码器，例如B类子编码器302可为MDCT编码器。The selector 301 is configured to estimate a reference linear prediction efficiency for the current audio frame, determine an audio encoder that matches the reference linear prediction efficiency for the current audio frame, and send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input to the controlled router 304 to the audio encoder (e.g., the Class A sub-encoder 302 or the Class B sub-encoder 303) that matches the reference linear prediction efficiency for the current audio frame. The Class A sub-encoder 302 or the Class B sub-encoder 303 is configured to perform audio encoding on the input current audio frame and output an encoded audio signal. For example, the Class A sub-encoder 302 may be a TCX encoder, and the Class B sub-encoder 302 may be a GAC encoder. For example, the Class B sub-encoder 302 may be an MDCT encoder.

在本发明的一些实施例中，如图3-b所示，还可进一步在图3-a所示架构的音频编码器300的基础上增加分类器305和子编码器306。In some embodiments of the present invention, as shown in FIG3 - b , a classifier 305 and a sub-encoder 306 may be further added based on the audio encoder 300 of the architecture shown in FIG3 - a .

其中，分类器305用于判断当前音频帧是否为语音音频帧，若音频帧为语音音频帧，则向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304的当前音频帧输出至子编码器306，其中，编码器306为适合编码语音音频帧的子编码器，例如子编码器306为ACELP编码器。编码器306用于对输入的当前音频帧进行音频编码，输出编码音频信号。Classifier 305 is configured to determine whether the current audio frame is a speech audio frame. If so, it sends a routing control signal to controlled router 304 to control controlled router 304 to output the current audio frame input to sub-encoder 306. Encoder 306 is a sub-encoder suitable for encoding speech audio frames, such as an ACELP encoder. Encoder 306 is configured to perform audio encoding on the input current audio frame and output an encoded audio signal.

在本发明的一些实施例中，如图3-c所示，选择器301可以包括：判决单元3013、第一估计单元3011和第二估计单元3022。其中，音频帧的参考线性预测效率包括音频帧的参考长时线性预测效率和参考短时线性预测效率。In some embodiments of the present invention, as shown in FIG3-c , the selector 301 may include a decision unit 3013, a first estimation unit 3011, and a second estimation unit 3022. The reference linear prediction efficiency of the audio frame includes a reference long-time linear prediction efficiency and a reference short-time linear prediction efficiency of the audio frame.

其中，第一估计单元3011用于估计当前音频帧的参考长时线性预测效率。The first estimation unit 3011 is used to estimate the reference long-term linear prediction efficiency of the current audio frame.

第二估计单元3012用于估计当前音频帧的参考短时线性预测效率。The second estimation unit 3012 is configured to estimate a reference short-time linear prediction efficiency of the current audio frame.

其中，判决单元3013，用于若第一估计单元3011估计出的上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或第二估计单元3012估计出的上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器303；若第一估计单元3011估计出的上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或第二估计单元3012估计出的上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器302。The decision unit 3013 is configured to determine that the audio coding method that matches the reference linear prediction efficiency of the current audio frame is a non-linear prediction-based audio coding method if the reference long-time linear prediction efficiency of the current audio frame estimated by the first estimation unit 3011 is less than a first threshold value, and/or the reference short-time linear prediction efficiency of the current audio frame estimated by the second estimation unit 3012 is less than a second threshold value, and to send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-class B sub-coding method. 303; if the reference long-time linear prediction efficiency of the current audio frame estimated by the first estimation unit 3011 is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame estimated by the second estimation unit 3012 is greater than or equal to the second threshold, then it is determined that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction, and a routing control signal is sent to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-B class subencoder 302.

在本发明的一些实施例中，如图3-d和图3-e所示，选择器301也不包括第一估计单元3011或不包括第二估计单元3012。In some embodiments of the present invention, as shown in FIG. 3 - d and FIG. 3 - e , the selector 301 also does not include the first estimation unit 3011 or the second estimation unit 3012 .

在图3-d所示架构中，判决单元3013可用于若第一估计单元3011估计出的上述当前音频帧的参考长时线性预测效率小于第一阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器303；若第一估计单元3011估计出的上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器302。In the architecture shown in Figure 3-d, the judgment unit 3013 can be used to determine that the audio coding method matching the reference linear prediction efficiency of the above-mentioned current audio frame estimated by the first estimation unit 3011 is an audio coding method not based on linear prediction if the reference long-term linear prediction efficiency of the above-mentioned current audio frame estimated by the first estimation unit 3011 is less than the first threshold, and send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-B class sub-encoder 303; if the reference long-term linear prediction efficiency of the above-mentioned current audio frame estimated by the first estimation unit 3011 is greater than or equal to the first threshold, then determine that the audio coding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio coding method based on linear prediction, and send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-B class sub-encoder 302.

在图3-e所示架构中，判决单元3013可用于若第二估计单元3012估计出的上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器303；若第二估计单元3012估计出的上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式，向受控选路器304发送选路控制信号以控制受控选路器304将输入到受控选路器304中的当前音频帧输出至子B类子编码器302。In the architecture shown in Figure 3-e, the judgment unit 3013 can be used to determine that the audio coding method matching the reference linear prediction efficiency of the above-mentioned current audio frame estimated by the second estimation unit 3012 is an audio coding method not based on linear prediction if the reference short-time linear prediction efficiency of the above-mentioned current audio frame estimated by the second estimation unit 3012 is less than the second threshold, and send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-B class sub-encoder 303; if the reference short-time linear prediction efficiency of the above-mentioned current audio frame estimated by the second estimation unit 3012 is greater than or equal to the second threshold, then determine that the audio coding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio coding method based on linear prediction, and send a routing control signal to the controlled router 304 to control the controlled router 304 to output the current audio frame input into the controlled router 304 to the sub-B class sub-encoder 302.

在本发明的一些实施例中，如图3-f所示，在图3-c所示架构的音频编码器300的基础上，音频编码器300还可包括前处理器3014，用于获得当前音频帧的线性预测残差，前处理器3014可具体用于，利用分析滤波器A(Z)对当前音频帧的时域信号进行滤波，得到当前音频帧的线性预测残差R，其中，滤波器A(Z)的滤波器系数为当前音频帧的线性预测系数。In some embodiments of the present invention, as shown in FIG3-f, based on the audio encoder 300 of the architecture shown in FIG3-c, the audio encoder 300 may further include a pre-processor 3014 for obtaining a linear prediction residual of the current audio frame. The pre-processor 3014 may be specifically used to filter the time domain signal of the current audio frame using an analysis filter A(Z) to obtain a linear prediction residual R of the current audio frame, wherein the filter coefficient of the filter A(Z) is the linear prediction coefficient of the current audio frame.

其中，第一估计单元3011具体用于，根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，基于相关性与当前音频帧的长时线性预测效率之间的映射关系，得到与计算出的上述相关性具有映射关系的上述当前音频帧的长时线性预测效率，其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差(例如，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某一帧历史音频帧的线性预测残差，或者，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测残差)，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励(例如，上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某一帧历史音频帧的线性预测激励，或者上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测激励)。The first estimation unit 3011 is specifically configured to obtain, based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal, and obtain, based on a mapping relationship between the correlation and the long-term linear prediction efficiency of the current audio frame, a long-term linear prediction efficiency of the current audio frame having a mapping relationship with the calculated correlation, wherein the first historical linear prediction signal is a first historical linear prediction excitation or a first historical linear prediction residual; the first historical linear prediction residual is a linear prediction residual of a historical audio frame of the current audio frame (for example, the first historical linear prediction residual may be a linear prediction residual of a historical audio frame of the current audio frame having the same or similar duration as the current audio frame and being the current audio frame). The first historical linear prediction excitation may be a linear prediction residual of a historical audio frame of a frequency frame, or the first historical linear prediction residual may be a linear prediction residual of a portion of a continuous audio signal of two adjacent historical audio frames having the same or similar duration as the current audio frame, and being the current audio frame), and the first historical linear prediction excitation may be a linear prediction excitation of a historical audio frame of the current audio frame (for example, the first historical linear prediction excitation may be a linear prediction excitation of a historical audio frame having the same or similar duration as the current audio frame, and being the current audio frame, or the first historical linear prediction excitation may be a linear prediction excitation of a portion of a continuous audio signal of two adjacent historical audio frames having the same or similar duration as the current audio frame).

在本发明的一些实施例中，如图3-g所示，在图3-f所示架构的音频编码器300的基础上，音频编码器300还可包括缓存器308，缓存器308可缓存每个音频帧或者部分音频帧的线性预测激励或线性预测残差，以便作为在可能的下一音频帧将可能用到的历史线性预测激励或历史线性预测残差，以计算其与下一音频帧的线性预测残差的相关性。其中，第一估计单元3011可从缓存器308之中获得第一历史线性预测信号。In some embodiments of the present invention, as shown in FIG3-g , based on the audio encoder 300 of the architecture shown in FIG3-f , the audio encoder 300 may further include a buffer 308 . The buffer 308 may cache the linear prediction excitation or linear prediction residual of each audio frame or a portion of an audio frame, so as to serve as a historical linear prediction excitation or historical linear prediction residual that may be used in a possible next audio frame, thereby calculating its correlation with the linear prediction residual of the next audio frame. The first estimation unit 3011 may obtain the first historical linear prediction signal from the buffer 308 .

在本发明的一些实施例中，如图3-h所示，缓存器308所缓存的历史线性预测激励或历史线性预测残差可来自于本地音频解码器311。其中，本地音频解码器311可对A类子编码器302、B类子编码器303编码后输出的已编码的音频帧进行解码处理并输出，线性预测器312可本地音频解码器311输出的时域音频帧进行线性预测，得到音频帧的线性预测残差或线性预测激励。In some embodiments of the present invention, as shown in FIG3-h , the historical linear prediction excitation or historical linear prediction residual cached in the buffer 308 may come from the local audio decoder 311. The local audio decoder 311 may decode and output the encoded audio frames encoded by the Class A sub-encoder 302 and the Class B sub-encoder 303, and the linear predictor 312 may perform linear prediction on the time-domain audio frames output by the local audio decoder 311 to obtain the linear prediction residual or linear prediction excitation of the audio frames.

在本发明的一些实施例中，如图3-i所示，缓存器308所缓存的历史线性预测激励也可来自A类子编码器302，A类子编码器302在编码音频帧的过程中将得到音频帧的线性预测激励，A类子编码器302可将得到的音频帧的线性预测激励输出到缓存器308之中进行缓存。In some embodiments of the present invention, as shown in Figure 3-i, the historical linear prediction excitation cached by the buffer 308 may also come from the Class A sub-encoder 302. The Class A sub-encoder 302 will obtain the linear prediction excitation of the audio frame during the process of encoding the audio frame. The Class A sub-encoder 302 can output the obtained linear prediction excitation of the audio frame to the buffer 308 for caching.

在本发明的一些实施例中，第一估计单元3011估计当前音频帧的长时线性预测效率所使用的第一历史线性预测激励或上述第一历史线性预测残差可基于上述当前音频帧的基音确定，例如，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于缓存器308所缓存的其它历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于缓存器308所缓存的其它至少1个历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。例如，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于缓存器308所缓存的其它历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于缓存器308所缓存的其它至少1个历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。In some embodiments of the present invention, the first historical linear prediction excitation or the first historical linear prediction residual used by the first estimation unit 3011 to estimate the long-term linear prediction efficiency of the current audio frame may be determined based on the pitch of the current audio frame. For example, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction excitations cached in the buffer 308 and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction excitation cached in the buffer 308 and the linear prediction residual of the current audio frame. For example, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction residuals cached in the buffer 308 and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction residual cached in the buffer 308 and the linear prediction residual of the current audio frame.

其中，音频编码器300可为任何需要采集、存储或者向外传输音频信号的装置，例如手机、平板电脑、个人电脑、笔记本电脑等等。The audio encoder 300 may be any device that needs to collect, store or transmit audio signals, such as a mobile phone, a tablet computer, a personal computer, a laptop computer, etc.

参见图4，图4为本发明的另一实施例提供的一种音频编码器400的结构示意图。其中，音频编码器400可以包括估计单元410、确定单元420和编码单元430。4 , which is a schematic structural diagram of an audio encoder 400 provided by another embodiment of the present invention, wherein the audio encoder 400 may include an estimating unit 410 , a determining unit 420 , and an encoding unit 430 .

其中，估计单元410，用于估计当前音频帧的参考线性预测效率。The estimation unit 410 is configured to estimate a reference linear prediction efficiency of the current audio frame.

确定单元420，用于确定与估计单元410估计出的上述当前音频帧的参考线性预测效率匹配的音频编码方式。The determining unit 420 is configured to determine an audio coding mode that matches the reference linear prediction efficiency of the current audio frame estimated by the estimating unit 410 .

编码单元430，用于按照确定单元420确定出的与上述当前音频帧的参考线性预测效率匹配的音频编码方式，对上述当前音频帧进行音频编码。The encoding unit 430 is configured to perform audio encoding on the current audio frame according to the audio encoding mode that matches the reference linear prediction efficiency of the current audio frame determined by the determining unit 420 .

例如，当前音频帧的参考长时线性预测效率可基于当前音频帧的长时线性预测效率得到。当前音频帧的参考短时线性预测效率可基于当前音频帧的短时线性预测效率得到。当前音频帧的参考综合线性预测效率例如可基于当前音频帧的长时线性预测效率和短时线性预测效率得到。For example, the reference long-term linear prediction efficiency of the current audio frame may be obtained based on the long-term linear prediction efficiency of the current audio frame. The reference short-term linear prediction efficiency of the current audio frame may be obtained based on the short-term linear prediction efficiency of the current audio frame. The reference comprehensive linear prediction efficiency of the current audio frame may be obtained based on the long-term linear prediction efficiency and the short-term linear prediction efficiency of the current audio frame.

可以理解，参考线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x1(x1为正数)。其中，参考长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x2(x2为正数)。参考短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x3(x3为正数)。其中，参考综合线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x4(x4为正数)。其中，长时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x5(x5为正数)。短时线性预测效率的取值范围可为0～1(即0％～100％)，或者取值范围也可以是0～x6(x6为正数)。其中，x1、x2、x3、x4、x5或x6例如可为0.5、0.8或1.5、2、5、10、50、100或其它正数。It can be understood that the value range of the reference linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x1 (x1 is a positive number). Among them, the value range of the reference long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x2 (x2 is a positive number). The value range of the reference short-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x3 (x3 is a positive number). Among them, the value range of the reference comprehensive linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x4 (x4 is a positive number). Among them, the value range of the long-term linear prediction efficiency can be 0-1 (i.e., 0%-100%), or the value range can also be 0-x5 (x5 is a positive number). The short-term linear prediction efficiency may range from 0 to 1 (i.e., 0% to 100%), or may range from 0 to x6 (x6 is a positive number). Here, x1, x2, x3, x4, x5, or x6 may be, for example, 0.5, 0.8, 1.5, 2, 5, 10, 50, 100, or other positive numbers.

在本发明的一些实施例中，估计单元可具体用于，当当前音频帧为非语音音频帧，估计上述当前音频帧的参考线性预测效率。In some embodiments of the present invention, the estimation unit may be specifically configured to estimate the reference linear prediction efficiency of the current audio frame when the current audio frame is a non-speech audio frame.

在本发明一些实施例中，音频帧(例如当前音频帧或其它音频帧)的参考综合线性预测效率基于该音频帧的参考长时线性预测效率和该音频帧的参考短时线性预测效率得到。上述当前音频帧的参考综合线性预测效率例如可为上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率的和值、加权和值(其中，此处加权和值所对应的权值可以根据实际需要进行设定，其中1个权值例如可为0.5、1.、2、3、5、10或其它值)或平均值。当然，也可能通过其它的算法，基于上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率得到上述当前音频帧的参考综合线性预测效率。In some embodiments of the present invention, a reference comprehensive linear prediction efficiency for an audio frame (e.g., a current audio frame or another audio frame) is obtained based on a reference long-term linear prediction efficiency and a reference short-term linear prediction efficiency for the audio frame. The reference comprehensive linear prediction efficiency for the current audio frame may be, for example, the sum, weighted sum (wherein the weight corresponding to the weighted sum may be set as needed, such as 0.5, 1.0, 2.0, 3.0, 5.0, 10.0, or other values), or an average value of the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency for the current audio frame. Of course, the reference comprehensive linear prediction efficiency for the current audio frame may also be obtained using other algorithms based on the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency for the current audio frame.

可以理解的是，上述当前音频帧的参考线性预测效率所包括的线性预测效率的种类不同，确定单元420确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的具体方式也就可能不同。It is understandable that the reference linear prediction efficiency of the current audio frame may include different types of linear prediction efficiencies, and the specific manner in which the determination unit 420 determines the audio encoding method that matches the reference linear prediction efficiency of the current audio frame may also be different.

下面举例一些可能的实施例方式。Some possible implementation methods are given below.

在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, the determination unit 420 may be specifically configured to: if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, determine that the audio coding method matching the reference linear prediction efficiency of the current audio frame is an audio coding method not based on linear prediction.

在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。In some further embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, the determination unit 420 may be specifically configured to: if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a second threshold, determine that the audio coding method that matches the reference linear prediction efficiency of the current audio frame is an audio coding method based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes a reference long-time linear prediction efficiency of the current audio frame and a reference short-time linear prediction efficiency of the current audio frame, the determination unit 420 may be specifically configured to: if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, determine that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode that is not based on linear prediction; if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the second threshold, determine that the audio coding mode that matches the reference linear prediction efficiency of the current audio frame is an audio coding mode that is based on linear prediction.

在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。In some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, the determination unit 420 may be specifically used to: if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some further embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, the determination unit 420 may be specifically configured to: if the reference long-time linear prediction efficiency of the current audio frame is less than a fourth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method that is not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则确定单元420可具体用于：若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference long-time linear prediction efficiency of the above-mentioned current audio frame, the determination unit 420 can be specifically used to: if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a third threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is less than a fourth threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则确定单元420具体用于：确定上述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，上述第一音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: determine a first linear prediction efficiency interval in which the reference long-time linear prediction efficiency of the current audio frame falls, and determine a first audio coding method having a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the first audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定单元420具体用于：若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then the above-mentioned determination unit 420 is specifically used to: if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to the fifth threshold, then determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则上述确定单元420具体用于：若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: if the reference short-time linear prediction efficiency of the current audio frame is less than a fifth threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则确定单元420具体用于：若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, the determination unit 420 is specifically used to: if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a fifth threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则确定单元420具体用于：确定上述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第二线性预测效率区间具有映射关系的第二音频编码方式，其中，上述第二音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第二音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: determine a second linear prediction efficiency interval into which the reference short-time linear prediction efficiency of the current audio frame falls, and determine a second audio coding method having a mapping relationship with the second linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the second audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定单元420具体用于：若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。In other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: if the reference comprehensive linear prediction efficiency of the current audio frame is greater than or equal to a sixth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定单元420具体用于：若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: if the reference comprehensive linear prediction efficiency of the current audio frame is less than a sixth threshold, determine that the audio encoding method matching the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则上述确定单元420具体用于：若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。In some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then the above-mentioned determination unit 420 is specifically used to: if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a sixth threshold, then determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than the sixth threshold, then determine that the audio encoding method matching the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，确定单元420具体用于：确定上述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第三线性预测效率区间具有映射关系的第三音频编码方式，上述第三音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第三音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。In other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, the determination unit 420 is specifically used to: determine the third linear prediction efficiency interval into which the reference comprehensive linear prediction efficiency of the current audio frame falls, and determine a third audio coding method having a mapping relationship with the third linear prediction efficiency interval based on a mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, the third audio coding method being an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio coding method being an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

在本发明一些实施例中，基于线性预测的音频编码方式可以包括ACELP编码、TCX等。非基于线性预测的音频编码方式可包括GAC，GAC例如可包括MDCT编码或DCT编码等。In some embodiments of the present invention, audio coding methods based on linear prediction may include ACELP coding, TCX, etc. Audio coding methods not based on linear prediction may include GAC, which may include MDCT coding or DCT coding, for example.

可以理解，上述举例中提及的各种阈值(例如第一阈值、第二阈值、第三阈值、第四阈值、第五阈值、第六阈值)的具体取值，可根据需要或者根据应用的环境和场景进行设定。例如上述当前音频帧的参考长时线性预测效率的取值范围为0～1，则第一阈值可取值为0.2、0.5、0.6、0.8等、上述当前音频帧的参考短时线性预测效率的取值范围为0～1，第二阈值可取值为0.3、0.3、0.6或0.8等。其它场景以此类推。进一步的，还可根据需要对各种阈值的取值进行动态适应性的调整。It can be understood that the specific values of the various thresholds mentioned in the above examples (such as the first threshold, the second threshold, the third threshold, the fourth threshold, the fifth threshold, and the sixth threshold) can be set as needed or according to the application environment and scenario. For example, if the reference long-term linear prediction efficiency of the current audio frame ranges from 0 to 1, the first threshold can be 0.2, 0.5, 0.6, 0.8, etc., and if the reference short-term linear prediction efficiency of the current audio frame ranges from 0 to 1, the second threshold can be 0.3, 0.3, 0.6 or 0.8, etc. The same applies to other scenarios. Furthermore, the values of the various thresholds can be dynamically and adaptively adjusted as needed.

可以理解的是，估计单元410具体估计上述当前音频帧的参考线性预测效率所包括的不同种类线性预测效率的方式可能有所不同。下面通过举例一些可能的实施例方式进行说明。It is understandable that the estimation unit 410 may estimate the different types of linear prediction efficiencies included in the reference linear prediction efficiency of the current audio frame in different ways.

在本发明的一些实施例中，在估计当前音频帧的参考长时线性预测效率的方面，估计单元410具体用于：估计当前音频帧的长时线性预测效率，上述当前音频帧的长时线性预测效率为上述当前音频帧的参考长时线性预测效率。In some embodiments of the present invention, in terms of estimating the reference long-time linear prediction efficiency of the current audio frame, the estimation unit 410 is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame, the long-time linear prediction efficiency of the current audio frame being the reference long-time linear prediction efficiency of the current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考长时线性预测效率的方面，估计单元410具体用于：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N1个历史音频帧的线性预测效率；计算上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值，其中，上述N1为正整数，上述第一统计值为上述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N11个历史音频帧为上述N1个历史音频帧的子集。其中，计算得到的上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值例如可以是，上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference long-time linear prediction efficiency of the above-mentioned current audio frame, the estimation unit 410 is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N1 historical audio frames of the above-mentioned current audio frame; calculate the linear prediction efficiencies of the above-mentioned N1 historical audio frames and the first statistical value of the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N1 is a positive integer, and the above-mentioned first statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the linear prediction efficiency of each historical audio frame in the N11 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of the above-mentioned each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of the above-mentioned each historical audio frame, and the above-mentioned N11 historical audio frames are a subset of the above-mentioned N1 historical audio frames. Among them, the first statistical value of the calculated linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame can be, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考长时线性预测效率的方面，估计单元410具体用于：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N2个历史音频帧的参考线性预测效率；计算上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值，其中，上述N2为正整数，其中，上述第二统计值为上述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N21个历史音频帧为上述N2个历史音频帧的子集。其中，计算得到的上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值例如为上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference long-time linear prediction efficiency of the above-mentioned current audio frame, the estimation unit 410 is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N2 historical audio frames of the above-mentioned current audio frame; calculate the reference linear prediction efficiencies of the above-mentioned N2 historical audio frames and the second statistical value of the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N2 is a positive integer, wherein the above-mentioned second statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N21 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each historical audio frame, and the above-mentioned N21 historical audio frames are a subset of the above-mentioned N2 historical audio frames. Among them, the second statistical value of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考长时线性预测效率的方面，估计单元410具体用于：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N4个历史音频帧的参考线性预测效率，获取上述当前音频帧的N3个历史音频帧的线性预测效率；计算上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值，其中，上述N3和上述N4为正整数，上述第三统计值为上述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述N31个历史音频帧为上述N3个历史音频帧的子集，上述N41个历史音频帧为上述N4个历史音频帧的子集，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到。其中，计算得到的上述上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值例如为，上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference long-time linear prediction efficiency of the above-mentioned current audio frame, the estimation unit 410 is specifically used to: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N4 historical audio frames of the above-mentioned current audio frame, and obtain the linear prediction efficiency of N3 historical audio frames of the above-mentioned current audio frame; calculate the third statistical value of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N3 and the above-mentioned N4 are positive integers, the above-mentioned third statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N31 historical audio frames is at least one of the following linear prediction efficiencies : long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each historical audio frame in the N41 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the above-mentioned N31 historical audio frames are a subset of the above-mentioned N3 historical audio frames, the above-mentioned N41 historical audio frames are a subset of the above-mentioned N4 historical audio frames, the comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the above-mentioned historical audio frames, and the reference comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each of the above-mentioned historical audio frames. Among them, the third statistical value calculated of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted average of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame.

在本发明的一些实施例中，在估计上述当前音频帧的参考短时线性预测效率的方面，上述估计单元410可具体用于：估计当前音频帧的短时线性预测效率，其中，上述当前音频帧的短时线性预测效率为上述当前音频帧的参考短时线性预测效率。In some embodiments of the present invention, in terms of estimating the reference short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned estimation unit 410 can be specifically used to: estimate the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the above-mentioned current audio frame is the reference short-time linear prediction efficiency of the above-mentioned current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考短时线性预测效率的方面，上述估计单元410可具体用于：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N5个历史音频帧的线性预测效率；计算上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值，其中，上述N5为正整数，上述第四统计值为上述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N51个历史音频帧为上述N5个历史音频帧的子集。计算得到的上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值可为，上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned estimation unit 410 can be specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N5 historical audio frames of the above-mentioned current audio frame; calculate the fourth statistical value of the linear prediction efficiencies of the above-mentioned N5 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N5 is a positive integer, and the above-mentioned fourth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the linear prediction efficiency of each historical audio frame in the N51 historical audio frames is at least one of the following linear prediction efficiency: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, and the comprehensive linear prediction efficiency of each of the above-mentioned historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the above-mentioned historical audio frames, and the above-mentioned N51 historical audio frames are a subset of the above-mentioned N5 historical audio frames. The fourth statistical value of the linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考短时线性预测效率的方面，上述估计单元410可具体用于：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N6个历史音频帧的参考线性预测效率；计算上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值，其中，上述N6为正整数，上述第五统计值为上述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N61个历史音频帧为上述N6个历史音频帧的子集。其中，估计单元410计算得到的上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值可为，上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned estimation unit 410 can be specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N6 historical audio frames of the above-mentioned current audio frame; calculate the fifth statistical value of the reference linear prediction efficiencies of the above-mentioned N6 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N6 is a positive integer, and the above-mentioned fifth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N61 historical audio frames is at least one of the following linear prediction efficiency: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each historical audio frame, and the above-mentioned N61 historical audio frames are a subset of the above-mentioned N6 historical audio frames. Among them, the fifth statistical value of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame calculated by the estimation unit 410 may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明的另一些实施例中，在估计上述当前音频帧的参考短时线性预测效率的方面，上述估计单元410可具体用于：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N8个历史音频帧的参考线性预测效率；获取上述当前音频帧的N7个历史音频帧的线性预测效率；计算上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值，其中，上述N7和上述N8为正整数，上述第六统计值为上述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，其中，上述N71个历史音频帧为上述N7个历史音频帧的子集，上述N81个历史音频帧为上述N8个历史音频帧的子集。其中，计算得到的上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值可为，上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。In some other embodiments of the present invention, in terms of estimating the reference short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned estimation unit 410 can be specifically used to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N8 historical audio frames of the above-mentioned current audio frame; obtain the linear prediction efficiency of N7 historical audio frames of the above-mentioned current audio frame; calculate the sixth statistical value of the linear prediction efficiency of the above-mentioned N7 historical audio frames, the reference linear prediction efficiency of the above-mentioned N8 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N7 and the above-mentioned N8 are positive integers, the above-mentioned sixth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N71 historical audio frames is at least one of the following linear prediction efficiencies : long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, the reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame, wherein the N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames. Among them, the sixth statistical value of the calculated linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明一些实施例中，在上述估计得到当前音频帧的短时线性预测效率的方面，估计单元410具体用于：基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率。In some embodiments of the present invention, in the aspect of estimating the short-time linear prediction efficiency of the current audio frame, the estimating unit 410 is specifically configured to obtain the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame.

在本发明一些实施例中，在上述基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率的方面，估计单元410可具体用于：计算当前音频帧进行短时线性预测前后的能量变化率，其中，上述能量变化率为上述当前音频帧的短时线性预测效率，或者上述当前音频帧的短时线性预测效率基于上述能量变化率变换得到，其中，上述当前音频帧进行短时线性预测后的能量为上述当前音频帧的线性预测残差的能量。例如，能量变化率与当前音频帧的短时线性预测效率之间可具有映射关系，可基于能量变化率与当前音频帧的短时线性预测效率之间的映射关系，得到与计算出的上述能量变化率具有映射关系的当前音频帧的短时线性预测效率。一般来说，当前音频帧进行短时线性预测前后的能量变化率越大，表示当前音频帧的短时线性预测效率越高。In some embodiments of the present invention, regarding the aspect of obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame, the estimation unit 410 may be specifically configured to calculate an energy change rate before and after the short-time linear prediction of the current audio frame is performed, wherein the energy change rate is the short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained by transforming the energy change rate, wherein the energy of the current audio frame after the short-time linear prediction is the energy of the linear prediction residual of the current audio frame. For example, a mapping relationship may be established between the energy change rate and the short-time linear prediction efficiency of the current audio frame. Based on the mapping relationship between the energy change rate and the short-time linear prediction efficiency of the current audio frame, the short-time linear prediction efficiency of the current audio frame that has a mapping relationship with the calculated energy change rate may be obtained. Generally speaking, a greater energy change rate before and after the short-time linear prediction of the current audio frame is associated with the short-time linear prediction, indicating a higher short-time linear prediction efficiency of the current audio frame.

在本发明一些实施例中，上述当前音频帧进行短时线性预测前后的能量变化率，为上述当前音频帧进行短时线性预测前的能量与上述当前音频帧的线性预测残差的能量的比值。一般来说，上述当前音频帧进行短时线性预测前的能量除以上述当前音频帧的线性预测残差的能量得到的比值越大，表示当前音频帧的短时线性预测效率越高。In some embodiments of the present invention, the energy change rate of the current audio frame before and after short-time linear prediction is calculated as a ratio of the energy of the current audio frame before short-time linear prediction to the energy of the linear prediction residual of the current audio frame. Generally speaking, a larger ratio obtained by dividing the energy of the current audio frame before short-time linear prediction by the energy of the linear prediction residual of the current audio frame indicates a higher short-time linear prediction efficiency for the current audio frame.

在本发明的一些实施例中，在上述估计得到当前音频帧的长时线性预测效率的方面，上述估计单元410可以具体用于：根据计算当前音频帧的线性预测残差和第一历史线性预测信号，得到当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性，其中，上述相关性为上述当前音频帧的长时线性预测效率，或者，上述当前音频帧的长时线性预测效率基于上述相关性得到，其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差，上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差(例如，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某一帧历史音频帧的线性预测残差，或者，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测残差)，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励(例如，上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某一帧历史音频帧的线性预测激励，或者上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测激励)。In some embodiments of the present invention, in terms of estimating the long-term linear prediction efficiency of the current audio frame, the estimation unit 410 may be specifically configured to: obtain the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the calculated linear prediction residual of the current audio frame, wherein the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the correlation, wherein the first historical linear prediction signal is a first historical linear prediction excitation or a first historical linear prediction residual, and the first historical linear prediction residual is a linear prediction residual of a historical audio frame of the current audio frame (for example, the first historical linear prediction residual may be a linear prediction residual having a duration corresponding to the duration of the current audio frame). The first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the current audio frame (for example, the first historical linear prediction excitation can be the linear prediction excitation of the historical audio frame of the current audio frame, which is the same or similar in length as the current audio frame and is the linear prediction excitation of a historical audio frame of the current audio frame, or the first historical linear prediction excitation can be the linear prediction excitation of the historical audio frame of the current audio frame, which is the same or similar in length as the current audio frame and is the linear prediction excitation of a historical audio frame of the current audio frame).

其中，估计单元410根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性的方式可以是多种多样的。There are various ways in which the estimation unit 410 obtains the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal.

在本发明的一些实施例中，在上述根据计算当前音频帧的线性预测残差和第一历史线性预测信号，得到当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性的方面，上述估计单元410可具体用于：计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性；In some embodiments of the present invention, in the aspect of obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on calculating the linear prediction residual of the current audio frame and the first historical linear prediction signal, the estimation unit 410 may be specifically configured to: calculate the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

或者，将当前音频帧的线性预测残差乘以增益因子以得到上述当前音频帧的增益线性预测残差，计算得到上述当前音频帧的增益线性预测残差与第一历史线性预测信号之间的相关性，其中，计算得到的上述当前音频帧的增益线性预测残差与上述第一历史线性预测信号之间的相关性，为上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性；Alternatively, multiplying the linear prediction residual of the current audio frame by a gain factor to obtain a gain linear prediction residual of the current audio frame, and calculating a correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal, wherein the calculated correlation between the gain linear prediction residual of the current audio frame and the first historical linear prediction signal is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal;

或者，or,

将第一历史线性预测信号乘以增益因子以得到增益后的第一历史线性预测信号，计算得到上述当前音频帧的线性预测残差与上述增益后的第一历史线性预测信号之间的相关性，其中，计算得到的上述当前音频帧的线性预测残差与上述增益后的第一历史线性预测信号之间的相关性，为上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性。The first historical linear prediction signal is multiplied by a gain factor to obtain a first historical linear prediction signal after gain, and a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal after gain is calculated, wherein the calculated correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal after gain is the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

在本发明一些实施例中，其中，上述第一历史线性预测激励或上述第一历史线性预测残差可基于上述当前音频帧的基音确定。例如，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它至少1个历史线性预测激励与上述当前音频帧的线性预测残差在时域上的相关性。例如，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或等于其它历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。或者，上述第一历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性，大于或者等于其它至少1个历史线性预测残差与上述当前音频帧的线性预测残差在时域上的相关性。In some embodiments of the present invention, the first historical linear prediction excitation or the first historical linear prediction residual may be determined based on the pitch of the current audio frame. For example, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction excitations and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction excitation and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction excitation and the linear prediction residual of the current audio frame. For example, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between other historical linear prediction residuals and the linear prediction residual of the current audio frame. Alternatively, the time-domain correlation between the first historical linear prediction residual and the linear prediction residual of the current audio frame is greater than or equal to the time-domain correlation between at least one other historical linear prediction residual and the linear prediction residual of the current audio frame.

在本发明的一些实施例中，当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性例如为，时域上的互相关函数值和/或频域上的互相关函数值，或者当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性可为时域上的失真和/或频域上的失真。其中，在本发明的一些实施例中，上述频域上的失真可在频域上的K1个频点的失真的和值或加权和值，或者上述频域上的失真可为在频域上的K2个子带上的失真的和值或加权和值，上述K1和上述K2为正整数。在本发明的一些实施例中，上述失真的加权和值所对应的加权系数为反映心理声学模型的感知加权系数。当然，上述失真的加权和值所对应的加权系数亦可为基于实际需要设定的其它加权系数。其中，测试发现，使用感知加权系数有利于使得计算出的失真更加符合主观的质量，从而有利于提升性能。In some embodiments of the present invention, the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal can be, for example, a cross-correlation function value in the time domain and/or a cross-correlation function value in the frequency domain. Alternatively, the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal can be distortion in the time domain and/or distortion in the frequency domain. In some embodiments of the present invention, the distortion in the frequency domain can be the sum or weighted sum of distortions at K1 frequency points in the frequency domain, or the distortion in the frequency domain can be the sum or weighted sum of distortions at K2 subbands in the frequency domain, where K1 and K2 are positive integers. In some embodiments of the present invention, the weighting coefficient corresponding to the weighted sum of the distortions is a perceptual weighting coefficient reflecting a psychoacoustic model. Of course, the weighting coefficient corresponding to the weighted sum of the distortions can also be other weighting coefficients set based on actual needs. Testing has found that using a perceptual weighting coefficient helps make the calculated distortion more consistent with subjective quality, thereby improving performance.

在本发明的一些实施例中，上述第一历史线性预测激励为利用基于线性预测的编码方式对上述当前音频帧的历史音频帧进行音频编码而产生的线性预测激励。In some embodiments of the present invention, the first historical linear prediction excitation is a linear prediction excitation generated by audio encoding a historical audio frame of the current audio frame using a linear prediction-based encoding method.

在本发明的一些实施例中，上述第一历史线性预测残差基于上述当前音频帧的第一历史音频帧的时域信号和上述第一历史音频帧的线性预测系数得到，其中，上述第一历史音频帧的线性预测编码系数为量化后的线性预测系数或未经量化的线性预测系数。其中，由于实际编解码过程中对最终质量起作用的通常都是量化后的线性预测系数，因此使用量化后的线性预测系数计算线性预测残差有利于使计算出的相关性更准确。In some embodiments of the present invention, the first historical linear prediction residual is obtained based on a time domain signal of a first historical audio frame of the current audio frame and linear prediction coefficients of the first historical audio frame, wherein the linear prediction coding coefficients of the first historical audio frame are quantized linear prediction coefficients or unquantized linear prediction coefficients. Since the quantized linear prediction coefficients generally contribute to the final quality in actual encoding and decoding processes, using the quantized linear prediction coefficients to calculate the linear prediction residual facilitates more accurate calculated correlation.

在本发明的一些实施例中，上述当前音频帧的线性预测残差基于上述当前音频帧的时域信号和上述当前音频帧的线性预测系数得到，其中，上述当前音频帧的线性预测系数为量化后的线性预测系数或未经量化的线性预测系数。其中，由于实际编解码过程中对最终质量起作用的通常都是量化后的线性预测系数，因此使用量化后的线性预测系数计算线性预测残差有利于使计算出的相关性更准确。In some embodiments of the present invention, the linear prediction residual of the current audio frame is obtained based on the time domain signal of the current audio frame and the linear prediction coefficient of the current audio frame, wherein the linear prediction coefficient of the current audio frame is a quantized linear prediction coefficient or an unquantized linear prediction coefficient. Since the quantized linear prediction coefficients generally affect the final quality in actual encoding and decoding processes, using the quantized linear prediction coefficients to calculate the linear prediction residual facilitates more accurate calculated correlation.

在本发明的一些实施例中，上述第一历史线性预测激励为自适应码本激励与固定码本激励的叠加激励，或者，上述第一历史线性预测激励为自适应码本激励。In some embodiments of the present invention, the first historical linear prediction excitation is a superposition of an adaptive codebook excitation and a fixed codebook excitation, or the first historical linear prediction excitation is an adaptive codebook excitation.

可以理解的是，本实施例的音频编码器400的各功能模块的功能可根据上述方法实施例中的方法具体实现，其具体实现过程可以参照上述方法实施例的相关描述，此处不再赘述。其中，音频编码器400可为任何需要采集、存储或者可向外传输音频信号的装置，例如可为手机、平板电脑、个人电脑、笔记本电脑等等。It is understood that the functions of the various functional modules of the audio encoder 400 of this embodiment can be specifically implemented according to the methods in the above-mentioned method embodiments. The specific implementation process can be referred to the relevant description of the above-mentioned method embodiments and will not be repeated here. The audio encoder 400 can be any device that needs to collect, store, or transmit audio signals, such as a mobile phone, tablet computer, personal computer, laptop computer, etc.

其中，本装置实施例涉及的各阈值(如第一阈值、第二阈值等)、各其它参数(如N1、N11、N21、N2等)的取值举例，可参考上述方法实施例中的相关取值举例，此处不再赘述。Among them, for examples of values of various thresholds (such as the first threshold, the second threshold, etc.) and other parameters (such as N1, N11, N21, N2, etc.) involved in the embodiment of this device, please refer to the relevant value examples in the above method embodiment, and no further details will be given here.

可以看出，本实施例的技术方案中，音频编码器400先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。It can be seen that in the technical solution of this embodiment, the audio encoder 400 first estimates the reference linear prediction efficiency of the current audio frame; determines the audio coding method that matches it through the estimated reference linear prediction efficiency of the above-mentioned current audio frame, and performs audio encoding on the above-mentioned current audio frame according to the determined audio coding method that matches it. Since the above-mentioned scheme does not need to perform the operation of fully encoding the current audio frame using each audio coding method required by the existing closed-loop selection mode in the process of determining the audio coding method, but instead determines the audio coding method to be selected through the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is usually much smaller than the computational complexity of fully encoding the current audio frame using each audio coding method, compared with the existing mechanism, the above-mentioned scheme of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the overhead of audio coding.

参见图5，图5描述了本发明另一个实施例提供的用于解码语音频码流的编码器的结构，该编码器包括：至少一个总线501、与总线501相连的至少一个处理器502以及与总线501相连的至少一个存储器503。Referring to FIG. 5 , FIG. 5 describes the structure of an encoder for decoding speech and audio code streams provided by another embodiment of the present invention. The encoder includes: at least one bus 501, at least one processor 502 connected to the bus 501, and at least one memory 503 connected to the bus 501.

其中，处理器502通过总线501，调用存储器503中存储的代码以用于估计当前音频帧的参考线性预测效率；确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式；按照与上述当前音频帧的参考线性预测效率匹配的音频编码方式，对上述当前音频帧进行音频编码。In which, the processor 502 calls the code stored in the memory 503 through the bus 501 to estimate the reference linear prediction efficiency of the current audio frame; determines the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame; and audio encodes the above-mentioned current audio frame according to the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame.

在本发明的一些实施例中，在估计当前音频帧的参考线性预测效率之前处理器502还可用于通过总线501，调用存储器503中存储的代码，先判断当前音频帧是否为语音音频帧。例如，上述估计当前音频帧的参考线性预测效率可以包括：当当前音频帧为非语音音频帧，估计上述当前音频帧的参考线性预测效率。此外，也可在上述估计当前音频帧的参考线性预测效率之前不区分当前音频帧是否为语音音频帧。In some embodiments of the present invention, before estimating the reference linear prediction efficiency of the current audio frame, the processor 502 may further be configured to call code stored in the memory 503 via the bus 501 to first determine whether the current audio frame is a speech audio frame. For example, estimating the reference linear prediction efficiency of the current audio frame may include, when the current audio frame is a non-speech audio frame, estimating the reference linear prediction efficiency of the current audio frame. Furthermore, the estimation of the reference linear prediction efficiency of the current audio frame may be performed without distinguishing whether the current audio frame is a speech audio frame.

在本发明的一些实施例中，上述当前音频帧的参考综合线性预测效率例如可为上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率的和值、加权和值(其中，此处加权和值所对应的权值可以根据实际需要进行设定，其中1个权值例如可为0.5、1.、2、3、5、10或者其它值)或平均值。当然，也可能通过其它算法，基于上述当前音频帧的参考长时线性预测效率和当前音频帧的参考短时线性预测效率得到上述当前音频帧的参考综合线性预测效率。In some embodiments of the present invention, the reference comprehensive linear prediction efficiency of the current audio frame may be, for example, the sum, weighted sum (wherein the weight corresponding to the weighted sum may be set as needed, and one weight may be, for example, 0.5, 1.0, 2.0, 3.0, 5.0, 10.0, or other values), or the average value of the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency of the current audio frame. Of course, the reference comprehensive linear prediction efficiency of the current audio frame may also be obtained using other algorithms based on the reference long-term linear prediction efficiency and the reference short-term linear prediction efficiency of the current audio frame.

可以理解的是，上述当前音频帧的参考线性预测效率所包括的线性预测效率的种类不同，处理器502确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的具体方式也就可能不同。下面举例一些可能的实施例方式。It is understood that the reference linear prediction efficiency of the current audio frame may include different types of linear prediction efficiencies, and the specific manner in which the processor 502 determines the audio coding mode that matches the reference linear prediction efficiency of the current audio frame may also vary. Some possible embodiments are described below.

举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a second threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the aspect of determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically determine that, if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, then the audio coding method that matches the reference linear prediction efficiency of the current audio frame is determined to be an audio coding method not based on linear prediction; if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the second threshold, then the audio coding method that matches the reference linear prediction efficiency of the current audio frame is determined to be an audio coding method based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference long-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is less than a fourth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference long-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically be used for: if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a third threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is less than a fourth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，确定上述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，上述第一音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。其中，不同的线性预测效率区间对应于不同的音频编码方式。例如假设存着3个线性预测效率区间，分别可为0～30％、30％～70％和70％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～30％(即第一线性预测效率区间为线性预测效率区间0～30％)，可确定线性预测效率区间0～30％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间30％～70％(即第一线性预测效率区间为线性预测效率区间30％～70％)，可以确定线性预测效率区间30％～70％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, then in determining the audio coding mode that matches the reference linear prediction efficiency of the current audio frame, the processor 502 invokes code stored in the memory 503 via the bus 501 to specifically determine a first linear prediction efficiency interval within which the reference long-term linear prediction efficiency of the current audio frame falls, and based on a mapping relationship between linear prediction efficiency intervals and audio coding modes based on linear prediction, determine a first audio coding mode that has a mapping relationship with the first linear prediction efficiency interval, wherein the first audio coding mode is an audio coding mode that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding mode is an audio coding mode based on linear prediction or an audio coding mode not based on linear prediction. Different linear prediction efficiency intervals correspond to different audio coding modes. For example, assuming there are three linear prediction efficiency intervals, which can be 0-30%, 30%-70%, and 70%-100%, respectively, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-30% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 0-30%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-30% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 30%-70% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 30%-70%), the audio coding mode corresponding to the linear prediction efficiency interval of 30%-70% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically be used for: if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，确定上述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第二线性预测效率区间具有映射关系的第二音频编码方式或为非基于线性预测的音频编码方式，其中，上述第二音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第二音频编码方式为基于线性预测的音频编码方式。例如假设存着3个线性预测效率区间，分别为0～40％、40％～60％和60％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～40％(即第二线性预测效率区间为线性预测效率区间0～40％)，则可确定线性预测效率区间0～40％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间40％～60％(即第二线性预测效率区间为线性预测效率区间40％～60％)，确定线性预测效率区间40％～60％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then in the aspect of determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically determine the second linear prediction efficiency interval into which the reference short-time linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, determines the second audio coding method that has a mapping relationship with the second linear prediction efficiency interval or the audio coding method that is not based on linear prediction, wherein the second audio coding method is the audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is the audio coding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, namely 0-40%, 40%-60%, and 60%-100%, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-40% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 0-40%), then the audio coding mode corresponding to the linear prediction efficiency interval of 0-40% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 40%-60% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 40%-60%), then the audio coding mode corresponding to the linear prediction efficiency interval of 40%-60% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to be specifically used for, if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than the sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically be used for: if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than the sixth threshold, then determining that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，处理器502通过总线501调用存储器503中存储的代码以具体用于，确定上述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第三线性预测效率区间具有映射关系的第三音频编码方式或为非基于线性预测的音频编码方式，其中，上述第三音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第三音频编码方式为基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～50％、50％～80％和80％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～50％(即第三线性预测效率区间为线性预测效率区间0～50％)，则可确定线性预测效率区间0～50％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间50～80％(即第三线性预测效率区间为线性预测效率区间50％～80％)，确定线性预测效率区间50％～80％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then in the aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically determine the third linear prediction efficiency interval into which the reference comprehensive linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, determines a third audio encoding method that has a mapping relationship with the third linear prediction efficiency interval or an audio encoding method that is not based on linear prediction, wherein the third audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio encoding method is an audio encoding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which can be 0-50%, 50%-80%, and 80%-100%, respectively. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-50% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 0-50%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-50% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 50%-80% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 50%-80%), the audio coding mode corresponding to the linear prediction efficiency interval of 50%-80% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

在本发明一些实施例中，基于线性预测的音频编码方式可包括代数码激励线性预测(ACELP)编码、变换激励编码(TCX)等。非基于线性预测的音频编码方式可包括一般音频编码(GAC)，GAC例如可以包括修正离散余弦变换(MDCT)编码或离散余弦变换(DCT)编码等。In some embodiments of the present invention, audio coding methods based on linear prediction may include Algebraic Code Excited Linear Prediction (ACELP) coding, Transform Excited Coding (TCX), etc. Audio coding methods not based on linear prediction may include General Audio Coding (GAC), which may include, for example, Modified Discrete Cosine Transform (MDCT) coding or Discrete Cosine Transform (DCT) coding.

举例来说，在本发明的一些实施例中，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计当前音频帧的长时线性预测效率，上述当前音频帧的长时线性预测效率为上述当前音频帧的参考长时线性预测效率。For example, in some embodiments of the present invention, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimating the long-time linear prediction efficiency of the current audio frame, the long-time linear prediction efficiency of the current audio frame being the reference long-time linear prediction efficiency of the current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N1个历史音频帧的线性预测效率；计算上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值，其中，上述N1为正整数，上述第一统计值为上述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N11个历史音频帧为上述N1个历史音频帧的子集。其中，计算得到的上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值例如可以是，上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501, specifically for estimating the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimating the long-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N1 historical audio frames of the current audio frame; calculating the linear prediction efficiencies of the N1 historical audio frames and the first statistical value of the long-time linear prediction efficiency of the current audio frame, wherein the N1 is a positive integer, and the first statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N11 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each of the historical audio frames, and the N11 historical audio frames are a subset of the N1 historical audio frames. Among them, the first statistical value of the calculated linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame can be, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N2个历史音频帧的参考线性预测效率；计算上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值，其中，上述N2为正整数，上述第二统计值为上述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N21个历史音频帧为上述N2个历史音频帧的子集。计算得到的上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值例如为，上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N2 historical audio frames of the current audio frame; calculate the reference linear prediction efficiencies of the N2 historical audio frames and a second statistical value of the long-time linear prediction efficiency of the current audio frame, wherein the N2 is a positive integer, and the second statistical value is the reference long-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N21 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N21 historical audio frames are a subset of the N2 historical audio frames. The second statistical value of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N4个历史音频帧的参考线性预测效率，获取上述当前音频帧的N3个历史音频帧的线性预测效率；计算上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值，其中，上述N3和上述N4为正整数，上述第三统计值为上述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述N31个历史音频帧为上述N3个历史音频帧的子集，上述N41个历史音频帧为上述N4个历史音频帧的子集，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到。上述N3个历史音频帧和上述N4个历史音频帧的交集可为空集或不是空集。计算得到的上述上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值例如为，上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of the N4 historical audio frames of the above-mentioned current audio frame, and obtain the linear prediction efficiency of the N3 historical audio frames of the above-mentioned current audio frame; calculate the third statistical value of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N3 and the above-mentioned N4 are positive integers, the above-mentioned third statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N31 historical audio frames is the following linear prediction efficiency of each of the above-mentioned historical audio frames At least one of the following: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each historical audio frame in the N41 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency, wherein the N31 historical audio frames are a subset of the N3 historical audio frames, the N41 historical audio frames are a subset of the N4 historical audio frames, the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, and the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame. The intersection of the N3 historical audio frames and the N4 historical audio frames may be an empty set or not an empty set. The third statistical value calculated for the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame.

举例来说，在本发明的一些实施例中，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计当前音频帧的短时线性预测效率，其中上述当前音频帧的短时线性预测效率为上述当前音频帧的参考短时线性预测效率。For example, in some embodiments of the present invention, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimating the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N5个历史音频帧的线性预测效率；计算上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值，其中，上述N5为正整数，上述第四统计值为上述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N51个历史音频帧为上述N5个历史音频帧的子集。其中，计算得到的上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值可为，上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501, specifically for estimating the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the linear prediction efficiencies of N5 historical audio frames of the current audio frame; calculating a fourth statistical value of the linear prediction efficiencies of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein the N5 is a positive integer, and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency, and the comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the long-time linear prediction efficiency and the short-time linear prediction efficiency of each of the historical audio frames, and the N51 historical audio frames are a subset of the N5 historical audio frames. Among them, the fourth statistical value of the calculated linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N6个历史音频帧的参考线性预测效率；计算上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值，上述N6为正整数，上述第五统计值为上述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N61个历史音频帧为上述N6个历史音频帧的子集。其中，计算得到的上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值可为，上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501, specifically for estimating the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimating the short-time linear prediction efficiency of the current audio frame; obtaining the reference linear prediction efficiencies of N6 historical audio frames of the current audio frame; calculating a fifth statistical value of the reference linear prediction efficiencies of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, where N6 is a positive integer and the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N61 historical audio frames is at least one of the following linear prediction efficiencies of each of the historical audio frames: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency, and a reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each of the historical audio frames is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each of the historical audio frames, and the N61 historical audio frames are a subset of the N6 historical audio frames. Among them, the fifth statistical value of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted average of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，处理器502通过总线501调用存储器503中存储的代码以具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N8个历史音频帧的参考线性预测效率；获取上述当前音频帧的N7个历史音频帧的线性预测效率；计算上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值，上述N7和上述N8为正整数，上述第六统计值为上述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N71个历史音频帧为上述N7个历史音频帧的子集，上述N81个历史音频帧为上述N8个历史音频帧的子集。上述N7个历史音频帧和上述N8个历史音频帧的交集可为空集或不是空集。其中，计算得到的上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值可为，上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, the processor 502 calls the code stored in the memory 503 through the bus 501 to specifically estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of the N8 historical audio frames of the above-mentioned current audio frame; obtain the linear prediction efficiency of the N7 historical audio frames of the above-mentioned current audio frame; calculate the sixth statistical value of the linear prediction efficiency of the above-mentioned N7 historical audio frames, the reference linear prediction efficiency of the above-mentioned N8 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned N7 and the above-mentioned N8 are positive integers, the above-mentioned sixth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N71 historical audio frames is the following linear prediction efficiency of each of the above-mentioned historical audio frames At least one of the following: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame; wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame; the N71 historical audio frames are a subset of the N7 historical audio frames; and the N81 historical audio frames are a subset of the N8 historical audio frames. The intersection of the N7 historical audio frames and the N8 historical audio frames may be an empty set or not an empty set. Among them, the sixth statistical value of the calculated linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明的一些实施例中，在上述基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率的方面，处理器502可通过总线501调用存储器503中存储的代码以具体用于：计算当前音频帧进行短时线性预测前后的能量变化率，其中，计算出的上述能量变化率为当前音频帧的短时线性预测效率，或者，当前音频帧的短时线性预测效率基于计算出的上述能量变化率变换得到，其中，上述当前音频帧进行短时线性预测后的能量为上述当前音频帧的线性预测残差的能量。例如，能量变化率与当前音频帧的短时线性预测效率之间可具有映射关系，可基于能量变化率与当前音频帧的短时线性预测效率之间的映射关系，得到与计算出的上述能量变化率具有映射关系的当前音频帧的短时线性预测效率。一般来说，当前音频帧进行短时线性预测前后的能量变化率越大，表示当前音频帧的短时线性预测效率越高。In some embodiments of the present invention, regarding the aspect of obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame, the processor 502 may invoke code stored in the memory 503 via the bus 501 to specifically calculate an energy change rate before and after the short-time linear prediction of the current audio frame, wherein the calculated energy change rate is the short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained by transforming the calculated energy change rate, wherein the energy of the current audio frame after the short-time linear prediction is the energy of the linear prediction residual of the current audio frame. For example, a mapping relationship may exist between the energy change rate and the short-time linear prediction efficiency of the current audio frame, and based on the mapping relationship between the energy change rate and the short-time linear prediction efficiency of the current audio frame, the short-time linear prediction efficiency of the current audio frame that has a mapping relationship with the calculated energy change rate may be obtained. Generally speaking, a greater energy change rate before and after the short-time linear prediction of the current audio frame indicates a higher short-time linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，在上述估计得到当前音频帧的长时线性预测效率的方面，处理器502可通过总线501调用存储器503中存储的代码以具体用于：根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，上述相关性为当前音频帧的长时线性预测效率，或者当前音频帧的长时线性预测效率基于上述变换得到。其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差(例如上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某一帧历史音频帧的线性预测残差，或者，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测残差)，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励(例如上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某一帧历史音频帧的线性预测激励，或者上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测激励)。举例来说，例如相关性与音频帧的长时线性预测效率之间具有映射关系，可基于相关性与音频帧的长时线性预测效率之间的映射关系，得到与计算出的上述相关性具有映射关系的上述当前音频帧的长时线性预测效率。In some embodiments of the present invention, in terms of estimating the long-term linear prediction efficiency of the current audio frame, the processor 502 may call the code stored in the memory 503 via the bus 501 to specifically be used for: obtaining, based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal, where the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the above-mentioned transformation. In which, the above-mentioned first historical linear prediction signal is the first historical linear prediction excitation or the first historical linear prediction residual; the above-mentioned first historical linear prediction residual is the linear prediction residual of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction residual can be the linear prediction residual of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction residual can be the linear prediction residual of a part of the continuous audio signal of two adjacent historical audio frames of the above-mentioned current audio frame, and being the current audio frame), and the above-mentioned first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a part of the continuous audio signal of two adjacent historical audio frames of the current audio frame). For example, there is a mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame. Based on the mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame, the long-term linear prediction efficiency of the current audio frame having a mapping relationship with the calculated correlation can be obtained.

举例来说，在上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性的方面，处理器502可通过总线501调用存储器503中存储的代码以具体用于：计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性。For example, in the aspect of obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, the processor 502 can call the code stored in the memory 503 through the bus 501 to specifically calculate the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

可以理解的是，本实施例的音频编码器500的各功能模块的功能可根据上述方法实施例中的方法具体实现，其具体实现过程可以参照上述方法实施例的相关描述，此处不再赘述。其中，音频编码器500可为任何需要采集、存储或者可向外传输音频信号的装置，例如可为手机、平板电脑、个人电脑、笔记本电脑等等。It is understood that the functions of the various functional modules of the audio encoder 500 of this embodiment can be specifically implemented according to the methods in the above-mentioned method embodiments. The specific implementation process can be referred to the relevant description of the above-mentioned method embodiments and will not be repeated here. The audio encoder 500 can be any device that needs to collect, store, or transmit audio signals, such as a mobile phone, tablet computer, personal computer, laptop computer, etc.

可以看出，本实施例的技术方案中，音频编码器500先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。It can be seen that in the technical solution of this embodiment, the audio encoder 500 first estimates the reference linear prediction efficiency of the current audio frame; determines the audio coding method that matches it through the estimated reference linear prediction efficiency of the above-mentioned current audio frame, and performs audio encoding on the above-mentioned current audio frame according to the determined audio coding method that matches it. Since the above-mentioned scheme does not need to perform the operation of fully encoding the current audio frame using each audio coding method required by the existing closed-loop selection mode in the process of determining the audio coding method, but instead determines the audio coding method to be selected through the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is usually much smaller than the computational complexity of fully encoding the current audio frame using each audio coding method, compared with the existing mechanism, the above-mentioned scheme of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the overhead of audio coding.

参见图6，图6是本发明的另一个实施例提供的音频编码器600的结构框图。其中，音频编码器600可以包括：至少1个处理器601，至少1个网络接口604或其他用户接口603，存储器605，至少1个通信总线602。通信总线602用于实现这些组件之间的连接通信。其中，该音频编码器600可选的包含用户接口603，包括显示器(例如，触摸屏、LCD、CRT、全息成像(Holographic)或者投影(Projector)等)、点击设备(例如鼠标、轨迹球(trackball)触感板或触摸屏等)、摄像头和/或拾音装置等。Referring to FIG6 , FIG6 is a block diagram of an audio encoder 600 according to another embodiment of the present invention. The audio encoder 600 may include: at least one processor 601, at least one network interface 604 or other user interface 603, a memory 605, and at least one communication bus 602. The communication bus 602 is used to implement communication between these components. The audio encoder 600 may optionally include a user interface 603, including a display (e.g., a touch screen, LCD, CRT, holographic imaging, or projector), a pointing device (e.g., a mouse, trackball, touch pad, or touch screen), a camera, and/or a sound pickup device.

其中，存储器602可以包括只读存储器和随机存取存储器，并向处理器601提供指令和数据。存储器602中的一部分还可以包括非易失性随机存取存储器(NVRAM)。The memory 602 may include a read-only memory and a random access memory, and provides instructions and data to the processor 601. A portion of the memory 602 may also include a non-volatile random access memory (NVRAM).

在一些实施方式中，存储器605存储了如下的元素，可执行模块或者数据结构，或者他们的子集，或者他们的扩展集：In some embodiments, the memory 605 stores the following elements, executable modules, or data structures, or a subset or extended set thereof:

操作系统6051，包含各种系统程序，用于实现各种基础业务以及处理基于硬件的任务。The operating system 6051 includes various system programs for implementing various basic services and processing hardware-based tasks.

应用程序模块6052，包含各种应用程序，用于实现各种应用业务。The application module 6052 includes various application programs for implementing various application services.

应用程序模块6052中包括但不限于集合确估计单元410、确定单元420和编码单元430等。The application module 6052 includes but is not limited to a set estimation unit 410, a determination unit 420, and an encoding unit 430.

在本发明实施例中，通过调用存储器605存储的程序或指令，处理器601用于估计当前音频帧的参考线性预测效率；确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式；按照与上述当前音频帧的参考线性预测效率匹配的音频编码方式，对上述当前音频帧进行音频编码。In an embodiment of the present invention, by calling a program or instruction stored in the memory 605, the processor 601 is used to estimate a reference linear prediction efficiency of the current audio frame; determine an audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame; and perform audio encoding on the above-mentioned current audio frame according to the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame.

在本发明一些实施例中，在估计当前音频帧的参考线性预测效率之前，通过调用存储器605存储的程序或指令，处理器601还可用于，先判断当前音频帧是否为语音音频帧。例如上述估计当前音频帧的参考线性预测效率可包括：当当前音频帧为非语音音频帧，估计上述当前音频帧的参考线性预测效率。此外，也可在上述估计当前音频帧的参考线性预测效率之前不区分当前音频帧是否为语音音频帧。In some embodiments of the present invention, before estimating the reference linear prediction efficiency of the current audio frame, the processor 601 may further be configured to determine whether the current audio frame is a speech audio frame by invoking a program or instruction stored in the memory 605. For example, estimating the reference linear prediction efficiency of the current audio frame may include estimating the reference linear prediction efficiency of the current audio frame when the current audio frame is a non-speech audio frame. Furthermore, estimating the reference linear prediction efficiency of the current audio frame may not distinguish whether the current audio frame is a speech audio frame before estimating the reference linear prediction efficiency of the current audio frame.

可以理解的是，上述当前音频帧的参考线性预测效率所包括的线性预测效率的种类不同，处理器601确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的具体方式也就可能不同。下面举例一些可能的实施例方式。It is understood that the reference linear prediction efficiency of the current audio frame may include different types of linear prediction efficiencies, and the specific manner in which the processor 601 determines the audio coding mode that matches the reference linear prediction efficiency of the current audio frame may also vary. Some possible embodiments are described below.

举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to, if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to a second threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率和上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于若上述当前音频帧的参考长时线性预测效率小于第一阈值，和/或上述当前音频帧的参考短时线性预测效率小于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率大于或等于第一阈值，和/或上述当前音频帧的参考短时线性预测效率大于或等于第二阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame and the reference short-time linear prediction efficiency of the current audio frame, then in the aspect of determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to determine that the audio coding method that matches the reference linear prediction efficiency of the current audio frame is an audio coding method not based on linear prediction if the reference long-time linear prediction efficiency of the current audio frame is less than a first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is less than a second threshold; if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to the first threshold, and/or the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the second threshold, then determine that the audio coding method that matches the reference linear prediction efficiency of the current audio frame is an audio coding method based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference long-time linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to, if the reference long-time linear prediction efficiency of the current audio frame is greater than or equal to a third threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference long-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is less than a fourth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考长时线性预测效率大于或等于第三阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考长时线性预测效率小于第四阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference long-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a third threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference long-time linear prediction efficiency of the above-mentioned current audio frame is less than a fourth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考长时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，确定上述当前音频帧的参考长时线性预测效率所落入的第一线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第一线性预测效率区间具有映射关系的第一音频编码方式，其中，上述第一音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第一音频编码方式为基于线性预测的音频编码方式或为非基于线性预测的音频编码方式。其中，不同的线性预测效率区间对应于不同的音频编码方式。例如假设存着3个线性预测效率区间，分别可为0～30％、30％～70％和70％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～30％(即第一线性预测效率区间为线性预测效率区间0～30％)，可确定线性预测效率区间0～30％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间30％～70％(即第一线性预测效率区间为线性预测效率区间30％～70％)，可以确定线性预测效率区间30％～70％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes a reference long-term linear prediction efficiency of the current audio frame, then in determining the audio coding mode that matches the reference linear prediction efficiency of the current audio frame, by invoking a program or instruction stored in the memory 605, the processor 601 may be specifically configured to determine a first linear prediction efficiency interval within which the reference long-term linear prediction efficiency of the current audio frame falls, and determine a first audio coding mode that has a mapping relationship with the first linear prediction efficiency interval based on a mapping relationship between linear prediction efficiency intervals and audio coding modes based on linear prediction, wherein the first audio coding mode is an audio coding mode that matches the reference linear prediction efficiency of the current audio frame, and the first audio coding mode is an audio coding mode based on linear prediction or an audio coding mode not based on linear prediction. Different linear prediction efficiency intervals correspond to different audio coding modes. For example, assuming there are three linear prediction efficiency intervals, which can be 0-30%, 30%-70%, and 70%-100%, respectively, if the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-30% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 0-30%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-30% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 30%-70% (i.e., the first linear prediction efficiency interval is the linear prediction efficiency interval of 30%-70%), the audio coding mode corresponding to the linear prediction efficiency interval of 30%-70% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to the fifth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to, if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考短时线性预测效率大于或等于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考短时线性预测效率小于第五阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference short-time linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a fifth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference short-time linear prediction efficiency of the above-mentioned current audio frame is less than the fifth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考短时线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，确定上述当前音频帧的参考短时线性预测效率所落入的第二线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第二线性预测效率区间具有映射关系的第二音频编码方式或为非基于线性预测的音频编码方式，其中，上述第二音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，第二音频编码方式为基于线性预测的音频编码方式。例如假设存着3个线性预测效率区间，分别可为0～40％、40％～60％和60％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～40％(即第二线性预测效率区间为线性预测效率区间0～40％)，则可确定线性预测效率区间0～40％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间40％～60％(即第二线性预测效率区间为线性预测效率区间40％～60％)，确定线性预测效率区间40％～60％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then in terms of determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to determine the second linear prediction efficiency interval in which the reference short-time linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, determine the second audio coding method that has a mapping relationship with the second linear prediction efficiency interval or the audio coding method that is not based on linear prediction, wherein the second audio coding method is the audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is the audio coding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which can be 0-40%, 40%-60%, and 60%-100%, respectively. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-40% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 0-40%), then the audio coding mode corresponding to the linear prediction efficiency interval of 0-40% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 40%-60% (i.e., the second linear prediction efficiency interval is the linear prediction efficiency interval of 40%-60%), then the audio coding mode corresponding to the linear prediction efficiency interval of 40%-60% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then in the above-mentioned determination of the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to, if the reference comprehensive linear prediction efficiency of the current audio frame is greater than or equal to a sixth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the current audio frame is an audio encoding method based on linear prediction.

又举例来说，在本发明的另一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to, if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than a sixth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method that is not based on linear prediction.

又举例来说，在本发明的又一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，若上述当前音频帧的参考综合线性预测效率大于或等于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为基于线性预测的音频编码方式；若上述当前音频帧的参考综合线性预测效率小于第六阈值，则确定出与上述当前音频帧的参考线性预测效率匹配的音频编码方式为非基于线性预测的音频编码方式。For another example, in some other embodiments of the present invention, if the reference linear prediction efficiency of the above-mentioned current audio frame includes the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame, then in the above-mentioned aspect of determining the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to, if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is greater than or equal to a sixth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method based on linear prediction; if the reference comprehensive linear prediction efficiency of the above-mentioned current audio frame is less than the sixth threshold, determine that the audio encoding method that matches the reference linear prediction efficiency of the above-mentioned current audio frame is an audio encoding method not based on linear prediction.

又举例来说，在本发明的一些实施例中，若上述当前音频帧的参考线性预测效率包括上述当前音频帧的参考综合线性预测效率，则在上述确定与上述当前音频帧的参考线性预测效率匹配的音频编码方式的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，确定上述当前音频帧的参考综合线性预测效率所落入的第三线性预测效率区间，根据线性预测效率区间和基于线性预测的音频编码方式之间的映射关系，确定出与上述第三线性预测效率区间具有映射关系的第三音频编码方式或为非基于线性预测的音频编码方式，其中，上述第三音频编码方式为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，上述第三音频编码方式为基于线性预测的音频编码方式。例如，假设存着3个线性预测效率区间，分别可为0～50％、50％～80％和80％～100％，若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间0～50％(即第三线性预测效率区间为线性预测效率区间0～50％)，则可确定线性预测效率区间0～50％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式。若上述当前音频帧的参考长时线性预测效率落入线性预测效率区间50～80％(即第三线性预测效率区间为线性预测效率区间50％～80％)，确定线性预测效率区间50％～80％对应的音频编码方式，为与上述当前音频帧的参考线性预测效率匹配的音频编码方式，其它场景以此类推。可以根据不同应用场景的需要，来设定线性预测效率区间和基于线性预测的音频编码方式之间的映射关系。For another example, in some embodiments of the present invention, if the reference linear prediction efficiency of the current audio frame includes the reference comprehensive linear prediction efficiency of the current audio frame, then in terms of determining the audio encoding method that matches the reference linear prediction efficiency of the current audio frame, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to determine the third linear prediction efficiency interval into which the reference comprehensive linear prediction efficiency of the current audio frame falls, and according to the mapping relationship between the linear prediction efficiency interval and the audio encoding method based on linear prediction, determine a third audio encoding method that has a mapping relationship with the third linear prediction efficiency interval or an audio encoding method that is not based on linear prediction, wherein the third audio encoding method is an audio encoding method that matches the reference linear prediction efficiency of the current audio frame, and the third audio encoding method is an audio encoding method based on linear prediction. For example, assuming there are three linear prediction efficiency intervals, which can be 0-50%, 50%-80%, and 80%-100%, respectively. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 0-50% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 0-50%), the audio coding mode corresponding to the linear prediction efficiency interval of 0-50% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. If the reference long-term linear prediction efficiency of the current audio frame falls within the linear prediction efficiency interval of 50%-80% (i.e., the third linear prediction efficiency interval is the linear prediction efficiency interval of 50%-80%), the audio coding mode corresponding to the linear prediction efficiency interval of 50%-80% can be determined to be the audio coding mode that matches the reference linear prediction efficiency of the current audio frame. The same applies to other scenarios. The mapping relationship between linear prediction efficiency intervals and linear prediction-based audio coding modes can be set according to the needs of different application scenarios.

举例来说，在本发明的一些实施例中，通过调用存储器605存储的程序或指令，处理器601可具体用于，通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计当前音频帧的长时线性预测效率，上述当前音频帧的长时线性预测效率为上述当前音频帧的参考长时线性预测效率。For example, in some embodiments of the present invention, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimating the long-time linear prediction efficiency of the current audio frame, the long-time linear prediction efficiency of the current audio frame being the reference long-time linear prediction efficiency of the current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N1个历史音频帧的线性预测效率；计算上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值，其中，上述N1为正整数，上述第一统计值为上述当前音频帧的参考长时线性预测效率，其中，N11个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N11个历史音频帧为上述N1个历史音频帧的子集。其中，计算得到的上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的第一统计值例如可以是，上述N1个历史音频帧的线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimate the long-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N1 historical audio frames of the above-mentioned current audio frame; calculate the linear prediction efficiencies of the above-mentioned N1 historical audio frames and the first statistical value of the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N1 is a positive integer, and the above-mentioned first statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the linear prediction efficiency of each historical audio frame in the N11 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency; the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, and the above-mentioned N11 historical audio frames are a subset of the above-mentioned N1 historical audio frames. Among them, the first statistical value of the calculated linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame can be, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above N1 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N2个历史音频帧的参考线性预测效率；计算上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值，其中，上述N2为正整数，上述第二统计值为上述当前音频帧的参考长时线性预测效率，其中，N21个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N21个历史音频帧为上述N2个历史音频帧的子集。计算得到的上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第二统计值例如为，上述N2个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N2 historical audio frames of the above-mentioned current audio frame; calculate the reference linear prediction efficiencies of the above-mentioned N2 historical audio frames and the second statistical value of the long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N2 is a positive integer, and the above-mentioned second statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N21 historical audio frames is at least one of the following linear prediction efficiencies of the above-mentioned each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of the above-mentioned each historical audio frame is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of the above-mentioned each historical audio frame, and the above-mentioned N21 historical audio frames are a subset of the above-mentioned N2 historical audio frames. The second statistical value of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the reference linear prediction efficiency of the above N2 historical audio frames and the long-time linear prediction efficiency of the above current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考长时线性预测效率：估计得到当前音频帧的长时线性预测效率；获取上述当前音频帧的N4个历史音频帧的参考线性预测效率，获取上述当前音频帧的N3个历史音频帧的线性预测效率；计算上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值，上述N3和上述N4为正整数，上述第三统计值为上述当前音频帧的参考长时线性预测效率，N31个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率；其中，N41个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述N31个历史音频帧为上述N3个历史音频帧的子集，上述N41个历史音频帧为上述N4个历史音频帧的子集，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到。上述N3个历史音频帧和上述N4个历史音频帧的交集可为空集或不是空集。计算得到的上述上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的第三统计值例如为，上述N3个历史音频帧的线性预测效率、上述N4个历史音频帧的参考线性预测效率和上述当前音频帧的长时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to estimate the reference long-time linear prediction efficiency of the current audio frame in the following manner: estimate the long-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of the N4 historical audio frames of the above-mentioned current audio frame, and obtain the linear prediction efficiency of the N3 historical audio frames of the above-mentioned current audio frame; calculate the third statistical value of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned N3 and the above-mentioned N4 are positive integers, the above-mentioned third statistical value is the reference long-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N31 historical audio frames is one of the following linear prediction efficiencies of the above-mentioned each historical audio frame At least one of: long-time linear prediction efficiency, short-time linear prediction efficiency, and comprehensive linear prediction efficiency; wherein, the reference linear prediction efficiency of each historical audio frame in the N41 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency, wherein the N31 historical audio frames are a subset of the N3 historical audio frames, the N41 historical audio frames are a subset of the N4 historical audio frames, the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, and the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame. The intersection of the N3 historical audio frames and the N4 historical audio frames may be an empty set or not. The third statistical value calculated for the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame is, for example, the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted mean of the linear prediction efficiency of the above-mentioned N3 historical audio frames, the reference linear prediction efficiency of the above-mentioned N4 historical audio frames and the long-time linear prediction efficiency of the above-mentioned current audio frame.

举例来说，在本发明的一些实施例中，通过调用存储器605存储的程序或指令，处理器601可具体用于，通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计当前音频帧的短时线性预测效率，其中上述当前音频帧的短时线性预测效率为上述当前音频帧的参考短时线性预测效率。For example, in some embodiments of the present invention, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimating the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N5个历史音频帧的线性预测效率；计算上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值，其中，上述N5为正整数，上述第四统计值为上述当前音频帧的参考短时线性预测效率，其中，N51个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，上述N51个历史音频帧为上述N5个历史音频帧的子集。其中，计算得到的上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的第四统计值可为，上述N5个历史音频帧的线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiencies of N5 historical audio frames of the above-mentioned current audio frame; calculate the fourth statistical value of the linear prediction efficiencies of the above-mentioned N5 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N5 is a positive integer, and the above-mentioned fourth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the linear prediction efficiency of each historical audio frame in the N51 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, and the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, and the above-mentioned N51 historical audio frames are a subset of the above-mentioned N5 historical audio frames. Among them, the fourth statistical value of the calculated linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N5 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N6个历史音频帧的参考线性预测效率；计算上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值，上述N6为正整数，上述第五统计值为上述当前音频帧的参考短时线性预测效率，其中，N61个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N61个历史音频帧为上述N6个历史音频帧的子集。其中，计算得到的上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第五统计值可为，上述N6个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiencies of N6 historical audio frames of the above-mentioned current audio frame; calculate the fifth statistical value of the reference linear prediction efficiencies of the above-mentioned N6 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the above-mentioned N6 is a positive integer, and the above-mentioned fifth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, wherein the reference linear prediction efficiency of each historical audio frame in the N61 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: a reference long-time linear prediction efficiency, a reference short-time linear prediction efficiency and a reference comprehensive linear prediction efficiency, wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and the reference short-time linear prediction efficiency of each historical audio frame, and the above-mentioned N61 historical audio frames are a subset of the above-mentioned N6 historical audio frames. Among them, the fifth statistical value of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding mean or weighted average of the reference linear prediction efficiency of the above N6 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

或者，通过调用存储器605存储的程序或指令，处理器601可具体用于通过如下方式估计得到当前音频帧的参考短时线性预测效率：估计得到当前音频帧的短时线性预测效率；获取上述当前音频帧的N8个历史音频帧的参考线性预测效率；获取上述当前音频帧的N7个历史音频帧的线性预测效率；计算上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值，上述N7和上述N8为正整数，上述第六统计值为上述当前音频帧的参考短时线性预测效率，N71个历史音频帧中的每个历史音频帧的线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：长时线性预测效率、短时间线性预测效率和综合线性预测效率，N81个历史音频帧中的每个历史音频帧的参考线性预测效率为上述每个历史音频帧的如下线性预测效率中的至少一种：参考长时线性预测效率、参考短时间线性预测效率和参考综合线性预测效率，上述每个历史音频帧的综合线性预测效率基于上述每个历史音频帧的长时线性预测效率和短时线性预测效率得到，其中，上述每个历史音频帧的参考综合线性预测效率基于上述每个历史音频帧的参考长时线性预测效率和参考短时线性预测效率得到，上述N71个历史音频帧为上述N7个历史音频帧的子集，上述N81个历史音频帧为上述N8个历史音频帧的子集。上述N7个历史音频帧和上述N8个历史音频帧的交集可为空集或不是空集。其中，计算得到的上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的第六统计值可为，上述N7个历史音频帧的线性预测效率、上述N8个历史音频帧的参考线性预测效率和上述当前音频帧的短时线性预测效率的和值、加权和值、几何平均值、算术平均值、滑动平均值或加权平均值。Alternatively, by calling the program or instruction stored in the memory 605, the processor 601 can be specifically used to estimate the reference short-time linear prediction efficiency of the current audio frame in the following manner: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of the N8 historical audio frames of the above-mentioned current audio frame; obtain the linear prediction efficiency of the N7 historical audio frames of the above-mentioned current audio frame; calculate the sixth statistical value of the linear prediction efficiency of the above-mentioned N7 historical audio frames, the reference linear prediction efficiency of the above-mentioned N8 historical audio frames and the short-time linear prediction efficiency of the above-mentioned current audio frame, the above-mentioned N7 and the above-mentioned N8 are positive integers, the above-mentioned sixth statistical value is the reference short-time linear prediction efficiency of the above-mentioned current audio frame, and the linear prediction efficiency of each historical audio frame in the N71 historical audio frames is the following linear prediction efficiency of each of the above-mentioned historical audio frames At least one of: long-time linear prediction efficiency, short-time linear prediction efficiency and comprehensive linear prediction efficiency, the reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiencies of each historical audio frame: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency and reference comprehensive linear prediction efficiency, the comprehensive linear prediction efficiency of each historical audio frame is obtained based on the long-time linear prediction efficiency and short-time linear prediction efficiency of each historical audio frame, wherein the reference comprehensive linear prediction efficiency of each historical audio frame is obtained based on the reference long-time linear prediction efficiency and reference short-time linear prediction efficiency of each historical audio frame, the N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames. The intersection of the N7 historical audio frames and the N8 historical audio frames may be an empty set or not an empty set. Among them, the sixth statistical value of the calculated linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame may be the sum, weighted sum, geometric mean, arithmetic mean, sliding average or weighted average of the linear prediction efficiency of the above N7 historical audio frames, the reference linear prediction efficiency of the above N8 historical audio frames and the short-time linear prediction efficiency of the above current audio frame.

在本发明一些实施例中，在基于当前音频帧的线性预测残差得到当前音频帧的短时线性预测效率的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，计算当前音频帧进行短时线性预测前后的能量变化率，其中，计算出的上述能量变化率为当前音频帧的短时线性预测效率，或者，当前音频帧的短时线性预测效率基于计算出的上述能量变化率变换得到，其中，上述当前音频帧进行短时线性预测后的能量为上述当前音频帧的线性预测残差的能量。例如，能量变化率与当前音频帧的短时线性预测效率之间可具有映射关系，可基于能量变化率与当前音频帧的短时线性预测效率之间的映射关系，得到与计算出的上述能量变化率具有映射关系的当前音频帧的短时线性预测效率。一般来说，当前音频帧进行短时线性预测前后的能量变化率越大，表示当前音频帧的短时线性预测效率越高。In some embodiments of the present invention, in terms of obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame, the processor 601 may be specifically configured to, by invoking a program or instruction stored in the memory 605, calculate an energy change rate before and after performing short-time linear prediction on the current audio frame, wherein the calculated energy change rate is the short-time linear prediction efficiency of the current audio frame, or obtain the short-time linear prediction efficiency of the current audio frame by transforming the calculated energy change rate, wherein the energy of the current audio frame after performing short-time linear prediction is the energy of the linear prediction residual of the current audio frame. For example, a mapping relationship may be established between the energy change rate and the short-time linear prediction efficiency of the current audio frame. Based on the mapping relationship between the energy change rate and the short-time linear prediction efficiency of the current audio frame, the short-time linear prediction efficiency of the current audio frame that has a mapping relationship with the calculated energy change rate may be obtained. Generally speaking, a greater energy change rate before and after performing short-time linear prediction on the current audio frame indicates a higher short-time linear prediction efficiency of the current audio frame.

在本发明的一些实施例中，在上述估计得到当前音频帧的长时线性预测效率的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性，上述相关性为当前音频帧的长时线性预测效率，或者当前音频帧的长时线性预测效率基于上述变换得到。其中，上述第一历史线性预测信号为第一历史线性预测激励或第一历史线性预测残差；上述第一历史线性预测残差为上述当前音频帧的历史音频帧的线性预测残差(例如，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某一帧历史音频帧的线性预测残差，或者，上述第一历史线性预测残差可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测残差)，上述第一历史线性预测激励为上述当前音频帧的历史音频帧的线性预测激励(例如，上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，并且为上述当前音频帧的某一帧历史音频帧的线性预测激励，或者上述第一历史线性预测激励可以为时长与上述当前音频帧相同或相近，且为当前音频帧的某相邻两帧历史音频帧的部分连续音频信号的线性预测激励)。举例来说，例如相关性与音频帧的长时线性预测效率之间具有映射关系，可基于相关性与音频帧的长时线性预测效率之间的映射关系，得到与计算出的上述相关性具有映射关系的上述当前音频帧的长时线性预测效率。In some embodiments of the present invention, in terms of estimating the long-term linear prediction efficiency of the current audio frame, the processor 601 may be specifically configured to, by calling a program or instruction stored in the memory 605, obtain, based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, a correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal, where the correlation is the long-term linear prediction efficiency of the current audio frame, or the long-term linear prediction efficiency of the current audio frame is obtained based on the above-mentioned transformation. In which, the above-mentioned first historical linear prediction signal is the first historical linear prediction excitation or the first historical linear prediction residual; the above-mentioned first historical linear prediction residual is the linear prediction residual of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction residual can be the linear prediction residual of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction residual can be the linear prediction residual of a part of the continuous audio signal of two adjacent historical audio frames of the above-mentioned current audio frame, and being the current audio frame), and the above-mentioned first historical linear prediction excitation is the linear prediction excitation of the historical audio frame of the above-mentioned current audio frame (for example, the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a historical audio frame having the same or similar duration as the above-mentioned current audio frame and being the current audio frame, or the above-mentioned first historical linear prediction excitation can be the linear prediction excitation of a part of the continuous audio signal of two adjacent historical audio frames of the current audio frame). For example, there is a mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame. Based on the mapping relationship between the correlation and the long-term linear prediction efficiency of the audio frame, the long-term linear prediction efficiency of the current audio frame having a mapping relationship with the calculated correlation can be obtained.

举例来说，在上述根据当前音频帧的线性预测残差与第一历史线性预测信号，得到上述当前音频帧的线性预测残差与上述第一历史线性预测信号之间的相关性的方面，通过调用存储器605存储的程序或指令，处理器601可具体用于，计算当前音频帧的线性预测残差与第一历史线性预测信号之间的相关性。For example, in the aspect of obtaining the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal based on the linear prediction residual of the current audio frame and the first historical linear prediction signal, by calling the program or instructions stored in the memory 605, the processor 601 can be specifically used to calculate the correlation between the linear prediction residual of the current audio frame and the first historical linear prediction signal.

可以理解的是，本实施例的音频编码器600的各功能模块的功能可根据上述方法实施例中的方法具体实现，其具体实现过程可以参照上述方法实施例的相关描述，此处不再赘述。其中，音频编码器600可为任何需要采集、存储或者可向外传输音频信号的装置，例如可为手机、平板电脑、个人电脑、笔记本电脑等等。It is understood that the functions of the various functional modules of the audio encoder 600 of this embodiment can be specifically implemented according to the methods in the above-mentioned method embodiments. The specific implementation process can be referred to the relevant description of the above-mentioned method embodiments and will not be repeated here. The audio encoder 600 can be any device that needs to collect, store, or transmit audio signals, such as a mobile phone, tablet computer, personal computer, laptop computer, etc.

可以看出，本实施例的技术方案中，音频编码器600先估计当前音频帧的参考线性预测效率；通过估计出的上述当前音频帧的参考线性预测效率来确定与之匹配的音频编码方式，并按照确定出的与之匹配音频编码方式对上述当前音频帧进行音频编码，由于上述方案在确定音频编码方式的过程中，无需执行现有闭环选择模式所需要执行的利用每种音频编码方式分别将当前音频帧进行完整编码的操作，而是通过当前音频帧的参考线性预测效率来确定需选择的音频编码方式，而估计当前音频帧的参考线性预测效率的计算复杂度，通常是远远小于利用每种音频编码方式分别将当前音频帧进行完整编码的计算复杂度的，因此相对于现有机制而言，本发明实施例的上述方案有利于降低音频编码运算复杂度，进而降低音频编码的开销。It can be seen that in the technical solution of this embodiment, the audio encoder 600 first estimates the reference linear prediction efficiency of the current audio frame; determines the audio coding method that matches it through the estimated reference linear prediction efficiency of the above-mentioned current audio frame, and performs audio encoding on the above-mentioned current audio frame according to the determined audio coding method that matches it. Since the above-mentioned scheme does not need to perform the operation of fully encoding the current audio frame using each audio coding method required by the existing closed-loop selection mode in the process of determining the audio coding method, but instead determines the audio coding method to be selected through the reference linear prediction efficiency of the current audio frame, and the computational complexity of estimating the reference linear prediction efficiency of the current audio frame is usually much smaller than the computational complexity of fully encoding the current audio frame using each audio coding method, compared with the existing mechanism, the above-mentioned scheme of the embodiment of the present invention is conducive to reducing the computational complexity of audio coding, thereby reducing the overhead of audio coding.

本发明实施例还提供一种计算机存储介质，其中，该计算机存储介质可存储有程序，该程序执行时包括上述方法实施例中记载的任意一种音频编码方法的部分或全部步骤。An embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium may store a program, and when the program is executed, the program includes part or all of the steps of any one of the audio encoding methods recorded in the above method embodiments.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本发明所必须的。It should be noted that for the aforementioned method embodiments, for simplicity of description, they are all expressed as a series of action combinations. However, those skilled in the art should be aware that the present invention is not limited by the order of the actions described, because according to the present invention, certain steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in this specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the above-mentioned units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of devices or units can be electrical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of these units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into a single processing unit, each unit may exist physically separately, or two or more units may be integrated into a single unit. The aforementioned integrated units may be implemented in the form of hardware or software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for enabling a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk, etc. Various media that can store program codes.

以上所述，以上实施例仅仅是用以说明本发明的技术方案，而不是对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。As described above, the above embodiments are merely used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that the technical solutions described in the above embodiments may still be modified, or some of the technical features thereof may be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An audio encoding method, characterized in that it comprises:

When the current audio frame is a non-speech audio frame, estimate the reference linear prediction efficiency of the current audio frame;

Determine an audio coding scheme that matches the reference linear prediction efficiency of the current audio frame;

The current audio frame is encoded using an audio encoding method that matches the reference linear prediction efficiency of the current audio frame.

The reference linear prediction efficiency includes at least one of the following linear prediction efficiencies: reference long-term linear prediction efficiency, reference short-term linear prediction efficiency, and reference comprehensive linear prediction efficiency.

2. The method according to claim 1, characterized in that,

If the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then determining the audio coding scheme that matches the reference linear prediction efficiency of the current audio frame includes:

If the reference short-time linear prediction efficiency of the current audio frame is greater than or equal to the fifth threshold, then the audio coding method that matches the reference linear prediction efficiency of the current audio frame is determined to be the audio coding method based on linear prediction.

And/or,

If the reference short-time linear prediction efficiency of the current audio frame is less than the fifth threshold, then the audio coding method that matches the reference linear prediction efficiency of the current audio frame is determined to be a non-linear prediction-based audio coding method.

3. The method according to claim 1, characterized in that,

If the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, then determining the audio coding method that matches the reference linear prediction efficiency of the current audio frame includes: determining the second linear prediction efficiency interval into which the reference short-time linear prediction efficiency of the current audio frame falls; and determining a second audio coding method that has a mapping relationship with the second linear prediction efficiency interval based on the mapping relationship between the linear prediction efficiency interval and audio coding methods based on linear prediction, wherein the second audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

4. The method according to any one of claims 1 to 3, characterized in that,

The reference short-time linear prediction efficiency of the current audio frame is estimated as follows: the short-time linear prediction efficiency of the current audio frame is estimated, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame.

or,

The reference short-time linear prediction efficiency of the current audio frame is estimated as follows: the short-time linear prediction efficiency of the current audio frame is estimated; the linear prediction efficiency of N5 historical audio frames of the current audio frame is obtained; a fourth statistical value of the linear prediction efficiency of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame is calculated, wherein N5 is a positive integer, and the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and combined linear prediction efficiency, and the N51 historical audio frames are a subset of the N5 historical audio frames;

or,

The reference short-time linear prediction efficiency of the current audio frame is estimated as follows: the short-time linear prediction efficiency of the current audio frame is estimated; the reference linear prediction efficiency of N6 historical audio frames of the current audio frame is obtained; a fifth statistical value of the reference linear prediction efficiency of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame is calculated, wherein N6 is a positive integer, the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, and the reference linear prediction efficiency of each of the N61 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency, wherein the N61 historical audio frames are a subset of the N6 historical audio frames;

or,

The reference short-time linear prediction efficiency of the current audio frame is estimated as follows: The short-time linear prediction efficiency of the current audio frame is estimated; the reference linear prediction efficiency of N8 historical audio frames of the current audio frame is obtained; the linear prediction efficiency of N7 historical audio frames of the current audio frame is obtained; a sixth statistical value is calculated of the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames, and the short-time linear prediction efficiency of the current audio frame, where N7 and N8 are positive integers, and the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame. The linear prediction efficiency of each historical audio frame in the N71 historical audio frames is at least one of the following linear prediction efficiencies: long-term linear prediction efficiency, short-term linear prediction efficiency, and combined linear prediction efficiency. The reference linear prediction efficiency of each historical audio frame in the N81 historical audio frames is at least one of the following linear prediction efficiencies: reference long-term linear prediction efficiency, reference short-term linear prediction efficiency, and reference combined linear prediction efficiency. The N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames.

5. The method according to claim 4, wherein estimating the short-time linear prediction efficiency of the current audio frame includes: obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame.

6. The method according to claim 5, characterized in that, obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame includes:

Calculate the energy change rate before and after performing short-time linear prediction on the current audio frame, wherein the energy change rate is the short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained by transforming the energy change rate, wherein the energy of the current audio frame after performing short-time linear prediction is the energy of the linear prediction residual of the current audio frame.

7. The method according to claim 6, wherein the energy change rate of the current audio frame before and after short-time linear prediction is the ratio of the energy of the current audio frame before short-time linear prediction to the energy of the linear prediction residual of the current audio frame.

8. An audio encoder, characterized in that it comprises:

An estimation unit is used to estimate the reference linear prediction efficiency of the current audio frame when the current audio frame is a non-speech audio frame.

A determining unit is used to determine an audio coding scheme that matches the reference linear prediction efficiency of the current audio frame estimated by the estimation unit;

The encoding unit is used to encode the current audio frame according to the audio encoding method determined by the determining unit that matches the reference linear prediction efficiency of the current audio frame;

9. The audio encoder according to claim 8, wherein if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, the determining unit is specifically used for:

And/or,

10. The audio encoder according to claim 8, wherein if the reference linear prediction efficiency of the current audio frame includes the reference short-time linear prediction efficiency of the current audio frame, the determining unit is specifically configured to: determine the second linear prediction efficiency interval into which the reference short-time linear prediction efficiency of the current audio frame falls; and determine a second audio coding method that has a mapping relationship with the second linear prediction efficiency interval based on the mapping relationship between the linear prediction efficiency interval and the audio coding method based on linear prediction, wherein the second audio coding method is an audio coding method that matches the reference linear prediction efficiency of the current audio frame, and the second audio coding method is an audio coding method based on linear prediction or an audio coding method not based on linear prediction.

11. The audio encoder according to any one of claims 8 to 10,

In estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: estimate the short-time linear prediction efficiency of the current audio frame, wherein the short-time linear prediction efficiency of the current audio frame is the reference short-time linear prediction efficiency of the current audio frame.

or,

In estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the linear prediction efficiency of N5 historical audio frames of the current audio frame; calculate a fourth statistical value of the linear prediction efficiency of the N5 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N5 is a positive integer, the fourth statistical value is the reference short-time linear prediction efficiency of the current audio frame, and the linear prediction efficiency of each of the N51 historical audio frames is at least one of the following linear prediction efficiencies: long-time linear prediction efficiency, short-time linear prediction efficiency, and combined linear prediction efficiency, wherein the N51 historical audio frames are a subset of the N5 historical audio frames;

or,

In estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N6 historical audio frames of the current audio frame; calculate a fifth statistical value of the reference linear prediction efficiency of the N6 historical audio frames and the short-time linear prediction efficiency of the current audio frame, wherein N6 is a positive integer, the fifth statistical value is the reference short-time linear prediction efficiency of the current audio frame, wherein the reference linear prediction efficiency of each of the N61 historical audio frames is at least one of the following linear prediction efficiencies: reference long-time linear prediction efficiency, reference short-time linear prediction efficiency, and reference comprehensive linear prediction efficiency, wherein the N61 historical audio frames are a subset of the N6 historical audio frames;

or,

In estimating the reference short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: estimate the short-time linear prediction efficiency of the current audio frame; obtain the reference linear prediction efficiency of N8 historical audio frames of the current audio frame; obtain the linear prediction efficiency of N7 historical audio frames of the current audio frame; and calculate a sixth statistical value of the linear prediction efficiency of the N7 historical audio frames, the reference linear prediction efficiency of the N8 historical audio frames, and the short-time linear prediction efficiency of the current audio frame, wherein N7 and N8 are positive integers, and the sixth statistical value is the reference short-time linear prediction efficiency of the current audio frame. The linear prediction efficiency is as follows: for each of the N71 historical audio frames, the linear prediction efficiency is at least one of the following linear prediction efficiencies: long-term linear prediction efficiency, short-term linear prediction efficiency, and combined linear prediction efficiency; for each of the N81 historical audio frames, the reference linear prediction efficiency is at least one of the following linear prediction efficiencies: reference long-term linear prediction efficiency, reference short-term linear prediction efficiency, and reference combined linear prediction efficiency. The N71 historical audio frames are a subset of the N7 historical audio frames, and the N81 historical audio frames are a subset of the N8 historical audio frames.

12. The audio encoder of claim 11, in the aspect of estimating the short-time linear prediction efficiency of the current audio frame, the estimation unit is specifically configured to: obtain the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame.

13. The audio encoder according to claim 12, wherein, in the aspect of obtaining the short-time linear prediction efficiency of the current audio frame based on the linear prediction residual of the current audio frame, the estimation unit is specifically configured to: calculate the energy change rate before and after the current audio frame undergoes short-time linear prediction, wherein the energy change rate is the short-time linear prediction efficiency of the current audio frame, or the short-time linear prediction efficiency of the current audio frame is obtained based on the energy change rate, wherein the energy of the current audio frame after undergoing short-time linear prediction is the energy of the linear prediction residual of the current audio frame.

14. The audio encoder according to claim 13, wherein the energy change rate of the current audio frame before and after short-time linear prediction is the ratio of the energy of the current audio frame before short-time linear prediction to the energy of the linear prediction residual of the current audio frame.

15. An audio encoder, characterized in that the audio encoder includes a processor and a memory, the processor executing the method according to any one of claims 1 to 7 by calling a program or instruction stored in the memory.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program, when executed by a computer device, is capable of implementing the method described in any one of claims 1 to 7.