CN111816197B - Audio encoding method, device, electronic equipment and storage medium - Google Patents
Audio encoding method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111816197B CN111816197B CN202010543384.7A CN202010543384A CN111816197B CN 111816197 B CN111816197 B CN 111816197B CN 202010543384 A CN202010543384 A CN 202010543384A CN 111816197 B CN111816197 B CN 111816197B
- Authority
- CN
- China
- Prior art keywords
- audio
- audio signal
- encoded
- audio type
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000005236 sound signal Effects 0.000 claims abstract description 219
- 238000012549 training Methods 0.000 claims description 70
- 238000013528 artificial neural network Methods 0.000 claims description 45
- 238000012545 processing Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 abstract description 5
- 239000000523 sample Substances 0.000 description 26
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present disclosure provides an audio encoding method, apparatus, electronic device, and storage medium. The method comprises the following steps: acquiring characteristic information of an audio signal to be encoded; determining audio type information of the audio signal to be encoded according to the characteristic information; determining the coding rate of the audio signal to be coded according to the audio type information; and encoding the audio signal to be encoded by using the encoding code rate. Thus, the coding rate selected in the present disclosure is a rate that matches the audio type information. The method solves the problem that bandwidth resources are wasted easily due to unreasonable coding code rate designated in the coding process in the traditional technology.
Description
Technical Field
The present invention relates to the field of audio information processing technologies, and in particular, to an audio encoding method, an audio encoding device, an electronic device, and a storage medium.
Background
The audio signal needs to be encoded during storage and transmission to save storage and transmission resources. The audio coding technique can be classified into lossless audio coding technique (Lossless Audio Coding), i.e., the original digital audio signal can be perfectly restored by a decoder at the decoding end. Such as lossless audio compression coding (Free Lossless Audio Codec, FLAC) format, etc. Another coding scheme is lossy coding, and audio signals can be compressed to different quality by psychoacoustic principles, and in transmission, the unit of audio quality is determined to be kilobits per second (kbps), i.e. how many bits are used for coding audio per unit length. Encoders for lossy encoding include advanced audio coding (Advanced Audio Coding, AAC), motion picture expert compression standard audio layer 3 (Moving Picture Experts Group Audio Layer III, mp 3) and the like.
In the prior art, in the process of lossy coding, a coding rate is usually required to be specified for the whole audio signal, and an encoder can code according to the specified coding rate. This can lead to unreasonable coding, for example, in order to ensure audio quality, a high code rate is often provided for coding. Whereby some bandwidth resources may be wasted.
Therefore, a new method is urgently needed to solve the above-mentioned problems.
Disclosure of Invention
The disclosure provides an audio coding method, an audio coding device, electronic equipment and a storage medium, which are used for solving the problem that bandwidth resources are wasted easily due to unreasonable coding rate designated in a coding process in the traditional technology.
In a first aspect, the present disclosure provides an audio encoding method, the method comprising:
acquiring characteristic information of an audio signal to be encoded;
determining audio type information of the audio signal to be encoded according to the characteristic information;
determining the coding rate of the audio signal to be coded according to the audio type information;
and encoding the audio signal to be encoded by using the encoding code rate.
In one embodiment, the determining the audio type information of the audio signal to be encoded according to the characteristic information includes:
Inputting the characteristic information into a neural network;
processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types;
and determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively.
In one embodiment, the determining the audio type information of the audio signal to be encoded according to the probabilities that the audio signal to be encoded belongs to each audio type respectively includes:
determining audio type information corresponding to the audio type with the highest probability as the audio type information of the audio signal to be encoded;
the determining the coding rate of the audio signal to be coded according to the audio type information comprises the following steps:
searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded.
In one embodiment, the determining the coding rate of the audio signal to be coded according to the audio type information includes:
taking the probability that the audio signals to be encoded belong to various audio types as an adjustment factor;
Obtaining coding code rates corresponding to various audio types;
and carrying out weighted summation according to the coding code rate of each audio type and the adjustment factor to obtain the coding code rate of the audio signal to be coded.
In an embodiment, before the obtaining the characteristic information of the audio signal to be encoded, the method further comprises:
training the neural network by:
acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
training the neural network according to the audio type training sample.
In one embodiment, before the training of the neural network according to the audio type training samples, the training method further comprises:
for each audio type training sample, determining the average value of the audio signals according to the total frame number of the audio signals of the same audio type and the audio signals of the same audio type in the audio type training sample.
In one embodiment, determining the audio signal to be encoded comprises:
segmenting an audio signal to be processed;
each obtained audio signal segment is determined as the audio signal to be encoded.
In one embodiment, the obtaining the characteristic information of the audio signal to be encoded includes:
converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal;
extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum;
and determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded.
In a second aspect, the present disclosure provides an audio encoding apparatus, the apparatus comprising:
an acquisition module configured to perform acquisition of characteristic information of an audio signal to be encoded;
an audio type information determining module configured to perform determining audio type information of the audio signal to be encoded according to the characteristic information;
a coding rate determining module configured to determine a coding rate of the audio signal to be coded according to the audio type information;
and the coding module is configured to perform coding of the audio signal to be coded by using the coding rate.
In one embodiment, the audio type information determination module is further configured to perform:
inputting the characteristic information into a neural network;
Processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types;
and determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively.
In one embodiment, the audio type information determination module is further configured to perform:
determining audio type information corresponding to the audio type with the highest probability as the audio type information of the audio signal to be encoded;
the code rate determination module is further configured to perform:
searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded.
In one embodiment, the coding rate determination module is further configured to perform:
taking the probability that the audio signals to be encoded belong to various audio types as an adjustment factor;
obtaining coding code rates corresponding to various audio types;
and carrying out weighted summation according to the coding code rate of each audio type and the adjustment factor to obtain the coding code rate of the audio signal to be coded.
In one embodiment, the apparatus further comprises:
The neural network training module is configured to train the neural network before acquiring the characteristic information of the audio signal to be encoded by the following method:
acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
training the neural network according to the audio type training sample.
In one embodiment, the apparatus further comprises:
and the audio signal average value determining module is configured to execute the training of each audio type sample before training the neural network according to the audio type training samples, and determine the average value of the audio signals according to the total frame number of the audio signals of the same audio type and the audio signals of the same audio type in the audio type training samples.
In one embodiment, the apparatus further comprises:
a segmentation module configured to perform segmentation of the audio signal to be processed;
an audio signal to be encoded determination module configured to perform determination of each obtained audio signal segment as the audio signal to be encoded.
In one embodiment, the acquisition module is further configured to perform:
Converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal;
extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum;
and determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor; the instructions are executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.
According to a fourth aspect provided by embodiments of the present disclosure, there is provided a computer storage medium storing a computer program for performing the method according to the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the present disclosure provides an audio encoding method, apparatus, device, and storage medium. The method comprises the following steps: acquiring characteristic information of an audio signal to be encoded; determining audio type information of the audio signal to be encoded according to the characteristic information; determining the coding rate of the audio signal to be coded according to the audio type information; and encoding the audio signal to be encoded by using the encoding code rate. The whole process determines the coding rate of the audio signal to be coded according to the audio type information of the audio signal to be coded, and codes the audio signal to be coded by using the coding rate. Therefore, the coding rate selected by the method is a rate matched with the type of the audio signal, and the problem that bandwidth resources are wasted due to unreasonable coding rate appointed in the coding process in the traditional technology is solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
FIG. 1 is a schematic diagram of a suitable scenario in one embodiment according to the present disclosure;
FIG. 2 is one of the flow diagrams of audio encoding according to one embodiment of the present disclosure;
FIG. 3 is a network model schematic diagram of audio encoding according to one embodiment of the present disclosure;
FIG. 4 is a second flow chart of audio encoding according to one embodiment of the present disclosure;
FIG. 5 is a third flow chart of audio encoding according to one embodiment of the present disclosure;
fig. 6 is an audio encoding apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural view of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to further explain the technical solutions provided by the embodiments of the present disclosure, the following details are described with reference to the accompanying drawings and the detailed description. Although the embodiments of the present disclosure provide the method operational steps as shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the order of execution of the steps is not limited to the order of execution provided by embodiments of the present disclosure. The methods may be performed sequentially or in parallel as shown in the embodiments or the drawings when the actual processing or the control device is executing.
The term "plurality" in the embodiments of the present disclosure means two or more, and other adjectives and the like, it should be understood that the preferred embodiments described herein are merely illustrative and explanatory of the disclosure, and are not intended to limit the present disclosure, and that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
The inventors have found that in lossy encoding, it is often necessary to specify a coding rate for the entire audio signal, and the encoder encodes according to the specified coding rate. It may result in an unreasonable coding rate, for example, a high rate is often set for coding in order to ensure audio quality. But may waste some bandwidth resources. For example, the use of 96kbps for encoding can better guarantee audio quality, but for speech signals, 64kbps can achieve equivalent subjective quality, where 96kbps is used for speech encoding. Then 32kbps bandwidth is 'wasted'.
Thus, the present disclosure proposes an audio encoding method, apparatus, electronic device, and storage medium. The present disclosure is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an audio encoding application scenario is shown, where the application scenario includes a plurality of terminal devices 110 and a server 130, and in fig. 1, three terminal devices 110 are taken as an example, and the number of terminal devices 110 is not limited in practice. The terminal device 110 has installed therein a client 120 for displaying network resource information (e.g., a client for making and playing audio/video). Communication between the client 120 and the server 130 may be through a communication network. Terminal device 110 is, for example, a cell phone, tablet computer, personal computer, etc. The server 130 may be implemented by a single server or by a plurality of servers. The server 130 may be implemented by a physical server or may be implemented by a virtual server.
In one possible application scenario, user a records a short video 1 using a client 120 in a terminal device 110, and the client 120 sends the short video 1 to a server 130. The server 130 extracts an audio signal to be encoded for the short video 1, acquires characteristic information of the audio signal to be encoded, and determines audio type information of the audio signal to be encoded according to the acquired characteristic information. Then, the server 130 determines the coding rate of the audio signal to be coded according to the audio type information; and encoding the audio signal to be encoded by using the obtained encoding code rate. After encoding the entire audio, the server 130 synthesizes the encoded audio and the image in the short video into a new short video 1 and transmits the new short video 1 to the clients 120 of the plurality of terminal devices 110 (e.g., clients of the user B and the user C in fig. 1).
In the embodiment of the application, the audio type information of the audio signal to be encoded can be determined by adopting a pre-trained neural network, and then the encoding code rate is determined and encoded through the audio type information. The training and use of the neural network is described in detail below, respectively.
1. Training of neural networks:
FIG. 2 is a neural network training flowchart, described in detail below, that may include the following steps:
Step 201: acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
for example, the music genre labels are music genre information, background ambient sound genre information of background ambient sound genre labels, and the like.
Step 202: training the neural network according to the audio type training sample.
As shown in fig. 3, a schematic structural diagram of the neural network model is shown. The neural network model includes an input layer 301, a hidden layer 302, and an output layer 303. The training samples are input into the neural network model through the input layer 301, and feature extraction is performed through the hidden layer 302. The audio of the training samples will then be output through the output layer 303.
Before step 202 is performed, in order to save computational resources, in one embodiment, for each training sample, an average value of the audio signals is determined based on the total number of frames of the audio signals of the same audio type and the audio signals of the same audio type in the training sample. For example, the training samples include 30 frames of audio signals of the same audio type, and the average value can be obtained according to the audio signals of the same audio type of 30 frames and the total frame number 30, so as to obtain an audio signal with a frame length. Wherein the audio signal of one frame length is the amount of audio data in units of sampling time. It should be noted that the sampling time is determined according to the requirements of the codec and the specific application.
Thus, if training samples of 30 frames of audio signals are input into the neural network for training, if the training period of the neural network for 30 frames of audio signals is As, if the training period of the neural network for 1 frame of audio signals is far less than As by determining the audio signals with the average value of the audio signals and inputting the audio signals with the length of one frame into the neural network for training. Therefore, the training period can be shortened to a great extent. And may save computing resources.
2. Neural network-based audio coding:
step 401: acquiring characteristic information of an audio signal to be encoded;
in one embodiment, the audio signal to be processed is segmented; each obtained audio signal segment is determined as the audio signal to be encoded.
Thus, when a plurality of audio type information is included in the audio signal to be processed, the audio signal to be encoded can be obtained by segmenting the audio signal to be processed. Therefore, the audio signals to be encoded of different audio types can be encoded by using the corresponding encoding code rate, thereby further saving bandwidth resources.
Step 402: determining audio type information of the audio signal to be encoded according to the characteristic information;
Step 403: determining the coding rate of the audio signal to be coded according to the audio type information;
step 404: and encoding the audio signal to be encoded by using the encoding code rate.
As described in the background, existing audio coding generally requires a specific coding rate for the entire audio signal, and the encoder encodes the audio signal according to the specific coding rate. For example, encoding with 96kbps may better guarantee audio quality, but for a speech signal, 64kbps may achieve equivalent subjective quality, where encoding with 96kbps 'wastes' 32kbps bandwidth.
The implementation process of the scheme of the application is that the neural network is utilized to determine that the audio type information of the audio signal to be encoded is music type information, the corresponding encoding code rate is determined to be 96kbps, and the audio signal to be encoded is encoded by using 96 kbps. If the determined audio type information of the audio signal to be encoded is voice type information, determining that the corresponding encoding code rate is 64kbps according to the voice type information, and encoding the audio signal to be encoded by using 64 kbps.
The method comprises the steps of extracting characteristics of an audio signal to be encoded, and determining audio type information of the audio signal to be encoded according to the characteristic information; and then determining the coding rate of the audio signal to be coded through the audio type information, and finally coding the audio signal to be coded by using the coding rate. Thus, the coding rate selected in the present disclosure is a rate that matches the audio type information. The method solves the problem that bandwidth resources are wasted easily due to unreasonable coding code rate designated in the coding process in the traditional technology.
In one embodiment, step 402 may be implemented as: inputting the characteristic information into a neural network; processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types; and determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively.
Therefore, the probability that the audio signals to be encoded respectively belong to various audio types can be obtained through the neural network, and the audio type information of the audio signals to be encoded can be determined.
In one embodiment, the audio type information corresponding to the audio type with the highest probability may be determined as the audio type information of the audio signal to be encoded; step 403 may be performed as: searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded. The corresponding relation between the audio type information and the coding rate is shown in table 1:
table 1:
| audio type information | Music genre information | Background ambient sound type information | Voice type information | … |
| Coding rate | 96kbps | 48kbps | 64kbps | … |
Thus, the coding rate of the audio signal to be coded can be determined by the audio type information corresponding to the audio type with the highest probability. The method not only can determine a reasonable coding rate, but also is simple to realize.
In one embodiment, the coding rate of the audio signal to be coded may also be determined by: taking the probability that the audio signals to be encoded belong to various audio types as an adjustment factor; obtaining coding code rates corresponding to various audio types; and carrying out weighted summation according to the coding code rate of each audio type and the adjustment factor to obtain the coding code rate of the audio signal to be coded.
For example, the audio types to which the audio signals to be encoded respectively belong include: music type and voice type. If the probability of the audio signal to be encoded belonging to the music type is determined to be 85%, the probability of the audio signal to be encoded belonging to the voice type is determined to be 15%. And obtaining the coding rate of the music type to be 96kbps and the coding rate of the voice type to be 64kbps, and then taking the probability of each audio type as an adjustment factor to respectively carry out weighted summation with each coding rate to obtain the coding rate of the audio signal to be coded to be 106.45kbps.
Specifically, the coding rate of the audio signal to be coded can be determined according to the formula (1):
wherein N is audio type information (N is more than or equal to 1 and less than or equal to N), N is the total number of the audio type information, and p n For the probability of corresponding to the audio type information, R n The coding rate corresponding to the audio type information.
Therefore, the method and the device can adjust the corresponding coding rate through the probability of the audio type information, obtain the final coding rate, and determine the coding rate of the audio signal to be coded according to different requirements.
In one embodiment, step 401 may be performed as: converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal; the method for converting the audio signal to be encoded into the frequency domain may be short-time fourier transform, and the obtained frequency domain signal may be expressed as:
S(n,k)=STFT(S(t))=A(n,k)*e iθ(n,k) (2);
where S (t) is the audio signal to be encoded, and STFT (S (t)) is represented by subjecting the audio signal to be encoded to short-time fourier transform. S (n, k) represents a frequency domain signal. A (n, k) is the signal amplitude, θ (n, k) is the phase, n is the number of frames of the audio signal to be encoded, and k is the frequency.
Extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum; and determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded. It should be noted that: the preset characteristic information can be determined according to actual use requirements. Wherein the mel-frequency spectrum is determined according to the following manner:
1. Performing short-time Fourier transform on the S (n, k) obtained in the formula (2) to obtain a Spectrogram (n, k):
Spectrogram(n,k)=(abs(S(n,k))) 2 (3);
where abs (S (n, k)) represents the absolute value of the frequency domain signal.
2. Spectrogrtam (n, k) in equation (3) is converted by Mel scale to obtain mapped value mel:
and by aggregating k in the Specgtrogram according to mel, mel spectrum MelSpectrogram can be obtained:
MelSpectrogram(m,k)=∑ m Spectrogram(n,k)·mel(n,k) (4);
3. mel-cepstrum MFCC is obtained by discrete cosine transforming mel-spectrum DCT on dB domain:
MFCC(m,k)=DCT(10*log10(MelSpectrogram(m,k))) (5)。
therefore, the audio signal to be encoded is converted into the frequency domain to obtain the frequency domain signal, and the characteristic information of the audio signal to be encoded is obtained by extracting the characteristic of the frequency domain signal.
For further understanding of the technical solution provided in the present disclosure, the following detailed description with reference to fig. 5 may include the following steps:
step 500: for each audio type training sample, determining an average value of the audio signals according to the audio signals of the same audio type in the audio type training sample and the total frame number of the audio signals of the same audio type;
step 501: acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
Step 502: training the neural network according to the audio type training sample;
step 503: segmenting an audio signal to be processed;
step 504: determining each obtained audio signal segment as the audio signal to be encoded;
step 505: acquiring characteristic information of an audio signal to be encoded;
step 506: inputting the characteristic information into a neural network;
step 507: processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types;
step 508: determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively;
step 509: determining the coding rate of the audio signal to be coded according to the audio type information;
step 510: and encoding the audio signal to be encoded by using the encoding code rate.
The audio encoding method of the present disclosure as described above may also be implemented by an audio encoding apparatus based on the same inventive concept. The effect of the device is similar to that of the previous method, and will not be described again here.
Fig. 6 is a schematic structural view of an audio encoding apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the audio encoding apparatus 600 of the present disclosure may include an acquisition module 610, an audio type information determination module 620, an encoding rate determination module 630, and an encoding module 640.
An acquisition module 610 configured to perform acquisition of characteristic information of an audio signal to be encoded;
an audio type information determining module 620 configured to perform determining audio type information of the audio signal to be encoded according to the characteristic information;
an encoding rate determining module 630 configured to determine an encoding rate of the audio signal to be encoded according to the audio type information;
an encoding module 640 configured to perform encoding of the audio signal to be encoded using the encoding rate.
In one embodiment, the audio type information determination module 620 is further configured to perform:
inputting the characteristic information into a neural network;
processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types;
and determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively.
In one embodiment, the audio type information determination module 620 is further configured to perform:
determining audio type information corresponding to the audio type with the highest probability as the audio type information of the audio signal to be encoded;
the coding rate determination module 630 is further configured to perform:
searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded.
In one embodiment, the coding rate determination module 630 is further configured to perform:
taking the probability that the audio signals to be encoded belong to various audio types as an adjustment factor;
obtaining coding code rates corresponding to various audio types;
and carrying out weighted summation according to the coding code rate of each audio type and the adjustment factor to obtain the coding code rate of the audio signal to be coded.
In one embodiment, the apparatus further comprises:
the neural network training module 650 is configured to train the neural network before performing the acquisition of the characteristic information of the audio signal to be encoded by:
acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
Training the neural network according to the audio type training sample.
In one embodiment, the apparatus further comprises:
an audio signal average value determining module 660 configured to perform, for each audio type training sample, determining an average value of the audio signals according to a total frame number of the audio signals of the same audio type and the audio signals of the same audio type in the audio type training sample before training the neural network according to the audio type training sample.
In one embodiment, the apparatus further comprises:
a segmentation module 670 configured to perform segmentation of the audio signal to be processed;
an audio signal to be encoded determination module 680 configured to perform a determination of each obtained audio signal segment as the audio signal to be encoded.
In one embodiment, the obtaining module 610 is further configured to perform:
converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal;
extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum;
and determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded.
Having described an audio encoding method and apparatus according to an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.
Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
In some possible implementations, an electronic device according to the present application may include at least one processor, and at least one computer storage medium. Wherein the computer storage medium stores program code which, when executed by a processor, causes the processor to perform the steps in the audio encoding method according to various exemplary embodiments of the present application described above in the present specification. For example, the processor may perform steps 401-404 as shown in FIG. 4.
An electronic device 700 according to this embodiment of the present application is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments herein.
As shown in fig. 7, the electronic device 700 is embodied in the form of a general-purpose electronic device. Components of electronic device 700 may include, but are not limited to: the at least one processor 701, the at least one computer storage medium 702, and a bus 703 that connects the various system components, including the computer storage medium 702 and the processor 701.
Bus 703 represents one or more of several types of bus structures, including a computer storage media bus or computer storage media controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The computer storage media 702 may include readable media in the form of volatile computer storage media, such as random access computer storage media (RAM) 721 and/or cache storage media 722, and may further include read only computer storage media (ROM) 723.
The computer storage media 702 may also include a program/utility 725 having a set (at least one) of program modules 724, such program modules 724 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The electronic device 700 may also communicate with one or more external devices 704 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 700, and/or any device (e.g., router, modem, etc.) that enables the electronic device 700 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 705. Also, the electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through the network adapter 706. As shown, the network adapter 706 communicates with other modules for the electronic device 700 over the bus 703. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
In some possible embodiments, aspects of an audio encoding method provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of an audio encoding method according to the various exemplary embodiments of the present application as described herein above, when the program product is run on a computer device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a random access computer storage medium (RAM), a read-only computer storage medium (ROM), an erasable programmable read-only computer storage medium (EPROM or flash memory), an optical fiber, a portable compact disc read-only computer storage medium (CD-ROM), an optical computer storage medium, a magnetic computer storage medium, or any suitable combination of the foregoing.
The program product for audio encoding of embodiments of the present application may employ a portable compact disc read-only computer storage medium (CD-ROM) and include program code and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device, partly on the remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).
It should be noted that although several modules of the apparatus are mentioned in the detailed description above, this division is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present application. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk computer storage media, CD-ROM, optical computer storage media, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable computer storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable computer storage medium produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.
Claims (14)
1. A method of audio encoding, the method comprising:
acquiring characteristic information of an audio signal to be encoded;
the audio type information of the audio signal to be encoded is determined according to the characteristic information, and the method specifically comprises the following steps:
inputting the characteristic information into a neural network; processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types; determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively;
According to the audio type information, determining the coding rate of the audio signal to be coded specifically comprises the following steps:
taking the probability that the audio signals to be encoded belong to various audio types as an adjustment factor; obtaining coding code rates corresponding to various audio types; according to the coding rate of each audio type and the adjustment factor, carrying out weighted summation to obtain the coding rate of the audio signal to be coded;
and encoding the audio signal to be encoded by using the encoding code rate.
2. The method according to claim 1, wherein the determining the audio type information of the audio signal to be encoded according to the probabilities that the audio signal to be encoded belongs to the audio types, respectively, comprises:
determining audio type information corresponding to the audio type with the highest probability as the audio type information of the audio signal to be encoded;
the determining the coding rate of the audio signal to be coded according to the audio type information comprises the following steps:
searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded.
3. The method according to claim 1, wherein prior to the obtaining of the characteristic information of the audio signal to be encoded, the method further comprises:
Training the neural network by:
acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
training the neural network according to the audio type training sample.
4. The method of claim 3, wherein the training samples according to the audio type further comprises, prior to training the neural network:
for each audio type training sample, determining the average value of the audio signals according to the total frame number of the audio signals of the same audio type and the audio signals of the same audio type in the audio type training sample.
5. The method of claim 1, wherein determining the audio signal to be encoded comprises:
segmenting an audio signal to be processed;
each obtained audio signal segment is determined as the audio signal to be encoded.
6. The method according to claim 1, wherein the obtaining characteristic information of the audio signal to be encoded comprises:
converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal;
extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum;
And determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded.
7. An audio encoding apparatus, the apparatus comprising:
an acquisition module configured to perform acquisition of characteristic information of an audio signal to be encoded;
an audio type information determining module configured to perform determining audio type information of the audio signal to be encoded according to the characteristic information;
the audio type information determination module is further configured to perform inputting the characteristic information into a neural network; processing the characteristic information by utilizing the neural network to obtain probabilities that the audio signals to be encoded respectively belong to various audio types; determining the audio type information of the audio signal to be encoded according to the probability that the audio signal to be encoded belongs to each audio type respectively;
a coding rate determining module configured to determine a coding rate of the audio signal to be coded according to the audio type information;
the coding rate determining module is further configured to perform the probability that the audio signals to be coded respectively belong to various audio types as an adjustment factor; obtaining coding code rates corresponding to various audio types; according to the coding rate of each audio type and the adjustment factor, carrying out weighted summation to obtain the coding rate of the audio signal to be coded;
And the coding module is configured to perform coding of the audio signal to be coded by using the coding rate.
8. The apparatus of claim 7, wherein the audio type information determination module is further configured to perform:
determining audio type information corresponding to the audio type with the highest probability as the audio type information of the audio signal to be encoded;
the code rate determination module is further configured to perform:
searching the coding rate corresponding to the audio type information, and taking the searched coding rate as the coding rate of the audio signal to be coded.
9. The apparatus of claim 7, wherein the apparatus further comprises:
the neural network training module is configured to train the neural network before acquiring the characteristic information of the audio signal to be encoded by the following method:
acquiring an audio type training sample, wherein the audio type training sample comprises characteristic information and marked audio type information of an audio signal of the same audio type;
training the neural network according to the audio type training sample.
10. The apparatus of claim 9, wherein the apparatus further comprises:
And the audio signal average value determining module is configured to execute the training of each audio type sample before training the neural network according to the audio type training samples, and determine the average value of the audio signals according to the total frame number of the audio signals of the same audio type and the audio signals of the same audio type in the audio type training samples.
11. The apparatus of claim 7, wherein the apparatus further comprises:
a segmentation module configured to perform segmentation of the audio signal to be processed;
an audio signal to be encoded determination module configured to perform determination of each obtained audio signal segment as the audio signal to be encoded.
12. The apparatus of claim 7, wherein the acquisition module is further configured to perform:
converting the audio signal to be encoded into a frequency domain to obtain a frequency domain signal;
extracting features of the frequency domain signals to obtain preset feature information, wherein the preset feature information comprises a mel cepstrum and/or a mel frequency spectrum;
and determining the frequency domain signal and/or the preset characteristic information as the characteristic information of the audio signal to be encoded.
13. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor; the instructions being executable by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1-6.
14. A computer storage medium, characterized in that it stores a computer program for executing the method according to any one of claims 1-6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010543384.7A CN111816197B (en) | 2020-06-15 | 2020-06-15 | Audio encoding method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010543384.7A CN111816197B (en) | 2020-06-15 | 2020-06-15 | Audio encoding method, device, electronic equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111816197A CN111816197A (en) | 2020-10-23 |
| CN111816197B true CN111816197B (en) | 2024-02-23 |
Family
ID=72845128
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010543384.7A Active CN111816197B (en) | 2020-06-15 | 2020-06-15 | Audio encoding method, device, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111816197B (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114495951A (en) * | 2020-11-11 | 2022-05-13 | 华为技术有限公司 | Audio coding and decoding method and device |
| CN113948099A (en) * | 2021-10-18 | 2022-01-18 | 北京金山云网络技术有限公司 | Audio encoding method, audio decoding method, audio encoding device, audio decoding device and electronic equipment |
| CN115334349B (en) * | 2022-07-15 | 2024-01-02 | 北京达佳互联信息技术有限公司 | Audio processing method, device, electronic equipment and storage medium |
| CN115410586B (en) * | 2022-07-26 | 2025-02-25 | 北京达佳互联信息技术有限公司 | Audio processing method, device, electronic device and storage medium |
| CN115550729B (en) * | 2022-09-29 | 2025-10-28 | 北京达佳互联信息技术有限公司 | Audio coding method, device, equipment and medium |
| CN115662453A (en) * | 2022-11-02 | 2023-01-31 | 北京百瑞互联技术股份有限公司 | A speech coding method, system, medium and device based on deep learning |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101755238B1 (en) * | 2016-11-29 | 2017-07-10 | 대한민국 | Apparatus for restoring speech of damaged multimedia file and method thereof |
| CN107483059A (en) * | 2017-07-31 | 2017-12-15 | 广东工业大学 | A method and device for encoding and decoding multi-channel data based on dynamic Huffman tree |
| CN109003618A (en) * | 2018-08-14 | 2018-12-14 | Oppo广东移动通信有限公司 | Encoding control method, encoding control device, electronic device, and storage medium |
| CN109273017A (en) * | 2018-08-14 | 2019-01-25 | Oppo广东移动通信有限公司 | Coding control method, device and electronic device |
| CN110249320A (en) * | 2017-04-28 | 2019-09-17 | 惠普发展公司有限责任合伙企业 | Utilize the audio classification for using the machine learning model of audio duration to carry out |
| CN110301143A (en) * | 2016-12-30 | 2019-10-01 | 英特尔公司 | Method and apparatus for radio communication |
| CN110602428A (en) * | 2018-06-12 | 2019-12-20 | 视联动力信息技术股份有限公司 | Audio data processing method and device |
| CN110992963A (en) * | 2019-12-10 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Network communication method, device, computer equipment and storage medium |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9646634B2 (en) * | 2014-09-30 | 2017-05-09 | Google Inc. | Low-rank hidden input layer for speech recognition neural network |
-
2020
- 2020-06-15 CN CN202010543384.7A patent/CN111816197B/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101755238B1 (en) * | 2016-11-29 | 2017-07-10 | 대한민국 | Apparatus for restoring speech of damaged multimedia file and method thereof |
| CN110301143A (en) * | 2016-12-30 | 2019-10-01 | 英特尔公司 | Method and apparatus for radio communication |
| CN110249320A (en) * | 2017-04-28 | 2019-09-17 | 惠普发展公司有限责任合伙企业 | Utilize the audio classification for using the machine learning model of audio duration to carry out |
| CN107483059A (en) * | 2017-07-31 | 2017-12-15 | 广东工业大学 | A method and device for encoding and decoding multi-channel data based on dynamic Huffman tree |
| CN110602428A (en) * | 2018-06-12 | 2019-12-20 | 视联动力信息技术股份有限公司 | Audio data processing method and device |
| CN109003618A (en) * | 2018-08-14 | 2018-12-14 | Oppo广东移动通信有限公司 | Encoding control method, encoding control device, electronic device, and storage medium |
| CN109273017A (en) * | 2018-08-14 | 2019-01-25 | Oppo广东移动通信有限公司 | Coding control method, device and electronic device |
| CN110992963A (en) * | 2019-12-10 | 2020-04-10 | 腾讯科技(深圳)有限公司 | Network communication method, device, computer equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111816197A (en) | 2020-10-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111816197B (en) | Audio encoding method, device, electronic equipment and storage medium | |
| CN112767954B (en) | Audio encoding and decoding method, device, medium and electronic equipment | |
| CN110223705B (en) | Voice conversion method, device, equipment and readable storage medium | |
| US20240249737A1 (en) | Audio encoding and decoding method and related product | |
| CN101950561B (en) | Methods and apparatus for embedding watermarks | |
| CN101055720B (en) | Method and device for encoding and decoding audio signals | |
| CN113903345A (en) | Audio processing method and device and electronic device | |
| CN117351943A (en) | Audio processing methods, devices, equipment and storage media | |
| CN113314132A (en) | A kind of audio object coding method, decoding method and device applied in interactive audio system | |
| US9886962B2 (en) | Extracting audio fingerprints in the compressed domain | |
| WO2024227155A1 (en) | Generating audio using non-autoregressive decoding | |
| CN112767955B (en) | Audio encoding method and device, storage medium and electronic equipment | |
| CN113112993B (en) | Audio information processing method, device, electronic equipment and storage medium | |
| CN114333892A (en) | Voice processing method and device, electronic equipment and readable medium | |
| CN114333891A (en) | Voice processing method and device, electronic equipment and readable medium | |
| JP2022505888A (en) | Methods and equipment for rate quality scalable coding using generative models | |
| US20250124934A1 (en) | Multi-lag format for audio coding | |
| CN114333893A (en) | Voice processing method and device, electronic equipment and readable medium | |
| CN117649846B (en) | Speech recognition model generation method, speech recognition method, device and medium | |
| CN115641857A (en) | Audio processing method, device, electronic equipment, storage medium and program product | |
| Bouzid et al. | Multi-coder vector quantizer for transparent coding of wideband speech ISF parameters | |
| CN118609581B (en) | Audio encoding and decoding methods, apparatuses, devices, storage medium, and products | |
| CN120299465B (en) | Audio data processing method, device, equipment, storage medium and program product | |
| CN117292694B (en) | Token-less neural speech coding and decoding method and system based on time-invariant coding | |
| US20230075562A1 (en) | Audio Transcoding Method and Apparatus, Audio Transcoder, Device, and Storage Medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |