CN108632852B

CN108632852B - Method and device for determining voice quality

Info

Publication number: CN108632852B
Application number: CN201710179111.7A
Authority: CN
Inventors: 蔡凤云; 魏建强
Original assignee: Shanghai Datang Mobile Communications Equipment Co ltd
Current assignee: Shanghai Datang Mobile Communications Equipment Co ltd
Priority date: 2017-03-23
Filing date: 2017-03-23
Publication date: 2021-08-20
Anticipated expiration: 2037-03-23
Also published as: CN108632852A

Abstract

The embodiment of the invention provides a method and a device for determining voice quality, wherein the method comprises the following steps: determining a target sampling rate according to the network type of the calling terminal and the network type of the called terminal; extracting target audio from pre-stored audio by adopting the target sampling rate; playing the target audio at the calling terminal, and recording the audio at the called terminal by adopting the target sampling rate; and/or, playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate; after recording and playing are finished, determining the voice quality according to the analysis of the recorded audio and the target audio; therefore, the accuracy of the determined voice quality when the voice service of the mobile terminal is switched to the corresponding network is improved.

Description

Method and device for determining voice quality

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for determining voice quality.

Background

With the continuous development of communication technology, mobile communication technology is also continuously developed. The mobile terminal establishes the latest mobile communication network except for acquiring the mobile data, and the voice service of the mobile terminal is transferred to the latest network to be carried out; for example, in addition to the 4G network being used by a 4G network user to obtain mobile data, voice services are also established on the 4G network.

Generally, a voice quality assessment (MOS) method is used to assess the voice quality, specifically, during MOS testing, recording and playing are performed according to a fixed sampling rate, and then audio files corresponding to the recording and playing are compared to determine a Score of the MOS test, so as to determine the voice quality of the current voice call according to the Score of the MOS test. Because network load conditions corresponding to different network systems are different, sampling rates corresponding to the MOS test are also different, for example, a 4G network corresponds to a 16K sampling rate, and a 2G network corresponds to an 8K sampling rate. In the process of improving the latest mobile communication network, the latest mobile communication network does not fully cover, so that the continuity of the voice service cannot be ensured, and the voice service of the mobile terminal is often switched from a high-order network to a low-order network. Therefore, in the process of performing the MOS test, if the network corresponding to the voice service of the mobile terminal is switched, the accuracy of the MOS test result is reduced.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a method for determining voice quality, so as to solve the problem of low accuracy of determined voice quality in the prior art.

Correspondingly, the embodiment of the invention also provides a device for determining the voice quality, which is used for ensuring the realization and the application of the method.

In order to solve the above problem, the present invention discloses a method for determining voice quality, which specifically comprises: determining a target sampling rate according to the network type of the calling terminal and the network type of the called terminal; extracting target audio from pre-stored audio by adopting the target sampling rate; playing the target audio at the calling terminal, and recording the audio at the called terminal by adopting the target sampling rate; and/or, playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate; after the recording and playing is finished, the voice quality is determined according to the analysis of the recorded audio and the target audio.

Optionally, the step of determining the target sampling rate according to the network type of the calling terminal and the network type of the called terminal includes: determining a first sampling rate according to the network type of the calling terminal and determining a second sampling rate according to the network type of the called terminal; and comparing the first sampling rate with the second sampling rate, and determining the minimum sampling rate as the target sampling rate.

Optionally, the step of determining the first sampling rate according to the network type of the calling terminal includes: judging whether the network type of the calling terminal is a low-order mobile network type; when the network type of the calling terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a first sampling rate; and when the network type of the calling terminal is determined to be a high-order mobile network type, obtaining a calling signaling from the calling terminal, and determining a first sampling rate according to the voice parameter of the calling signaling.

Optionally, the step of determining the second sampling rate according to the network type of the called terminal includes: judging whether the network type of the called terminal is a low-order mobile network type; when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate; and when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to the voice parameter of the called signaling.

Optionally, the low-order mobile network system includes a second-generation mobile network system and a third-generation mobile network system, and the high-order mobile network system includes a fourth-generation mobile network system and a mobile network system with an order of more than fourth generation.

Optionally, the step of determining speech quality from the analysis of the recorded audio and the target audio comprises: calculating the recorded audio and the target audio by adopting a voice evaluation MOS scoring algorithm; and determining the voice quality according to the calculated result.

The invention also discloses a device for determining the voice quality, which specifically comprises the following steps: the determining module is used for determining a target sampling rate according to the network type of the calling terminal and the network type of the called terminal; the extraction module is used for extracting target audio from prestored audio by adopting the target sampling rate; the playing and recording module is used for playing the target audio at the calling terminal and recording the audio at the called terminal by adopting the target sampling rate; and/or, playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate; and the analysis module is used for determining the voice quality according to the analysis of the recorded audio and the target audio after the recording and the playing are finished.

Optionally, the determining module includes: the rate determining submodule is used for determining a first sampling rate according to the network type of the calling terminal and determining a second sampling rate according to the network type of the called terminal; and the comparison sub-module is used for comparing the first sampling rate with the second sampling rate and determining the minimum sampling rate as the target sampling rate.

Optionally, the rate determining submodule is specifically configured to determine whether a network type of the calling terminal is a low-order mobile network type; when the network type of the calling terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a first sampling rate; and when the network type of the calling terminal is determined to be a high-order mobile network type, obtaining a calling signaling from the calling terminal, and determining a first sampling rate according to the voice parameter of the calling signaling.

Optionally, the rate determining submodule is specifically configured to determine whether a network type of the called terminal is a low-order mobile network type; when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate; and when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to the voice parameter of the called signaling.

Optionally, the calculation submodule is used for calculating the recorded audio and the target audio by adopting a voice evaluation MOS scoring algorithm; and the quality determination submodule is used for determining the voice quality according to the calculation result.

Compared with the prior art, the embodiment of the invention has the following advantages:

when the voice quality is determined, the target sampling rate is determined according to the network type of the calling terminal and the network type of the called terminal; extracting target audio from pre-stored audio by adopting the target sampling rate; then, the target sampling rate is adopted to carry out recording and playing operation on the calling terminal and the called terminal; then, after recording and playing are finished, determining the voice quality according to the analysis of the recorded audio and the target audio; therefore, the target sampling rate can be determined in real time according to the network systems corresponding to the calling terminal and the called terminal, so that the target sampling rate is matched with the actual network load condition of the calling terminal and/or the called terminal when the voice quality is determined; therefore, the accuracy of the determined voice quality when the voice service of the mobile terminal is switched to the corresponding network is improved.

Drawings

FIG. 1 is a flow chart of the steps of one embodiment of a method for determining speech quality of the present invention;

FIG. 2 is a flow chart of steps in another embodiment of a method for determining speech quality of the present invention;

FIG. 3 is a block diagram of an embodiment of a speech quality determining apparatus according to the present invention;

fig. 4 is a block diagram of another embodiment of the speech quality determining apparatus according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

One of the ideas of the method for determining the voice quality provided by the embodiment of the invention is that when the voice quality is determined, a target audio is played at any one of a calling terminal or a called terminal, the audio is recorded at the other terminal, and after recording and playing are finished, the voice quality is determined by comparing the target audio with the recorded audio; the sampling rate of the target audio and the recorded audio is determined according to the network type of the calling terminal and the network type of the called terminal. Therefore, when the voice quality is determined, the corresponding target sampling rate is matched with the actual network load condition of the calling terminal and/or the called terminal, and the accuracy of the determined voice quality is improved.

Example one

Referring to fig. 1, a flow chart of steps of an embodiment of a method for determining speech quality of the present invention is shown; the method specifically comprises the following steps:

step 101, determining a target sampling rate according to a network type of a calling terminal and a network type of a called terminal.

The voice quality determined by the embodiment of the invention is the voice quality in the voice communication process between the calling terminal and the called terminal; and the mobile terminal for dialing the call is determined as a calling terminal, and the mobile terminal for answering the call is determined as a called terminal. In the process of determining the voice quality, the calling terminal and the called terminal always keep the voice communication state, and the number corresponding to the calling terminal and the number corresponding to the called terminal can belong to the same operator or different operators. For example, the calling terminal belongs to china telecommunications, and the called terminal belongs to china unicom; or, the calling terminal and the called terminal both belong to China Mobile. In the embodiment of the invention, in order to ensure that the target sampling rate is matched with the actual network load condition of the calling terminal and/or the called terminal in the process of determining the voice quality, the server or the PC terminal can respectively acquire data from the calling terminal and the called terminal, and then the network type of the calling terminal and the network type of the called terminal are determined according to the acquired data. And then determining the target sampling rate for generating the target audio and the recorded audio according to the determined network system. For example, if the network systems corresponding to the calling terminal and the called terminal are both 2G, the target sampling rate may be determined to be 8K; if the network systems corresponding to the calling terminal and the called terminal are both 4G, the target sampling rate can be determined to be 16K; wherein, the target audio is audio played by the calling terminal and/or the called terminal.

And step 102, extracting the target audio from the pre-stored audio by adopting the target sampling rate.

After the target sampling rate is determined, the target sampling rate can be adopted to extract the target audio from the pre-stored audio; wherein, the pre-stored audio can be pre-stored, and the pre-stored audio can at least comprise one voice sample. When the pre-stored audio comprises a plurality of voice samples, the plurality of voice samples can be connected, wherein the voice samples can be connected without interruption or at certain time intervals, and the connection is not limited herein; in addition, the embodiment of the present invention does not limit the duration of each voice sample.

103, playing the target audio at the calling terminal, and recording the audio at the called terminal by adopting the target sampling rate; and/or playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate.

The extracted target audio can be played while the target audio is extracted, specifically, the target audio can be played at the calling terminal, the audio can also be played at the called terminal, and the target audio can also be played alternately at the calling terminal and the called terminal. When the target audio is played at the calling terminal, the calling terminal transmits the target audio to the called terminal through a wireless network, the called terminal receives the audio from the audio server or the PC terminal, and the server or the PC terminal records the received audio at the target sampling rate when receiving the audio returned by the called terminal. When the target audio is played at the called terminal, similarly, the audio is recorded at the calling terminal. Preferably, in the embodiment of the present invention, in order to improve the accuracy of the determined voice quality, the audio may be alternately played and recorded at the calling terminal and the called terminal, wherein the durations of playing the target audio at the calling terminal and the called terminal are not limited.

And step 104, after recording and playing are finished, determining the voice quality according to the analysis of the recorded audio and the target audio.

Generally, the sound transmitted through the wireless network is lost, so that the sound transmitted through the wireless network is different from the original sound; therefore, in the embodiment of the invention, after recording and playing are finished, the recorded audio and the target audio can be compared, the difference between the recorded audio and the target audio is determined, and then the difference is analyzed, so that the voice quality of the calling terminal and the called terminal can be determined according to the analysis result.

The embodiment of the invention can be applied to various scenes, such as voice test and the like, specifically, the voice quality test software can be installed at the PC end, and the two mobile terminals are respectively connected with the PC end through the USB data lines. Any mobile terminal can be used as a calling terminal, and the other terminal can be used as a called terminal; the calling terminal and the called terminal are connected with the voice box through a voice line. In the testing process, voice testing software acquires data from a calling terminal and a called terminal through a USB line respectively, and then determines the network type of the calling terminal and the network type of the called terminal according to the acquired data; and determining the target sampling rate of the voice box for generating the target audio and the recorded audio according to the determined network system. In the process of communication between the calling terminal and the called terminal, the voice box extracts a target audio from pre-stored audio according to a target sampling rate and plays the target audio at the calling terminal; meanwhile, recording audio at the called terminal through a voice line; after the recording and playing are finished, the voice quality testing software compares the target audio frequency with the recorded audio frequency to determine the voice quality.

Example two

In the embodiment of the invention, the network type of the calling terminal and the network type of the called terminal may be different, and the sampling rates matched with the corresponding networks may also be different. In addition, the manner of determining the sampling rate according to the network system also includes various manners. A method of determining a target sampling rate, and a method of determining speech quality are described below.

Referring to fig. 2, a flowchart illustrating steps of another embodiment of the method for determining speech quality of the present invention is shown, which specifically includes the following steps:

step 201, determining a first sampling rate according to the network type of the calling terminal, and determining a second sampling rate according to the network type of the called terminal.

In the embodiment of the invention, when the calling terminal and the called terminal carry out voice communication, the corresponding network systems for bearing voice services may be the same or different. For example, one case is that the network systems corresponding to the voice services of the calling terminal and the called terminal are both 3G; the other situation is that the network standard corresponding to the voice service of the calling terminal is 4G, and the network standard corresponding to the voice service of the called terminal is 3G. In the process of communication, the network corresponding to the voice service of the calling terminal and/or the called terminal may be switched from the high-order network to the low-order network, that is, in the process of determining the voice quality, the network standard of the same terminal may also be changed. The high-order network may be a network corresponding to the latest mobile communication technology in the communication field, such as a 4G network, and a network with an order higher than 4G, such as a 5G network and a 6G network; the lower-order network can be a network with an order lower than that of the higher-order network, such as a 2G network and a 3G network. Wherein, the order is determined according to the development stage of the mobile communication technology, for example, the order corresponding to the second generation mobile communication technology is 2 orders, the order corresponding to the third generation mobile communication technology is 3 orders, the order corresponding to the fourth generation mobile communication technology is 4 orders, the order corresponding to the fifth generation mobile communication technology is 5 orders, and so on, if the latest network technology at the present stage is the fifth generation mobile communication technology, the 4G and 5G can be determined as high-order networks, and the 2G and 3G can be determined as low-order networks; if the network technology is developed to the sixth generation mobile communication technology, it is possible to determine 6G and 5G as high-order networks and 2G, 3G, and 4G as low-order networks. Therefore, after the data of the calling terminal and the called terminal are respectively obtained, the corresponding sampling rate can be determined as a first sampling rate according to the network type of the calling terminal; and determining the corresponding sampling rate as a second sampling rate according to the network type of the called terminal.

In the embodiment of the invention, the sampling rates corresponding to different network systems are different, for example, the sampling rates corresponding to low-order mobile network systems such as 2G and 3G are 8 k; the high-order mobile network system such as 4G can correspond to sampling rates of 8k, 16k and 48 k. And the different network systems also have different modes for determining the sampling rate, so that the network system of the calling terminal and the network system of the called terminal can be respectively judged, and then the corresponding sampling rate is determined, which specifically comprises the following steps:

(1) determining a first sample rate of a calling terminal

After the network type of the calling terminal is obtained, judging the network type of the calling terminal, and judging whether the network type of the calling terminal is a high-order mobile network type or a low-order mobile network type; when the network type of the calling terminal is determined to be a low-order mobile network type, the preset sampling rate can be determined to be a first sampling rate; the preset sampling rate is a corresponding sampling rate preset for a low-order mobile network system, such as 8 k. When the network type of the calling terminal is determined to be a high-order mobile network type, because a plurality of samples corresponding to the high-order mobile network type can exist, the calling signaling can be obtained from the calling terminal, and a first sampling rate is determined according to the voice parameter of the calling signaling; for example, the parameters carried in the caller signaling: m ═ audio 13918RTP/AVP 104105; 104 AMR-WB/16000/1; 105 telephone-event/16000; and if the voice coding mode is determined to be AMR-WB, the corresponding sampling rate is 16K.

(2) Determining a second sample rate of the called terminal

Determining the second sample rate of the called terminal is similar to the method of determining the first sample rate of the calling terminal described above; namely judging whether the network type of the called terminal is a low-order mobile network type; when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate; when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to a voice parameter of the called signaling; the detailed steps are not described herein.

Step 202, comparing the first sampling rate with the second sampling rate, and determining the minimum sampling rate as the target sampling rate.

The embodiment of the invention can keep the sampling rate of the extracted target audio consistent with the sampling rate of the recorded audio, so as to be convenient for the subsequent analysis of the target audio and the recorded audio. Because the actual network load capacity of the low-order mobile network is lower than that of the high-order mobile network, the target sampling rate can be determined according to the corresponding sampling rate of the low-order network. Specifically, the first sampling rate is compared with the second sampling rate, and the minimum sampling rate is determined as a target sampling rate; judging the sizes of the first sampling rate and the second sampling rate, and determining the second sampling rate as a target sampling rate when the first sampling rate is greater than the second sampling rate; determining the first sample rate as a target sample rate when the first sample rate is less than the second sample rate; when the first sample rate is equal to the second sample rate, the first sample rate or the second sample rate is determined as a target sample rate. For example, the first sampling rate is 8K and the second sampling rate is 16K, the target sampling rate is determined to be 8K. In the process of determining the voice quality, the target sampling rate can be continuously adjusted in real time according to the network type of the calling terminal and the network type of the called terminal; to ensure the accuracy of the determined speech quality.

And step 203, extracting the target audio from the pre-stored audio by adopting the target sampling rate.

And after the target sampling rate is determined, extracting the target audio from the pre-stored audio by adopting the target sampling rate.

Step 204, playing the target audio at the calling terminal, and recording the audio at the called terminal by adopting the target sampling rate; and/or playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate.

In the embodiment of the invention, after the target audio with the preset time length is extracted, the target audio can be immediately played at the calling terminal and/or the called terminal, wherein the preset time length can be set according to the actual situation, such as one frame, 1s and the like. And when the target audio is played, recording the audio at the opposite end by adopting the target sampling rate. The opposite terminal of the calling terminal is a called terminal, and the opposite terminal of the called terminal is the calling terminal.

And step 205, after recording and playing are finished, calculating the recorded audio and the target audio by adopting a voice evaluation MOS scoring algorithm.

And step 206, determining the voice quality according to the calculated result.

In the embodiment of the invention, after recording and playing are finished, the target audio and the recorded audio are analyzed; one way to analyze the audio is to calculate the recorded audio and the target audio using a voice evaluation MOS scoring algorithm to obtain MOS values, and determine adaptive test parameters based on the target sampling rate when using the MOS algorithm. And then determining the voice quality according to the MOS value, wherein the MOS value is in direct proportion to the voice quality, usually, the MOS value corresponding to the voice quality test is between 0 and 5, the voice quality is the best when the MOS value is 5, and the voice quality is the worst when the MOS value is 0.

In the embodiment of the invention, a first sampling rate corresponding to the network type of a calling terminal is compared with a second sampling rate corresponding to the network type of a called terminal, and the minimum sampling rate is determined as a target sampling rate; therefore, the target sampling rate can simultaneously meet the network load capacity of the calling terminal and the called terminal, and the accuracy of the determined voice quality is ensured.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

EXAMPLE III

Referring to fig. 3, a block diagram of an embodiment of the apparatus for determining speech quality according to the present invention is shown. The device specifically comprises: a determination module 301, an extraction module 302, a play-recording module 303, and an analysis module 304, wherein,

the determining module 301 is configured to determine a target sampling rate according to a network type of the calling terminal and a network type of the called terminal.

A decimation module 302 for decimating the target audio from the pre-stored audio using the target sampling rate.

A playing and recording module 303, configured to play the target audio at the calling terminal, and record the audio at the called terminal by using the target sampling rate; and/or playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate.

And the analysis module 304 is used for determining the voice quality according to the analysis of the recorded audio and the target audio after the recording and the playing are finished.

On the basis of the above embodiments, the embodiments of the present invention describe sub-modules included in each module of the apparatus.

The determining module 301 of the embodiment of the present invention includes: a rate determination sub-module 3011 and a ratio sub-module 3012, wherein,

the rate determining sub-module 3011 is configured to determine a first sampling rate according to the network type of the calling terminal, and determine a second sampling rate according to the network type of the called terminal.

And a comparison sub-block 3012 for comparing the first sampling rate with the second sampling rate and determining the minimum sampling rate as the target sampling rate.

The rate determining submodule 3011 specifically determines whether the network type of the calling terminal is a low-order mobile network type; when the network type of the calling terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a first sampling rate; and when the network type of the calling terminal is determined to be a high-order mobile network type, obtaining a calling signaling from the calling terminal, and determining a first sampling rate according to the voice parameter of the calling signaling.

The rate determining submodule 3011 is specifically configured to determine whether the network type of the called terminal is a low-order mobile network type; when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate; and when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to the voice parameter of the called signaling.

The analysis module 304 in the embodiment of the present invention includes: a computation submodule 3041 and a quality determination submodule 3042, wherein,

a calculation submodule 3041, configured to calculate the recorded audio and the target audio by using a voice evaluation MOS scoring algorithm;

a quality determination submodule 3042 for determining the speech quality based on the result of the calculation.

In the embodiment of the present invention, the low-order mobile network standard includes a second-generation mobile network standard and a third-generation mobile network standard, and the high-order mobile network standard includes a fourth-generation mobile network standard and a mobile network standard with an order of more than fourth generation.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method for determining voice quality and the device for determining voice quality provided by the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for determining speech quality, comprising:

determining a target sampling rate according to the network type of the calling terminal and the network type of the called terminal; the target sampling rate is matched with the actual network load condition of the calling terminal and/or the called terminal;

extracting target audio from pre-stored audio by adopting the target sampling rate;

playing the target audio at the calling terminal, and recording the audio at the called terminal by adopting the target sampling rate; and/or, playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate;

after the recording and playing is finished, the voice quality is determined according to the analysis of the recorded audio and the target audio.

2. The method of claim 1, wherein the step of determining the target sampling rate according to the network type of the calling terminal and the network type of the called terminal comprises:

determining a first sampling rate according to the network type of the calling terminal and determining a second sampling rate according to the network type of the called terminal;

and comparing the first sampling rate with the second sampling rate, and determining the minimum sampling rate as the target sampling rate.

3. The method of claim 2, wherein the step of determining the first sample rate based on the network type of the calling terminal comprises:

judging whether the network type of the calling terminal is a low-order mobile network type;

when the network type of the calling terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a first sampling rate;

and when the network type of the calling terminal is determined to be a high-order mobile network type, obtaining a calling signaling from the calling terminal, and determining a first sampling rate according to the voice parameter of the calling signaling.

4. The method of claim 2 wherein the step of determining the second sample rate based on the network type of the called terminal comprises:

judging whether the network type of the called terminal is a low-order mobile network type;

when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate;

and when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to the voice parameter of the called signaling.

5. The method according to claim 3 or 4, wherein the low-order mobile network standard comprises a second-generation mobile network standard and a third-generation mobile network standard, and the high-order mobile network standard comprises a fourth-generation mobile network standard and a mobile network standard with an order of more than fourth generation.

6. The method of claim 1, wherein the step of determining speech quality from the analysis of the recorded audio and the target audio comprises:

calculating the recorded audio and the target audio by adopting a voice evaluation MOS scoring algorithm;

and determining the voice quality according to the calculated result.

7. An apparatus for determining speech quality, comprising:

the determining module is used for determining a target sampling rate according to the network type of the calling terminal and the network type of the called terminal; the target sampling rate is matched with the actual network load condition of the calling terminal and/or the called terminal;

the extraction module is used for extracting target audio from prestored audio by adopting the target sampling rate;

the playing and recording module is used for playing the target audio at the calling terminal and recording the audio at the called terminal by adopting the target sampling rate; and/or, playing the target audio at the called terminal, and recording the audio at the calling terminal by adopting the target sampling rate;

and the analysis module is used for determining the voice quality according to the analysis of the recorded audio and the target audio after the recording and the playing are finished.

8. The apparatus of claim 7, wherein the determining module comprises:

the rate determining submodule is used for determining a first sampling rate according to the network type of the calling terminal and determining a second sampling rate according to the network type of the called terminal;

and the comparison sub-module is used for comparing the first sampling rate with the second sampling rate and determining the minimum sampling rate as the target sampling rate.

9. The apparatus of claim 8,

the rate determining submodule is specifically used for judging whether the network type of the calling terminal is a low-order mobile network type; when the network type of the calling terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a first sampling rate; and when the network type of the calling terminal is determined to be a high-order mobile network type, obtaining a calling signaling from the calling terminal, and determining a first sampling rate according to the voice parameter of the calling signaling.

10. The apparatus according to claim 8, wherein the rate determining submodule is specifically configured to determine whether a network type of the called terminal is a low-order mobile network type; when the network type of the called terminal is determined to be a low-order mobile network type, determining a preset sampling rate as a second sampling rate; and when the network type of the called terminal is determined to be a high-order mobile network type, acquiring a called signaling from the called terminal, and determining a second sampling rate according to the voice parameter of the called signaling.

11. The apparatus according to claim 9 or 10, wherein the low-order mobile network standard comprises a second-generation mobile network standard and a third-generation mobile network standard, and the high-order mobile network standard comprises a fourth-generation mobile network standard and a mobile network standard with an order of more than fourth generation.

12. The apparatus of claim 7, wherein the analysis module comprises:

the calculation submodule is used for calculating the recorded audio and the target audio by adopting a voice evaluation MOS scoring algorithm;

and the quality determination submodule is used for determining the voice quality according to the calculation result.