CN112885332A

CN112885332A - Voice quality inspection method, system and storage medium

Info

Publication number: CN112885332A
Application number: CN202110023263.4A
Authority: CN
Inventors: 王吉星; 马晓亮; 李应春; 刘育楠; 黄湘闽; 杨威; 蓝兰; 陈柱安
Original assignee: Tisson Regaltec Communications Tech Co Ltd
Current assignee: Tisson Regaltec Communications Tech Co Ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-06-01

Abstract

The invention relates to a voice quality inspection method, system and storage medium. The voice quality inspection method includes: acquiring a recording file; performing voice recognition conversion on the recording file to obtain a call text; The multi-dimensional inspection of the file is carried out to obtain the inspection result, and the quality inspection score is performed on the inspection result to obtain the quality inspection score; Reduce the error rate caused by subjective factors.

Description

Voice quality inspection method, system and storage medium

Technical Field

The invention relates to the technical field of voice processing, in particular to a voice quality inspection method, a voice quality inspection system and a storage medium.

Background

In the customer service quality inspection industry, in order to guarantee the quality of customer service, the quality inspection needs to be carried out on the call records of the customer service manually, and at present, a voice quality inspection server of a server hot line carries out quality inspection through manual quality inspection or a simple voice quality inspection model after storing a recording file communicated between the customer service and a user; if the manual quality inspection mode is adopted, quality inspection personnel generally listen all the recording files again manually to perform voice quality inspection, so that not only is the quality inspection efficiency low, but also the accuracy of the detection result is generally low under the influence of human factors; if a simple voice quality inspection model is adopted to perform voice quality inspection on the recording file, quality inspection can be generally performed only on the characteristics of the recording file in a single aspect, the detection result is biased to a single dimension, and the accuracy is low.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a voice quality inspection method, a system and a storage medium, which are used for carrying out multi-dimensional detection on a recording file and grading a quality inspection result, thereby improving the accuracy of the quality inspection result and reducing the error rate caused by subjective factors.

The technical scheme for solving the technical problems is as follows: a voice quality inspection method comprises the following steps:

acquiring a sound recording file;

carrying out voice recognition conversion on the recording file to obtain a call text;

and carrying out multi-dimensional detection on the call text and/or the recording file according to a preset voice quality detection strategy to obtain a detection result, and carrying out quality detection grading on the detection result to obtain a quality detection grading result.

The invention has the beneficial effects that: acquiring a recording file and converting the recording file into a call text, so that subsequent quality inspection is facilitated; in the quality inspection process, multi-dimensional detection is carried out on the call text and/or the recording file according to a preset voice quality inspection strategy, automatic quality inspection is carried out on the voice, the quality inspection efficiency is improved, and the accuracy of the detection result of the voice quality inspection is improved through the multi-dimensional detection; and the quality inspection results of the multi-dimensional characteristics of the voice can be comprehensively scored, so that the error rate caused by subjective factors is reduced.

On the basis of the technical scheme, the invention can be further improved as follows:

further, the performing multi-dimensional detection on the call text and/or the recording file according to a preset voice quality detection strategy to obtain a detection result includes:

detecting a voice appearance point and a voice disappearance point of an audio signal of the recording file according to a voice endpoint detection model VAD; when the time length between the adjacent voice vanishing point and the voice appearing point exceeds the preset mute time length, obtaining a mute detection result;

and performing emotion analysis on the text corresponding to the customer service in the call file through a pre-trained text emotion recognition model, and obtaining an emotion detection result when negative emotion exists.

The beneficial effect of adopting the further scheme is that: the method comprises the steps of detecting the end point of voice call content in a record file to be quality-tested to realize mute detection of the call content so as to accurately detect out non-standard services of customer service personnel, analyzing emotion of a text corresponding to the customer service in a call text through a text emotion recognition model to obtain emotion recognition results corresponding to the text corresponding to the customer service in the call text, and accurately detecting the problem of poor service attitude of the customer service personnel.

Further, the performing multi-dimensional detection on the call text and/or the recording file according to a preset voice quality detection strategy to obtain a detection result further includes:

separating customer service sound and customer sound in the recording file through a human voice separation algorithm;

detecting the time length and the word number of each speech of the customer service in the audio file corresponding to the customer service sound;

taking the quotient of the duration and the number of words as the speech rate of the customer service, and obtaining a speech rate overproof detection result when the speech rate exceeds a standard speech rate range;

and detecting the end time corresponding to the speech vanishing point of each speech of the client in the audio file according to the VAD, detecting the appearance time corresponding to the speech appearance point of each speech of the customer service in the audio file, and obtaining a speech robbing detection result when the end time of the client is repeated with the start time of the customer service.

The beneficial effect of adopting the further scheme is that: after the interlocutors of the sound files to be inspected are separated, the speech speed of the customer service is calculated, the speech speed is detected, the time when the last speech of a customer is finished and the current speech starting time of the customer service can be calculated, whether effective speech robbing occurs or not can be calculated, and therefore the time period and the time length of the speech robbing in the sound files can be accurately found out, and the non-standard service of the customer service staff can be accurately detected.

acquiring a text corresponding to the customer service in the call text;

when the keywords in the text belong to a preset sensitive word set, obtaining a sensitive word detection result;

and when the dialect in the text belongs to a preset dialect analysis model, obtaining a dialect analysis detection result, wherein the preset dialect analysis module comprises a model which is required to accept the hidden trouble, a placating client model, a hot line switching model, an acceptance failure model, a correct welcoming language model and a correct ending language model.

The beneficial effect of adopting the further scheme is that: searching sensitive words in the recording, analyzing and checking whether service avoiding words are contained in the conversation of the customer service staff; and whether the customer service staff speaks reasonable dialogs according to the sequence specified by the flow can be checked through the prearranged dialogs detection models so as to accurately detect the non-standard service of the customer service staff.

forming semantic vectors by keywords corresponding to texts at preset positions in texts corresponding to customer services in the conversation texts;

and inputting the semantic vector into a preset trained convolutional neural network to obtain a service classification result.

The beneficial effect of adopting the further scheme is that: and extracting text characteristics according to the call content, and automatically classifying high-accuracy services after completing characteristic analysis through a convolutional neural network.

Further, the quality inspection scoring of the detection result to obtain a quality inspection scoring result comprises:

and acquiring the parameter weight of each detection result, and acquiring a quality inspection grading result according to each parameter weight and the detection result.

The beneficial effect of adopting the further scheme is that: and when one detection result is obtained by correspondingly carrying out the detection of one type, detecting the detection results of the call text and the recording file through the weight of each parameter so as to obtain a quality control scoring result and reduce the error rate caused by subjective factors.

Further, after performing multi-dimensional detection on the call text and/or the sound recording file according to a preset voice quality detection strategy to obtain a detection result, the method comprises the following steps:

and sending a quality inspection report containing the detection result and the quality inspection grading result to a quality inspector terminal for rechecking.

The beneficial effect of adopting the further scheme is that: and the quality inspection report is sent to the quality inspector terminal for rechecking, so that the detection accuracy is improved.

and when the detection result is matched with the preset automatic early warning and real-time intervention condition, sending the sound recording file to the quality inspector terminal for real-time intervention and rechecking.

The beneficial effect of adopting the further scheme is that: and sending the recording file to the quality inspector terminal for real-time intervention and rechecking through preset automatic early warning and real-time intervention conditions, so that the satisfaction degree of the service is improved.

In order to solve the above problem, an embodiment of the present invention further provides a voice quality inspection system, where the voice quality inspection system includes: the system comprises a data acquisition module, a voice transcription module, a voice quality inspection module and a quality inspector terminal;

the data acquisition module is used for acquiring a sound recording file;

the voice transcription module is used for carrying out voice recognition conversion on the recording file to obtain a call text;

the voice quality inspection module is used for carrying out multi-dimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, and carrying out quality inspection grading on the detection result to obtain a quality inspection grading result.

In order to solve the above problem, an embodiment of the present invention further provides a storage medium, where the storage medium includes one or more computer programs stored therein, and the one or more computer programs are executable by one or more processors to implement the steps of the intelligent voice interaction method described above.

Drawings

Fig. 1 is a flowchart of a voice quality inspection method according to an embodiment of the present invention;

fig. 2 is a structural diagram of a voice quality inspection system according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating an implementation of the voice quality inspection module according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, fig. 1 is a flowchart of a voice quality inspection method according to an embodiment of the present invention, the voice quality inspection method is applied to a voice quality inspection system, and the voice quality inspection method includes:

s101, acquiring a sound recording file;

s102, carrying out voice recognition conversion on the recording file to obtain a call text;

s103, carrying out multi-dimensional detection on the call text and/or the recording file according to a preset voice quality detection strategy to obtain a detection result, and carrying out quality detection grading on the detection result to obtain a quality detection grading result.

In the embodiment, the recording file is obtained and converted into the call text, so that subsequent quality inspection is facilitated; in the quality inspection process, multi-dimensional detection is carried out on the call text and/or the recording file according to a preset voice quality inspection strategy, automatic quality inspection is carried out on the voice, the quality inspection efficiency is improved, and the accuracy of the detection result of the voice quality inspection is improved through the multi-dimensional detection; and the quality inspection results of the multi-dimensional characteristics of the voice can be comprehensively scored, so that the error rate caused by subjective factors is reduced.

It can be understood that, after the user dials the customer service hotline, the recording file after communicating with the background manual customer service or the intelligent robot customer service is the recording file library stored in the external system, and the external system includes a call center, a customer service center or a call pickup service center. When voice quality inspection is needed, the sound recording file to be inspected needs to be extracted from the sound recording file library of the external system. In this embodiment, step S101 specifically includes: the quality control data source is configured in advance to obtain an address, and a corresponding sound recording File is downloaded to a sound recording File library of an external system through an FTP (File Transfer Protocol), wherein the obtained sound recording File may be one or multiple sound recording files.

In this embodiment, step S102 specifically includes: firstly, analyzing and processing voice signals of a sound recording file to be quality tested to remove redundant information; then extracting key information influencing speech recognition and feature information expressing language meanings, fastening the feature information, recognizing words by using a minimum unit, recognizing the words according to respective grammars of different languages and a sequence, and taking front and rear meanings as auxiliary recognition conditions, thereby being beneficial to analysis and recognition; and finally, according to semantic analysis, segmenting key information, taking out recognized words and connecting the words, adjusting sentence composition according to the meaning of the sentence, and synthesizing a call text.

It should be noted that the step S103 of performing multidimensional detection on the call text and/or the recording file may include silence detection, speech rate detection, call robbery detection, emotion detection, sensitive word analysis, dialect analysis, service analysis detection, and the like; the specific detection is as follows:

the silence detection includes: detecting a voice appearance point and a voice disappearance point of an audio signal of the audio file according to a voice endpoint detection model VAD; and when the time length between the adjacent voice vanishing point and the voice appearing point exceeds the preset mute time length, obtaining a mute detection result. A VAD algorithm divides the audio signal into a voiced part (voiced), an unvoiced part (unvoiced) and a silence part (silence), wherein the VAD detection step is to perform framing processing on the audio signal, extract features from each frame of data, and effectively find out the end points of the beginning and the end of the voice by using the difference between the short-time characteristics of the voice signal and the short-time characteristics of the non-voice signal; training a classifier on a data frame set of a known voice and silence signal area, classifying unknown frame data, judging whether the unknown frame data belongs to a voice signal or a silence signal, and further determining a voice appearance point and a voice disappearance point of an audio signal; the preset mute time can be flexibly adjusted according to actual requirements, for example, the preset mute time is 2 minutes, when the time between adjacent voice occurrence points is detected to exceed 2 minutes, which indicates that mute exists, a mute detection result of 1 is output; for example, after the user finishes speaking, the customer service only speaks after more than 2 minutes; or after the customer service finishes speaking, the user only speaks for more than 2 minutes.

The speech speed detection comprises the following steps: separating customer service sound and customer sound in the recording file by a human voice separation algorithm; detecting the time length and the word number of each speech of the customer service in the audio file corresponding to the customer service sound; taking the quotient of the duration and the number of words as the speech rate of the customer service, and obtaining a speech rate overproof detection result when the speech rate exceeds a standard speech rate range; if the speech speed of the customer service is too fast, the customer is difficult to hear clearly, and the service quality is influenced; and too slow a speech rate may result in less skilled seating skills or poor working conditions. The standard speech rate range can be flexibly adjusted according to actual requirements, for example, the standard speech rate range is the broadcast speed of 250-.

The call robbing detection comprises the following steps: separating customer service sound and customer sound in the recording file by a human voice separation algorithm; detecting the end time corresponding to the speech vanishing point of each speech of the client in the audio file according to VAD, detecting the appearance time of the speech appearing point of each speech of the customer service in the audio file, and obtaining a speech robbing detection result when the end time of the client is repeated with the start time of the customer service; furthermore, the repeated duration of the last speech end time of the client and the current speech starting time of the customer service can be detected to exceed the preset repeated duration, and a speech robbing detection result can be obtained. In this embodiment, the timestamp range corresponding to each speech of the customer service can be obtained, and when it is determined that the speech robbing behavior exists in the customer service, the timestamp range corresponding to the speech is located, so that subsequent manual review is facilitated.

The emotion detection comprises the following steps: and performing emotion analysis on the text corresponding to the customer service in the call file through a pre-trained text emotion recognition model, and obtaining an emotion detection result when negative emotion exists. The emotion analysis through the call text comprises the steps of analyzing context semantics and texts of texts corresponding to customer service, determining the position where the customer service emotion abnormality occurs, and conducting emotion analysis on the texts corresponding to the position where the customer service emotion abnormality occurs through a pre-trained text emotion recognition model, wherein the pre-trained text emotion recognition model can be a HiGRU (high emotional Gated Recurrent units) model, and negative emotion worry, sadness, anger, tension, anxiety, pain, fear, hate and the like. In some embodiments, the emotion analysis may be further performed on the audio signal corresponding to the customer service in the recording file, to extract the speech features such as energy (energy), pitch (pitch), mel-frequency cepstrum coefficient (MFCC), and the like, and the speech features are input into a pre-trained classifier to perform discrimination, and a result of the emotional state is output, where the classifier includes a Gaussian Mixture Model (GMM), a Hidden Markov Model (HMM), a Support Vector Machine (SVM), and the like. In some embodiments, emotion analysis may also be performed on the audio signal corresponding to the customer service in the call text and the recording file, for example, when it is determined from the call text that a negative emotion exists in a certain call, emotion analysis is performed on the audio signal again for detection.

The sensitive word analysis detection comprises the following steps: acquiring a text corresponding to the customer service in the call text; when the keywords in the text belong to a preset sensitive word set, obtaining a sensitive word detection result; and analyzing and checking whether the conversation of the customer service staff contains service prohibited words or not by searching the sensitive words in the recording. The method comprises the steps that a preset sensitive word set (which can be regarded as an uncivilized expression model) comprises a plurality of sensitive words; because each keyword exists before the conversation text is synthesized, the keyword included in the text corresponding to the customer service can be obtained, at the moment, each keyword included in the text corresponding to the customer service can be compared with each sensitive word in the sensitive word set one by one, once each keyword included in the text corresponding to the customer service has a sensitive word, the problem that the customer service has an uncivilized phrase can be judged, and a sensitive word detection result can be obtained.

The conversational analysis test includes: the method comprises the steps of obtaining a text corresponding to a customer in a conversation text, obtaining a conversation analysis detection result when a conversation in the text belongs to a preset conversation analysis model, wherein the preset conversation analysis module comprises a model which is not subjected to hidden danger and is to be accepted, a customer pacifying model, a hot line switching model, a model which is not to be accepted, a correct welcoming language model which is not used and a correct ending language model which is not used. Checking whether customer service personnel say reasonable dialect according to a sequence specified by a flow through semantic analysis, wherein a model which is not subjected to hidden danger and is used for detecting that the customer service personnel are classified as non-accepted items/non-municipal government and non-acceptance unit items when the customer service personnel should accept the call; the comfort client model is used for detecting the sentence which soothes the client in the customer service sentence, the switching hot line model is used for detecting the conversation which is switched to other mechanism units, the acceptance model is not used for detecting the conversation which can not be accepted by the customer service, the incorrect welcome language model is used for detecting the incorrect welcome language which is not used by the customer service, and the incorrect ending language model is used for detecting the incorrect ending language which is not used by the customer service.

The service analysis detection comprises the following steps: acquiring a text corresponding to the customer service in the call text; forming semantic vectors by keywords corresponding to texts at preset positions in the texts; and inputting the semantic vector into a preset trained convolutional neural network to obtain a service classification result. Text features are extracted from call contents, feature analysis is completed through a convolutional neural network, then high-accuracy service classification is automatically performed, and the convolutional neural network is preset for service classification; after the first few words of the client, such as the first 3 words, and the keywords extracted corresponding to the text form a semantic vector, the semantic vector is input to the convolutional neural network, so that the service type to be transacted by the user can be known, and the call quality inspection of the customer service can be carried out by calling the dialect corresponding to the service type.

In this embodiment, after obtaining each detection result, performing quality inspection scoring on the detection result to obtain a quality inspection scoring result includes: and acquiring the parameter weight of each detection result, and obtaining a quality inspection grading result according to each parameter weight and the detection result. The method comprises the steps that silence detection, speech speed detection, speech robbing detection, emotion detection, sensitive word analysis and conversational analysis are sequentially carried out on a call text and/or a recording file, when one type of detection is carried out to obtain a detection result correspondingly, the detection result can be correspondingly converted into a score value, the obtained corresponding detection result is set to be 1, for example, the obtained silence detection result is set to be 1, the speech speed overproof detection result is set to be 1, the conversational detection result is set to be 1, the emotion detection result is set to be 1, the sensitive word detection result is set to be 1, the conversational analysis detection result is set to be 1, and the weight is multiplied by the corresponding detection result to obtain a quality inspection score result; for example, if a sound recording file is subjected to silence detection, speech rate detection, speech robbery detection, emotion detection, sensitive word analysis and speech analysis detection to obtain a silence detection result, a speech robbery detection result and a speech analysis detection result, the quality test scoring result is as follows: 0.2 × 1+0.1 × 1+0.2 × 1 ═ 0.4; the quality inspection results of the multi-dimensional characteristics of the voice can be comprehensively scored by combining with the preset scoring rule, so that the error rate caused by subjective factors is reduced.

In this embodiment, after step S103, a quality inspection report is formed by the detection result and the quality inspection scoring result, and the quality inspection report is sent to the quality inspector terminal, where the quality inspection report of all the audio files may be sent to the quality inspector terminal, or a part of the quality inspection report may be sent to the quality inspector terminal, for example, when there is a silence condition, a negative emotion, or a detection result of a sensitive word, the quality inspection report is sent to the quality inspector terminal; for example, when the quality inspection scoring result is greater than a preset value, the quality inspection report is sent to a quality inspector terminal; the quality inspector terminal can display the detection result and the quality inspection grading result, and the quality inspector can manually re-listen to the recording file according to the detection result and the quality inspection grading result and then manually re-inspect the recording file. For example, when the corresponding value of the quality test scoring result is greater than the preset scoring value, the corresponding recording file is manually rechecked, and during manual rechecking, the detection result with higher parameter weight can be selected for emphasis rechecking.

It should be noted that, in this embodiment, after the call text and/or the sound recording file is subjected to the multidimensional detection according to the preset voice quality inspection policy to obtain the detection result, when the detection result matches with the preset automatic early warning and real-time intervention condition, the sound recording file is sent to the quality inspector terminal to perform the real-time intervention recheck. It can be understood that, in this embodiment, the recording files of the customer service and the customer can be obtained in real time, so as to perform quality inspection on the recording files, and when the detection result meets the detection result of performing automatic early warning and real-time intervention conditions, manual intervention and rechecking can be performed; the automatic early warning and real-time intervention conditions can be flexibly adjusted according to actual requirements, for example, when the silence detection result corresponds to silence of more than 3 minutes, the sound recording file is sent to a quality inspector terminal for rechecking, and when the rechecking determines that the silence of more than 3 minutes exists, real-time intervention can be performed through the quality inspector terminal.

According to the voice quality inspection method provided by the embodiment, the mass voice data in the voice quality inspection server is automatically inspected through the multi-dimensional quality inspection model, so that the voice quality inspection efficiency and accuracy are improved, automatic quality inspection can be simultaneously performed on a plurality of voices, and the quality inspection efficiency is improved; the voice quality control method and the voice quality control system can perform multi-dimensional quality control such as silence detection, voice speed detection, voice robbing detection, emotion detection, sensitive word analysis, dialectical analysis, business analysis and the like on voice, improve the accuracy of a detection result of voice quality control, can perform comprehensive grading on a quality control result of multi-dimensional characteristics of voice by combining with a preset grading rule, and reduce the error rate caused by subjective factors.

Example 2

The present embodiment provides a voice quality inspection system, as shown in fig. 2, the voice quality inspection system includes: a data acquisition module 201, a voice transcription module 202 and a voice quality inspection module 203;

the data obtaining module 201 is configured to obtain a sound recording file;

the voice transcription module 202 is used for performing voice recognition conversion on the recording file to obtain a call text;

the voice quality inspection module 203 is configured to perform multidimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, and perform quality inspection scoring on the detection result to obtain a quality inspection scoring result.

The voice quality inspection system acquires the recording file from a recording file library in an external system through the data acquisition software module, and the data acquisition software module sends the recording file to the voice transcription module to perform voice recognition to obtain a call text. The voice transcription module sends the call text to the voice quality inspection module, and the voice quality inspection module performs mute detection, speech speed detection, speech robbing detection, emotion detection, sensitive word analysis, dialect analysis and service analysis according to the voice quality inspection model stored in the voice quality inspection module to obtain a quality inspection grading result corresponding to the recording file.

The process of acquiring the audio file by the data acquisition module 201 is as follows:

the data acquisition module 201 establishes connection with a recording file library of an external system through a preset data source acquisition address;

the data acquisition module 201 downloads a corresponding sound recording file to be quality checked from a sound recording file library through an FTP protocol;

the data acquisition module 201 sends the audio file to be quality tested to the voice transcription module 202 through the local area network TCP.

The process of speech recognition by the speech transcription module 202 is as follows:

analyzing and processing the voice signal of the sound file to be tested, and removing redundant information;

extracting key information influencing speech recognition and feature information expressing language meaning, fastening the feature information, and recognizing words by using a minimum unit;

recognizing words according to respective grammars of different languages and the sequence, and taking the front meaning and the rear meaning as auxiliary recognition conditions, so as to be beneficial to analysis and recognition;

according to semantic analysis, the key information is divided into paragraphs, the recognized words are extracted and connected, and meanwhile, the sentence composition is adjusted according to the meaning of the sentence, and the conversation text is synthesized.

As shown in fig. 3, the process of performing voice quality inspection by the voice quality inspection module 203 is as follows:

s301, the voice quality inspection module 203 performs end point detection on voice call content in the record file to be inspected so as to judge whether mute audio exceeding 3 minutes exists or not; if there is a mute audio exceeding 3 minutes, outputting a mute detection result (output 1);

s302, the voice quality inspection module 203 separates the interlocutors through the recording file to be quality inspected, obtains the audio data corresponding to the customer service, calculates the speech speed to obtain a speech speed result, and judges whether the speech speed result exceeds a standard speech speed range; if the speech rate result does not exceed the standard speech rate range, outputting the speech rate detection result (outputting 1);

s303, the voice quality inspection module 203 separates interlocutors through the audio files to be inspected, obtains audio data corresponding to the customer service and the customer, obtains the end time corresponding to the voice vanishing point of each speech of the customer according to the audio data corresponding to the customer, detects the appearance time corresponding to the voice appearance point of each speech of the customer service in the audio files, and outputs a speech robbing detection result (output 1) when the end time of the customer is repeated with the start time of the customer service;

s304, the voice quality inspection module 203 analyzes the emotion of the text corresponding to the customer service in the call text through the text emotion recognition model to obtain an emotion recognition result corresponding to the text corresponding to the customer service in the call text, and if the customer service has negative emotion, the emotion detection result is output (output 1);

s305, the voice quality inspection module 203 performs sensitive word detection on the call text through a preset sensitive word set to judge whether sensitive words exist in the keywords corresponding to the call text; if the sensitive words exist in the keywords corresponding to the call text, outputting a sensitive word analysis detection result (output 1);

s306, the voice quality inspection module 203 judges whether corresponding unreasonable dialogues exist in the text corresponding to the customer service in the call text through a preconfigured dialogues analysis model; if the corresponding unreasonable dialogs exist in the text corresponding to the customer service in the call text, outputting a dialogs analysis detection result (outputting 1);

s307, the voice quality inspection module 203 extracts keywords from the first 3 words in the text corresponding to the client in the call text to form a semantic vector, and the semantic vector is used as the input of a pre-trained convolutional neural network to obtain a service classification result;

s308, the voice quality inspection module 203 calculates the quality inspection score according to each detection result obtained in the steps S301 to S306 to obtain a quality inspection score result; the calculation rule of the quality inspection scoring result may be a pre-configured parameter weight, and the parameter weight is multiplied by the corresponding detection result to obtain the quality inspection scoring result.

The order of the steps of steps S301 to S307 is not limited, and may be flexibly adjusted.

In this embodiment, the voice quality inspection module 203 sends the detection result and the quality inspection scoring result corresponding to the recording file to the quality inspector terminal, and performs manual review and other operations on the quality inspector terminal.

The voice quality inspection rechecking process of the quality inspector terminal module is as follows:

obtaining a quality inspection report consisting of all detection results and quality inspection grading results;

and displaying the quality inspection report at a quality inspector terminal for manual review.

The voice quality inspection module 203 can also send the recording file to the quality inspector terminal when the detection result is matched with the preset automatic early warning and real-time intervention condition, and the quality inspector terminal module performs real-time intervention and recheck.

The present embodiment further provides a storage medium, where the storage medium includes one or more computer programs stored therein, and the one or more computer programs can be executed by one or more processors to implement the steps of the voice quality inspection method described above, which are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained in this patent by applying specific examples, and the descriptions of the embodiments above are only used to help understanding the principles of the embodiments of the present invention; the present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. a voice quality inspection method, is characterized in that, described voice quality inspection method comprises:

Obtain recording files;

Performing voice recognition conversion on the recording file to obtain the call text;

Perform multi-dimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, and perform a quality inspection score on the detection result to obtain a quality inspection score result.

2. The voice quality inspection method according to claim 1, wherein the multi-dimensional detection of the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result comprises:

According to the voice endpoint detection model VAD, the voice appearing point and the voice disappearing point of the audio signal of the recording file are detected; when the duration between the adjacent voice disappearing point and the voice appearing point exceeds the preset mute duration, the mute detection result is obtained ;

Emotion analysis is performed on the text corresponding to the customer service in the call file through a pre-trained text emotion recognition model, and when there is a negative emotion, an emotion detection result is obtained.

3. The voice quality inspection method according to claim 2, wherein the multi-dimensional detection of the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result further comprises:

Separate the customer service voice and the customer voice in the recording file by using a voice separation algorithm;

Detecting the duration and word count of each sentence of the customer service in the audio file corresponding to the customer service voice;

Taking the quotient of the duration and the number of words as the speech rate of the customer service, when the speech rate exceeds the standard speech rate range, a speech rate exceeding detection result is obtained;

According to the VAD, the end time corresponding to the voice vanishing point of each sentence of the customer in the audio file is detected, and the appearance time corresponding to the voice appearance point of each sentence of the customer in the audio file is detected. The end time is the same as the start time of the customer service, and the call grabbing detection result is obtained.

4. The voice quality inspection method according to claim 3, wherein the multi-dimensional detection of the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result further comprises:

Obtain the text corresponding to the customer service in the call text;

When the keyword in the text belongs to the preset sensitive word set, obtain the sensitive word detection result;

When the speech in the text belongs to a preset speech analysis model, the speech analysis detection result is obtained, and the preset speech analysis module includes a model for accepting no hidden dangers, a model for appeasing customers, a model for transferring hotline, a model for not accepting Accepting model, not using the correct welcome model, and not using the correct closing model.

5. The voice quality inspection method according to claim 4, wherein the multi-dimensional detection of the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result further comprises:

A semantic vector is formed from the keywords corresponding to the text in the preset position in the text corresponding to the customer service in the call text;

Input the semantic vector into a pre-trained convolutional neural network to obtain a business classification result.

6. The voice quality inspection method according to any one of claims 1-4, wherein the performing quality inspection scoring on the detection result to obtain a quality inspection scoring result comprises:

The parameter weight of each test result is obtained, and a quality inspection score result is obtained according to each of the parameter weight and the test result.

7. The voice quality inspection method according to claim 6, characterized in that, after performing multi-dimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, the method comprises:

Send the quality inspection report including the detection result and the quality inspection score result to the quality inspector terminal for re-inspection.

8. The voice quality inspection method according to claim 6, characterized in that, after performing multi-dimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, the method comprises:

When the detection result matches the preset automatic warning and real-time intervention conditions, the recording file is sent to the quality inspector terminal for real-time intervention re-inspection.

9. A voice quality inspection system, characterized in that the voice quality inspection system comprises: a data acquisition module, a voice transcription module, and a voice quality inspection module;

The data acquisition module is used to acquire recording files;

The voice transcription module is used for voice recognition conversion of the recording file to obtain the call text;

The voice quality inspection module is configured to perform multi-dimensional detection on the call text and/or the recording file according to a preset voice quality inspection strategy to obtain a detection result, and perform a quality inspection score on the detection result to obtain a quality inspection score result .

10. A storage medium, characterized in that, the storage medium comprises one or more computer programs stored thereon, and the one or more computer programs can be executed by one or more processors to realize the steps as claimed in claims 1 to 10. 11 . Steps of the voice quality inspection method described in any one of 8.