CN111508529A

CN111508529A - Dynamic extensible voice quality inspection scoring method

Info

Publication number: CN111508529A
Application number: CN202010299128.8A
Authority: CN
Inventors: 苏一敏; 刘蕾
Original assignee: Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Current assignee: Shenzhen Aerospace Technology & Innovation Industrial Co ltd; Aerospace Science and Industry Shenzhen Group Co Ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-07

Abstract

The invention relates to a dynamic extensible voice quality inspection scoring method, which comprises the steps of obtaining recorded audio data of a quality inspection object, converting the obtained recorded audio data into text data in a text format and into a voice oscillogram, detecting service avoiding words according to the text data, generating a first deduction value, detecting emotional abnormality according to the voice oscillogram, generating a second deduction value, obtaining a quality inspection score of the recorded audio data according to the first deduction value, the second deduction value and a full point value, and feeding back a judgment result of whether the quality inspection score, the service avoiding words existing in the text data and the recorded audio data have abnormal emotions. The voice quality inspection scoring method is an automatic quality inspection scoring method, quality inspection personnel do not need to be specially arranged, the problem of large quality inspection workload is avoided, the personnel input cost is reduced, the phenomenon that the scoring fairness is lost due to subjective judgment of the quality inspection personnel is avoided, and the accuracy is high.

Description

Dynamic extensible voice quality inspection scoring method

Technical Field

The invention relates to a dynamic extensible voice quality inspection scoring method.

Background

At present, a management system formed by a computer network has become the most common management system for information production and management. Such as: in financial institutions such as banks, securities, insurance and the like or other institutions, with the development of business, professional customer service personnel are required to be equipped to provide service for customers so as to meet the requirement of business development. Customer service personnel need put through and handle many times of calls every day, in order to monitor the service quality of customer service personnel, generally need special quality control personnel to carry out quality control score to customer service personnel's call recording, judge customer service personnel's service quality according to quality control score. At present, quality testing personnel usually adopt a manual quality testing mode to carry out quality testing and grade according to own experience, so that the workload of quality testing is large, only sampling inspection can be realized, and moreover, due to the addition of subjective judgment of the quality testing personnel, human factors intervene deeply, and the fairness of grading can be lost to a certain extent.

Disclosure of Invention

The invention aims to provide a dynamic extensible voice quality inspection scoring method which is used for solving the problem that a manual quality inspection mode brings large quality inspection workload.

In order to solve the problems, the invention adopts the following technical scheme:

a dynamic extensible voice quality inspection scoring method comprises the following steps:

acquiring the recording audio data of a detected object;

converting the obtained recording audio data into text data in a text format and converting the text data into a voice oscillogram;

comparing the obtained text data with a preset service aversion database to obtain service aversion existing in the text data;

inputting the service prohibited words into a preset scoring standard for matching scoring to obtain a first withholding numerical value; the preset scoring standard comprises at least two service prohibited phrases and a score deduction value corresponding to each service prohibited phrase;

obtaining a voice amplitude maximum value and a voice amplitude minimum value in the voice oscillogram according to the voice oscillogram;

calculating a difference value between the maximum voice amplitude value and the minimum voice amplitude value, and comparing the difference value with a preset emotion abnormal amplitude difference threshold value;

if the difference value is larger than or equal to the emotion abnormal amplitude difference threshold value, judging that abnormal emotion exists in the recorded audio data, and obtaining a second score value;

calculating the sum of the first deduction value and the second deduction value to obtain a total deduction value, and obtaining a quality inspection grade of the recorded audio data according to a preset full score and the total deduction value;

and feeding back the quality inspection score of the recorded audio data, service aversion existing in the text data and a judgment result of whether abnormal emotion exists in the recorded audio data.

Optionally, the acquiring the recorded audio data of the quality-tested object includes:

acquiring the audio data of the sound recording of the object to be quality tested and the identity of the object to be quality tested;

correspondingly, the feedback of the quality control score of the recorded audio data, the service aversion existing in the text data and the judgment result of whether the recorded audio data has abnormal emotion or not comprises the following steps:

generating a quality control grading result blank data table according to the quality control grading of the recorded audio data, wherein the quality control grading result blank data table comprises an identity mark filling area, a quality control grading filling area, a service avoiding filling area and an abnormal emotion filling area;

filling the identity of the quality-tested object into the identity filling area, filling the quality testing score of the recorded audio data into the quality testing score filling area, filling the service aversion existing in the text data into the service aversion filling area, filling the judgment result of whether the recorded audio data has abnormal emotion into the abnormal emotion filling area, and obtaining a quality testing score result target data table;

and feeding back the quality inspection scoring result target data table.

Optionally, after the feeding back the quality inspection result target data table, the dynamically extensible voice quality inspection scoring method further includes:

converting the quality control scoring result target data table into a pdf file;

encrypting the quality inspection scoring result target data table to obtain an encrypted quality inspection scoring result target data table;

packaging the pdf file and the encrypted quality inspection scoring result target data table into a compressed packet file, and storing the compressed packet file into a memory database;

correspondingly, when a data calling instruction is received, the compressed packet file in the memory database is sent to a calling object which sends the data calling instruction.

The invention has the beneficial effects that: the dynamic extensible voice quality inspection scoring method provided by the invention is an automatic quality inspection scoring method according to the recorded audio data, compared with the traditional manual quality inspection scoring mode, the method is not limited by the size and the number of the recorded audio data, the size or the number of the recorded audio data to be subjected to quality inspection can be dynamically extended according to actual needs, the dynamic extensibility of quality inspection scoring is realized, and the more the number of the recorded audio data to be subjected to quality inspection is, the more obvious the effect is compared with the manual quality inspection scoring mode; the voice quality inspection scoring method comprises the steps of performing quality inspection scoring through two aspects, namely service banishing and emotion abnormity, wherein recorded audio data are converted into text data in a text format, the text data are compared with a preset service banishing database to obtain the service banishing existing in the text data, then a first deduction value generated due to the service banishing is obtained according to the deduction value of each service banishing, the recorded audio data are converted into a voice oscillogram, then a voice amplitude maximum value and a voice amplitude minimum value in the voice oscillogram are obtained, the difference value between the voice amplitude maximum value and the voice amplitude minimum value is calculated, the difference value is compared with a preset emotion abnormity amplitude difference threshold value, and because speaking is excited and voice fluctuation is large when emotion is abnormal, such as anger, the difference value between the voice amplitude maximum value and the voice amplitude minimum value is large, therefore, if the difference value is larger than or equal to the emotion abnormal amplitude difference threshold value, judging that abnormal emotion exists in the recorded audio data, and obtaining a corresponding second credit score value; correspondingly, a total deduction numerical value is obtained according to the first deduction numerical value and the second deduction numerical value, a quality inspection grade of the recorded audio data is obtained according to a preset full value and the total deduction numerical value, and finally related data information is fed back. The voice quality inspection scoring method integrates two very important aspects to perform quality inspection scoring on voice, and the scoring accuracy is high; need not to set up quality control personnel specially, and then avoid producing the problem of very big quality control work load, reduced personnel's input cost, and can realize examining entirely, moreover, owing to there is not the intervention of human factor, avoid losing the fairness of grading because of quality control personnel's subjective judgement, the quality control that finally obtains is graded more objectively, and the accuracy is higher.

Drawings

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings needed to be used in the embodiment will be briefly described as follows:

FIG. 1 is a flow chart of a dynamically extensible speech quality inspection scoring method.

Detailed Description

The embodiment provides a dynamic extensible voice quality inspection scoring method, an execution main body of the dynamic extensible voice quality inspection scoring method can be an intelligent mobile terminal, a notebook computer or a desktop computer, and can also be a server, and the execution main body is not limited in the embodiment. In this embodiment, the dynamically extensible voice quality inspection scoring method is implemented in a computer by executing an application program.

As shown in fig. 1, the dynamic extensible speech quality inspection scoring method includes the following steps:

acquiring the recorded audio data of the object to be quality tested:

in this embodiment, the computer acquires the recorded audio data of the object to be quality-tested from the storage server of the background monitoring center. Wherein, the object of being examined quality can be the customer service personnel of enterprise, and recording audio data is the recording of customer service personnel in a certain period of time, and the length of this period of time is set up by actual conditions, for example: the recorded audio data is a complete recording of the customer service staff when the customer service staff communicates with a certain client.

Further, in addition to obtaining the recorded audio data of the quality-tested object, an identification of the quality-tested object may also be obtained, so as to be subsequently associated with the quality-test score, where the identification may be a job number of the quality-tested object, such as 1001.

Converting the obtained recording audio data into text data in a text format, and converting the obtained recording audio data into a voice oscillogram:

and carrying out two conversion processes on the acquired recording audio data, wherein the first conversion process is text data converted into a text format, and the second conversion process is a voice oscillogram.

The first processing mode, namely converting the obtained audio data of the audio recording into text data in a text format, belongs to the conventional technical means and is not described in detail.

Speech is essentially sound waves generated by vibrations, and the basic analog form of speech is an acoustic wave called a speech signal. The conversion of speech into electrical signals is essentially the conversion into speech oscillograms, in which the abscissa represents time and the ordinate represents amplitude. The amplitude represents the sound level and can be equivalent to decibels, i.e., the speech waveform is a decibel waveform. Then, converting the recorded audio data into a speech waveform essentially converts the recorded audio data into a decibel waveform.

The first processing process is carried out based on the obtained text data, the second processing process is carried out based on the obtained voice oscillogram, the first processing process and the second processing process do not have strict sequential execution sequence, and the following three execution sequences can be provided: (1) firstly, carrying out a first treatment process and then carrying out a second treatment process; (2) firstly, carrying out a second treatment process, and then carrying out a first treatment process; (3) the first treatment process and the second treatment process are performed simultaneously. The first processing procedure, the second processing procedure, and the process of performing final processing on the result obtained by the first processing procedure and the result obtained by the second processing procedure will be described below.

The first processing procedure comprises the following steps:

comparing the obtained text data with a preset service aversion database to obtain service aversion existing in the text data:

in the customer service, it is very desirable to avoid the customer service to say some service abstinence, i.e. words that cannot even be said to be forbidden in the service process, such as: really stupid, what is urgent, sick brain, incomplete, I just so, and the like. Then, quality inspection is performed to determine whether the customer service staff says these service banners when performing customer communication.

Presetting a service phrase database, wherein a plurality of (i.e. at least two) service phrases are stored in the service phrase database, and the number of the service phrases specifically stored in the service phrase database is set according to actual needs, such as: if the quality inspection requirement is strict, setting a plurality of service prohibited languages in a service prohibited language database; if the quality inspection requirement is loose, a few service prohibited words are set in the service prohibited word database. In this embodiment, the service aversion database includes the following service aversion words: annoying, really stupid, really annoying, really tremble, find oneself, little wasted words, what you are doing, you do not have long eyes, what you are urgent, brain is sick, none is done, i is in this attitude, do you do more and do not see i is busy.

And comparing the obtained text data with a preset service aversion database to obtain the service aversion existing in the text data. Such as: the obtained recorded audio data is 'good you, happy and serve you', … …, urgent, … …, really clumsy, … …, i is in this attitude, … …, bye ', the text data converted into the text format is' good you, happy and serve you ', … …, urgent, … …, really clumsy, … …, i is in this attitude, … …, bye', the text data is compared with a preset service language database, and the service avoiding words existing in the text data are 'urgent, really clumsy' and 'my is in this attitude' respectively.

Inputting the service prohibited words into a preset scoring standard for matching scoring to obtain a first withholding numerical value; the preset scoring standard comprises at least two service prohibited phrases and a score deduction value corresponding to each service prohibited phrase:

in order to perform quality inspection scoring on service prohibited words, a scoring standard is required to be preset, and the preset scoring standard comprises a plurality of (namely at least two) service prohibited words and a deduction score value corresponding to each service prohibited word. The service aversion words contained in the scoring standard can be the same as the service aversion words contained in the preset service aversion word database or more than the service aversion words contained in the preset service aversion word database. The scoring values corresponding to the service prohibited phrases in the scoring standard may be equal or different. Specifically, different deduction values can be set according to different severity degrees of the consequences possibly brought by service purlins, such as: the severity of the possible consequence of 'what is urgent' is less than the severity of the possible consequence of 'really clumsy' is less than the severity of the possible consequence of 'I just in this attitude', the deduction value corresponding to 'what is urgent' can be set to be 10 points, the deduction value corresponding to 'really clumsy' is 20 points, and the deduction value corresponding to 'I just in this attitude' is 30 points.

Then, the service prohibited words existing in the text data are input into a preset scoring standard for matching scoring, and then the first deduction value can be obtained. The first deduction value is the sum of deduction values corresponding to the appeared service prohibited words. Following the above example: since the text data "good you, happy serve you, … …, urgent what, … …, really clumsy, … …, i'm is in this attitude, … …, see that" the service whistles existing in "are" urgent what "," really clumsy "and" i'm is in this attitude ", respectively, they are input into the scoring standard to be matched and scored, and three score values are obtained, 10, 20 and 30 respectively, and the first score value is 60.

And a second treatment process:

obtaining the maximum value and the minimum value of the voice amplitude in the voice waveform diagram according to the voice waveform diagram:

the speech waveform is equivalent to a decibel waveform, i.e., a waveform in which decibels vary with time. Then, the voice amplitude maximum value and the voice amplitude minimum value in the voice waveform map can be obtained from the voice waveform map. The maximum voice amplitude value is the maximum decibel value in the waveform of the variation of decibels along with time (namely, the maximum decibel value in a decibel waveform diagram), and the minimum voice amplitude value is the minimum decibel value in the waveform of the variation of decibels along with time (namely, the minimum decibel value in the decibel waveform diagram).

Calculating the difference value between the maximum voice amplitude value and the minimum voice amplitude value, and comparing the difference value with a preset emotional abnormal amplitude difference threshold value:

and calculating the difference value between the maximum voice amplitude value and the minimum voice amplitude value, namely calculating the difference value between the maximum decibel value and the minimum decibel value. The difference between the maximum decibel value and the minimum decibel value can represent the tone of speaking of the customer service personnel. In general, when the emotion is abnormal, such as anger, the speech is excited, the voice fluctuation is large, that is, the amplitude of the speech sound is large, that is, the difference between the maximum decibel value and the minimum decibel value is large.

And then, comparing the difference value between the maximum decibel value and the minimum decibel value with a preset emotional anomaly amplitude difference threshold value. The specific value of the emotional anomaly amplitude difference threshold is set by actual needs, such as: if the quality inspection is strict, the value of the emotion abnormal amplitude difference threshold value can be set to be smaller; if the quality inspection is loose, the value of the emotion abnormal amplitude difference threshold value can be set to be larger.

If the difference value is larger than or equal to the emotion abnormal amplitude difference threshold value, judging that abnormal emotion exists in the recorded audio data, and obtaining a second score value:

if the difference value between the maximum decibel value and the minimum decibel value is larger than or equal to the preset emotion abnormal amplitude difference threshold value, the difference value between the maximum decibel value and the minimum decibel value is larger, the speaking voice has larger amplitude and larger voice fluctuation, speaking is excited, and the emotion abnormality (such as anger) of the customer service staff is judged, namely the recorded audio data is judged to have abnormal emotion, namely the customer service staff has abnormal emotion when speaking the recorded audio data. Since abnormal emotion cannot occur when the client is served, a point is deducted to obtain a second point value when abnormal emotion occurs. The specific value of the second deduction value is set according to actual needs, and if the quality inspection is strict, the value of the second deduction value can be set to be larger; if the quality inspection is loose, the value of the second deduction value can be set to be smaller. Such as: the second score value was 20. In addition, if the difference value between the maximum decibel value and the minimum decibel value is smaller than a preset emotion abnormal amplitude difference threshold value, the recorded audio data is judged to have no abnormal emotion, and therefore deduction is not needed.

The first processing procedure obtains a first deduction value, and the second processing procedure obtains a second deduction value. Then, next:

calculating the sum of the first deduction value and the second deduction value to obtain a total deduction value, and obtaining the quality inspection grade of the recorded audio data according to a preset full score and the total deduction value:

calculating the sum of the first deduction value and the second deduction value to obtain a total deduction value, and obtaining the quality inspection grade of the recorded audio data according to the preset full score and the total deduction value, specifically: and subtracting the total deduction value from the preset full score value to obtain a difference value which is the quality inspection score of the recorded audio data. The full score is set by actual conditions, such as: 100 minutes. Following the above example: the first score is 60 points and the second score is 20 points, so that the quality control score of the recorded audio data is 100- (60+20) to 20 points.

Feeding back a quality inspection score of the recorded audio data, service prohibited words existing in the text data and a judgment result of whether abnormal emotion exists in the recorded audio data:

after the quality inspection score of the recorded audio data is obtained, the quality inspection score of the recorded audio data, service avoiding words existing in the text data and the judgment result of whether abnormal emotion exists in the recorded audio data are fed back to related objects, such as quality inspection personnel. The feedback may be in the form of a display via a display screen of the computer.

Further, a specific implementation process of feeding back a quality inspection score of the recorded audio data, a service aversion existing in the text data, and a determination result of whether the recorded audio data has an abnormal emotion is given as follows:

(1) and generating a quality control grading result blank data table according to the quality control grading of the recorded audio data, wherein the quality control grading result blank data table comprises an identity mark filling area, a quality control grading filling area, a service avoiding filling area and an abnormal emotion filling area. The identification filling area is used for filling identification of the customer service personnel, the quality inspection score filling area is used for filling quality inspection scores of the recorded audio data, the service avoiding words filling area is used for filling service avoiding words existing in the text data, and the abnormal emotion filling area is used for filling a judgment result of whether the recorded audio data has abnormal emotion or not. Table 1 shows a specific implementation manner of the blank data table of the quality inspection scoring result, where the area a represents an identification filling area, the area B represents a quality inspection scoring filling area, the area C represents a service aversive filling area, and the area D represents an abnormal emotion filling area.

TABLE 1

A	B
		C	D

(2) Filling the identity of the object to be quality tested into the identity filling area of the quality testing scoring result blank data table, filling the quality testing score of the recorded audio data into the quality testing score filling area of the quality testing scoring result blank data table, filling the service avoiding words existing in the text data into the service avoiding words filling area of the quality testing scoring result blank data table, filling the judgment result of whether the recorded audio data has abnormal emotion into the abnormal emotion filling area of the quality testing scoring result blank data table, and obtaining a data table filled with related data, wherein the data table is a quality testing scoring result target data table. Table 2 shows a specific implementation of the objective data table of the quality inspection scoring results.

TABLE 2

1001	20 minutes
		What is urgent and really stupid is	The presence of abnormal emotion

And feeding back the quality inspection scoring result target data table to a related object, such as a quality inspector, wherein the feedback form can be displayed through a display screen of a computer. The quality testing personnel can intuitively know which quality testing object has the quality testing score, the service avoiding words and whether abnormal emotions exist according to the data table.

After feeding back the quality inspection result target data table, the dynamic extensible voice quality inspection scoring method further comprises the following steps:

converting the quality testing result target data table into a pdf file:

the quality control scoring result target data table can be a Word document or an Excel document, and then the Word document or the Excel document is converted into a pdf file, and the conversion mode belongs to the conventional technology and is not repeated.

Encrypting the quality inspection scoring result target data table to obtain an encrypted quality inspection scoring result target data table:

and encrypting the quality inspection scoring result target data table, wherein the encryption mode can be as follows: and setting an opening password for the quality control scoring result target data table, inputting the password before opening the quality control scoring result target data table, and opening the quality control scoring result target data table only after the password is correct, or encrypting the quality control scoring result target data table, so that the quality control scoring result target data table is in a read-only state when opened and cannot be edited. And the encrypted quality control scoring result target data table is an encrypted quality control scoring result target data table.

Packaging the pdf file and the encrypted quality inspection scoring result target data table into a compressed packet file, and storing the compressed packet file into a memory database:

and packaging the generated pdf file and the target data table of the encrypted quality inspection result into a compressed packet file, and storing the compressed packet file into a memory database.

The process is a data storage process, the quality inspection scoring result target data table can be prevented from being randomly tampered by encrypting the quality inspection scoring result target data table, and the quality inspection scoring result target data table can be effectively prevented from being randomly tampered by converting the quality inspection scoring result target data table into a pdf file, so that the data effectiveness is improved.

Correspondingly, when a data calling instruction is received, sending the compressed packet file in the memory database to a calling object which sends the data calling instruction, such as: and a background monitoring center.

The above-mentioned embodiments are merely illustrative of the technical solutions of the present invention in a specific embodiment, and any equivalent substitutions and modifications or partial substitutions of the present invention without departing from the spirit and scope of the present invention should be covered by the claims of the present invention.

Claims

1. A dynamic extensible voice quality inspection scoring method is characterized by comprising the following steps:

acquiring the recording audio data of a detected object;

2. The dynamically scalable speech quality control scoring method according to claim 1, wherein the obtaining of recorded audio data of a quality-controlled object comprises:

and feeding back the quality inspection scoring result target data table.

3. The dynamically scalable voice quality control scoring method according to claim 2, wherein after the feeding back the quality control scoring result target data table, the dynamically scalable voice quality control scoring method further comprises: