CN114372701A - Method and device for evaluating customer service quality, storage medium and equipment - Google Patents

Method and device for evaluating customer service quality, storage medium and equipment Download PDF

Info

Publication number
CN114372701A
CN114372701A CN202210018451.2A CN202210018451A CN114372701A CN 114372701 A CN114372701 A CN 114372701A CN 202210018451 A CN202210018451 A CN 202210018451A CN 114372701 A CN114372701 A CN 114372701A
Authority
CN
China
Prior art keywords
video
evaluation result
neural network
network model
customer service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210018451.2A
Other languages
Chinese (zh)
Inventor
陈艳婷
暨光耀
张�浩
吴晓茵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210018451.2A priority Critical patent/CN114372701A/en
Publication of CN114372701A publication Critical patent/CN114372701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Child & Adolescent Psychology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for evaluating customer service quality, a storage medium and equipment. Wherein, the method comprises the following steps: acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process; calculating to obtain an audio characteristic classification evaluation result based on audio information by adopting a first neural network model; calculating to obtain video characteristic classification evaluation results based on video information by adopting a second neural network model and a third neural network model, wherein the audio characteristic classification evaluation results and the video characteristic classification evaluation results both comprise evaluation results of at least one emotional characteristic; and determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result. The invention solves the technical problems of low identification efficiency and poor evaluation effect caused by the fact that the prior art cannot realize accurate identification and evaluation on the customer service quality.

Description

Method and device for evaluating customer service quality, storage medium and equipment
Technical Field
The invention relates to the technical field of finance, in particular to the technical field of intelligent algorithm and intelligent identification, and specifically relates to a method, a device, a storage medium and equipment for evaluating customer service quality.
Background
With the continuous development of network informatization, remote video service is widely applied, the real-time performance of the video customer service process and most video customer services lack effective supervision, and more illegal behaviors occur. On one hand, the existing service evaluation system carries out post-service evaluation through scoring by a customer responding to a short message and selecting satisfaction on an interactive interface by the customer, and more intelligent and real-time evaluation means are few; on the other hand, the existing video analysis task is usually based on a deep network model, and training and reasoning of the model bring huge computational pressure, and meanwhile, due to instantaneity of a video customer service process and complexity of audio and video data, an effective method is not provided for effectively monitoring violation behaviors of video customer service at present.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method, a device, a storage medium and equipment for evaluating customer service quality, which at least solve the technical problems of low identification efficiency and poor evaluation effect caused by the fact that the prior art cannot realize accurate identification and evaluation on the customer service quality.
According to an aspect of an embodiment of the present invention, there is provided a method for evaluating customer service quality, including: acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process; calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; calculating to obtain a video characteristic classification evaluation result based on the video information by adopting a second neural network model, wherein the audio characteristic classification evaluation result and the video characteristic classification evaluation result both comprise at least one evaluation result of emotional characteristics; and determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result.
Optionally, the obtaining, by using the first neural network model, an audio feature classification evaluation result based on the audio information by calculation includes: acquiring a voice sequence in the audio information; performing framing processing on the voice sequence, and extracting voice frame characteristics of the voice sequence; carrying out segmentation processing on the voice frame characteristics to obtain voice segment characteristics of the voice sequence; fitting the voice section characteristics through a target function to obtain emotion cognition window characteristics of the voice sequence; and calculating the audio characteristic classification evaluation result through the first neural network model based on the voice frame characteristics, the voice section characteristics and the emotion cognition window characteristics.
Optionally, the calculating, by using the second neural network model, a video feature classification evaluation result based on the video information includes: performing frame division processing on the video information to obtain a plurality of frames of video images; extracting a facial image feature set of the target object from each frame of the video image; and respectively inputting the facial image feature set into the third neural network model and the fourth neural network model to obtain the video feature classification evaluation result.
Optionally, the extracting a facial image feature set of the target object from each frame of the video image includes: adopting a target classifier to perform segmentation processing on each frame of the video image to obtain a segmented video image; extracting the segmented video image by adopting a fifth neural network model to obtain a facial image feature map of the target object; performing target monitoring and accurate positioning processing on the facial image feature map through the third neural network model to obtain a candidate frame region corresponding to the facial image feature map; and performing maximum pooling processing on the candidate frame area through the fourth neural network model to obtain the facial image feature set.
Optionally, the determining, according to the audio feature classification evaluation result and the video feature classification evaluation result, a customer service quality evaluation result corresponding to the target object includes: calculating a first evaluation score corresponding to the evaluation result of each emotional characteristic; summing a plurality of first evaluation scores to obtain a second evaluation score; and determining the customer service quality evaluation result according to the second evaluation score.
Optionally, the method further includes: acquiring a first violation frequency corresponding to a first violation of the target object in the audio information; acquiring a second violation frequency corresponding to a second violation of the target object in the video information; and summing the first violation times and the second violation times to obtain the total violation times of the target object.
Optionally, the method further includes: judging whether the evaluation index of the customer service quality of the target object reaches an alarm threshold value, wherein the evaluation index comprises at least one of the following: the audio feature classification evaluation result, the video feature classification evaluation result, the customer service quality evaluation result and the total violation frequency; and if any one of the evaluation indexes reaches the alarm threshold, sending an alarm instruction.
Optionally, the obtaining a second violation number corresponding to a second violation of the target object in the video information includes: performing frame division processing on the video information to obtain a plurality of frames of video images; judging whether the second violation behaviors existing in the two adjacent frames of the video images are the same violation behavior; if the result of the determination is yes, the same violation is recorded as the second violation 1 time.
Optionally, the method further includes: training the first neural network model based on the audio information to obtain a trained first neural network model; training the second neural network model based on the video information to obtain a trained second neural network model; and updating the first neural network model and the second neural network model according to the trained first neural network model and the trained second neural network model respectively.
Optionally, before the obtaining the voice sequence in the audio information, the method further includes: carrying out segmentation processing on the audio information according to a preset time length to obtain a plurality of sections of audio samples; judging the matching degree of the voiceprint characteristics of the first section of the audio sample and the pre-stored voiceprint characteristics; if the matching degree is greater than or equal to the threshold value of the matching degree, acquiring the voice sequence; and if the matching degree is smaller than the threshold value of the matching degree, sending an alarm indication.
Optionally, the method further includes: detecting the number of human faces in each frame of the video image through a human face recognition model; judging whether the number of the human faces is 1 or not; if the judgment result is yes, extracting the facial image feature set from each frame of the video image; if the judgment result is negative, an alarm instruction is sent out.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for evaluating customer service quality, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring customer service monitoring information, and the customer service monitoring information is audio information and video information generated by a target object in a service execution process; the first calculation module is used for calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; the second calculation module is used for calculating to obtain video feature classification evaluation results based on the video information by adopting a second neural network model and a third neural network model, wherein the audio feature classification evaluation results and the video feature classification evaluation results both comprise evaluation results of at least one emotional feature; and the determining module is used for determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result.
According to another aspect of the embodiments of the present invention, there is also provided a system for evaluating customer service quality, including: the system comprises a video customer service client, a service server and a service server, wherein the video customer service client is used for acquiring customer service monitoring information, and the customer service monitoring information is audio information and video information generated by a target object in a service execution process; the edge computing platform server is connected with the video customer service client and used for computing to obtain an audio feature classification evaluation result based on the audio information by adopting a first neural network model; and the cloud platform server is connected with the edge computing platform server and is used for calculating a video feature classification evaluation result based on the video information by adopting a second neural network model and a third neural network model, and determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result, wherein the audio feature classification evaluation result and the video feature classification evaluation result both comprise at least one emotion feature evaluation result.
According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium, wherein the non-volatile storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing any one of the above methods for evaluating customer service quality.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above methods for evaluating quality of customer service.
In the embodiment of the invention, the method of evaluating the customer service quality is adopted, and the customer service monitoring information is obtained, wherein the customer service monitoring information is audio information and video information generated by a target object in the service execution process; calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; calculating to obtain video characteristic classification evaluation results based on the video information by adopting a second neural network model and a third neural network model, wherein the audio characteristic classification evaluation results and the video characteristic classification evaluation results both comprise evaluation results of at least one emotional characteristic; according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result, the customer service quality evaluation result corresponding to the target object is determined, and the aim of accurately identifying and evaluating the customer service quality according to the audio and video information of the target user is achieved, so that the technical effects of improving the audio and video identification efficiency and the evaluation quality are achieved, and the technical problems of low identification efficiency and poor evaluation effect caused by the fact that the prior art cannot accurately identify and evaluate the customer service quality are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method of evaluating customer service quality according to an embodiment of the present invention;
FIG. 2 is a flow chart of an alternative method of evaluating customer service quality according to an embodiment of the present invention;
FIG. 3 is a flow diagram of an alternative method of evaluating customer service quality according to an embodiment of the present invention;
FIG. 4 is a flow diagram of an alternative method of evaluating customer service quality according to an embodiment of the present invention;
FIG. 5 is a flow diagram of an alternative method of evaluating customer service quality according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an alternative system architecture for implementing the above-described customer service quality evaluation method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an evaluation apparatus for customer service quality according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, in order to facilitate understanding of the embodiments of the present invention, some terms or nouns referred to in the present invention will be explained as follows:
edge Computing (EC) is an open platform that integrates network, Computing, storage, and application core capabilities to provide nearest-end services nearby on the side near the source of an object or data. And the Edge Computing node (ECP) is responsible for interacting the collected and computed front-end data result with the cloud platform end server in real time.
The Fast R-CNN introduces a RoI posing layer on the basis of RCNN, thereby avoiding the condition that the R-CNN algorithm extracts features from the same region for multiple times.
The Faster R-CNN algorithm is an RPN candidate box generation algorithm added on the basis of Fast R-CNN, so that the target detection speed is greatly improved. The fast R-CNN is actually training two networks: RPN and Fast R-CNN, but both networks, if trained separately, change the parameters of the shared convolutional layer, so it is necessary to share the convolutional layer for both networks, rather than training separately. RPN is a recommendation algorithm of candidate frames, and Fast R-CNN carries out detailed calculation on the positions of the candidate frames and the classes of objects in the frames on the basis of the RPN.
Example 1
With the acceleration of the commercial process of the new generation of wireless mobile communication technology 5G, the continuous popularization of mobile intelligent terminals and the multimedia of electronic service contents, various service enterprises, such as bank institutions, are driven by research and innovation to realize the transformation development to intelligent banks and provide all-around intelligent services for customers. At present, the transaction amount of electronic channels such as internet banks, mobile phone banks and the like is far beyond the transaction amount of traditional physical network points, and a remote video bank is used as a further extension of the electronic channels, so that integration of various service channels can be realized, and the external service time of the banks can be prolonged. The video customer service is not limited by space, commodities can be displayed visually, the requirement of a supervision policy of online transaction can be met, an offline business handling scene is greatly simulated, more comprehensive 'immersion experience' service is provided for a client, and the emotion of the client is also considered.
Meanwhile, the real-time performance of the video customer service process and the video customer service mostly lack effective supervision, and more illegal behaviors occur. On one hand, the existing service evaluation system carries out post-service evaluation through scoring by a customer responding to a short message and selecting satisfaction on an interactive interface by the customer, and more intelligent and real-time evaluation means are few; on the other hand, the existing video analysis task is usually based on a deep network model, and training and reasoning of the model bring huge computational pressure, and meanwhile, due to instantaneity of a video customer service process and complexity of audio and video data, an effective method is not provided for effectively monitoring violation behaviors of video customer service at present.
In view of the foregoing, it should be noted that the steps illustrated in the flowchart of the accompanying figures may be implemented in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than presented herein.
Fig. 1 is a flowchart of a method for evaluating customer service quality according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, customer service monitoring information is obtained, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process;
step S104, calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model;
step S106, calculating to obtain video characteristic classification evaluation results based on the video information by adopting a second neural network model, wherein the audio characteristic classification evaluation results and the video characteristic classification evaluation results both comprise evaluation results of at least one emotional characteristic;
and step S108, determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result.
Optionally, audio information of the target object is acquired through a video acquisition device; and acquiring video information of the target object through the audio acquisition device.
Optionally, the first neural network model may be, but is not limited to, a feedback neural network model CIRNN based on cognitive mechanism; the second neural network model described above may be, but is not limited to, a Faster R-CNN model.
Optionally, the obtaining the audio information and the video information includes: acquiring initial audio information and initial video information of a target object; and cleaning the initial audio information and the initial video information to obtain the audio information and the video information.
In the embodiment of the invention, the method of evaluating the customer service quality is adopted, and the customer service monitoring information is obtained, wherein the customer service monitoring information is audio information and video information generated by a target object in the service execution process; calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; calculating to obtain video characteristic classification evaluation results based on the video information by adopting a second neural network model and a third neural network model, wherein the audio characteristic classification evaluation results and the video characteristic classification evaluation results both comprise evaluation results of at least one emotional characteristic; according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result, the customer service quality evaluation result corresponding to the target object is determined, and the aim of accurately identifying and evaluating the customer service quality according to the audio and video information of the target user is achieved, so that the technical effects of improving the audio and video identification efficiency and the evaluation quality are achieved, and the technical problems of low identification efficiency and poor evaluation effect caused by the fact that the prior art cannot accurately identify and evaluate the customer service quality are solved.
As an alternative embodiment, fig. 2 is a flowchart of an alternative customer service quality evaluation method according to an embodiment of the present invention, and as shown in fig. 2, the calculating an audio feature classification evaluation result based on the audio information by using a first neural network model includes:
step S202, acquiring a voice sequence in the audio information;
step S204, performing framing processing on the voice sequence, and extracting voice frame characteristics of the voice sequence;
step S206, carrying out segmentation processing on the voice frame characteristics to obtain voice segment characteristics of the voice sequence;
step S208, fitting the voice segment characteristics through a target function to obtain emotion cognition window characteristics of the voice sequence;
step S210, calculating the audio feature classification evaluation result through the first neural network model based on the speech frame features, the speech segment features, and the emotion recognition window features.
Optionally, the speech frame characteristics may include, but are not limited to: prosodic, nonlinear, psychoacoustic, and spectral features, and the like.
Alternatively, the target function may be, but is not limited to, a gaussian function; the first neural network model may be, but is not limited to, a cognitive mechanism-based feedback neural network model CIRNN.
It should be noted that, because the physical movement of the vocal organs has a close influence on the formation of the speech, which is relatively stable in a short time, the speech signal can be analyzed in a short time, including framing and windowing, to obtain a time-discrete and amplitude-discrete speech sequence. The speech signal can be framed by frame shift of 10ms by using a 25ms Hamming window, and then, the short-time characteristics (namely speech frame characteristics) and the corresponding first-order difference of each frame of signal are extracted by taking a frame as a unit; the short-term features may include prosodic features, nonlinear features, psychoacoustic features, spectral features, and the like, and may be normalized. For example, let the feature vector of the speech frame be f (t), where
Figure BDA0003461162850000071
For an audio feature parameter, assume that there are m feature parameters in the sample f (t), each of which is acquired at time t, where t is 1, 2, …, n:
Figure BDA0003461162850000081
optionally, 100 frame/segment division samples are taken according to segment length, and segment feature extraction is performed on the speech signal to obtain speech segment features of the target object. By extracting the features by taking the segment as a unit, the statistical features based on the length of the segment can remove the character correlation without weakening the expression of important prosodic features, and the features are extracted through the fundamental tone frequency and the contours of the first three formants, and are fused with the short-time frame features and the global statistical features to improve the emotion recognition rate. For example, let the feature vector of the speech segment be k(t)Wherein
Figure BDA0003461162850000082
For an audio feature parameter, assume sample k(t)M, each of which is acquired at time t, t being 1, 2, …, n:
Figure BDA0003461162850000083
It should be noted that, in order to reflect the influence of the dynamic process of human emotion expression on emotion recognition more accurately, the speech segment features are fitted by using a target function (such as a gaussian function), the expression process of human emotion is simulated, and on the basis of the speech segment features, an emotion recognition window is extracted, that is, a gaussian function is loaded on a plurality of speech segment features, the emotion in the middle is weighted, and the emotions at two ends of the speech are weakened. For example, let a speech segment be characterized by KNConvolving with a Gaussian function G (x) to obtain the characteristics of the emotion cognition window: e ═ G (w)i)*KN(ii) a Wherein N is the total number of the audio sample segments, wiIs the position of the i-th speech segment corresponding to the Gaussian function.
Optionally, a feedback neural network model CIRNN (i.e., a first neural network model) based on a cognitive mechanism is constructed. Based on the cognitive rule of human brain on emotion, the input characteristics of auditory information are analyzed, and the comparison and the trial are carried out by using the prior experience model and the probability system of the additional lobe so as to improve the information processing, the ordering and the accuracy. Therefore, on the basis of a feedback neural network RNN and the like, a feedback neural network (CINN) based on a cognitive mechanism and fusing multi-granularity features is adopted, so that the features of different time units are involved in network training, the time sequence of emotion is highlighted, the influence of context on emotion is emphasized, and the effect of global characteristics on emotion recognition is also reserved. The CINN network includes an input layer, a hidden layer, a memory layer, and an output layer. The input layer respectively inputs frame characteristics and segment characteristics, and the memory layer is a set of some neurons fed back from the hidden layer and is used for recording the content of the hidden layer at a moment. The neuron activation function is Sigmoid function. For example, let t be the current time of the network, f (t) represent frame features, k (t) represent segment features in t time period, e (t) represent cognitive window features, W (t) represent cognitive window features1The layer frame characteristics f (t), segment characteristics k (t) are input to the weight matrix of the hidden layer x (t). W2From hidden layer x (t) to hidden layer z (t) weight momentsAnd (5) arraying. W3From the hidden layer z (t) to the output layer y (t) weight matrices. W4Is a memory layer xc(t) to the hidden layer x (t) weight matrices. W5Inputting layer cognitive window characteristics E (t) to the hidden layer z (t) weight matrix. x (t) and z (t) represent the outputs of two hidden layers respectively, and the specific formula is as follows: first hidden layer x (t) ═ f (W)1(f(t)+k(t)+W4xc(t))), wherein f takes the Sigmoid function:
Figure BDA0003461162850000091
second hidden layer z (t) ═ f (W)2x(t)+W5E (t)); output layer y (t) ═ f (W)3z(t))。
As an alternative embodiment, fig. 3 is a flowchart of another alternative method for evaluating customer service quality according to an embodiment of the present invention, and as shown in fig. 3, the second neural network model includes a third neural network model and a fourth neural network model, and the calculating to obtain a video feature classification evaluation result based on the video information by using the second neural network model includes:
step S302, performing framing processing on the video information to obtain a plurality of frames of video images;
step S304, extracting a facial image feature set of the target object from each frame of the video image;
step S306, inputting the facial image feature set to the third neural network model and the fourth neural network model, respectively, to obtain the video feature classification evaluation result.
Alternatively, the second neural network model may be, but is not limited to, a fast R-CNN model; the third neural network model may be, but is not limited to, an rpn (regional pro-social network) network model; the fourth neural network model may be, but is not limited to, a Fast R-CNN network model.
It should be noted that, in order to ensure consistency between the audio feature and the video feature extraction, the multiple frames of video images and the voice frame feature have the same preset time length division, such as one segment per minute, video data is output at 25 frames per second, and so on.
Optionally, the video feature classification evaluation result may be, but is not limited to, a package: calm, happy, surprised, sad, horror, angry, disgust, etc.
Optionally, the output form of the video feature classification evaluation result may be, but is not limited to, a matrix form.
Optionally, a video stream V ═ V is defined1,V2…VqAcquiring the video stream by a monitoring camera arranged on video customer service client equipment, wherein the monitoring camera faces the face of a video customer service person (namely a target object), and video data (namely video images) corresponding to the video stream are output at 25 frames per second; wherein each frame of video image VqExpressed by a matrix of l x w, wherein l is the number of rows of the video matrix and w is the number of columns of the video matrix; the qth segment of the video sample is labeled, and q 1, 2 … … n indicates the sample number.
In an alternative embodiment, the detection procedure of the Faster R-CNN model (the second neural network model) can be, but is not limited to: inputting a video image; performing feature extraction on the image through a convolution layer in a depth network by using an RDN (residual scaled network) residual expansion network to obtain feature maps of eyes, eyebrows and mouths of each frame in a video stream; performing target detection and accurate positioning on the feature map through an RPN (regional proxy network) network to obtain a candidate frame region; and (3) carrying out RoI pooling operation on the obtained candidate frame region: namely, a coordinate projection method is used for obtaining a characteristic region corresponding to a candidate frame region in an input image on a characteristic map, and the region is subjected to maximum pooling, so that the characteristics of the candidate frame region are obtained, and the characteristic size is unified. Taking the output of the RoI posing layer (namely the features after the feature map maximum value pooling corresponding to the candidate frame regions) as the feature vector of each candidate frame region; connecting the feature vectors of the candidate frame region with the full connection layer, defining a multi-task loss function, and respectively connecting the multi-task loss function with a softmax classifier and a boxbounding regressor (bounding box regression) to respectively obtain the category and the coordinate bounding box of the current region of interest; and performing non-maximum suppression on all the obtained bounding boxes to obtain a final classification detection result (namely a video feature classification evaluation result).
Optionally, before the fast R-CNN model is used for detection, the fast R-CNN model is trained, and the training process may be, but is not limited to: 1) initializing the RPN network by using a convolution layer of a pre-training model, then training the RPN network independently, and after training, updating the characteristic parameters of the model and the RPN; 2) initializing a detection network Fast R-CNN by using a convolution layer of the same pre-training model, wherein a candidate frame comes from the RPN network in the step 1, then training the detection network Fast R-CNN independently, and after training, the characteristic parameters of the model and the Fast R-CNN are updated; 3) initializing the RPN model by the model trained in the step 2, fixing the convolution layer (namely the shared convolution layer) during training, only adjusting the characteristic parameters belonging to the RPN, and training the RPN network for the second time; 4) and still keeping the shared convolution layer fixed, using the candidate frame output by the RPN adjusted in the step 3 as an input, training the Fast R-CNN network for the second time, and finely adjusting the parameters of the Fast R-CNN.
In an optional embodiment, the facial image feature set is respectively sent into an RPN regression neural network model and a Fast R-CNN classification neural network model for identification and classification, and a video feature classification evaluation matrix is output as (V)Dis,VHap,VQui) And the probability of outputting 3 kinds of emotions of dysphoria, pleasure and calmness in the customer service facial expression emotion recognition is expressed, and a facial expression classification result is obtained.
As an alternative embodiment, fig. 4 is a flowchart of another alternative customer service quality evaluation method according to an embodiment of the present invention, and as shown in fig. 4, the extracting a facial image feature set of the target object from each frame of the video image includes:
step S402, a target classifier is adopted to segment each frame of the video image to obtain a segmented video image;
step S404, extracting the segmented video image by adopting a fifth neural network model to obtain a facial image feature map of the target object;
step S406, performing target monitoring and accurate positioning processing on the facial image feature map through the third neural network model to obtain a candidate frame region corresponding to the facial image feature map;
step S408, performing maximum pooling on the candidate frame region through the fourth neural network model to obtain the facial image feature set.
Optionally, the segmented video image at least includes: eye, eyebrow, and mouth feature maps of the target object.
Optionally, the fifth neural network model may be, but is not limited to, an rdn (residual scaled network) residual expanded network model; the third neural network model may be, but is not limited to, an rpn (regional pro-social network) network model; the fourth neural network model may be, but is not limited to, a Fast R-CNN network model.
Optionally, the target classifier may be, but is not limited to, a Gabor feature-based enhanced classifier for identifying facial organ feature points of the target object, where the facial organ feature points at least include: and obtaining the eye, eyebrow and mouth characteristic maps of the target object by the inner eye angular position of the left eye and the right eye, the outer eye angular position of the left eye and the right eye, the highest point positions of the left eye and the right eye, the lowest point positions of the left eye and the right eye, the nose tip position, the leftmost end and the rightmost end positions of the mouth corner, and the uppermost end and the lowermost end of the intersection of the lip center line and the lip outline.
In an optional embodiment, an rdn (residual scaled network) residual expansion network model is adopted, the segmented video image is input to a layer 1 convolutional layer (denoted as Conv1), and preliminary feature extraction is performed on the image, so as to obtain a facial image feature map of the target object. Taking the image of 400 × 400 as an example, the convolution kernel size of Conv1 is constructed to be 3 × 3, and the number of convolution kernels is set to 64. Then, the feature map output by the Conv1 enters the Conv2, the size of the convolution kernel of the layer is the same as that of the Conv1, and the number of the convolution kernels is set to be 128. Then, the feature map output by the Conv2 enters the Conv3, the size of the convolution kernel of this layer is set to 3 × 3, and the number of convolution kernels is set to 256. Next, the feature map output by the Conv3 enters the RDN4, the size of the convolution kernel of this layer is set to 3 × 3, the number of convolution kernels is set to 512, the dilation convolution is introduced, and dilation parameters d-1 and d-2 are set. Next, the feature map output by RDN4 enters RDN5, the size of the convolution kernel in this layer is set to 3 × 3, the number of convolution kernels is still set to 512, and dilation convolution is also introduced, and dilation parameters d-2 and d-4 are set, so that feature detection of a tiny expression is guaranteed. The RDN residual error expanded network model improves algorithm robustness by using a deeper residual error network structure, and ensures that the micro expression is kept in the continuous convolution process by accumulating the shallow layer characteristic and the deep layer characteristic, so that more accurate output information is obtained, and the detection performance of the network can be greatly improved under the condition of not increasing the calculated amount of the original model basically.
Optionally, the feature map is subjected to target detection and accurate positioning through an RPN network model to obtain a candidate frame region, and a largest pooling operation is performed on the candidate frame through a RoI pooling layer in a Fast R-CNN network model to output a group of facial image feature sets including a plurality of feature vectors with the same dimension.
As an alternative embodiment, fig. 5 is a flowchart of another alternative customer service quality evaluation method according to an embodiment of the present invention, and as shown in fig. 5, the determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result includes:
step S502, calculating a first evaluation score corresponding to the evaluation result of each emotional characteristic;
step S504, summing up a plurality of first evaluation scores to obtain a second evaluation score;
and step S506, determining the customer service quality evaluation result according to the second evaluation score.
Optionally, the customer service quality evaluation result is characterized by a form of an evaluation category, wherein the evaluation category can be classified into, but not limited to, "poor", "general", "better", and "very good".
In an alternative embodiment, the first evaluation score is a service quality compliance degree represented by a specific emotional characteristic classification during the customer service period. Therefore, one emotional feature classification calculates one first evaluation score, so that M first evaluation scores are finally calculated in total. Calculating each first rating score by the following formula:
Figure BDA0003461162850000121
wherein, OwiRepresents the ith first evaluation score, i is more than or equal to 1 and less than or equal to M and is a positive integer, i is changed from 1 to M in sequence. The emotion of the audio and the video are classified into 3 types as in the present example, so that M is 6.
WiThe weight indicating the type to which the ith first specific emotion feature belongs is, for example, the audio happy emotion classification feature weight is set to 0.6, the audio nervous emotion classification feature weight is set to-0.15, the video expressive feature is set to 0.3 for happy weight, the video expressive feature is set to-0.3 for angry weight, and the video expressive feature is set to-0.2 for disgust weight. x is the number ofA1 denotes the ith specific feature action occurrence, xA0 means that the ith special feature action does not occur. T isAIndicating the time at which a particular emotional characteristic behavior occurs, TDRepresents less than TAThe dynamic reference time of (2) is arbitrarily set by those skilled in the art, for example, at a start time level of each customer service turn-on. K is a time decreasing factor, K>1 is set to any value of 1.5 to 3, for example. XARepresents TAThe number of specific emotional characteristic behaviors generated during the preceding period of time. The reference time period is, for example, 1 minute, and the present invention is not particularly limited. The formula is obtained by adding
Figure BDA0003461162850000131
So that XAThe larger, OwiThe smaller, XAThe smaller is, thewiThe larger the service is, the influence of rigid and unchangeable emotional behaviors on evaluation during the customer service period is reduced, the customer service is encouraged to respond to customer demands in time, and the timely change is achieved.
From the above formula, WiService to customers of different emotional states embodied by a particular emotional characteristic classification of different typesThe quality evaluation results in different degrees of influence. Due to the time decreasing factor K>1, and therefore the more distant the occurrence time of the same type of emotional feature from the dynamic reference time, the greater the impact on the quality of service evaluation result. Furthermore, through the setting of the time decreasing factor, the influence of the historical behaviors in the final result evaluation during the same service is reduced.
Optionally, the second evaluation score represents a degree of quality of service compliance of the customer service in a specific time period. In the embodiment of the present invention, the second evaluation score is obtained by calculating the sum of all the first evaluation scores:
Figure BDA0003461162850000132
when the second evaluation score evaluated for the service period is higher, the service quality of the customer service is evaluated to be higher. On the contrary, when the second evaluation score evaluated during the service is lower, the quality of service of the customer service is evaluated to be worse. Through the above process, the quality of the service of the customer service is evaluated. And summing and averaging the evaluation classification distribution probability vectors to obtain a final evaluation result vector, wherein the corresponding evaluation category is the final service evaluation category. Among them, the evaluation categories may be classified into "poor", "normal", "good", and "very good".
In an optional embodiment, the method further includes:
step S602, acquiring a first violation frequency corresponding to a first violation of the target object in the audio information;
step S604, acquiring a second violation frequency corresponding to a second violation of the target object in the video information;
step S606, summing the first violation times and the second violation times to obtain a total number of violations of the target object.
Optionally, when it is detected that the violation action exists in the audio information of the target object, determining that the target object has a first violation action; and determining that the target object has a second violation behavior when the violation behavior is detected in the video information of the target object.
Optionally, the time characteristic is used as a mark, and the occurrence time, the violation type and the accumulated times of violation marks are accumulated in the scoring model. And accumulating various violation times in the database, triggering a monitoring alarm when exceeding a preset threshold value, and sending an alarm indication.
In an optional embodiment, the method further includes:
step S702, judging whether the evaluation index of the customer service quality of the target object reaches an alarm threshold value;
step S704, if any one of the evaluation indexes reaches the alarm threshold, an alarm indication is sent.
Optionally, the evaluation index includes at least one of: the audio feature classification evaluation result, the video feature classification evaluation result, the customer service quality evaluation result and the total violation frequency.
In an optional embodiment, the obtaining a second violation number corresponding to a second violation of the target object in the video information includes:
step S802, performing framing processing on the video information to obtain a plurality of frames of video images;
step S804, determining whether the second violation behaviors existing in two adjacent frames of the video images are the same violation behavior;
in step S806, if the determination result is yes, the same violation is recorded as the second violation 1 time.
Optionally, when the second violation number is recorded, the method further includes: judging whether the illegal behaviors existing in the preset time are the same in two adjacent frames of video images, if not, identifying the illegal behaviors as different illegal behaviors, recording each illegal behavior in the background, and if so, not repeatedly accumulating. And the time domain information is considered, so that the condition that the same violation behavior is recorded for multiple times due to the fact that only a single frame of image is considered is avoided.
In an optional embodiment, the method further includes:
step S902, training the first neural network model based on the audio information to obtain a trained first neural network model;
step S904, training the second neural network model based on the video information to obtain a trained second neural network model;
step S906, updating the first neural network model and the second neural network model according to the trained first neural network model and the trained second neural network model, respectively.
It should be noted that, the initial model parameters of the first neural network model (i.e. the feedback neural network model CIRNN based on cognitive mechanism) and the second neural network model (i.e. the fast R-CNN model) are set manually, in the process of evaluating the customer service quality of the target object, new audio information, video information and an evaluation result are continuously obtained, the model is trained and updated based on the audio information, the video information and the evaluation result, a first neural network model and a second neural network model (namely the trained first neural network model and the trained second neural network model) with higher precision are obtained, and updating the first neural network model and the second neural network model respectively by using the trained first neural network model and the trained second neural network model so as to achieve the purpose of improving the accuracy of the customer service quality evaluation result.
Alternatively, the training method of the Faster R-CNN network model can be, but is not limited to: the target function of the Faster R-CNN network model is a binary cross entropy function (binary _ cross entropy), and the optimization method is Adam; wherein the learning rate of Adam is set to 0.001, the exponential decay rate of the mean of the gradient is set to 0.9, and the exponential decay rate of the non-centered variance of the gradient is set to 0.999; the batch processing size is set to be 200, a training set, a verification set and a test set of data are set according to a certain proportion, after multiple rounds of training, each round of training is carried out on the verification set, the training model with the best result is stored and used for testing the test set, and the result is the result of the whole learning. In an optional embodiment, before the obtaining the speech sequence in the audio information, the method includes:
step S1002, carrying out segmentation processing on the audio information according to a preset time length to obtain a plurality of segments of audio samples;
step S1004, judging the matching degree of the voiceprint characteristics of the first section of the audio sample and the pre-stored voiceprint characteristics;
step S1006, if the matching degree is greater than or equal to the threshold value of the matching degree, acquiring the voice sequence;
step S1008, if the matching degree is smaller than the threshold value, an alarm indication is sent.
Optionally, the audio information is segmented according to a preset time length (for example, one minute per segment), so as to obtain multiple segments of audio samples, which are defined as an audio stream U ═ U1,U2…UqU is a set of audio streams, denoted as the qth audio sample, q, i.e., sample number q is 1, 2, …; voice data of the first section of the audio sample about 10 seconds is input into a voiceprint recognition module, audio feature data (namely preset voiceprint features) in a call audio file are extracted and sent to a sound comparison module, and the sound comparison module compares the audio feature data with customer service sound feature data stored in a database in advance; judging the matching degree between the voiceprint characteristics of the customer service and a pre-stored voiceprint characteristic information base, if the matching degree is greater than or equal to a preset threshold value, returning a response result of passing the verification to the customer service client, and returning a comparison result and the corresponding job number of the customer service staff to the client; and if the matching degree is less than or equal to a preset threshold value, sending an alarm instruction, returning a response result of failing to pass the verification and a service termination control instruction to the customer service client based on the alarm instruction, registering the violation mark, uploading the violation mark to the cloud platform, and terminating the service.
In an optional embodiment, the method further includes:
step S1102, detecting the number of human faces in each frame of video image through a human face recognition model;
step S1104, determining whether the number of faces is 1;
step S1106, if the determination result is yes, extracting the facial image feature set from each frame of the video image;
in step S1108, if the determination result is negative, an alarm indication is sent.
Optionally, inputting each frame of the video image into a pre-trained face recognition model, performing real-time face detection on face image information, and if the number N <1 or N >1 of detected faces is detected, returning a transaction abnormal instruction to the video client to enable the video client to execute an alarm operation; and if the number of the detected human faces is equal to 1, continuing the subsequent operation.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is further provided a system embodiment for implementing the method for evaluating customer service quality, fig. 6 is a schematic structural diagram of a system for evaluating customer service quality according to an embodiment of the present invention, and as shown in fig. 6, the system for evaluating customer service quality includes: video customer service client 1, marginal computing platform server 2, cloud platform server 3, wherein:
the video customer service client 1 is used for acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process; the edge computing platform server 2 is connected with the video customer service client and used for computing and obtaining an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; and the cloud platform server 3 is connected with the edge computing platform server and is used for calculating a video feature classification evaluation result based on the video information by adopting a second neural network model and a third neural network model, and determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result, wherein the audio feature classification evaluation result and the video feature classification evaluation result both comprise an evaluation result of at least one emotional feature.
Optionally, in the system for evaluating customer service quality, each video customer service client, edge node and cloud center has a unique static address, wherein the video customer service client 1 performs data interaction with the edge computing platform server 2 through a communication network; the edge computing platform server 2 performs data interaction with the cloud platform server 3 through a private line network. The video customer service client 1 is internally provided with an audio and video acquisition system which is used for acquiring audio and video information in a video customer service period in real time, detecting client operation state parameters and uploading the operation state parameters to the edge computing platform server; edge computing system 2: the system comprises a physical layer structure and an application layer structure, and is used for carrying out data cleaning on audio and video characteristic information, client operation state parameters and the like and uploading the cleaned parameters to the cloud computing system; the edge computing system 2 further updates a local audio/video management evaluation model by using the updated model parameters, calculates the service evaluation result according to the cleaned customer service period audio/video characteristic parameters and the updated local audio/video service evaluation model, and uploads the service evaluation result to the cloud platform server 3 so as to evaluate the customer service process. The audio and video management evaluation models in the edge computing system 2 and the cloud computing system 3 have initial model parameters, the initial model parameters are manually set during production of the audio and video management system, but at the moment, the accuracy of the initial model parameters is low, and as the cloud platform end continuously receives the audio and video characteristics and the evaluation results sent by the edge computing system 2, the cloud platform end trains and updates the audio and video management evaluation model to obtain model parameters with higher accuracy. And then the model parameters with higher precision are sent to the edge computing system 2, so that a customer service evaluation result with higher precision can be obtained through calculation.
Optionally, the specific evaluation method of the customer service quality analysis system is as follows: customer service personnel access the service workbench through the video customer service client 1, and the video customer service terminal 1 starts a voice and video acquisition device; the edge computing platform 2 counts and analyzes audio and video information sent by the video customer service client 1 according to a specified algorithm, and sends an evaluation result to the cloud platform server 3; background workers set related index parameters on the cloud platform server 3 according to management requirements; the cloud platform server 3 displays the monitoring information in real time in the forms of visual charts, instrument panels and the like.
It should be noted that the edge network customer service quality evaluation system provided by the embodiment of the present invention includes a video customer service client, an edge computing platform server, a cloud platform server, etc., each of the video customer service client, the edge node, and the cloud center has a unique static address, and the video customer service client uploads the collected operation state parameters during service to the edge computing platform server through an audio/video collection device; the edge computing system comprises a physical layer structure and an application layer structure and is used for cleaning and uploading data of audio and video characteristic information, client operation state parameters and the like, calculating a service evaluation result by using a local audio and video management evaluation model and uploading the service evaluation result to a cloud platform server. The cloud platform side trains and updates the audio and video management evaluation model to obtain model parameters with higher precision and sends the model parameters to the edge computing system, so that a customer service evaluation result with higher precision can be calculated, the uniformity of evaluation standards is guaranteed, and meanwhile, the calculation pressure of model training of the edge computing system is greatly reduced.
It should be noted that the specific structures of the video customer service client 1, the edge computing platform server 2, and the cloud platform server 3 shown in fig. 6 in this application are merely schematic, and in a specific application, the evaluation system of customer service quality in this application may have more or less structures than the video customer service client 1, the edge computing platform server 2, and the cloud platform server 3 shown in fig. 6.
It should be noted that any optional or preferred method for evaluating customer service quality in embodiment 1 above may be implemented or realized in the evaluation of customer service quality provided in this embodiment.
In addition, it should be noted that, for alternative or preferred embodiments of the present embodiment, reference may be made to the relevant description in embodiment 1, and details are not described herein again.
Example 3
According to an embodiment of the present invention, there is further provided an embodiment of an apparatus for implementing the method for evaluating customer service quality, fig. 7 is a schematic structural diagram of an apparatus for evaluating customer service quality according to an embodiment of the present invention, and as shown in fig. 7, the apparatus for evaluating customer service quality includes: an obtaining module 20, a first calculating module 22, a second calculating module 24, and a determining module 26, wherein:
an obtaining module 20, configured to obtain customer service monitoring information, where the customer service monitoring information is audio information and video information generated by a target object in a service execution process;
the first calculation module 22 is configured to calculate, based on the audio information, an audio feature classification evaluation result by using a first neural network model;
a second calculating module 24, configured to calculate, by using a second neural network model and a third neural network model, a video feature classification evaluation result based on the video information, where the audio feature classification evaluation result and the video feature classification evaluation result both include an evaluation result of at least one emotional feature;
and a determining module 26, configured to determine a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.
It should be noted here that the acquiring module 20, the first calculating module 22, the second calculating module 24, and the determining module 26 correspond to steps S102 to S108 in embodiment 1, and the modules are the same as the corresponding steps in implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.
It should be noted that, reference may be made to the relevant description in embodiment 1 for alternative or preferred embodiments of this embodiment, and details are not described here again.
The aforementioned evaluation device for customer service quality may further include a processor and a memory, where the aforementioned acquisition module 20, the first calculation module 22, the second calculation module 24, the determination module 26, and the like are all stored in the memory as program units, and the processor executes the aforementioned program units stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory, wherein one or more than one kernel can be arranged. The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and the apparatus in which the nonvolatile storage medium is located is controlled to execute any one of the above methods for evaluating customer service quality when the program runs.
Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group, and the nonvolatile storage medium includes a stored program.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process; calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; calculating to obtain a video characteristic classification evaluation result based on the video information by adopting a second neural network model, wherein the audio characteristic classification evaluation result and the video characteristic classification evaluation result both comprise at least one evaluation result of emotional characteristics; and determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: acquiring a voice sequence in the audio information; performing framing processing on the voice sequence, and extracting voice frame characteristics of the voice sequence; carrying out segmentation processing on the voice frame characteristics to obtain voice segment characteristics of the voice sequence; fitting the voice section characteristics through a target function to obtain emotion cognition window characteristics of the voice sequence; and calculating the audio characteristic classification evaluation result through the first neural network model based on the voice frame characteristics, the voice section characteristics and the emotion cognition window characteristics.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: performing frame division processing on the video information to obtain a plurality of frames of video images; extracting a facial image feature set of the target object from each frame of the video image; and respectively inputting the facial image feature set into the third neural network model and the fourth neural network model to obtain the video feature classification evaluation result.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: adopting a target classifier to perform segmentation processing on each frame of the video image to obtain a segmented video image; extracting the segmented video image by adopting a fifth neural network model to obtain a facial image feature map of the target object; performing target monitoring and accurate positioning processing on the facial image feature map through the third neural network model to obtain a candidate frame region corresponding to the facial image feature map; and performing maximum pooling processing on the candidate frame area through the fourth neural network model to obtain the facial image feature set.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: calculating a first evaluation score corresponding to the evaluation result of each emotional characteristic; summing a plurality of first evaluation scores to obtain a second evaluation score; and determining the customer service quality evaluation result according to the second evaluation score.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: acquiring a first violation frequency corresponding to a first violation of the target object in the audio information; acquiring a second violation frequency corresponding to a second violation of the target object in the video information; and summing the first violation times and the second violation times to obtain the total violation times of the target object.
Optionally, the method further includes: judging whether the evaluation index of the customer service quality of the target object reaches an alarm threshold value, wherein the evaluation index comprises at least one of the following: the audio feature classification evaluation result, the video feature classification evaluation result, the customer service quality evaluation result and the total violation frequency; and if any one of the evaluation indexes reaches the alarm threshold, sending an alarm instruction.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: performing frame division processing on the video information to obtain a plurality of frames of video images; judging whether the second violation behaviors existing in the two adjacent frames of the video images are the same violation behavior; if the result of the determination is yes, the same violation is recorded as the second violation 1 time.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: training the first neural network model based on the audio information to obtain a trained first neural network model; training the second neural network model based on the video information to obtain a trained second neural network model; and updating the first neural network model and the second neural network model according to the trained first neural network model and the trained second neural network model respectively.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: carrying out segmentation processing on the audio information according to a preset time length to obtain a plurality of sections of audio samples; judging the matching degree of the voiceprint characteristics of the first section of the audio sample and the pre-stored voiceprint characteristics; if the matching degree is greater than or equal to the threshold value of the matching degree, acquiring the voice sequence; and if the matching degree is smaller than the threshold value of the matching degree, sending an alarm indication.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: detecting the number of human faces in each frame of the video image through a human face recognition model; judging whether the number of the human faces is 1 or not; if the judgment result is yes, extracting the facial image feature set from each frame of the video image; if the judgment result is negative, an alarm instruction is sent out.
According to an embodiment of the present application, there is also provided an embodiment of a processor. Optionally, in this embodiment, the processor is configured to execute a program, where the program executes any one of the methods for evaluating customer service quality when running.
There is further provided, according to an embodiment of the present application, an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to execute a program for initializing the steps of the method for evaluating quality of customer service of any of the above.
Optionally, the computer program product is adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process; calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model; calculating to obtain a video characteristic classification evaluation result based on the video information by adopting a second neural network model, wherein the audio characteristic classification evaluation result and the video characteristic classification evaluation result both comprise at least one evaluation result of emotional characteristics; and determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result.
According to an embodiment of the present application, there is further provided an embodiment of an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to run the computer program to perform any one of the above methods for evaluating customer service quality.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (15)

1. A method for evaluating customer service quality is characterized by comprising the following steps:
acquiring customer service monitoring information, wherein the customer service monitoring information is audio information and video information generated by a target object in a service execution process;
calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model;
calculating to obtain a video feature classification evaluation result based on the video information by adopting a second neural network model, wherein the audio feature classification evaluation result and the video feature classification evaluation result both comprise at least one evaluation result of emotional features;
and determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result.
2. The method of claim 1, wherein calculating an audio feature classification evaluation result based on the audio information by using the first neural network model comprises:
acquiring a voice sequence in the audio information;
performing framing processing on the voice sequence, and extracting voice frame characteristics of the voice sequence;
carrying out segmentation processing on the voice frame characteristics to obtain voice segment characteristics of the voice sequence;
fitting the voice segment characteristics through a target function to obtain emotion cognition window characteristics of the voice sequence;
and calculating the audio feature classification evaluation result through the first neural network model based on the voice frame features, the voice section features and the emotion cognition window features.
3. The method of claim 1, wherein the second neural network model comprises a third neural network model and a fourth neural network model, and the calculating a video feature classification evaluation result based on the video information using the second neural network model comprises:
performing frame division processing on the video information to obtain a plurality of frames of video images;
extracting a facial image feature set of the target object from each frame of the video image;
and respectively inputting the facial image feature set to the third neural network model and the fourth neural network model to obtain the video feature classification evaluation result.
4. The method of claim 3, wherein extracting a set of facial image features of the target object from each frame of the video image comprises:
adopting a target classifier to perform segmentation processing on each frame of the video image to obtain a segmented video image;
extracting the segmented video image by adopting a fifth neural network model to obtain a facial image feature map of the target object;
carrying out target monitoring and accurate positioning processing on the facial image feature map through the third neural network model to obtain a candidate frame region corresponding to the facial image feature map;
and performing maximum pooling processing on the candidate frame region through the fourth neural network model to obtain the facial image feature set.
5. The method according to claim 1, wherein the determining a customer service quality evaluation result corresponding to the target object according to the audio feature classification evaluation result and the video feature classification evaluation result comprises:
calculating a first evaluation score corresponding to the evaluation result of each emotional feature;
summing the plurality of first evaluation scores to obtain a second evaluation score;
and determining the customer service quality evaluation result according to the second evaluation score.
6. The method of claim 1, further comprising:
acquiring a first violation frequency corresponding to a first violation of the target object in the audio information;
acquiring a second violation frequency corresponding to a second violation of the target object in the video information;
and summing the first violation times and the second violation times to obtain the total violation times of the target object.
7. The method of claim 6, further comprising:
judging whether the evaluation index of the customer service quality of the target object reaches an alarm threshold value, wherein the evaluation index comprises at least one of the following: the audio feature classification evaluation result, the video feature classification evaluation result, the customer service quality evaluation result and the total violation frequency;
and if any one evaluation index reaches the alarm threshold value, sending an alarm indication.
8. The method according to claim 6, wherein the obtaining a second violation number corresponding to a second violation of the target object in the video information comprises:
performing frame division processing on the video information to obtain a plurality of frames of video images;
judging whether the second violation behaviors existing in the two adjacent frames of the video images are the same violation behavior or not;
and if so, recording the same violation as the second violation for 1 time.
9. The method of claim 1, further comprising:
training the first neural network model based on the audio information to obtain a trained first neural network model;
training the second neural network model based on the video information to obtain a trained second neural network model;
and respectively updating the first neural network model and the second neural network model according to the trained first neural network model and the trained second neural network model.
10. The method of claim 2, wherein prior to said obtaining the sequence of speech in the audio information, the method further comprises:
carrying out segmentation processing on the audio information according to a preset time length to obtain a plurality of sections of audio samples;
judging the matching degree of the voiceprint characteristics of the first section of the audio sample and the pre-stored voiceprint characteristics;
if the matching degree is greater than or equal to a threshold value of the matching degree, acquiring the voice sequence;
and if the matching degree is smaller than the threshold value of the matching degree, sending an alarm indication.
11. The method of claim 3, further comprising:
detecting the number of human faces in each frame of the video image through a human face recognition model;
judging whether the number of the human faces is 1 or not;
if the judgment result is yes, extracting the facial image feature set from each frame of the video image;
if the judgment result is negative, an alarm instruction is sent out.
12. An evaluation device for customer service quality, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring customer service monitoring information, and the customer service monitoring information is audio information and video information generated by a target object in a service execution process;
the first calculation module is used for calculating to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model;
the second calculation module is used for calculating to obtain video feature classification evaluation results based on the video information by adopting a second neural network model and a third neural network model, wherein the audio feature classification evaluation results and the video feature classification evaluation results both comprise evaluation results of at least one emotional feature;
and the determining module is used for determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result.
13. A system for evaluating customer service quality, comprising:
the system comprises a video customer service client, a service server and a service server, wherein the video customer service client is used for acquiring customer service monitoring information, and the customer service monitoring information is audio information and video information generated by a target object in a service execution process;
the edge computing platform server is connected with the video customer service client and used for computing to obtain an audio characteristic classification evaluation result based on the audio information by adopting a first neural network model;
and the cloud platform server is connected with the edge computing platform server and is used for calculating to obtain a video characteristic classification evaluation result based on the video information by adopting a second neural network model and a third neural network model, and determining a customer service quality evaluation result corresponding to the target object according to the audio characteristic classification evaluation result and the video characteristic classification evaluation result, wherein the audio characteristic classification evaluation result and the video characteristic classification evaluation result both comprise at least one evaluation result of emotional characteristics.
14. A non-volatile storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to execute the method of evaluating quality of customer service according to any one of claims 1 to 11.
15. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of evaluating quality of customer service according to any one of claims 1 to 11.
CN202210018451.2A 2022-01-07 2022-01-07 Method and device for evaluating customer service quality, storage medium and equipment Pending CN114372701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210018451.2A CN114372701A (en) 2022-01-07 2022-01-07 Method and device for evaluating customer service quality, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210018451.2A CN114372701A (en) 2022-01-07 2022-01-07 Method and device for evaluating customer service quality, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN114372701A true CN114372701A (en) 2022-04-19

Family

ID=81143834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210018451.2A Pending CN114372701A (en) 2022-01-07 2022-01-07 Method and device for evaluating customer service quality, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114372701A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099639A (en) * 2022-06-28 2022-09-23 中国工商银行股份有限公司 Remote inspection method, system, device, medium, and program product
CN117726231A (en) * 2023-12-20 2024-03-19 万物信通(广州)通信信息技术有限公司 Video customer service quality analysis method
CN117993788A (en) * 2024-04-03 2024-05-07 贵州联广科技股份有限公司 User experience management system and method based on operation network service

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099639A (en) * 2022-06-28 2022-09-23 中国工商银行股份有限公司 Remote inspection method, system, device, medium, and program product
CN117726231A (en) * 2023-12-20 2024-03-19 万物信通(广州)通信信息技术有限公司 Video customer service quality analysis method
CN117993788A (en) * 2024-04-03 2024-05-07 贵州联广科技股份有限公司 User experience management system and method based on operation network service

Similar Documents

Publication Publication Date Title
CN111243626B (en) Method and system for generating speaking video
TWI731297B (en) Risk prediction method and apparatus, storage medium, and server
CN114372701A (en) Method and device for evaluating customer service quality, storage medium and equipment
CN112465935A (en) Virtual image synthesis method and device, electronic equipment and storage medium
CN112699774B (en) Emotion recognition method and device for characters in video, computer equipment and medium
CN112966568A (en) Video customer service quality analysis method and device
CN110021308A (en) Voice mood recognition methods, device, computer equipment and storage medium
CN109658923A (en) Voice quality detecting method, equipment, storage medium and device based on artificial intelligence
CN109660744A (en) The double recording methods of intelligence, equipment, storage medium and device based on big data
Seng et al. Video analytics for customer emotion and satisfaction at contact centers
CN110610534B (en) Automatic mouth shape animation generation method based on Actor-Critic algorithm
US11900959B2 (en) Speech emotion recognition method and apparatus
CN112420014A (en) Virtual face construction method and device, computer equipment and computer readable medium
CN109063587A (en) data processing method, storage medium and electronic equipment
CN111563422A (en) Service evaluation obtaining method and device based on bimodal emotion recognition network
CN112233698A (en) Character emotion recognition method and device, terminal device and storage medium
CN111901627B (en) Video processing method and device, storage medium and electronic equipment
CN112749869A (en) Adaptive job vacancy matching system and method
CN114218488A (en) Information recommendation method and device based on multi-modal feature fusion and processor
CN116563829A (en) Driver emotion recognition method and device, electronic equipment and storage medium
CN114429767A (en) Video generation method and device, electronic equipment and storage medium
CN113326868A (en) Decision layer fusion method for multi-modal emotion classification
CN112329748A (en) Automatic lie detection method, device, equipment and medium for interactive scene
JP2017182261A (en) Information processing apparatus, information processing method, and program
CN115438246A (en) Content evaluation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination