CN112287675A - Intelligent customer service intention understanding method based on text and voice information fusion - Google Patents

Intelligent customer service intention understanding method based on text and voice information fusion Download PDF

Info

Publication number
CN112287675A
CN112287675A CN202011589715.7A CN202011589715A CN112287675A CN 112287675 A CN112287675 A CN 112287675A CN 202011589715 A CN202011589715 A CN 202011589715A CN 112287675 A CN112287675 A CN 112287675A
Authority
CN
China
Prior art keywords
text
customer service
voice
intelligent customer
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011589715.7A
Other languages
Chinese (zh)
Other versions
CN112287675B (en
Inventor
张学强
董晓飞
张丹
曹峰
石霖
孙明俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing New Generation Artificial Intelligence Research Institute Co ltd
Original Assignee
Nanjing New Generation Artificial Intelligence Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing New Generation Artificial Intelligence Research Institute Co ltd filed Critical Nanjing New Generation Artificial Intelligence Research Institute Co ltd
Priority to CN202011589715.7A priority Critical patent/CN112287675B/en
Publication of CN112287675A publication Critical patent/CN112287675A/en
Application granted granted Critical
Publication of CN112287675B publication Critical patent/CN112287675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent customer service intention understanding method based on text and voice information fusion, which relates to intelligent customer service products applied to vertical industries of finance, education, medical treatment and the like, under the scene of intelligent customer service application, the processing process of the invention is mainly divided into six parts of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback, and on the basis of adopting a bidirectional long-time and short-time memory deep neural network (BilsTM) to carry out intention understanding on a text, voice characteristics are introduced, and the purpose of improving the intention understanding effect is achieved in a multi-mode information fusion mode; meanwhile, by utilizing text and voice information, the cascade influence caused by voice recognition errors can be avoided to the greatest extent.

Description

Intelligent customer service intention understanding method based on text and voice information fusion
Technical Field
The invention relates to an intelligent customer service product applied to financial, education, medical and other vertical industries, and mainly optimizes an intention understanding algorithm in the product by means of natural language processing and voice processing methods.
Background
Intent understanding refers to accurately understanding a user's intent on a semantic level based on user preferences, spatiotemporal characteristics, context, interactions, and content such as multimodal information including text, gestures, images, and video. In recent years, a large amount of expression and comment information of interest to such people as characters, events, products, etc., which express various speaking intentions of people, such as question consultation, request assistance, or expression dissatisfaction and complaints, etc., are generated on the internet. The real world is multimodal and interactive, so that information data of a user query object is generally multimodal as well. Therefore, in addition to the most common characters, multimodal data such as pictures, videos, and audios can be applied to assist understanding of user intention, thereby improving accuracy of information services. The intention understanding is one of four dimensions (intention understanding, service providing, smooth interaction and personality traits) for measuring the intellectualization of the intelligent customer service product, and the accurate intention understanding can greatly improve the problem solving rate and the task completing rate of the intelligent customer service and effectively improve the user satisfaction of the intelligent customer service.
The source or form of the information may be referred to as a modality. For example, the sense of touch, hearing, vision, smell, etc.; information media such as voice, video, text, etc.; the sensor can be radar, infrared, accelerometer, etc. The multi-modal fusion task generally needs to fuse the features of two or more modalities, and feature fusion is to input feature vectors of two modalities and output the fused vectors.
The traditional method has the problems that only text is used as the input of the intelligent customer service system, if the user input is voice, the voice is simply converted into the text through the voice recognition technology, and therefore important characteristics such as tone, speed, stress and the like in the voice of the user cannot be effectively analyzed.
Disclosure of Invention
Aiming at the problems, the invention aims to fully extract the modal characteristics of voice, text and the like by utilizing a multi-modal fusion technology on the basis of adopting a bidirectional long-time memory deep neural network (BilTM) to perform intention recognition on the text, and finally improve the effect of intention recognition in scenes such as intelligent customer service and the like by a method of multi-modal information fusion of the text, the voice and the like.
In order to achieve the purpose, the invention provides an intelligent customer service intention understanding method based on text and voice information fusion, which introduces voice characteristics on the basis of adopting a bidirectional long-time and short-time memory deep neural network (BilTM) to carry out intention understanding on a text, and achieves the purpose of improving the intention understanding effect in a multi-mode information fusion mode. In an intelligent customer service application scene, the processing process of the proposal mainly comprises six steps of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback.
Step 1: and (3) user input:
(1) the user accesses the intelligent customer service system through the channels of web pages, WeChat, applets, public numbers and the like, and initiates question answering or conversation in the form of voice communication or characters. If the user input is speech, the speech is converted to text for further processing and analysis via speech recognition techniques.
Step 2: text encoding:
the method adopts the BilSTM neural network to encode the text, can encode the input text from the forward direction and the reverse direction simultaneously, and ensures that the context information of each word is captured;
(1) forward scanning text by using LSTM deep neural network to obtain forward characteristic vector
Figure 339791DEST_PATH_IMAGE001
(2) Reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector
Figure 984880DEST_PATH_IMAGE002
(3) Splicing the feature vectors of the two parts of the text to obtain the feature vector
Figure 742621DEST_PATH_IMAGE003
Figure 46563DEST_PATH_IMAGE004
Figure 180741DEST_PATH_IMAGE005
Figure 620950DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure 815171DEST_PATH_IMAGE001
is the vector resulting from forward encoding the text at time t,
Figure 609339DEST_PATH_IMAGE002
is the vector resulting from reverse encoding the text at time t,
Figure 219312DEST_PATH_IMAGE007
is the t-th word from left to right in the text,
Figure 779606DEST_PATH_IMAGE008
is a hidden state at the moment t of forward coding,
Figure 613570DEST_PATH_IMAGE009
is a hidden state at the moment of reverse coding t +1,
Figure 892105DEST_PATH_IMAGE010
representing the concatenation of the two vectors,
Figure 368085DEST_PATH_IMAGE003
a bi-directional coded vector representing the text at time t.
And step 3: and (3) voice coding:
the BilSTM neural network is adopted to encode the voice audio, and has the advantages that the input voice can be encoded from the forward direction and the reverse direction simultaneously, and the context information of each section of audio is ensured to be accurately captured, which is specifically as follows;
(1) forward scanning voice audio by using LSTM deep neural network to obtain forward characteristic vector
Figure 517307DEST_PATH_IMAGE012
(2) Reversely scanning voice audio by adopting LSTM deep neural network to obtain reverse characteristic vector
Figure 808259DEST_PATH_IMAGE014
(3) Splicing the feature vectors of the two parts of the voice to obtain
Figure 105248DEST_PATH_IMAGE016
Figure 853761DEST_PATH_IMAGE017
Figure 857490DEST_PATH_IMAGE018
Figure 33256DEST_PATH_IMAGE019
Wherein the content of the first and second substances,
Figure 348700DEST_PATH_IMAGE020
is the vector resulting from forward encoding audio at time t,
Figure 838587DEST_PATH_IMAGE021
is the vector resulting from reverse encoding the text at time t,
Figure 230910DEST_PATH_IMAGE022
is the t-th segment from left to right in the audio,
Figure 108736DEST_PATH_IMAGE023
is a hidden state at the moment t of forward coding,
Figure 318000DEST_PATH_IMAGE024
is a hidden state at the moment of reverse coding t +1,
Figure 736212DEST_PATH_IMAGE025
representing the concatenation of the two vectors,
Figure 245691DEST_PATH_IMAGE026
a bi-directional coded vector representing the text at time t.
And 4, step 4: feature fusion:
and (3) performing weighted fusion on the two independent feature vectors obtained in the step (2) and the step (3) through function calculation:
Figure 559997DEST_PATH_IMAGE027
]
Figure 253628DEST_PATH_IMAGE028
Figure 334586DEST_PATH_IMAGE029
Figure 167412DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure 121462DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 305319DEST_PATH_IMAGE032
is the state of the decoder at the initial time,
Figure 740367DEST_PATH_IMAGE033
indicating the hidden state of the decoder at the last moment,
Figure 958858DEST_PATH_IMAGE034
is the word decoded at the last time instant,
Figure 818230DEST_PATH_IMAGE035
is the vector of the attention of the user,
Figure 223804DEST_PATH_IMAGE036
it is the weight of attention that is being weighted,
Figure 990771DEST_PATH_IMAGE037
is the jth word in the source language sentence,
Figure 594928DEST_PATH_IMAGE038
is the k-th word in the source language sentence,
Figure 359622DEST_PATH_IMAGE040
indicating the hidden state of the encoder at time T.
And 5: it is intended to understand that:
inputting the fused feature vector into a softmax function, and identifying the user intention in the intelligent customer service system;
Figure 515141DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 85799DEST_PATH_IMAGE042
indicating the hidden state of the decoder at time i,
Figure 13304DEST_PATH_IMAGE043
is the word decoded at time i,
Figure 214478DEST_PATH_IMAGE044
representing the kth word in the vocabulary V,
Figure 860223DEST_PATH_IMAGE045
indicating a hidden state
Figure 703414DEST_PATH_IMAGE042
The confidence of (c). exp is an exponential function with a natural constant e as base, P (y)i) Indicating the currently generated target word yiThe probability of (c).
Step 6: and (3) performing feedback:
and after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a relevant solution for the user.
Compared with the prior art, the invention has the main advantages that:
(1) the invention fully utilizes the characteristics of the text and the voice by adopting a text and voice multi-mode coding technology, thereby improving the effect of understanding the intention in the intelligent customer service;
(2) the invention can ensure that the complementary information of voice and text is fully combined on the premise of not introducing other modal information in scenes such as intelligent customer service and the like;
(3) for intelligent customer service products, voice interaction is mainly used basically. Therefore, the invention simultaneously utilizes the text and the voice information, and can avoid the cascade influence caused by the voice recognition error to the maximum extent.
Drawings
FIG. 1 is a flow chart of an intelligent customer service intent understanding method of the present invention.
Detailed Description
The invention is further explained below with reference to the figures and the specific embodiments.
As shown in fig. 1, this embodiment provides an intelligent customer service intention understanding method based on text and speech information fusion, and introduces speech features on the basis of adopting a bidirectional long-and-short-term memory deep neural network BiLSTM to perform intention understanding on a text, so as to achieve the purpose of improving the intention understanding effect in a multi-modal information fusion manner. In an intelligent customer service application scene, the method mainly comprises six parts of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback.
Step 1: and (3) user input:
the user accesses the intelligent customer service system through the channels of web pages, WeChat, applets, public numbers and the like, and initiates question answering or conversation in the form of voice communication or characters.
The traditional method has the problem that the important characteristics of tone, speed, stress and the like in the voice of a user cannot be analyzed due to the fact that only the text is used as the input of the intelligent customer service system. The method has the greatest advantage that under the intelligent customer service application scene, the deep neural network is fully utilized to extract two types of input information characteristics of voice and text of the user, so that the intention recognition effect is effectively improved.
Step 2: text encoding:
text coding is a common strategy in the traditional intelligent customer service system, and text features are extracted for intention analysis and understanding;
the BilSTM neural network is adopted to encode the text, and the method has the advantages that the input text can be encoded from the forward direction and the reverse direction at the same time, and the context information of each word is ensured to be captured;
forward scanning text by using LSTM deep neural network to obtain forward characteristic vector
Figure 285093DEST_PATH_IMAGE046
Reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector
Figure 453906DEST_PATH_IMAGE047
Splicing the feature vectors of the two parts of the text to obtain the feature vector
Figure 55789DEST_PATH_IMAGE048
Figure 702671DEST_PATH_IMAGE004
Figure 401505DEST_PATH_IMAGE005
Figure 678903DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure 827469DEST_PATH_IMAGE001
is the vector resulting from forward encoding the text at time t,
Figure 746883DEST_PATH_IMAGE002
is the vector resulting from reverse encoding the text at time t,
Figure 769066DEST_PATH_IMAGE007
is the t-th word from left to right in the text,
Figure 686206DEST_PATH_IMAGE008
is a hidden state at the moment t of forward coding,
Figure 528261DEST_PATH_IMAGE009
is a hidden state at the moment of reverse coding t +1,
Figure 844841DEST_PATH_IMAGE010
representing the concatenation of the two vectors,
Figure 721530DEST_PATH_IMAGE003
a bi-directional coded vector representing the text at time t.
And step 3: and (3) voice coding:
the voice coding is an optimization strategy of the proposal, and voice characteristics are extracted for intention analysis and understanding;
the BilSTM neural network is adopted to encode the voice, and the method has the advantages that the input voice can be encoded from the forward direction and the reverse direction simultaneously, and the context information of each section of audio can be accurately captured;
forward scanning voice by using LSTM deep neural network to obtain forward characteristic vector
Figure 671556DEST_PATH_IMAGE049
Reversely scanning voice by using LSTM deep neural network to obtain reverse characteristic vector
Figure 125540DEST_PATH_IMAGE050
Splicing the feature vectors of the two parts of the voice to obtain
Figure 980233DEST_PATH_IMAGE052
Figure 914691DEST_PATH_IMAGE053
Figure 990879DEST_PATH_IMAGE054
Figure 135422DEST_PATH_IMAGE055
Wherein the content of the first and second substances,
Figure 465909DEST_PATH_IMAGE049
is the vector resulting from forward encoding audio at time t,
Figure 379507DEST_PATH_IMAGE050
is the vector resulting from reverse encoding the text at time t,
Figure 137247DEST_PATH_IMAGE056
is the t-th segment from left to right in the audio,
Figure 444120DEST_PATH_IMAGE057
is a hidden state at the moment t of forward coding,
Figure 47139DEST_PATH_IMAGE058
is a hidden state at the moment of reverse coding t +1,
Figure 221769DEST_PATH_IMAGE059
representing the concatenation of the two vectors,
Figure 681569DEST_PATH_IMAGE060
a bi-directional coded vector representing the text at time t.
And 4, step 4: feature fusion:
the feature fusion is a core strategy of the proposal, and extracts text and voice features for intention analysis and understanding so as to ensure that the semantics in the text expressed by the user and the important features such as tone, speed, stress and the like in the voice are fully utilized;
and (4) decoding by using the feature vector obtained in the step (3):
Figure 410490DEST_PATH_IMAGE062
]
Figure 613939DEST_PATH_IMAGE064
Figure 174233DEST_PATH_IMAGE066
Figure 598742DEST_PATH_IMAGE067
wherein the content of the first and second substances,
Figure 611698DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 884416DEST_PATH_IMAGE069
is the state of the decoder at the initial time,
Figure 299217DEST_PATH_IMAGE070
representing the hidden state of the decoder at the last moment,
Figure 507344DEST_PATH_IMAGE071
is the word decoded at the last time instant,
Figure 538754DEST_PATH_IMAGE072
is the vector of the attention of the user,
Figure 290197DEST_PATH_IMAGE036
it is the weight of attention that is being weighted,
Figure 559504DEST_PATH_IMAGE073
is the jth word in the source language sentence,
Figure 63167DEST_PATH_IMAGE074
is the k-th word in the source language sentence,
Figure DEST_PATH_IMAGE076AA
indicating the hidden state of the encoder at time T.
And 5: it is intended to understand that:
through the steps 1-4, the fusion characteristic vector of the text and the voice is obtained, and the vector is input into the softmax function, so that the user intention can be identified in the intelligent customer service system, the user idea can be accurately known, high-quality service is provided, and better user experience is created.
Figure 952845DEST_PATH_IMAGE077
Wherein the content of the first and second substances,
Figure 239470DEST_PATH_IMAGE078
indicating the hidden state of the decoder at time i,
Figure 363283DEST_PATH_IMAGE079
is the word decoded at time i,
Figure 444372DEST_PATH_IMAGE044
representing the kth word in the vocabulary V,
Figure 919216DEST_PATH_IMAGE045
a hidden state is represented in the form of a hidden state,
Figure 278040DEST_PATH_IMAGE080
the confidence of (c). exp is an exponential function with a natural constant e as base, P (y)i) Indicating the currently generated target word yiThe probability of (c).
Step 6: and (3) performing feedback:
after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a relevant solution for the user;
according to the method for fusing the text information and the voice information in the intelligent customer service application product, the question intention of the user is jointly inferred by combining the characteristics of the two parts, so that the effect in the intelligent customer service is effectively improved.
The technical idea of the present invention is described above only, and the scope of the present invention should not be limited thereby, and any modifications made on the basis of the technical solutions according to the technical idea of the present invention are within the scope of the present invention. The technology not related to the invention can be realized by the prior art.

Claims (5)

1. An intelligent customer service intention understanding method based on text and voice information fusion is characterized in that: the method comprises the following steps:
step 1: and (3) user input: a user accesses the intelligent customer service system through a webpage, a WeChat, an applet or a channel in a public number, and initiates a question and answer or a conversation in a voice call or character mode;
step 2: text encoding: the method comprises the steps of coding a text by using a BilSTM neural network, coding an input text from a forward direction and a reverse direction simultaneously, and accurately capturing context information of each word to obtain a feature vector;
and step 3: and (3) voice coding: coding voice audio by using a BilSTM neural network, coding input voice from forward direction and reverse direction simultaneously, and accurately capturing context information of each section of audio to obtain a feature vector;
and 4, step 4: feature fusion: performing weighted fusion on the two independent feature vectors obtained in the step 2 and the step 3 through function calculation;
and 5: it is intended to understand that: inputting the fused feature vector into a softmax function, and identifying the user intention in the intelligent customer service system;
step 6: and (3) performing feedback: and after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a solution for the user.
2. The intelligent customer service intention understanding method of claim 1, wherein: the step 2 specifically comprises the following steps:
step 2.1: forward scanning text by using LSTM deep neural network to obtain forward characteristic vector
Figure 60228DEST_PATH_IMAGE001
Step 2.2: reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector
Figure 247627DEST_PATH_IMAGE002
Step 2.3: splicing the feature vectors of the two parts of the text to obtain the feature vector
Figure 967321DEST_PATH_IMAGE003
Figure 949184DEST_PATH_IMAGE004
Figure 754329DEST_PATH_IMAGE005
Figure 742489DEST_PATH_IMAGE006
Wherein the content of the first and second substances,
Figure 254373DEST_PATH_IMAGE001
is the vector resulting from forward encoding the text at time t,
Figure 469453DEST_PATH_IMAGE002
is the vector resulting from reverse encoding the text at time t,
Figure 433998DEST_PATH_IMAGE007
is the t-th word from left to right in the text,
Figure 25517DEST_PATH_IMAGE008
is a hidden state at the moment t of forward coding,
Figure 391907DEST_PATH_IMAGE009
is a hidden state at the moment of reverse coding t +1,
Figure 777889DEST_PATH_IMAGE010
representing the concatenation of the two vectors,
Figure 495309DEST_PATH_IMAGE003
a bi-directional coded vector representing the text at time t.
3. The intelligent customer service intention understanding method of claim 2, wherein: the step 3 specifically comprises the following steps:
step 3.1: forward scanning voice audio by using LSTM deep neural network to obtain forward characteristic vector
Figure 624939DEST_PATH_IMAGE011
Step 3.2: reversely scanning voice audio by adopting LSTM deep neural network to obtain reverse characteristic vector
Figure 111415DEST_PATH_IMAGE012
Step 3.3: splicing the feature vectors of the two parts of the voice to obtain
Figure 868631DEST_PATH_IMAGE013
Figure 870085DEST_PATH_IMAGE014
Figure 741089DEST_PATH_IMAGE015
Figure 878810DEST_PATH_IMAGE016
Wherein the content of the first and second substances,
Figure 75436DEST_PATH_IMAGE011
is the vector resulting from forward encoding audio at time t,
Figure 501869DEST_PATH_IMAGE012
is the vector resulting from reverse encoding the text at time t,
Figure 973302DEST_PATH_IMAGE017
is the t-th segment from left to right in the audio,
Figure 168791DEST_PATH_IMAGE018
is a hidden state at the moment t of forward coding,
Figure 333056DEST_PATH_IMAGE019
is a hidden state at the moment of reverse coding t +1,
Figure 981206DEST_PATH_IMAGE021
is a hidden state at the moment of reverse coding t +1,
Figure 256330DEST_PATH_IMAGE022
representing the concatenation of the two vectors,
Figure 568975DEST_PATH_IMAGE013
a bi-directional coded vector representing the text at time t.
4. The intelligent customer service intention understanding method of claim 3, wherein: the specific process of decoding in step 4 is as follows:
Figure 638562DEST_PATH_IMAGE023
]
Figure 39587DEST_PATH_IMAGE024
Figure 56085DEST_PATH_IMAGE025
Figure 288483DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 466655DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 417293DEST_PATH_IMAGE028
is the state of the decoder at the initial time,
Figure 971902DEST_PATH_IMAGE029
representing the hidden state of the decoder at the last moment,
Figure 58807DEST_PATH_IMAGE030
is the word decoded at the last time instant,
Figure 673459DEST_PATH_IMAGE031
is the vector of the attention of the user,
Figure 842885DEST_PATH_IMAGE032
it is the weight of attention that is being weighted,
Figure 935606DEST_PATH_IMAGE033
is the jth word in the source language sentence,
Figure 345859DEST_PATH_IMAGE034
is the k-th word in the source language sentence,
Figure 928150DEST_PATH_IMAGE035
indicating the hidden state of the encoder at time T.
5. The intelligent customer service intention understanding method of claim 1, wherein: soft in the step 5The max function identifies the fused feature vector as follows:
Figure 525484DEST_PATH_IMAGE036
wherein the content of the first and second substances,
Figure 484213DEST_PATH_IMAGE037
indicating the hidden state of the decoder at time i,
Figure 483393DEST_PATH_IMAGE038
is the word decoded at time i,
Figure 174268DEST_PATH_IMAGE039
representing the kth word in the vocabulary V,
Figure 321216DEST_PATH_IMAGE040
indicating a hidden state
Figure 752810DEST_PATH_IMAGE037
Exp is an exponential function based on a natural constant e, P (y)i) Indicating the currently generated target word yiThe probability of (c).
CN202011589715.7A 2020-12-29 2020-12-29 Intelligent customer service intention understanding method based on text and voice information fusion Active CN112287675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011589715.7A CN112287675B (en) 2020-12-29 2020-12-29 Intelligent customer service intention understanding method based on text and voice information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011589715.7A CN112287675B (en) 2020-12-29 2020-12-29 Intelligent customer service intention understanding method based on text and voice information fusion

Publications (2)

Publication Number Publication Date
CN112287675A true CN112287675A (en) 2021-01-29
CN112287675B CN112287675B (en) 2021-04-30

Family

ID=74426212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011589715.7A Active CN112287675B (en) 2020-12-29 2020-12-29 Intelligent customer service intention understanding method based on text and voice information fusion

Country Status (1)

Country Link
CN (1) CN112287675B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053366A (en) * 2021-03-12 2021-06-29 中国电子科技集团公司第二十八研究所 Controlled voice repeat consistency checking method based on multi-mode fusion
CN114373448A (en) * 2022-03-22 2022-04-19 北京沃丰时代数据科技有限公司 Topic detection method and device, electronic equipment and storage medium
CN115187345A (en) * 2022-09-13 2022-10-14 深圳装速配科技有限公司 Intelligent household building material recommendation method, device, equipment and storage medium
CN115760022A (en) * 2023-01-10 2023-03-07 广州佰锐网络科技有限公司 Intelligent financial business handling method, system and medium
WO2023131207A1 (en) * 2022-01-07 2023-07-13 Huawei Technologies Co., Ltd. Methods and systems for streamable multimodal language understanding

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI823195B (en) * 2021-11-25 2023-11-21 荷蘭商荷蘭移動驅動器公司 Intelligent recommendation method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188361A (en) * 2019-06-10 2019-08-30 北京智合大方科技有限公司 Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics
CN111145786A (en) * 2019-12-17 2020-05-12 深圳追一科技有限公司 Speech emotion recognition method and device, server and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188361A (en) * 2019-06-10 2019-08-30 北京智合大方科技有限公司 Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics
CN111145786A (en) * 2019-12-17 2020-05-12 深圳追一科技有限公司 Speech emotion recognition method and device, server and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
季学武等: "基于LSTM网络的驾驶意图识别及车辆轨迹预测", 《中国公路学报》 *
宁义双: "智能语音交互中的用户意图理解与反馈生成研究", 《中国博士学位论文全文数据库 信息科技辑》 *
郑彬彬等: "基于多模态信息融合的语音意图理解方法", 《中国科技论文在线》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053366A (en) * 2021-03-12 2021-06-29 中国电子科技集团公司第二十八研究所 Controlled voice repeat consistency checking method based on multi-mode fusion
CN113053366B (en) * 2021-03-12 2023-11-21 中国电子科技集团公司第二十八研究所 Multi-mode fusion-based control voice duplicate consistency verification method
WO2023131207A1 (en) * 2022-01-07 2023-07-13 Huawei Technologies Co., Ltd. Methods and systems for streamable multimodal language understanding
CN114373448A (en) * 2022-03-22 2022-04-19 北京沃丰时代数据科技有限公司 Topic detection method and device, electronic equipment and storage medium
CN115187345A (en) * 2022-09-13 2022-10-14 深圳装速配科技有限公司 Intelligent household building material recommendation method, device, equipment and storage medium
CN115760022A (en) * 2023-01-10 2023-03-07 广州佰锐网络科技有限公司 Intelligent financial business handling method, system and medium

Also Published As

Publication number Publication date
CN112287675B (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN112287675B (en) Intelligent customer service intention understanding method based on text and voice information fusion
Ma et al. Visual speech recognition for multiple languages in the wild
Cheng et al. Fully convolutional networks for continuous sign language recognition
CN110751208B (en) Criminal emotion recognition method for multi-mode feature fusion based on self-weight differential encoder
WO2021072875A1 (en) Intelligent dialogue generation method, device, computer apparatus and computer storage medium
US20200082928A1 (en) Assisting psychological cure in automated chatting
CN114401438B (en) Video generation method and device for virtual digital person, storage medium and terminal
CN112181127A (en) Method and device for man-machine interaction
CN112101045B (en) Multi-mode semantic integrity recognition method and device and electronic equipment
US12008336B2 (en) Multimodal translation method, apparatus, electronic device and computer-readable storage medium
De Coster et al. Machine translation from signed to spoken languages: State of the art and challenges
CN115577161A (en) Multi-mode emotion analysis model fusing emotion resources
WO2023226239A1 (en) Object emotion analysis method and apparatus and electronic device
CN113392265A (en) Multimedia processing method, device and equipment
CN115563290A (en) Intelligent emotion recognition method based on context modeling
CN115599894A (en) Emotion recognition method and device, electronic equipment and storage medium
CN116561265A (en) Personalized dialogue generation method, model training method and device
Chakraborty et al. Analyzing emotion in spontaneous speech
CN115827854A (en) Voice abstract generation model training method, voice abstract generation method and device
CN111354362A (en) Method and device for assisting hearing-impaired communication
CN114463688A (en) Cross-modal context coding dialogue emotion recognition method and system
Xue et al. Lcsnet: End-to-end lipreading with channel-aware feature selection
Pu et al. Review on research progress of machine lip reading
CN114281948A (en) Summary determination method and related equipment thereof
WO2021169825A1 (en) Speech synthesis method and apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant