CN112287675A - Intelligent customer service intention understanding method based on text and voice information fusion - Google Patents
Intelligent customer service intention understanding method based on text and voice information fusion Download PDFInfo
- Publication number
- CN112287675A CN112287675A CN202011589715.7A CN202011589715A CN112287675A CN 112287675 A CN112287675 A CN 112287675A CN 202011589715 A CN202011589715 A CN 202011589715A CN 112287675 A CN112287675 A CN 112287675A
- Authority
- CN
- China
- Prior art keywords
- text
- customer service
- voice
- intelligent customer
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000004927 fusion Effects 0.000 title claims abstract description 23
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 60
- 239000000126 substance Substances 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 7
- 230000002457 bidirectional effect Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Business, Economics & Management (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an intelligent customer service intention understanding method based on text and voice information fusion, which relates to intelligent customer service products applied to vertical industries of finance, education, medical treatment and the like, under the scene of intelligent customer service application, the processing process of the invention is mainly divided into six parts of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback, and on the basis of adopting a bidirectional long-time and short-time memory deep neural network (BilsTM) to carry out intention understanding on a text, voice characteristics are introduced, and the purpose of improving the intention understanding effect is achieved in a multi-mode information fusion mode; meanwhile, by utilizing text and voice information, the cascade influence caused by voice recognition errors can be avoided to the greatest extent.
Description
Technical Field
The invention relates to an intelligent customer service product applied to financial, education, medical and other vertical industries, and mainly optimizes an intention understanding algorithm in the product by means of natural language processing and voice processing methods.
Background
Intent understanding refers to accurately understanding a user's intent on a semantic level based on user preferences, spatiotemporal characteristics, context, interactions, and content such as multimodal information including text, gestures, images, and video. In recent years, a large amount of expression and comment information of interest to such people as characters, events, products, etc., which express various speaking intentions of people, such as question consultation, request assistance, or expression dissatisfaction and complaints, etc., are generated on the internet. The real world is multimodal and interactive, so that information data of a user query object is generally multimodal as well. Therefore, in addition to the most common characters, multimodal data such as pictures, videos, and audios can be applied to assist understanding of user intention, thereby improving accuracy of information services. The intention understanding is one of four dimensions (intention understanding, service providing, smooth interaction and personality traits) for measuring the intellectualization of the intelligent customer service product, and the accurate intention understanding can greatly improve the problem solving rate and the task completing rate of the intelligent customer service and effectively improve the user satisfaction of the intelligent customer service.
The source or form of the information may be referred to as a modality. For example, the sense of touch, hearing, vision, smell, etc.; information media such as voice, video, text, etc.; the sensor can be radar, infrared, accelerometer, etc. The multi-modal fusion task generally needs to fuse the features of two or more modalities, and feature fusion is to input feature vectors of two modalities and output the fused vectors.
The traditional method has the problems that only text is used as the input of the intelligent customer service system, if the user input is voice, the voice is simply converted into the text through the voice recognition technology, and therefore important characteristics such as tone, speed, stress and the like in the voice of the user cannot be effectively analyzed.
Disclosure of Invention
Aiming at the problems, the invention aims to fully extract the modal characteristics of voice, text and the like by utilizing a multi-modal fusion technology on the basis of adopting a bidirectional long-time memory deep neural network (BilTM) to perform intention recognition on the text, and finally improve the effect of intention recognition in scenes such as intelligent customer service and the like by a method of multi-modal information fusion of the text, the voice and the like.
In order to achieve the purpose, the invention provides an intelligent customer service intention understanding method based on text and voice information fusion, which introduces voice characteristics on the basis of adopting a bidirectional long-time and short-time memory deep neural network (BilTM) to carry out intention understanding on a text, and achieves the purpose of improving the intention understanding effect in a multi-mode information fusion mode. In an intelligent customer service application scene, the processing process of the proposal mainly comprises six steps of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback.
Step 1: and (3) user input:
(1) the user accesses the intelligent customer service system through the channels of web pages, WeChat, applets, public numbers and the like, and initiates question answering or conversation in the form of voice communication or characters. If the user input is speech, the speech is converted to text for further processing and analysis via speech recognition techniques.
Step 2: text encoding:
the method adopts the BilSTM neural network to encode the text, can encode the input text from the forward direction and the reverse direction simultaneously, and ensures that the context information of each word is captured;
(1) forward scanning text by using LSTM deep neural network to obtain forward characteristic vector;
(2) Reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector;
Wherein the content of the first and second substances,is the vector resulting from forward encoding the text at time t,is the vector resulting from reverse encoding the text at time t,is the t-th word from left to right in the text,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
And step 3: and (3) voice coding:
the BilSTM neural network is adopted to encode the voice audio, and has the advantages that the input voice can be encoded from the forward direction and the reverse direction simultaneously, and the context information of each section of audio is ensured to be accurately captured, which is specifically as follows;
(1) forward scanning voice audio by using LSTM deep neural network to obtain forward characteristic vector;
(2) Reversely scanning voice audio by adopting LSTM deep neural network to obtain reverse characteristic vector;
Wherein the content of the first and second substances,is the vector resulting from forward encoding audio at time t,is the vector resulting from reverse encoding the text at time t,is the t-th segment from left to right in the audio,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
And 4, step 4: feature fusion:
and (3) performing weighted fusion on the two independent feature vectors obtained in the step (2) and the step (3) through function calculation:
wherein the content of the first and second substances,is the state of the decoder at the initial time,indicating the hidden state of the decoder at the last moment,is the word decoded at the last time instant,is the vector of the attention of the user,it is the weight of attention that is being weighted,is the jth word in the source language sentence,is the k-th word in the source language sentence,indicating the hidden state of the encoder at time T.
And 5: it is intended to understand that:
inputting the fused feature vector into a softmax function, and identifying the user intention in the intelligent customer service system;
wherein the content of the first and second substances,indicating the hidden state of the decoder at time i,is the word decoded at time i,representing the kth word in the vocabulary V,indicating a hidden stateThe confidence of (c). exp is an exponential function with a natural constant e as base, P (y)i) Indicating the currently generated target word yiThe probability of (c).
Step 6: and (3) performing feedback:
and after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a relevant solution for the user.
Compared with the prior art, the invention has the main advantages that:
(1) the invention fully utilizes the characteristics of the text and the voice by adopting a text and voice multi-mode coding technology, thereby improving the effect of understanding the intention in the intelligent customer service;
(2) the invention can ensure that the complementary information of voice and text is fully combined on the premise of not introducing other modal information in scenes such as intelligent customer service and the like;
(3) for intelligent customer service products, voice interaction is mainly used basically. Therefore, the invention simultaneously utilizes the text and the voice information, and can avoid the cascade influence caused by the voice recognition error to the maximum extent.
Drawings
FIG. 1 is a flow chart of an intelligent customer service intent understanding method of the present invention.
Detailed Description
The invention is further explained below with reference to the figures and the specific embodiments.
As shown in fig. 1, this embodiment provides an intelligent customer service intention understanding method based on text and speech information fusion, and introduces speech features on the basis of adopting a bidirectional long-and-short-term memory deep neural network BiLSTM to perform intention understanding on a text, so as to achieve the purpose of improving the intention understanding effect in a multi-modal information fusion manner. In an intelligent customer service application scene, the method mainly comprises six parts of user input, text coding, voice coding, feature fusion, intention understanding and execution feedback.
Step 1: and (3) user input:
the user accesses the intelligent customer service system through the channels of web pages, WeChat, applets, public numbers and the like, and initiates question answering or conversation in the form of voice communication or characters.
The traditional method has the problem that the important characteristics of tone, speed, stress and the like in the voice of a user cannot be analyzed due to the fact that only the text is used as the input of the intelligent customer service system. The method has the greatest advantage that under the intelligent customer service application scene, the deep neural network is fully utilized to extract two types of input information characteristics of voice and text of the user, so that the intention recognition effect is effectively improved.
Step 2: text encoding:
text coding is a common strategy in the traditional intelligent customer service system, and text features are extracted for intention analysis and understanding;
the BilSTM neural network is adopted to encode the text, and the method has the advantages that the input text can be encoded from the forward direction and the reverse direction at the same time, and the context information of each word is ensured to be captured;
Reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector;
Wherein the content of the first and second substances,is the vector resulting from forward encoding the text at time t,is the vector resulting from reverse encoding the text at time t,is the t-th word from left to right in the text,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
And step 3: and (3) voice coding:
the voice coding is an optimization strategy of the proposal, and voice characteristics are extracted for intention analysis and understanding;
the BilSTM neural network is adopted to encode the voice, and the method has the advantages that the input voice can be encoded from the forward direction and the reverse direction simultaneously, and the context information of each section of audio can be accurately captured;
Wherein the content of the first and second substances,is the vector resulting from forward encoding audio at time t,is the vector resulting from reverse encoding the text at time t,is the t-th segment from left to right in the audio,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
And 4, step 4: feature fusion:
the feature fusion is a core strategy of the proposal, and extracts text and voice features for intention analysis and understanding so as to ensure that the semantics in the text expressed by the user and the important features such as tone, speed, stress and the like in the voice are fully utilized;
and (4) decoding by using the feature vector obtained in the step (3):
wherein the content of the first and second substances,is the state of the decoder at the initial time,representing the hidden state of the decoder at the last moment,is the word decoded at the last time instant,is the vector of the attention of the user,it is the weight of attention that is being weighted,is the jth word in the source language sentence,is the k-th word in the source language sentence,indicating the hidden state of the encoder at time T.
And 5: it is intended to understand that:
through the steps 1-4, the fusion characteristic vector of the text and the voice is obtained, and the vector is input into the softmax function, so that the user intention can be identified in the intelligent customer service system, the user idea can be accurately known, high-quality service is provided, and better user experience is created.
Wherein the content of the first and second substances,indicating the hidden state of the decoder at time i,is the word decoded at time i,representing the kth word in the vocabulary V,a hidden state is represented in the form of a hidden state,the confidence of (c). exp is an exponential function with a natural constant e as base, P (y)i) Indicating the currently generated target word yiThe probability of (c).
Step 6: and (3) performing feedback:
after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a relevant solution for the user;
according to the method for fusing the text information and the voice information in the intelligent customer service application product, the question intention of the user is jointly inferred by combining the characteristics of the two parts, so that the effect in the intelligent customer service is effectively improved.
The technical idea of the present invention is described above only, and the scope of the present invention should not be limited thereby, and any modifications made on the basis of the technical solutions according to the technical idea of the present invention are within the scope of the present invention. The technology not related to the invention can be realized by the prior art.
Claims (5)
1. An intelligent customer service intention understanding method based on text and voice information fusion is characterized in that: the method comprises the following steps:
step 1: and (3) user input: a user accesses the intelligent customer service system through a webpage, a WeChat, an applet or a channel in a public number, and initiates a question and answer or a conversation in a voice call or character mode;
step 2: text encoding: the method comprises the steps of coding a text by using a BilSTM neural network, coding an input text from a forward direction and a reverse direction simultaneously, and accurately capturing context information of each word to obtain a feature vector;
and step 3: and (3) voice coding: coding voice audio by using a BilSTM neural network, coding input voice from forward direction and reverse direction simultaneously, and accurately capturing context information of each section of audio to obtain a feature vector;
and 4, step 4: feature fusion: performing weighted fusion on the two independent feature vectors obtained in the step 2 and the step 3 through function calculation;
and 5: it is intended to understand that: inputting the fused feature vector into a softmax function, and identifying the user intention in the intelligent customer service system;
step 6: and (3) performing feedback: and after the intelligent customer service system correctly understands the questioning intention of the user, matching the questioning intention with a knowledge base maintained by a background, and recommending a solution for the user.
2. The intelligent customer service intention understanding method of claim 1, wherein: the step 2 specifically comprises the following steps:
step 2.1: forward scanning text by using LSTM deep neural network to obtain forward characteristic vector;
Step 2.2: reversely scanning the text by adopting an LSTM deep neural network to obtain a reverse characteristic vector;
Wherein the content of the first and second substances,is the vector resulting from forward encoding the text at time t,is the vector resulting from reverse encoding the text at time t,is the t-th word from left to right in the text,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
3. The intelligent customer service intention understanding method of claim 2, wherein: the step 3 specifically comprises the following steps:
step 3.1: forward scanning voice audio by using LSTM deep neural network to obtain forward characteristic vector;
Step 3.2: reversely scanning voice audio by adopting LSTM deep neural network to obtain reverse characteristic vector;
Wherein the content of the first and second substances,is the vector resulting from forward encoding audio at time t,is the vector resulting from reverse encoding the text at time t,is the t-th segment from left to right in the audio,is a hidden state at the moment t of forward coding,is a hidden state at the moment of reverse coding t +1,is a hidden state at the moment of reverse coding t +1,representing the concatenation of the two vectors,a bi-directional coded vector representing the text at time t.
4. The intelligent customer service intention understanding method of claim 3, wherein: the specific process of decoding in step 4 is as follows:
wherein the content of the first and second substances,is the state of the decoder at the initial time,representing the hidden state of the decoder at the last moment,is the word decoded at the last time instant,is the vector of the attention of the user,it is the weight of attention that is being weighted,is the jth word in the source language sentence,is the k-th word in the source language sentence,indicating the hidden state of the encoder at time T.
5. The intelligent customer service intention understanding method of claim 1, wherein: soft in the step 5The max function identifies the fused feature vector as follows:
wherein the content of the first and second substances,indicating the hidden state of the decoder at time i,is the word decoded at time i,representing the kth word in the vocabulary V,indicating a hidden stateExp is an exponential function based on a natural constant e, P (y)i) Indicating the currently generated target word yiThe probability of (c).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011589715.7A CN112287675B (en) | 2020-12-29 | 2020-12-29 | Intelligent customer service intention understanding method based on text and voice information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011589715.7A CN112287675B (en) | 2020-12-29 | 2020-12-29 | Intelligent customer service intention understanding method based on text and voice information fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112287675A true CN112287675A (en) | 2021-01-29 |
CN112287675B CN112287675B (en) | 2021-04-30 |
Family
ID=74426212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011589715.7A Active CN112287675B (en) | 2020-12-29 | 2020-12-29 | Intelligent customer service intention understanding method based on text and voice information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112287675B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113053366A (en) * | 2021-03-12 | 2021-06-29 | 中国电子科技集团公司第二十八研究所 | Controlled voice repeat consistency checking method based on multi-mode fusion |
CN114373448A (en) * | 2022-03-22 | 2022-04-19 | 北京沃丰时代数据科技有限公司 | Topic detection method and device, electronic equipment and storage medium |
CN115187345A (en) * | 2022-09-13 | 2022-10-14 | 深圳装速配科技有限公司 | Intelligent household building material recommendation method, device, equipment and storage medium |
CN115760022A (en) * | 2023-01-10 | 2023-03-07 | 广州佰锐网络科技有限公司 | Intelligent financial business handling method, system and medium |
WO2023131207A1 (en) * | 2022-01-07 | 2023-07-13 | Huawei Technologies Co., Ltd. | Methods and systems for streamable multimodal language understanding |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI823195B (en) * | 2021-11-25 | 2023-11-21 | 荷蘭商荷蘭移動驅動器公司 | Intelligent recommendation method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188361A (en) * | 2019-06-10 | 2019-08-30 | 北京智合大方科技有限公司 | Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics |
CN111145786A (en) * | 2019-12-17 | 2020-05-12 | 深圳追一科技有限公司 | Speech emotion recognition method and device, server and computer readable storage medium |
-
2020
- 2020-12-29 CN CN202011589715.7A patent/CN112287675B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188361A (en) * | 2019-06-10 | 2019-08-30 | 北京智合大方科技有限公司 | Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics |
CN111145786A (en) * | 2019-12-17 | 2020-05-12 | 深圳追一科技有限公司 | Speech emotion recognition method and device, server and computer readable storage medium |
Non-Patent Citations (3)
Title |
---|
季学武等: "基于LSTM网络的驾驶意图识别及车辆轨迹预测", 《中国公路学报》 * |
宁义双: "智能语音交互中的用户意图理解与反馈生成研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
郑彬彬等: "基于多模态信息融合的语音意图理解方法", 《中国科技论文在线》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113053366A (en) * | 2021-03-12 | 2021-06-29 | 中国电子科技集团公司第二十八研究所 | Controlled voice repeat consistency checking method based on multi-mode fusion |
CN113053366B (en) * | 2021-03-12 | 2023-11-21 | 中国电子科技集团公司第二十八研究所 | Multi-mode fusion-based control voice duplicate consistency verification method |
WO2023131207A1 (en) * | 2022-01-07 | 2023-07-13 | Huawei Technologies Co., Ltd. | Methods and systems for streamable multimodal language understanding |
CN114373448A (en) * | 2022-03-22 | 2022-04-19 | 北京沃丰时代数据科技有限公司 | Topic detection method and device, electronic equipment and storage medium |
CN115187345A (en) * | 2022-09-13 | 2022-10-14 | 深圳装速配科技有限公司 | Intelligent household building material recommendation method, device, equipment and storage medium |
CN115760022A (en) * | 2023-01-10 | 2023-03-07 | 广州佰锐网络科技有限公司 | Intelligent financial business handling method, system and medium |
Also Published As
Publication number | Publication date |
---|---|
CN112287675B (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112287675B (en) | Intelligent customer service intention understanding method based on text and voice information fusion | |
Ma et al. | Visual speech recognition for multiple languages in the wild | |
Cheng et al. | Fully convolutional networks for continuous sign language recognition | |
CN110751208B (en) | Criminal emotion recognition method for multi-mode feature fusion based on self-weight differential encoder | |
WO2021072875A1 (en) | Intelligent dialogue generation method, device, computer apparatus and computer storage medium | |
US20200082928A1 (en) | Assisting psychological cure in automated chatting | |
CN114401438B (en) | Video generation method and device for virtual digital person, storage medium and terminal | |
CN112181127A (en) | Method and device for man-machine interaction | |
CN112101045B (en) | Multi-mode semantic integrity recognition method and device and electronic equipment | |
US12008336B2 (en) | Multimodal translation method, apparatus, electronic device and computer-readable storage medium | |
De Coster et al. | Machine translation from signed to spoken languages: State of the art and challenges | |
CN115577161A (en) | Multi-mode emotion analysis model fusing emotion resources | |
WO2023226239A1 (en) | Object emotion analysis method and apparatus and electronic device | |
CN113392265A (en) | Multimedia processing method, device and equipment | |
CN115563290A (en) | Intelligent emotion recognition method based on context modeling | |
CN115599894A (en) | Emotion recognition method and device, electronic equipment and storage medium | |
CN116561265A (en) | Personalized dialogue generation method, model training method and device | |
Chakraborty et al. | Analyzing emotion in spontaneous speech | |
CN115827854A (en) | Voice abstract generation model training method, voice abstract generation method and device | |
CN111354362A (en) | Method and device for assisting hearing-impaired communication | |
CN114463688A (en) | Cross-modal context coding dialogue emotion recognition method and system | |
Xue et al. | Lcsnet: End-to-end lipreading with channel-aware feature selection | |
Pu et al. | Review on research progress of machine lip reading | |
CN114281948A (en) | Summary determination method and related equipment thereof | |
WO2021169825A1 (en) | Speech synthesis method and apparatus, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |