WO2023097745A1 - 一种基于深度学习的智能交互方法、***及终端 - Google Patents

一种基于深度学习的智能交互方法、***及终端 Download PDF

Info

Publication number
WO2023097745A1
WO2023097745A1 PCT/CN2021/136927 CN2021136927W WO2023097745A1 WO 2023097745 A1 WO2023097745 A1 WO 2023097745A1 CN 2021136927 W CN2021136927 W CN 2021136927W WO 2023097745 A1 WO2023097745 A1 WO 2023097745A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
deep learning
language
intonation
Prior art date
Application number
PCT/CN2021/136927
Other languages
English (en)
French (fr)
Inventor
张庆茂
刘培刚
Original Assignee
山东远联信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东远联信息科技有限公司 filed Critical 山东远联信息科技有限公司
Publication of WO2023097745A1 publication Critical patent/WO2023097745A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • This application relates to the field of artificial intelligence interaction technology, in particular to an intelligent interaction method, system and terminal based on deep learning.
  • Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing and expert systems, etc. Since the birth of artificial intelligence, the theory and technology have become increasingly mature, and the application fields have also continued to expand. It can be imagined that the technological products brought by artificial intelligence in the future will be the "container" of human wisdom. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but it can think like human beings, and it may surpass human intelligence.
  • speech recognition and natural language processing are widely used in smart terminals and online customer service in the service industry, such as mobile, China Unicom, China Telecom and other operators, as well as government service hotlines.
  • the artificial intelligence dialogue in traditional technology generally sets a fixed dialogue template. When the user accesses it, the intelligent customer service will guide the user to make their own request through the templated language through the guiding language. After identifying the user's request, give the corresponding response according to the user's request.
  • the traditional intelligent customer service can realize the basic voice recognition function, if the user asks in a dialect, or does not use a template language when making an inquiry, the intelligent customer service will enter an endless loop at this time, constantly asking the user's needs, and then will reduce user satisfaction.
  • the embodiment of the present application provides an intelligent interaction method based on deep learning, including: obtaining the voice feature information of the access user; inputting the voice feature information into the trained deep learning neural network, and determining the response strategy ; Answer the user according to the answer policy.
  • the traditional thin-plate language is abandoned when the intelligent customer service talks with the user, and the user is given priority in explaining the appeal. Then analyze the utterance of the appeal to obtain a response strategy, thereby ensuring a response to the user's appeal, so that there is no need to repeatedly ask the user's needs, thereby improving user satisfaction.
  • the acquiring the voice feature information of the access user includes: matching the language of the user's voice to determine the language information; according to the language information and the corresponding The language base determines the semantic and intonation meaning of phonetic correspondence.
  • determining the semantics and intonation meaning corresponding to the voice according to the language information and the corresponding language library includes: according to the language Each word in the user's sentence is determined by the database and voice voiceprint information; the determined words are combined and then part-of-speech is divided to determine the semantics of the user's voice; the user's intonation meaning is determined by combining the intonation of the voice and the intonation feature information of the current language.
  • the speech feature information is input into a trained deep learning neural network to obtain a response strategy, including: the depth
  • the learning neural network determines the user's emotional characteristics according to the intonation meaning; if the emotional characteristics represent the user's emotional stability, then select the corresponding response words from the response database according to the semantics of the user's voice; or, if the emotional characteristics represent If the user is emotionally anxious, it will be transferred to manual service.
  • the agent if the agent is busy when transferring to a manual agent, a transfer intelligent customer service is temporarily established, and the transfer intelligent customer service imitates In the status of manual customer access, when the manual customer service is idle, it will directly switch to the manual customer service.
  • the embodiment of the present application provides an intelligent interaction system based on deep learning, including: an acquisition module, used to acquire the voice feature information of the access user; a determination module, used to input the voice feature information into the trained In the deep learning neural network, the response strategy is determined; the response module is used to respond to the user according to the response strategy.
  • the acquisition module includes: a first determining unit, configured to match the language of the user's voice, and determine language information; a second determining unit, configured to It is used to determine the semantics and intonation meaning corresponding to the voice according to the language type information and the corresponding language database.
  • the second determination unit includes: a first determination subunit configured to The information determines each word in the user sentence; the second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice; the third determining subunit is used to combine voice intonation and current The intonation feature information of the language determines the intonation meaning of the user.
  • the determining module includes: a third determining unit, configured for the deep learning neural network to determine according to the intonation meaning The user's emotional characteristics; the processing unit is used to select the corresponding response words from the response database according to the semantics of the user's voice if the emotional characteristics represent the user's emotional stability; or, if the emotional characteristics represent the user's emotional anxiety, Then transfer to manual service.
  • an embodiment of the present application provides a terminal, including: a processor; a memory for storing computer-executable instructions; when the processor executes the computer-executable instructions, the processor executes the first
  • a terminal including: a processor; a memory for storing computer-executable instructions; when the processor executes the computer-executable instructions, the processor executes the first
  • FIG. 1 is a schematic flow diagram of a deep learning-based intelligent interaction method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an intelligent interactive system based on deep learning provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a terminal provided by an embodiment of the present application.
  • Fig. 1 is a schematic flowchart of a deep learning-based intelligent interaction method provided by the embodiment of the present application.
  • the deep learning-based intelligent interaction method provided by the embodiment of the present application includes:
  • Intelligent voice interaction in traditional technologies generally realizes communication in fixed language types intelligently, such as mobile operators, convenience service hotlines, etc.
  • the users who visit are required to express their appeals in Mandarin, and the intelligent customer service determines the content of the response based on the analysis of the user's voice.
  • the intelligent customer service cannot answer.
  • the language of the user's voice is first matched to determine the language information.
  • the semantics and intonation meaning of the user's voice are determined in combination with the corresponding language database.
  • the semantics of voice is to understand the meaning of the user, while the meaning of intonation is to determine the tone and mood of the customer when speaking.
  • the intonation meaning of the user is determined by combining the intonation of the voice and the intonation feature information of the current language.
  • the meaning of the tone of the user's voice it is particularly important to determine the meaning of the tone of the user's voice, because the meaning of the tone of voice can determine the current emotional characteristics of the user. For example, taking Mandarin Chinese as an example, if the user is emotionally excited or anxious, he will have the following intonation characteristics when speaking: speaking fast or loudly. However, for some languages, fast speaking speed and loud voice are their unique normal intonation features, which need to be determined from other aspects.
  • the deep learning neural network After determining the meaning of the user's speech semantics and intonation in S101, input them into the trained deep learning neural network, and the deep learning neural network first determines the user's emotional characteristics according to the meaning of the intonation. If the emotional feature indicates that the user is emotionally stable, a corresponding response utterance is selected from the response database according to the semantics of the user's voice. However, if the emotional characteristics represent the user's emotional anxiety, then transfer to manual service. At this time, using intelligent customer service to interact with the user may not be able to solve the user's appeal, and may even cause user dissatisfaction.
  • the corresponding response sentence is retrieved from the corresponding database to respond to the user through the semantics of the user's voice. If it needs to be transferred to manual, the service will be performed manually.
  • a transfer intelligent customer service is temporarily set up.
  • the transfer intelligent customer service imitates the state of manual customer access, and when the artificial customer service is idle, it is directly switched to the artificial customer service.
  • the present application also provides an embodiment of a deep learning-based intelligent interaction system.
  • the deep learning-based intelligent interaction system 20 includes: An acquisition module 201 , a determination module 202 and a response module 203 .
  • the acquiring module 201 is configured to acquire voice feature information of an access user.
  • the determining module 202 is configured to input the speech feature information into the trained deep learning neural network to determine the response strategy.
  • An answering module 203 configured to answer the user according to the answering strategy.
  • the acquiring module 201 includes: a first determining unit and a second determining unit.
  • the first determination unit is configured to match the language of the user's voice and determine the language information;
  • the second determination unit is configured to determine the semantics and intonation meaning of the voice according to the language information and the corresponding language library.
  • the second determination unit includes: a first determination subunit, a second determination subunit and a third determination subunit.
  • the first determining subunit is configured to determine each word in the user sentence according to the language library and voiceprint information.
  • the second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice.
  • the third determining subunit is used to determine the meaning of the user's intonation in combination with the intonation feature information of the current language.
  • the determining module 202 includes: a third determining unit and a processing unit.
  • the third determination unit is used for the deep learning neural network to determine the user's emotional characteristics according to the meaning of the intonation.
  • the processing unit is used to select the corresponding response utterance from the response database according to the semantics of the user's voice if the emotional feature indicates that the user is emotionally stable; or, if the emotional feature indicates that the user is emotionally anxious, transfer to a manual Serve.
  • a terminal 30 includes: a processor 301 , a memory 302 and a communication interface 303 .
  • the processor 301 , the memory 302 and the communication interface 303 can be connected to each other through a bus; the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • only one thick line is used in FIG. 3 , but it does not mean that there is only one bus or one type of bus.
  • the processor 301 usually controls the overall functions of the terminal 30, such as starting the terminal 30, and obtaining the voice feature information of the access user after the terminal 30 starts; input the voice feature information into the trained deep learning neural network, and determine the response policy; answering the user according to the answering policy.
  • the processor 301 may be a general processor, for example, a central processing unit (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.
  • the processor may also be a microprocessor (MCU).
  • Processors may also include hardware chips.
  • the aforementioned hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD) or a combination thereof.
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field programmable logic gate array (FPGA) or the like.
  • the memory 302 is configured to store computer-executable instructions to support the operation of terminal 30 data.
  • the memory 301 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the processor 301 and the memory 302 are powered on, and the processor 301 reads and executes the computer-executable instructions stored in the memory 302 to complete all or part of the above-mentioned embodiments of the intelligent interaction method based on deep learning step.
  • the communication interface 303 is used for the terminal 30 to transmit data, such as realizing communication with network devices and servers.
  • the communication interface 303 includes a wired communication interface, and may also include a wireless communication interface.
  • the wired communication interface includes a USB interface, a Micro USB interface, and may also include an Ethernet interface.
  • the wireless communication interface may be a WLAN interface, a cellular network communication interface or a combination thereof.
  • the terminal 30 provided in the embodiment of the present application further includes a power supply component, which provides power for various components of the terminal 30 .
  • Power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to terminal 30 .
  • a communication component the communication component is configured to facilitate wired or wireless communication between the terminal 30 and other devices.
  • the terminal 30 can access a wireless network based on communication standards, such as WiFi, 4G or 5G, or a combination thereof.
  • the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component also includes a Near Field Communication (NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • terminal 30 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Programmable Gate Array (FPGA) or other electronic component implementation.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Programmable Gate Array

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于深度学习的智能交互方法、***及终端,包括:获取接入用户的语音特征信息(S101);将语音特征信息输入训练好的深度学习神经网络中,确定应答策略(S102);根据应答策略对用户进行应答(S103)。智能客服与用户进行会话时摒弃了传统的薄板式语言,优先用户进行诉求阐述。然后对诉求阐述的话语进行分析,获得回应的策略,进而保证了针对用户诉求进行回复,从而不需要反复的询问用户的需求,从而提高了用户的满意度。

Description

一种基于深度学习的智能交互方法、***及终端 技术领域
本申请涉及人工智能交互技术领域,具体涉及一种基于深度学习的智能交互方法、***及终端。
背景技术
人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家***等。人工智能从诞生以来,理论和技术日益成熟,应用领域也不断扩大,可以设想,未来人工智能带来的科技产品,将会是人类智慧的“容器”。人工智能可以对人的意识、思维的信息过程的模拟。人工智能不是人的智能,但能像人那样思考、也可能超过人的智能。
尤其语音识别和自然语言处理被广泛应用到服务行业的智能终端和在线客服中,比如移动、联通、电信等运营商,也有政府服务热线中。传统技术中的人工智能对话一般是设置固定对话模板,当用户接入后,智能客服会通过引导语引导用户通过模板化的语言提出自己的请求。识别到用户的请求后,根据用户请求给出对应的应答。
虽然传统的智能客服能实现基本的语音识别功能,但是如果用户采用方言询问,或者进行询问时不采用模板式的语言,此时智能客服会进入死循环中,不停的询问用户的需求,进而会降低用户的满意度。
发明内容
本申请为了解决上述技术问题,提出了如下技术方案:
第一方面,本申请实施例提供了一种基于深度学习的智能交互方法,包 括:获取接入用户的语音特征信息;将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;根据所述应答策略对所述用户进行应答。
采用上述实现方式,智能客服与用户进行会话时摒弃了传统的薄板式语言,优先用户进行诉求阐述。然后对诉求阐述的话语进行分析,获得回应的策略,进而保证了针对用户诉求进行回复,从而不需要反复的询问用户的需求,从而提高了用户的满意度。
结合第一方面,在第一方面第一种可能的实现方式中,所述获取接入用户的语音特征信息,包括:对用户语音的语种进行匹配,确定语种信息;根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。
结合第一方面第一种可能的实现方式,在第一方面第二种可能的实现方式中,根据所述语种信息与对应的语言库确定语音对应的语义和语调含义,包括:根据所述语言库和语音声纹信息确定出用户语句中的每个单字;将确定出的单字进行组合后再进行词性划分确定用户语音的语义;结合语音语调和当前语种的语调特征信息确定用户的语调含义。
结合第一方面第二种可能的实现方式,在第一方面第三种可能的实现方式中,将所述语音特征信息输入训练好的深度学习神经网络中,获取应答策略,包括:所述深度学习神经网络根据所述语调含义确定用户的情绪特征;如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。
结合第一方面第三种可能的实现方式,在第一方面第四种可能的实现方式中,如果转接人工坐席时,出现坐席繁忙,则临时建立一个中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接 切换至人工客服。
第二方面,本申请实施例提供了一种基于深度学习的智能交互***,包括:获取模块,用于获取接入用户的语音特征信息;确定模块,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;应答模块,用于根据所述应答策略对所述用户进行应答。
结合第二方面,在第二方面第一种可能的实现方式中,所述获取模块,包括:第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。
结合第二方面第一种可能的实现方式,在第二方面第二种可能的实现方式中,所述第二确定单元包括:第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字;第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义;第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。
结合第二方面第二种可能的实现方式,在第二方面第三种可能的实现方式中,所述确定模块包括:第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征;处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。
第三方面,本申请实施例提供了一种终端,包括:处理器;存储器,用于存储计算机可执行指令;当所述处理器执行所述计算机可执行指令时,所述处理器执行第一方面或第一方面任一可能实现方式所述的方法,实现智能语音交互。
附图说明
图1为本申请实施例提供的一种基于深度学习的智能交互方法的流程示意图;
图2为本申请实施例提供的一种基于深度学习的智能交互***的示意图;
图3为本申请实施例提供的一种终端的示意图。
具体实施方式
下面结合附图与具体实施方式对本方案进行阐述。
图1为本申请实施例提供的一种基于深度学习的智能交互方法的流程示意图,参见图1,本申请实施例提供的基于深度学习的智能交互方法包括:
S101,获取接入用户的语音特征信息。
传统技术中的智能语音交互一般智能实现固定语言种类的交流,比如移动运营商、便民服务热线等。一般要求访问的用户采用普通话说出自己的诉求,智能客服根据对用户语音的分析确定应答内容。但是,如果没有用户发音为非普通话或是非固定语言种类,则智能客服无法进行应答。
基于上述原因,本申请实施例中接收到用户的语音后,首先对用户语音的语种进行匹配,确定语种信息。为了实现上述功能,需要接入多种语言的数据库和各地方言发音数据库。当匹配到对应的语种信息后,则结合对应的语言库确定出用户语音的语义和语调含义。很显然的,语音的语义是对用户的意思进行理解,而语调的含义则是对客户说话时的语气与心情进行确定。
本实施例中为了实现对用户语音语义和语调含义的确定,首先根据所述语言库和语音声纹信息确定出用户语句中的每个单字,将确定出的单字进行组合后再进行词性划分确定用户语音的语义。确定语义时,需要根据对应语言种类的特征对单字进行准确划分,使得语义与用户表达意思贴合。确定出 用户语音的语义后,再结合语音语调和当前语种的语调特征信息确定用户的语调含义。本实施例中,对于用户语音语调的含义确定尤其重要,因为语调含义可以确定出用户当前情绪特征。比如以普通话为例,如果用户情绪比较激动或着急,则说话时会有以下语调特征:语速快或声音大等。但是有的语言种类语速快和声音大则是其特有的正常语调特征,而需要从其他方面来确定。
S102,将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略。
S101中确定出用户语音语义和语调含义后,输入到训练好的深度学习神经网络中,深度学习神经网络首先根据语调含义确定用户的情绪特征。如果情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语。但是如果所述情绪特征表征用户情绪焦虑,则转接至人工服务,此时采用智能客服与用户交互可能无法解决用户的诉求,甚至会造成用户的不满意。
比如接入诉求的用户此时比较着急,例如涉及到投诉的情况,如果像现在的人工智能客服反复的询问“您投诉哪方面的内容”,则会引起用户的不满。如果将这类情况的用户直接通过转接接入到人工客服,则可以通过人工客服进行针对性的人性化服务,从而实现最大程度的解决用户诉求。
S103,根据所述应答策略对所述用户进行应答。
根据S102中确定应答策略,如果是采用智能客服则通过用户语音的语义,从对应数据库中调取相应的应答语句对用户进行应答。如果需要转人工的,则由人工进行服务。
需要指出的是,如果转接人工坐席时,出现坐席繁忙,则临时建立一个 中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接切换至人工客服。
与上述实施例提供的一种基于深度学习的智能交互方法相对应,本申请还提供了一种基于深度学习的智能交互***的实施例,参见图2,基于深度学习的智能交互***20包括:获取模块201、确定模块202和应答模块203。
获取模块201,用于获取接入用户的语音特征信息。确定模块202,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略。应答模块203,用于根据所述应答策略对所述用户进行应答。
本实施例中,所述获取模块201,包括:第一确定单元和第二确定单元。第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。
进一步地,所述第二确定单元包括:第一确定子单元、第二确定子单元和第三确定子单元。第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字。第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义。第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。
所述确定模块202包括:第三确定单元和处理单元。第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征。处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。
本申请还提供了一种终端的实施例,参见图3,终端30包括:处理器301、 存储器302和通信接口303。
在图3中,处理器301、存储器302和通信接口303可以通过总线相互连接;总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器301通常是控制终端30的整体功能,例如终端30的启动、以及终端30启动后获取接入用户的语音特征信息;将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;根据所述应答策略对所述用户进行应答。
处理器301可以是通用处理器,例如,中央处理器(英文:central processing unit,缩写:CPU),网络处理器(英文:network processor,缩写:NP)或者CPU和NP的组合。处理器也可以是微处理器(MCU)。处理器还可以包括硬件芯片。上述硬件芯片可以是专用集成电路(ASIC),可编程逻辑器件(PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(CPLD),现场可编程逻辑门阵列(FPGA)等。
存储器302被配置为存储计算机可执行指令以支持终端30数据的操作。存储器301可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
启动终端30后,处理器301和存储器302上电,处理器301读取并执行存储在存储器302内的计算机可执行指令,以完成上述的基于深度学习的智能交互方法实施例中的全部或部分步骤。
通信接口303用于终端30传输数据,例如实现与网络设备、服务器之间的 通信等。通信接口303包括有线通信接口,还可以包括无线通信接口。其中,有线通信接口包括USB接口、Micro USB接口,还可以包括以太网接口。无线通信接口可以为WLAN接口,蜂窝网络通信接口或其组合等。
在一个示意性实施例中,本申请实施例提供的终端30还包括电源组件,电源组件为终端30的各种组件提供电力。电源组件可以包括电源管理***,一个或多个电源,及其他与为终端30生成、管理和分配电力相关联的组件。
通信组件,通信组件被配置为便于终端30和其他设备之间有线或无线方式的通信。终端30可以接入基于通信标准的无线网络,如WiFi,4G或5G,或它们的组合。通信组件经由广播信道接收来自外部广播管理***的广播信号或广播相关信息。通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在一个示意性实施例中,终端30可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)或其他电子元件实现。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存 在另外的相同要素。

Claims (10)

  1. 一种基于深度学习的智能交互方法,其特征在于,包括:
    获取接入用户的语音特征信息;
    将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;
    根据所述应答策略对所述用户进行应答。
  2. 根据权利要求1所述的基于深度学习的智能交互方法,其特征在于,所述获取接入用户的语音特征信息,包括:
    对用户语音的语种进行匹配,确定语种信息;
    根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。
  3. 根据权利要求2所述的基于深度学习的智能交互方法,其特征在于,根据所述语种信息与对应的语言库确定语音对应的语义和语调含义,包括:
    根据所述语言库和语音声纹信息确定出用户语句中的每个单字;
    将确定出的单字进行组合后再进行词性划分确定用户语音的语义;
    结合语音语调和当前语种的语调特征信息确定用户的语调含义。
  4. 根据权利要求3所述基于深度学习的智能交互方法,其特征在于,将所述语音特征信息输入训练好的深度学习神经网络中,获取应答策略,包括:
    所述深度学习神经网络根据所述语调含义确定用户的情绪特征;
    如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;
    或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。
  5. 根据权利要求4所述的基于深度学习的智能交互方法,其特征在于,如果转接人工坐席时,出现坐席繁忙,则临时建立一个中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接切换至人工客服。
  6. 一种基于深度学习的智能交互***,其特征在于,包括:
    获取模块,用于获取接入用户的语音特征信息;
    确定模块,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;
    应答模块,用于根据所述应答策略对所述用户进行应答。
  7. 根据权利要求6所述的基于深度学习的智能交互***,其特征在于,所述获取模块,包括:
    第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;
    第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。
  8. 根据权利要求7所述的基于深度学习的智能交互***,其特征在于,所述第二确定单元包括:
    第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字;
    第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义;
    第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。
  9. 根据权利要求8所述基于深度学习的智能交互***,其特征在于,所述确定模块包括:
    第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征;
    处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户 语音的语义从应答数据库中选择对应的应答话语;
    或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。
  10. 一种终端,其特征在于,包括:
    处理器;
    存储器,用于存储计算机可执行指令;
    当所述处理器执行所述计算机可执行指令时,所述处理器执行权利要求1-5任一项所述的方法,实现智能语音交互。
PCT/CN2021/136927 2021-12-03 2021-12-10 一种基于深度学习的智能交互方法、***及终端 WO2023097745A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111464680.9 2021-12-03
CN202111464680.9A CN114240454A (zh) 2021-12-03 2021-12-03 一种基于深度学习的智能交互方法、***及终端

Publications (1)

Publication Number Publication Date
WO2023097745A1 true WO2023097745A1 (zh) 2023-06-08

Family

ID=80752869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136927 WO2023097745A1 (zh) 2021-12-03 2021-12-10 一种基于深度学习的智能交互方法、***及终端

Country Status (2)

Country Link
CN (1) CN114240454A (zh)
WO (1) WO2023097745A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117411970A (zh) * 2023-10-17 2024-01-16 广州易风健康科技股份有限公司 一种基于声音处理的人机耦合客服控制方法及***

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202301A (zh) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 一种基于深度学习的智能应答***
CN108090218A (zh) * 2017-12-29 2018-05-29 北京百度网讯科技有限公司 基于深度强化学习的对话***生成方法和装置
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN110149450A (zh) * 2019-05-22 2019-08-20 欧冶云商股份有限公司 智能客服应答方法及***
CN110427472A (zh) * 2019-08-02 2019-11-08 深圳追一科技有限公司 智能客服匹配的方法、装置、终端设备及存储介质
CN111739516A (zh) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 一种针对智能客服通话的语音识别***
CN112148849A (zh) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 动态交互方法、服务器、电子设备及存储介质
CN112148850A (zh) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 动态交互方法、服务器、电子设备及存储介质
JP2021022928A (ja) * 2019-07-24 2021-02-18 ネイバー コーポレーションNAVER Corporation 人工知能基盤の自動応答方法およびシステム

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202301A (zh) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 一种基于深度学习的智能应答***
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN108090218A (zh) * 2017-12-29 2018-05-29 北京百度网讯科技有限公司 基于深度强化学习的对话***生成方法和装置
CN110149450A (zh) * 2019-05-22 2019-08-20 欧冶云商股份有限公司 智能客服应答方法及***
JP2021022928A (ja) * 2019-07-24 2021-02-18 ネイバー コーポレーションNAVER Corporation 人工知能基盤の自動応答方法およびシステム
CN110427472A (zh) * 2019-08-02 2019-11-08 深圳追一科技有限公司 智能客服匹配的方法、装置、终端设备及存储介质
CN111739516A (zh) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 一种针对智能客服通话的语音识别***
CN112148849A (zh) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 动态交互方法、服务器、电子设备及存储介质
CN112148850A (zh) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 动态交互方法、服务器、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117411970A (zh) * 2023-10-17 2024-01-16 广州易风健康科技股份有限公司 一种基于声音处理的人机耦合客服控制方法及***
CN117411970B (zh) * 2023-10-17 2024-06-07 广州易风健康科技股份有限公司 一种基于声音处理的人机耦合客服控制方法及***

Also Published As

Publication number Publication date
CN114240454A (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
US11335339B2 (en) Voice interaction method and apparatus, terminal, server and readable storage medium
US20230206940A1 (en) Method of and system for real time feedback in an incremental speech input interface
KR102178738B1 (ko) 적절한 에이전트의 자동화된 어시스턴트 호출
KR102458805B1 (ko) 장치에 대한 다중 사용자 인증
US20220335930A1 (en) Utilizing pre-event and post-event input streams to engage an automated assistant
TWI276046B (en) Distributed language processing system and method of transmitting medium information therefore
US20030167167A1 (en) Intelligent personal assistants
US20030163311A1 (en) Intelligent social agents
US11758047B2 (en) Systems and methods for smart dialogue communication
CN111090728A (zh) 一种对话状态跟踪方法、装置及计算设备
US20180308481A1 (en) Automated assistant data flow
JP2021022928A (ja) 人工知能基盤の自動応答方法およびシステム
JP2019133127A (ja) 音声認識方法、装置及びサーバ
WO2023097745A1 (zh) 一种基于深度学习的智能交互方法、***及终端
WO2022267405A1 (zh) 语音交互方法、***、电子设备及存储介质
CN114860910A (zh) 智能对话方法及***
CN111556096B (zh) 信息推送方法、装置、介质及电子设备
CN113885825A (zh) 一种智能创建申请单的方法及装置
CN109147786A (zh) 一种信息处理方法及电子设备
CN111596833A (zh) 一种技能话术缠绕处理方法和装置
CN114168706A (zh) 智能对话能力测试方法、介质和测试设备
CN115691492A (zh) 一种车载语音控制***及方法
CN117672192A (zh) 基于语音的意图识别方法和装置、设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966170

Country of ref document: EP

Kind code of ref document: A1