CN110659361B - Conversation method, device, equipment and medium - Google Patents

Conversation method, device, equipment and medium Download PDF

Info

Publication number
CN110659361B
CN110659361B CN201910961811.0A CN201910961811A CN110659361B CN 110659361 B CN110659361 B CN 110659361B CN 201910961811 A CN201910961811 A CN 201910961811A CN 110659361 B CN110659361 B CN 110659361B
Authority
CN
China
Prior art keywords
service provider
target
natural language
language understanding
understanding result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910961811.0A
Other languages
Chinese (zh)
Other versions
CN110659361A (en
Inventor
黄涛
姜伟
杨令铎
李来林
伍绪青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Luka Beijing Intelligent Technology Co ltd
Original Assignee
Luka Beijing Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luka Beijing Intelligent Technology Co ltd filed Critical Luka Beijing Intelligent Technology Co ltd
Priority to CN201910961811.0A priority Critical patent/CN110659361B/en
Publication of CN110659361A publication Critical patent/CN110659361A/en
Application granted granted Critical
Publication of CN110659361B publication Critical patent/CN110659361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the specification discloses a conversation method, a device, equipment and a medium, wherein the conversation method comprises the following steps: receiving target conversation information, and sending the target conversation information to each associated service provider; receiving natural language understanding results of each service provider on the target dialogue information; and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.

Description

Conversation method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a dialog method, apparatus, device, and medium.
Background
With the arrival of the artificial intelligence era, various intelligent devices are moving to thousands of households, and the intelligent devices with intelligent dialogue systems, such as various intelligent sound boxes, intelligent robots, mobile phone voice assistants, and the like, are becoming more and more popular. However, each intelligent dialogue device has its own standard and limit, and generally only supports the content of the company or platform to which it belongs, but cannot use the content of other companies or other platforms; for example, the Tengchun Dingdang intelligent screen in the intelligent sound box product can use the content of Tengchun music, but can not use Himalayan audio content; the tianmao elfin can use the shrimp music but cannot use the content of the Tengchong music, and the like, thereby reducing the use experience of the user.
In view of the above, there is a need for a more efficient and effective human-machine dialog scheme.
Disclosure of Invention
Embodiments of the present specification provide a conversation method, apparatus, device, and medium, so as to solve a technical problem of how to perform a human-computer conversation more effectively and efficiently.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
an embodiment of the present specification provides a dialog method, including:
receiving target session information, and sending the target session information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
The embodiment of the present specification provides a conversation device, including;
the information receiving and transmitting module is used for receiving the target conversation information and transmitting the target conversation information to each associated service provider;
a result receiving module, configured to receive a natural language understanding result of each service provider for the target dialog information;
and the feedback module is used for determining a target service provider according to the natural language understanding result so as to enable the terminal to output feedback information corresponding to the natural language understanding result corresponding to the target service provider.
An embodiment of the present specification provides a conversation device, including:
at least one processor;
and the number of the first and second groups,
a memory communicatively coupled to the at least one processor;
wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
receiving target session information, and sending the target session information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
Embodiments of the present specification provide a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of:
receiving target session information, and sending the target session information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:
the method can integrate resources of each service provider, expand conversation resource sources and improve conversation accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the embodiments of the present specification or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating a dialog method according to a first embodiment of the present disclosure.
Fig. 2 is a schematic diagram of a dialog process provided in the first embodiment of the present specification.
Fig. 3 is a schematic diagram of an application of the dialogue method in the first embodiment of the present specification.
Fig. 4 is a schematic structural diagram of a dialogue device provided in a third embodiment of the present specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification. It should be apparent that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.
As shown in fig. 1, a first embodiment of the present specification provides a dialog method. The execution subject of this embodiment may be a computer or a server or a corresponding intelligent system, that is, the execution subject is various and may be set or changed according to the actual situation. In addition, a third-party application program can assist the execution main body to execute the embodiment; for example, as shown in fig. 3, the server may execute the conversation method in the present embodiment, and may further install a corresponding application on the terminal (held by the user), where the server corresponds to the application, data transmission may be performed between the server and the terminal held by the user, and the application may be used to display a page and display information or input/output information to the user.
Fig. 2 is a schematic diagram of a dialog process provided in a first embodiment of this specification, and with reference to fig. 1 and fig. 2, the dialog method in this embodiment includes:
s101: and receiving target conversation information, and sending the target conversation information to each associated service provider.
In this embodiment, the target dialog information may be text information or other types or forms of information, and the source of the target dialog information may also be various, which is not limited in this embodiment. For example, a user may perform text input, voice input, or touch input through the terminal, and if the user performs voice input through the terminal, the terminal may convert voice into text through a voice Recognition technology (e.g., automatic Speech Recognition, ASR), and the terminal may also convert touch input into text. The terminal includes, but is not limited to, various intelligent terminal devices, such as a mobile phone, a smart speaker, and the like. Taking the smart speaker as an example, the speaker can determine whether the user is speaking, and confirm whether the user is sending voice or instructions to the speaker by waking up words, etc., and can receive the voice of the user. If the terminal is awakened and voice information of the user cannot be detected, prompting can be performed on the user, for example, the user is prompted to input voice again. How many times the prompt is set according to the requirement. Because the voices uttered by the users are different, for example, the words uttered by children are unclear, and the tone is different from that of adults, the accuracy of voice recognition can be influenced, and therefore, targeted voice training corpora can be added to enhance the accuracy of the voice recognition of the terminal.
The terminal may send the text information input by the user or the text information converted or processed according to the user input to the execution main body (hereinafter referred to as "execution main body") in this embodiment, where the information received by the execution main body is the target dialog information. In this embodiment, the terminal may send the information to the execution main body and receive the information by the execution main body in a web service call manner. In addition, the terminal may also transmit one or more of a device type, a device ID (Identity), a system version, and token authentication information to the execution principal.
It can be seen that the "conversation method" described in the present embodiment may be executed by a computer or a server or a corresponding intelligent system, but the "direct or face-to-face interlocutor" with the user may be a terminal, that is, a terminal that receives user input and feeds back or outputs conversation results to the user.
In this embodiment, after receiving the target session information, the execution main body may send the target session information to each service provider. A service provider here is a service provider associated with an execution body for providing relevant services, such as weather, audio, video services, etc., and audio services may also include songs, stories, stations, etc. The services provided by different service providers may be different or the same or similar (e.g., there may be multiple service providers that all provide weather services), and each service provider may provide a single or multiple services. The service provider may be a computer or a server or a service or an application system, and the execution subject may perform data transmission or data sharing with each service provider. Specific service providers to be accessed can be selected according to needs, for example, the existing weather service provider, music service provider, video service provider and the like can be accessed; in particular, one or more service providers may be self-configured to provide desired related services, such as weather services, audio, video services, and the like.
In this embodiment, after receiving the target session information, it may be further determined whether the received target session information meets a preset condition (e.g., whether the target session information is empty); if the target session information meets the preset conditions, transmitting the target session information to each service provider; and/or, if the target session information does not meet the preset condition (for example, the target session information is empty), not sending the target session information to each service provider, and further enabling the terminal to send a prompt, which belongs to the fault tolerance mechanism of the embodiment. The preset condition may be determined according to actual needs, and this embodiment does not limit this.
S103: and receiving a natural language understanding result of each service provider on the target dialogue information.
In the present embodiment, each service provider has a Natural Language Understanding (NLU) function, and thus can identify the semantics of the target session information and determine a natural language understanding result (hereinafter, simply referred to as "understanding result") for the target session information. NLUs include domain, intent, and slot. The NLU is responsible for domain/intent classification and slot filling of the target dialog information. The classification of the domain and the intention (intent) aims to understand the domain and the intention of the dialogue of the user and facilitate the subsequent calling of a corresponding model for identification; in addition, slot position, i.e., slot fill, needs to be determined. For example, "inquire weather of Beijing tomorrow", domain is weather, interaction is search _ weather, and slot hasTwo values, date = tomorrow, city = beijing; for example, the user can listen to a song according to the field/intention classification, and the intention of the user is judged to listen to the song; it is also necessary to identify the name of the song, etc., and the song name can be used as a slot. The identification of Domain and interaction can utilize a classification technology in deep learning, and the slot extraction can adopt an LSTM + crf technology. For example, a certain service provider understands asvendor:NLU A; The intent is music; music url http:// www.a.com/XXX.mp3; respone please listen to the musicThis means that in the NLU returned by the service provider a, intent is music, that is, the skill returned by the service provider is music and is the skill of song class, and the song resource played is located inhttp://www.a.com/XXX.mp3
In this embodiment, the executive may employ a configuration file to access the NLUs of each service provider. An example of a profile is as follows:
(1) Version description information:
one NLU version can configure multiple skills (i.e., functions or services). For example, the 3.0NLU version supports 1.0,1.1 or more skill configuration versions, and the default value starts a (default: true/false) control:
Figure BDA0002229178010000061
(2) Skill configuration information:
support _ device-specifies the supported hardware device.
device _ skills _ selector — specifies the service provider.
render _ format-output results format: the yaml, json format is supported.
skills std mapping-puts together similar skills of multiple service providers.
In the following example there are skills from three parties, respectively service provider a (noted as vendorA), service provider B (noted as vendorB) and service provider C (noted as vendorC). Different service providers may provide the same or similar skills, such as vendorA and vendorB both have idiom pickup skills, then:
Figure BDA0002229178010000071
Figure BDA0002229178010000081
Figure BDA0002229178010000091
Figure BDA0002229178010000101
Figure BDA0002229178010000111
Figure BDA0002229178010000121
the version configuration file can control a plurality of versions to provide services at the same time, and a calling party (namely an execution main body) selects functions; the skill configuration file can configure a certain version of skill configuration information, including service provider priorities, skill switches, and the like. The skills of each service provider can be immediately validated through the configuration file, which is beneficial to reducing the code amount and the complexity of logic. It should be noted that the skill priority of each service provider may be configured as required, including the priority of each service provider and the priority of each skill, for example, the priority of the skill of the service provider a may be configured to be higher than that of the skill of the service provider B, and the priority of the audio-class skill may also be configured to be higher than that of the chat-class skill.
In this embodiment, NLU functions of each service provider may be invoked in parallel, that is, understanding results of each service provider are received, where the understanding results include, but are not limited to, domain, intent, slot, and a reply of the service provider. Specifically, the configuration file may be read, the web service of each service provider may be called in a get or post manner, and the understanding result of the service provider may be received. Java and python, and other common languages provide related package implementation function calls. Specifically, a service provider may provide web services and provide related interface information, where the interface information may include: calling an interface address, interface parameters, technical documents and keys. According to the information, the execution body can initiate service call and acquire the understanding result of each service provider.
For any service provider, if the reception of the understanding result of the service provider is unsuccessful and a termination condition is met (for example, the understanding result of the service provider is not called successfully after being called for a predetermined number of times and/or the time for calling the understanding result of the service provider is up to a predetermined time and then is not called successfully), stopping receiving the understanding result of the service provider;
and/or the presence of a gas in the atmosphere,
for any service provider, if the receiving of the understanding result of the service provider is unsuccessful and does not meet the termination condition (for example, the calling of the understanding result of the service provider does not reach the preset number of times and/or does not reach the preset time consumption), the receiving of the understanding result of the service provider is repeatedly requested. The two requests can be separated by a preset time length between the receiving of the understanding result
After stopping receiving the understanding result of the service provider, the understanding result returned by the service provider may be discarded. In addition, the above-mentioned "predetermined time length" for different service providers may be the same or different.
S105: and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
After receiving the understanding result of the service provider, the target service provider may be determined according to the understanding result. In this embodiment, determining the target service provider according to the understanding result includes:
determining whether the understanding result accords with a direct selection rule;
if the understanding result accords with the direct selection rule, the service provider corresponding to the understanding result which accords with the direct selection rule is used as a target service provider;
and if the no-understanding result accords with the direct selection rule, performing skill election on the understanding result, determining candidate service providers according to the skill election result, and determining a target service provider according to the priority of the candidate service providers.
The following is a detailed description of determining a target service provider according to the understanding result:
determining whether the understanding result accords with a direct selection rule; in this embodiment, the direct selection rule may be referred to as a special route, and the target service provider may be selected through the rule. The special route may be configured in the configuration file special _ di _ route node.
And if the irrational result accords with the direct selection rule (namely the irrational result accords with the special route), performing skill election on the understanding result, determining candidate service providers according to the skill election result, and determining target service providers according to the priorities of the candidate service providers. Specifically, the number of service providers of domain and interaction is counted (domain and interaction are not recorded as D/I). Since the D/I's correspond to skill, the D/I's corresponding to the same or similar skill are counted together, such that one or more D/I groups, each group corresponding to a skill, appear. According to a few majority-obeying election principles, taking a group containing the most amount of D/I as a target group; one or more of such target groups may be possible.
If only one target group exists, the priority of the service provider corresponding to each D/I in the target group is compared, and the service provider with the highest priority is the target vendor; if all returned D/is do not contain special routes and there are multiple target groups, the universal vendor priorities of all service providers in the target groups can be compared, that is, the universal priorities of all service providers in all target groups are compared, and the highest universal vendor priority is the target vendor. The generic vendor priority may be set at the "general _ vendor _ priority" node of the configuration file.
And if the understood result accords with the direct selection rule (namely the understood result accords with the special route), taking the service provider corresponding to the understood result which accords with the special route as the target service provider. The target service provider selected under the special route may be different from the target service provider selected without the special route for the same target session information and service provider, etc. This may force the selection of the target service provider using a special route, as the selected target service provider without the special route may not be the intended service provider. As can be seen, the priority of the service provider corresponding to the understanding result conforming to the direct selection rule is higher than the priority of the other understanding results.
In the present embodiment, after receiving the understanding result of the service provider, the understanding result may be normalized. Specifically, the received D/I of each service provider may be mapped to a value in the configuration file (for example, mapped to std _ domain and std _ interaction values in the configuration file skip _ std _ mapping node), that is, D/I standardization is completed. Since the specific values of domian and enttion returned by different service providers may be different for the same or similar skills, such as skills in song category, some service providers return music, and some service providers return song, after the standardization operation, the skills music and song can be standardized as std _ audio, and it is easier to classify or group the skills. In this case, standardized D/I, i.e., "std _ domain" and "std _ interaction", may be used in the "determination of the target service provider based on the understanding result".
After the target service provider is determined, the received understanding result of the target service provider may be used as a target understanding result, and then an instruction may be sent to the terminal, so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
In this embodiment, after the target service provider is determined, whether a skill corresponding to a natural language understanding result corresponding to the target service provider is started or not may be determined;
if the terminal is started, enabling the terminal to output feedback information corresponding to the natural language understanding result corresponding to the target service provider;
if the terminal is not started, an instruction can be sent to the terminal to prompt the terminal, or the terminal can be enabled to perform chatting.
The skill on state of the service provider can be flexibly configured, different conversation scenes or systems can be configured with different skills according to needs, for example, audio skill on and chat skill off in some scenes; in some scenes, the audio skills are closed, and the chatting skills are opened, so that differentiation is realized.
In this embodiment, the terminal may perform Natural Language Generation (NLG) on the target result, so as To convert the target result into a Text and output the Text To the user, or may convert the target result into a Speech by using TTS (Text To Speech) and output the Speech To the user.
In this embodiment, the dialog may be performed in a round, that is, the input to the terminal by the user to the output to the user may be regarded as a round of dialog. The skills in the dialog include a single round of skill and multiple rounds of skill, for example, idiom pick-up is a multiple round of skill and playing a song is a single round of skill. In this embodiment, whether a current conversation (or a current round) starts a multi-round conversation process may be determined according to the target result;
if the multi-turn conversation process is not started, executing a single-turn conversation process, namely executing the process;
if a multi-turn conversation process is started, the current turn (not called as the "starting turn") and the subsequent turns are processed according to the multi-turn conversation until a certain turn finishes the multi-turn conversation process. Specifically, after receiving the target session information, if it is determined that the multi-turn session start condition is met according to the target result of the target service provider (which is not marked as service provider X) in a turn, the multi-turn session flow is started, and the turn and the subsequent turns both use "the target service provider determined by the start turn" as the target service provider, that is, the turn and the subsequent turns both use service provider X as the target service provider until the multi-turn session flow is ended in a turn. Since multiple rounds of conversations are started according to the understanding result of the service provider X, and the multiple rounds of conversations are continuous, the accuracy of the understanding result of the service provider X adopted subsequently is higher.
Whether to start or end multiple rounds of dialog flow may set conditions, such as setting keywords, starting multiple rounds of dialog flow when a certain keyword or keywords appear in the target result, and ending multiple rounds of dialog flow when a certain keyword or keywords appear in the target result.
For example, the user utters "start idiom" in a certain turn, and after going through the foregoing S101, S103, and S105, it is determined that the service provider Y is the target service provider, and the result of understanding by the service provider Y is the target result. And if the target result contains a keyword' idiom connecting ", wherein the keyword accords with the starting condition of the multi-turn conversation, starting the multi-turn conversation process, and taking the service provider Y as the target service provider in the turn and the subsequent turns. If the user in a subsequent round (which may be the next round of the opening round) says "finish phrase connect" (the service provider Y is still used as the target service provider in the round), the understanding result returned by the service provider Y includes the keyword "finish", and the keyword meets the multiple rounds of conversation finish conditions, the multiple rounds of conversation processes are finished, and the subsequent round is still processed according to the foregoing S101, S103, and S105.
In this embodiment, after (or each round of) the conversation is finished at every time, the terminal is enabled to automatically open the monitoring state, and the situation that the user needs to wake up the terminal again to carry out the next round of conversation is avoided, that is, the multiple rounds of non-wake-up conversation are realized, and the wake-up is carried out once, so that the conversation can be carried out for multiple times, the real human interaction is more closely, and the user experience is improved.
In the embodiment, by accessing each service provider, resources (including self-constructed service providers) of each service provider can be wholly provided, conversation resource sources can be expanded, and conversation accuracy and efficiency are improved; the target service provider and the target result are determined through the priority and the direct selection rule, so that the conversation resources are expanded, the optimal conversation feedback information can be output, and the conversation accuracy and efficiency can be further improved; by integrating the NLU functions of the service providers and combining the deep learning technology, the conversation accuracy and efficiency can be further improved; the skill of each service provider is counted in a configuration file form, so that the efficiency is high, and the code amount and the logic complexity can be reduced; the multi-round awakening-free conversation is supported, and the conversation efficiency is improved; and a fault-tolerant mechanism is provided, so that the conversation efficiency is further improved.
The second embodiment of the present specification is an application example of the first embodiment,
in this embodiment, a user speaks to the terminal, for example, the user says "i want to hear little red cap", the terminal receives the voice information of the user, and the ASR is used to convert the voice information into text information "i want to hear little red cap";
the terminal sends the text information to a server in a web service calling mode;
the server receives text information, wherein the received text information is target dialogue information; the terminal can also send information such as equipment type, equipment ID, system version, token authentication information and the like to the server, and the server carries out security authentication on the received web service calling request;
the server judges whether the target dialogue information is empty; if yes, carrying out fault tolerance processing, not sending the target session information to the service provider, and sending an instruction to the terminal to prompt the terminal, for example, to say once again; if not, the target dialogue information is sent to each service provider;
and (4) calling NLU services of all the service providers, and receiving the understanding results returned by the service providers. Assuming that there are four service providers, vendorA, vendorB, vendorC, vendorD, respectively, the returned understanding results are as follows:
(1)vendor:NLU A;intent:play_music;music_url:
http:// www.a.com/little _ red _ hat.mp3; response, starting playing;
(2)vendor:NLU B;intent:song;song_url:
http:// www.b.com/little _ red _ hat.mp3, respone please listen to this classical story;
(3) vendor: NLU C; chat is the intent; response: is a story of a small red cap and a big gray wolf;
(4)vendor:me;intent:audio;url:
http:// www.d.com/little _ red _ hat.mp3, respone: baby please listen to the small red cap;
standardizing the understanding result, namely standardizing audio, play _ music and song as std _ audio; chat is normalized to std _ chat;
if the understanding results do not accord with the direct selection rule, the election is carried out: in the four understanding results, the target group is three std _ audio, one std _ chat and the std _ audio, so that the skill of the requested audio is determined;
in three service providers corresponding to the understanding results of the std _ audio skills, if the priority is vendorA < vendorB < vendorD, vendorD is selected as the target service provider;
if the std _ audio skill of the vendorD is started, the server sends an instruction to the terminal, and triggers action to enable the terminal to play response, namely playing 'baby please hear little red cap'; the terminal then plays the audio http:// www.d.com/lite _ red _ hat.mp3, and both the audio response and the audio http:// www.d.com/lite _ red _ hat.mp3 can be regarded as feedback information.
Assuming that two of the four understanding results are std _ audio and two are std _ chat, the generic vendor priority can be compared to decide the target service provider.
As shown in fig. 4, a third embodiment of the present specification provides a dialogue apparatus including:
an information transceiver module 202, configured to receive target session information and send the target session information to each associated service provider;
a result receiving module 204, configured to receive a natural language understanding result of each service provider for the target dialog information;
and a feedback module 206, configured to determine a target service provider according to the natural language understanding result, so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
Optionally, the target dialog information is obtained by performing voice recognition on voice input by a user through the terminal.
Optionally, the information transceiver module 202 receives the target dialog information through web call.
Optionally, the information transceiver module 202 is further configured to:
after receiving the target dialogue information, judging whether the target dialogue information is empty;
and if not, sending the target session information to each associated service provider.
Optionally, the result receiving module 204 is further configured to:
for any service provider, if the natural language understanding result of the service provider is unsuccessfully received and meets the termination condition, stopping receiving the natural language understanding result of the service provider;
and/or the presence of a gas in the gas,
and for any service provider, if the natural language understanding result of the service provider is unsuccessfully received and does not meet the termination condition, repeatedly requesting to receive the natural language understanding result of the service provider.
Optionally, determining the target service provider according to the natural language understanding result includes:
determining whether the natural language understanding result accords with a direct selection rule;
if the natural language understanding result accords with the direct selection rule, the service provider corresponding to the natural language understanding result which accords with the direct selection rule is used as a target service provider;
and if no natural language understanding result accords with the direct selection rule, performing skill election on the natural language understanding result, determining candidate service providers according to the skill election result, and determining a target service provider according to the priority of the candidate service providers.
Optionally, the feedback module 206 is further configured to:
before determining the target service provider according to the natural language understanding result, standardizing the received natural language understanding result, and determining the target service provider according to the standardized natural language understanding result.
Optionally, the feedback module 206 is further configured to:
determining whether a skill corresponding to a natural language understanding result corresponding to a target service provider is started;
and if so, enabling the terminal to output feedback information corresponding to the natural language understanding result corresponding to the target service provider.
Optionally, the feedback module 206 is further configured to:
and if the skill is not started, prompting the terminal.
Optionally, the feedback module 206 is further configured to:
determining whether to start or end multiple rounds of conversations according to a natural language understanding result corresponding to the target service provider;
the feedback module 206 uses the same service provider as the target service provider from the start of a multi-turn session to the end of the multi-turn session.
A fourth embodiment of the present specification provides a dialogue apparatus including:
at least one processor;
and (c) a second step of,
a memory communicatively coupled to the at least one processor;
wherein,
the memory stores instructions executable by the at least one processor to cause the at least one processor to:
receiving target session information, and sending the target session information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
A fifth embodiment of the present specification provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, perform the steps of:
receiving target conversation information, and sending the target conversation information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
and determining a target service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider.
The above embodiments may be used in combination.
While certain embodiments of the present description have been described above, other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown, or in sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and non-volatile computer-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the description of the method embodiments.
The apparatus, the device, the nonvolatile computer readable storage medium, and the method provided in the embodiments of the present specification correspond to each other, and therefore, the apparatus, the device, and the nonvolatile computer storage medium also have similar advantageous technical effects to the corresponding method.
In the 90's of the 20 th century, improvements to a technology could clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements to process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as ABEL (Advanced Boolean Expression Language), AHDL (Advanced Hardware description ip address) and conversion Language, CUPL (core unity Programming Language), hdcall, JHDL (Hardware description ip address Language), lava, lola, HDL, PALASM, palsy (Hardware runtime software Language), and Hardware Language (Hardware Language-Hardware Language). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchIP address PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be conceived to be both a software module implementing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
As will be appreciated by one skilled in the art, the present specification embodiments may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises that element.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (13)

1. A conversation method comprising:
receiving target conversation information, and sending the target conversation information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
determining a target service provider from the service providers according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider to integrate resources of the service providers and expand conversation resource sources;
determining a target service provider from the service providers according to the natural language understanding result, comprising:
and if the natural language understanding result accords with the direct selection rule, taking the service provider corresponding to the natural language understanding result according with the direct selection rule as a target service provider.
2. The method of claim 1, wherein the target dialog information is obtained by a terminal performing speech recognition on a speech input by a user.
3. The method of claim 1 or 2, receiving the target dialog information via a web call.
4. The method of claim 1 or 2, further comprising:
after receiving the target dialogue information, judging whether the target dialogue information is empty;
and if not, sending the target session information to each associated service provider.
5. The method of claim 1 or 2, further comprising:
for any service provider, if the natural language understanding result of the service provider is unsuccessfully received and meets the termination condition, stopping receiving the natural language understanding result of the service provider;
and/or the presence of a gas in the gas,
and for any service provider, if the natural language understanding result of the service provider is unsuccessfully received and does not meet the termination condition, repeatedly requesting to receive the natural language understanding result of the service provider.
6. The method of claim 1, determining a target service provider from the natural language understanding result comprises:
determining whether the natural language understanding result accords with a direct selection rule;
and if no natural language understanding result accords with the direct selection rule, performing skill election on the natural language understanding result, determining candidate service providers according to the skill election result, and determining a target service provider according to the priority of the candidate service providers.
7. The method of claim 6, wherein before determining the target service provider according to the natural language understanding result, the received natural language understanding result is normalized, and the target service provider is determined according to the normalized natural language understanding result.
8. The method of claim 1, further comprising:
determining whether a skill corresponding to a natural language understanding result corresponding to a target service provider is started;
and if so, enabling the terminal to output feedback information corresponding to the natural language understanding result corresponding to the target service provider.
9. The method of claim 8, further comprising:
and if the skill is not started, prompting the terminal.
10. The method of claim 1, further comprising:
determining whether to start or end multiple rounds of conversations according to a natural language understanding result corresponding to the target service provider;
the same service provider is used as the target service provider from the start of a multi-turn session to the end of the multi-turn session.
11. A conversation apparatus comprising:
the information receiving and transmitting module is used for receiving the target conversation information and transmitting the target conversation information to each associated service provider;
a result receiving module, configured to receive a natural language understanding result of each service provider for the target dialog information;
the feedback module is used for determining a target service provider from the service providers according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider to integrate resources of the service providers and expand conversation resource sources;
determining a target service provider from the service providers according to the natural language understanding result, comprising:
and if the natural language understanding result accords with the direct selection rule, the service provider corresponding to the natural language understanding result according with the direct selection rule is used as a target service provider.
12. A dialogue device, comprising:
at least one processor;
and the number of the first and second groups,
a memory communicatively coupled to the at least one processor;
wherein,
the memory stores instructions executable by the at least one processor to cause the at least one processor to:
receiving target conversation information, and sending the target conversation information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
determining a target service provider from each service provider according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider to integrate each service provider resource and expand a conversation resource source;
determining a target service provider from the service providers according to the natural language understanding result, comprising:
and if the natural language understanding result accords with the direct selection rule, taking the service provider corresponding to the natural language understanding result according with the direct selection rule as a target service provider.
13. A computer-readable storage medium storing computer-executable instructions that, when executed by a processor, perform the steps of:
receiving target conversation information, and sending the target conversation information to each associated service provider;
receiving natural language understanding results of each service provider on the target dialogue information;
determining a target service provider from the service providers according to the natural language understanding result so that the terminal outputs feedback information corresponding to the natural language understanding result corresponding to the target service provider to integrate resources of the service providers and expand conversation resource sources; determining a target service provider from the service providers according to the natural language understanding result, comprising:
and if the natural language understanding result accords with the direct selection rule, taking the service provider corresponding to the natural language understanding result according with the direct selection rule as a target service provider.
CN201910961811.0A 2019-10-11 2019-10-11 Conversation method, device, equipment and medium Active CN110659361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910961811.0A CN110659361B (en) 2019-10-11 2019-10-11 Conversation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910961811.0A CN110659361B (en) 2019-10-11 2019-10-11 Conversation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN110659361A CN110659361A (en) 2020-01-07
CN110659361B true CN110659361B (en) 2023-01-17

Family

ID=69040412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910961811.0A Active CN110659361B (en) 2019-10-11 2019-10-11 Conversation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN110659361B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113552938A (en) * 2020-04-26 2021-10-26 京东方科技集团股份有限公司 Action association method based on Internet of things, electronic equipment and storage medium
CN112199498A (en) * 2020-09-27 2021-01-08 中国建设银行股份有限公司 Man-machine conversation method, device, medium and electronic equipment for endowment service
CN115086283B (en) * 2022-05-18 2024-02-06 阿里巴巴(中国)有限公司 Voice stream processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777135A (en) * 2016-05-27 2017-05-31 中科鼎富(北京)科技发展有限公司 Service scheduling method, device and robot service system
CN107919123A (en) * 2017-12-07 2018-04-17 北京小米移动软件有限公司 More voice assistant control method, device and computer-readable recording medium
CN107977238A (en) * 2016-10-19 2018-05-01 百度在线网络技术(北京)有限公司 Using startup method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9171541B2 (en) * 2009-11-10 2015-10-27 Voicebox Technologies Corporation System and method for hybrid processing in a natural language voice services environment
CN108564946B (en) * 2018-03-16 2019-09-20 苏州思必驰信息科技有限公司 Technical ability, the method and system of voice dialogue product are created in voice dialogue platform
CN108829757B (en) * 2018-05-28 2022-01-28 广州麦优网络科技有限公司 Intelligent service method, server and storage medium for chat robot
CN109036396A (en) * 2018-06-29 2018-12-18 百度在线网络技术(北京)有限公司 A kind of exchange method and system of third-party application
CN109408800B (en) * 2018-08-23 2024-03-01 阿里巴巴(中国)有限公司 Dialogue robot system and related skill configuration method
CN110297955B (en) * 2019-06-20 2023-03-24 创新先进技术有限公司 Information query method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777135A (en) * 2016-05-27 2017-05-31 中科鼎富(北京)科技发展有限公司 Service scheduling method, device and robot service system
CN107977238A (en) * 2016-10-19 2018-05-01 百度在线网络技术(北京)有限公司 Using startup method and device
CN107919123A (en) * 2017-12-07 2018-04-17 北京小米移动软件有限公司 More voice assistant control method, device and computer-readable recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
a survey on dialogue systems:recent advances and new frontiers;chen H等;《ACM special interest group on knowledge discovery and data mining explorations》;20171231;第19卷(第2期);25-35 *
基于深度学习的开放领域对话***研究综述;陈晨等;《计算机学报》;20190328;第42卷(第7期);1439-1466 *

Also Published As

Publication number Publication date
CN110659361A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
US11430442B2 (en) Contextual hotwords
US11437041B1 (en) Speech interface device with caching component
JP7044415B2 (en) Methods and systems for controlling home assistant appliances
US10540970B2 (en) Architectures and topologies for vehicle-based, voice-controlled devices
US10089984B2 (en) System and method for an integrated, multi-modal, multi-device natural language voice services environment
US20170229122A1 (en) Hybridized client-server speech recognition
CN110659361B (en) Conversation method, device, equipment and medium
US11373645B1 (en) Updating personalized data on a speech interface device
JP2020525903A (en) Managing Privilege by Speaking for Voice Assistant System
US20090299745A1 (en) System and method for an integrated, multi-modal, multi-device natural language voice services environment
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN112735407B (en) Dialogue processing method and device
CN113362828A (en) Method and apparatus for recognizing speech
US10540973B2 (en) Electronic device for performing operation corresponding to voice input
US10629199B1 (en) Architectures and topologies for vehicle-based, voice-controlled devices
KR20220143683A (en) Electronic Personal Assistant Coordination
CN111833857A (en) Voice processing method and device and distributed system
WO2020114323A1 (en) Method and apparatus for customized speech synthesis
CN115019781A (en) Conversation service execution method, device, storage medium and electronic equipment
US11907676B1 (en) Processing orchestration for systems including distributed components
US11893996B1 (en) Supplemental content output
CN114724587A (en) Voice response method and device
CN117292705A (en) Audio processing method, device, electronic equipment and storage medium
CN117496941A (en) Voice data processing method, device and system
JP2020021040A (en) Information processing unit, sound output method, and sound output program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100000 Room D529, No. 501, Floor 5, Building 2, Fourth District, Wangjing Dongyuan, Chaoyang District, Beijing

Applicant after: Beijing Wuling Technology Co.,Ltd.

Address before: 100000 room 06, 2163, 13 / F, building 523, Wangjing Dongyuan, Chaoyang District, Beijing

Applicant before: Beijing Wuling Technology Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20221222

Address after: 100000 Room 815, Floor 8, Building 6, Yard 33, Guangshun North Street, Chaoyang District, Beijing

Applicant after: Luka (Beijing) Intelligent Technology Co.,Ltd.

Address before: 100000 Room D529, No. 501, Floor 5, Building 2, Fourth District, Wangjing Dongyuan, Chaoyang District, Beijing

Applicant before: Beijing Wuling Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant