CN108108340B

CN108108340B - Dialogue interaction method and system for intelligent robot

Info

Publication number: CN108108340B
Application number: CN201711215862.6A
Authority: CN
Inventors: 韦克礼; 赵媛媛
Original assignee: Beijing Guangnian Wuxian Technology Co Ltd
Current assignee: Beijing Guangnian Wuxian Technology Co Ltd
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2021-07-23
Anticipated expiration: 2037-11-28
Also published as: CN108108340A

Abstract

The invention discloses a dialogue interaction processing method and a system for an intelligent robot, wherein the method comprises the following steps: analyzing context dialog interaction information in the process of dialog interaction between the intelligent robot and a user to generate a corresponding topic label, wherein the topic label is used for marking a topic to which each turn of dialog interaction belongs; acquiring dialogue data output by a user in the current turn, and analyzing by combining a topic label of context dialogue interaction information to obtain a user intention; and generating dialog output data according to the user intention decision. According to the invention, after the voice information of the user is received, the output under the same topic can be generated by combining the current topic label, so that the continuity of conversation is ensured, the conversation quality is further improved, and the conversation experience of the user is promoted.

Description

Dialogue interaction method and system for intelligent robot

Technical Field

The invention relates to the field of intelligent robots, in particular to a dialogue interaction method and system for an intelligent robot.

Background

With the continuous development of science and technology and the introduction of information technology, computer technology and artificial intelligence technology, the research of robots has gradually gone out of the industrial field and gradually expanded to the fields of medical treatment, health care, families, entertainment, service industry and the like. The requirements of people on the robot are also improved from simple and repeated mechanical actions to an intelligent robot with anthropomorphic question answering, autonomy and interaction with other robots, and human-computer interaction also becomes an important factor for determining the development of the intelligent robot. Therefore, the improvement of the interaction capability of the intelligent robot and the improvement of the human-like nature and intelligence of the robot are important problems to be solved urgently at present.

Disclosure of Invention

One of the technical problems to be solved by the present invention is to provide a human-computer interaction method and system for an intelligent robot, which can ensure the continuity of a conversation, enhance the interest of the human-computer conversation, and improve the interaction experience of a user.

In order to solve the above technical problem, an embodiment of the present application first provides a dialogue interaction processing method for an intelligent robot, including the following steps: analyzing context dialog interaction information in the process of dialog interaction between the intelligent robot and a user to generate a corresponding topic label, wherein the topic label is used for marking a topic to which each turn of dialog interaction belongs; acquiring dialogue data output by a user in the current turn, and analyzing by combining a topic label of context dialogue interaction information to obtain a user intention; and generating dialogue interaction data according to the user intention decision.

Preferably, the topic tag determination is performed on each pair of conversations using a topic tag determination model formed by deep learning training of data of multiple conversations on the same topic.

Preferably, in the step of generating dialogue interaction data according to the user intention decision, dialogue interaction content matched with the topic tags is selected from a dialogue database, and combined with the current turn of user dialogue intention, dialogue interaction data is generated and output to the user, wherein the data of the dialogue database is labeled with different topic tags.

Preferably, in the dialogue database, corresponding reply modes under different topic labels are set for the same question; and after the topic label of the current turn is determined, generating dialogue interaction data by combining a corresponding reply mode.

Preferably, the method further comprises the following steps: identifying the user identity, and judging whether the current user is a child user; and if the user is a child user, carrying out conversation interaction based on a conversation database and a conversation label built for the child user.

According to another aspect of the embodiments of the present invention, there is also provided a dialogue interaction processing system for an intelligent robot, the system including the following modules: the topic tag determining module is used for analyzing the context conversation interaction information in the conversation interaction process of the intelligent robot and the user and generating a corresponding topic tag, wherein the topic tag is used for marking a topic to which each turn of conversation interaction belongs; the user intention analysis module is used for acquiring dialogue data output by a user in the current turn and analyzing the dialogue data in combination with the topic labels of the context dialogue interaction information to obtain the user intention; a dialogue data generation module that generates dialogue interaction data according to the user intent decision.

Preferably, the topic label determination module determines the topic label for each pair of conversations by using a topic label determination model formed by deep learning training of data of multiple turns of conversations on the same topic.

Preferably, the dialogue data generation module selects dialogue interaction content matched with the topic tags from a dialogue database, combines the dialogue intention of the user in the current turn, generates dialogue interaction data and outputs the dialogue interaction data to the user, wherein the data in the dialogue database are labeled with different topic tags.

Preferably, in the dialogue database, corresponding reply modes under different topic labels are set for the same question; and the dialogue data generation module is used for generating dialogue interaction data by combining a corresponding reply mode after determining the topic label of the current turn.

Preferably, the method further comprises the following steps: the user identity identification module is used for identifying the user identity and judging whether the current user is a child user; and the dialogue data generation module is used for carrying out dialogue interaction based on a dialogue database and a topic label built for the child user when the user is the child user.

According to another aspect of the embodiments of the present invention, there is also provided a dialogue interaction system for an intelligent robot, the system including: the cloud server is provided with the dialogue interaction processing system; the intelligent robot acquires multi-mode interaction data interacted with a user, sends the multi-mode interaction data to the cloud server, and outputs a dialogue interaction statement from the cloud server to the user.

Preferably, the intelligent robot is a story machine or a chat robot.

Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:

in the embodiment of the invention, context dialog interaction information is analyzed in the process of dialog interaction between the intelligent robot and the user to generate the corresponding topic tag, then dialog data output by the user in the current turn is acquired, the user intention is obtained by combining the analysis of the topic tag of the context dialog interaction information, and dialog output data is generated according to the user intention decision. According to the embodiment of the invention, the topic label generation model is trained through a deep learning method, so that the corresponding topic label can be determined for any turn of conversation, after the voice information of the user is received, the output under the same topic can be generated by combining the current topic label, the continuity of the conversation is ensured, the conversation quality is further improved, and the conversation experience of the user is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure and/or process particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the technology or prior art of the present application and are incorporated in and constitute a part of this specification. The drawings expressing the embodiments of the present application are used for explaining the technical solutions of the present application, and should not be construed as limiting the technical solutions of the present application.

Fig. 1 is a schematic application scenario diagram of a story machine or a chat robot according to an embodiment of the present application.

Fig. 2 is a functional structure block diagram of a story machine or a chat robot according to an embodiment of the present disclosure.

Fig. 3 is a schematic hardware structure diagram of a story machine or a chat robot according to an embodiment of the present disclosure.

Fig. 4 is a functional structure diagram of an example of a first dialog interaction processing system of a cloud server (cloud brain) according to an embodiment of the present disclosure.

Fig. 5 is a functional structure diagram of a second example of a dialogue interaction processing system of a cloud server (cloud brain) according to an embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating an example one of a dialogue interaction processing method for an intelligent robot according to an embodiment of the present application.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the accompanying drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and the features of the embodiments can be combined without conflict, and the technical solutions formed are all within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

In recent years, with the rapid development of artificial intelligence, chat robots have also received extensive attention from both academic and industrial fields. The chat robot is an intelligent robot simulating human beings to have conversations through natural language. At present, chat robots are mainly classified into five categories, namely online customer service, entertainment, education, personal assistants and intelligent question answering. Whatever the robot, a certain degree of interaction with the user can be achieved. However, in the current chat robot application scenario, the dialog between the robot and the user is often implemented by a question-answering system based on a knowledge base, which easily causes the dialog to be ended all the time after several rounds, and the extensibility of the chat is poor, which often makes the dialog boring and often ends after one or two turns of the dialog. In addition, some topics in the topic chat are contents chatting with entity nodes, some chat contents are contents without entities, and in the case that the chat contents do not have entities, the chat contents can only be supported by a database, but for data stored in the database, if no topic tag is made, the situation that answers are not asked often occurs. As shown in the following example, wherein Q represents the user side and a represents the robot side:

q: is there a good-looking movie recommendation?

A, the people can listen to the Roman holiday.

And Q is seen.

A1 (error): what is that, how the doctor said?

A2 (correct): then I recommend you a bar, the wolf war.

Obviously, the above a1 should be used more to make the answer of the topic "doctor sick". After the robot replies the content of A1, the experience that the user is very bad can be brought to the robot, the intelligence of the robot is considered to be lower, the enthusiasm for using the robot is further reduced, and after a wolf war movie is correctly recommended to the user, the user requirement can be better met.

The embodiment of the invention provides a dialogue interaction processing method and system for an intelligent robot.

For better understanding of the present invention, before the description of the embodiments, the terms "topic" and "topic label" in the present embodiment will be described slightly.

Topic: by using the narrow definition about the topic in linguistics, namely, the central main body of a sentence is the topic of the sentence. Since we want the topic to be better extended, all define the central body related attributes around the topic as sub-topics under the main topic. For example: "do you like Liu De Hua? "the central body is Liu De Hua, the language of the sentence is called" Liu De Hua ", if we define the main language of the discussion at this time as" Liu De Hua ", then when the next sentence asks" do you like to listen to his "ice rain"? The central subject is "ice rain" of Liudebhua "which is a sub-topic.

The topics are divided into main topics and subtopics, which are main topics and subtopics need to be identified, the main topics mainly refer to star characters in the field of star, and the subtopics are direct attributes of the main topics and are developed around the star characters. Furthermore, if the question is "do you like to hear" ice rain? "the chat topic is music, the main topic is" ice rain ", if next question" the song is Liu De Hua sung, do you know Liu De Hua? "that" Liu Dehua "is a sub-topic. In summary, a sentence may have multiple topic tags, a main topic under one topic and a sub-topic under another topic, such definitions being primarily related to the topic extension that follows.

The topic label is mainly used for marking the topic to which each turn of conversation interaction belongs in the process of man-machine conversation interaction. And marking corresponding topics for the question and answer sentences of each turn of conversation, setting topic IDs for the question and answer sentences marked with the topics, and storing the topic IDs in a memory. Since the decision-making dialogue output data mainly depends on the dialogue database in this embodiment, in order to more accurately screen response contents, topics of question-answer data in the dialogue database of this embodiment are also explicitly labeled in advance by using topic tags.

Examples

Fig. 1 is a schematic application scenario diagram of a story machine or a chat robot according to an embodiment of the present application. In the application scenario, a smart robot (also referred to as a "conversation robot") 20 and a cloud brain (cloud server) 10 are included, and the smart robot 20 performs voice conversation interaction with a user U. The robot 20 may be a physical robot shown in fig. 1, or may also be a robot application program installed on an intelligent device, where the intelligent device may be a conventional PC personal computer, a LapTop notebook computer, a holographic projection device, or the like, or may be a portable terminal device that can access the internet in a wireless manner such as a wireless lan or a mobile communication network. In the embodiment of the present application, the wireless terminal includes, but is not limited to, a mobile phone, a Netbook (Netbook), and the like, and the wireless terminal generally has functions of multimodal information acquisition, data transmission, and the like. The cloud brain 10 serves as a brain end of the intelligent robot 20, and is configured with a dialogue interaction processing system 100, where the system 100 is configured to process multimodal input data transmitted by the intelligent robot 20, such as parsing visual data, completing visual recognition and visual detection, and performing emotion calculation, cognitive calculation, semantic understanding, and the like, mainly voice data of a user in a dialogue interaction process, so as to decide dialogue voice to be output by the robot 20 or other multimodal output data.

It should be noted that the dialogue interaction method and system of the intelligent robot are also suitable for being applied to children AI devices, such as dialogue application scenes of children story machines (children AI devices capable of meeting audio and video requirements of children, such as music, stories, national schools and the like, and can be provided with cartoon IP images of animals and human beings), and in addition, the story machines can be controlled by intelligent handheld devices to complete setting and instruction execution of the intelligent robot.

The following describes the composition and functions of an intelligent robot according to the present invention, taking a chat robot in the form of an entity as an example.

Fig. 2 is a functional structure block diagram of a story machine or a chat robot according to an embodiment of the present disclosure. As shown in fig. 2, the robot mainly collects multi-modal interaction data interacted with a user, sends the multi-modal interaction data to the cloud server 10, and outputs a dialog interaction statement from the cloud server to the user. The robot control system mainly comprises an interactive information acquisition module 2110, a communication module 2120, a voice output unit 2130, a robot limb control unit 2210 and a posture sensor 2220.

The interactive information acquisition module 2110 acquires external interactive input information, which specifically includes a voice acquisition unit 2111 for acquiring external voice information, a touch sensor 2112 for acquiring external touch pressure data, and an image acquisition unit 2113 for acquiring external image information. The communication module 2120 sends the multi-modal information collected by the interaction information collecting module 2110 to the cloud brain 10 through the networking interaction unit 2121 for processing, and receives the dialog output data or other multi-modal decision data decided by the cloud brain 10 and obtained in response to the interaction intention of the user. The networking interaction unit 2121 realizes data interaction between the communication module 2120 and the cloud brain 10. The voice output unit 2130 outputs the matched voice response according to the voice control information. The robot limb control unit 2210 outputs a matched robot limb control signal according to the motion control information to drive the limb of the robot to make a corresponding motion. The attitude sensor 2220 monitors the current attitude of the robot, so that the robot can be prevented from forcibly executing actions regardless of the current attitude, and the situations of action attitude error or unbalanced fall and the like are avoided.

The electronic control system of the chat robot is configured into an upper computer system and a lower computer system in consideration of the difference in power supply requirements, data processing requirements and functions of the functional modules. The upper computer system and the lower computer system respectively comprise an independent main control board, and external circuit elements of the upper computer system and the lower computer system are connected to the respective main control boards. Therefore, on the premise of ensuring the integral integration level of the system, modules with resource conflict are separated, and the stable and efficient operation of the system is ensured.

In this example, different functional modules are distributed to the upper computer system and the lower computer system in the manner shown in fig. 2. Specifically, the robot limb control unit 2210 and the posture sensor 2220 are constructed in the lower computer system 220, and the other functional modules are constructed in the upper computer system 210.

In this embodiment, the system may further include a power display module that displays current power information of the robot. Considering that the data processing amount required by the electric quantity display is not high but certain power supply driving support (driving light emitting diodes) is required, the electric quantity display module is arranged in a lower computer system. In specific operation, the main control board of the upper computer system collects and sends current electric quantity information of the robot, and the electric quantity display module outputs corresponding electric quantity display according to the electric quantity information.

Furthermore, in order to facilitate the user to know the current interaction state of the robot, an interaction display module for displaying the current interaction state of the robot is further arranged in the lower computer system. In specific operation, a main control board of the upper computer system collects and sends the current interaction state of the robot, wherein the interaction state comprises a recording state, a voice/action output state and a semantic analysis state; and the interactive display module outputs corresponding interactive state display according to the interactive state.

As shown in the hardware structure block diagram of fig. 3, the main control board of the upper computer system is a main control board based on a full-anmic dual a20 processor, and is integrated with a wireless networking module (WiFi), a microphone noise reduction module, and an audio amplification module. The A20 processor is used for preprocessing and analyzing external interactive input information and generating a motion control command of robot movement; the WiFi networking module realizes data interaction with the cloud brain 10; the microphone noise reduction module and a microphone connected to the main control board realize the acquisition of external voice information; the audio amplification module and the loudspeaker connected to the main control panel realize the output of voice response.

The interface that host computer main control board 210 provided has: a capacitive touch interface 212, a three-wire interface, in line-order power supply (VCC) Ground (GND) Output (OUT), connected to the touch module 204; a serial communication interface 216, a three-wire interface, a wire sequence of a Ground (GND) uplink (RX) downlink (TX), which is connected to the serial communication interface 217 of the lower computer main control board 220; a Speaker interface 213, a two-wire interface, a line sequence of audio signal positive (Speaker +) audio signal negative (Speaker-) (in this example, there are 2 Speaker interfaces), which is connected to the Speaker 205; a microphone interface 211, a two-wire interface, a wire sequence being a microphone signal positive (Mic +) microphone signal negative (Mic-), which is connected to the microphone 203; a charging port 214, a two-wire interface, a line sequence is a power supply (VCC) Ground (GND), which is connected to the robot charging port 201 and to the power management module 215; the battery charging interface of the power management module 215, the two-wire interface, and the wire sequence are power input (DCIN) Ground (GND), which is connected to the lithium battery 202.

The main control board of the lower computer system is based on an ideological semiconductor microcontroller STM32, and a six-axis attitude sensor MPU6500 and a motor driving module are integrated on the main control board. Wherein, the microcontroller STM32 generates robot limb control signals; the six-axis attitude sensor MPU6500 monitors the current attitude of the robot; the motor driving module drives the robot limb to act.

The interface that the host computer main control board 220 provided has: the power supply interface is a two-wire interface, the wire sequence is VCC GND (not shown), and a voltage stabilizing chip 223 of the power supply management module is connected with the lithium battery 202 and the power supply management module 215 of the upper computer main control board 210 through the power supply interface; the serial port communication interface 227 is a three-wire interface, the wire sequence is GND RX TX, and data transmission is realized between the upper computer main control board 210 and the lower computer main control board 220 through serial port communication; each path of the three-path Motor interface is a two-line interface, the line sequence is Motor plus (Motor +) Motor minus (Motor-), and the motors (231, 232 and 233) of the robot are driven by the Motor drives (224, 225 and 226) through the three-path Motor interface (the three motors are respectively two leg motors and one arm Motor) to operate so as to realize the action of the robot; a system electric quantity display interface, a four-wire interface, a line sequence of output (IO) Ground (GND), and an electric quantity display lamp 206 (the electric quantity display lamp is a multi-color Light Emitting Diode (LED) lamp, and the three IO interfaces correspond to red R, green G and blue B respectively); the interactive display interface, the two-wire interface, the line sequence is PWM GND, is connected to the interactive display lamp 207 (the interactive display lamp is nose breathing lamp).

The system upper and lower computer main control boards 210 and 220 are controlled by a physical switch. The system startup process comprises:

the system is powered on, and the upper computer main control board 210 completes networking and initialization;

the interactive display lamp 207 of the lower computer main control board 220 is in a breathing state to wait for the initialization of the upper computer main control board 210 to be completed.

And (3) an interaction process:

the upper computer main control board 210 completes initialization, and the upper computer main control board 210 and the lower computer main control board 220 normally transmit data through serial ports (216 and 217);

the microphone 203 collects audio signals, the audio signals are subjected to noise reduction and amplification and are transmitted to the processing chip A20 of the upper computer main control board 210, the A20 transmits voice information to the cloud brain 10 through the networking module, the cloud brain 10 returns multi-mode decision data and transmits the multi-mode decision data back to the A20 through the networking module, the A20 controls the loudspeaker 205 to feed back voice responses of a user, and meanwhile the A20 transmits action control information (actions to be executed), electric quantity information and interaction state information to the lower computer main control board 220 through a serial port;

the lower computer main control board 220 receives the control instruction of A20 through the serial port, and multi-mode interactive actions such as electric quantity display, interactive display, leg and hand actions and the like are completed.

In the electric quantity display process, the electric quantity of the system is displayed through an RGB (red, green and blue) three-color lamp: r represents that the electric quantity is insufficient, B represents that the electric quantity is normal, and G represents that the electric quantity is sufficient. Meanwhile, the upper computer main control board 210 informs the user of the electric quantity condition through a loudspeaker. In the interactive display process, the LED lamp controlled by PWM: the user is prompted to be in a recording state at the moment when the robot is on, the user is prompted to be in a voice output state at the moment when the robot is off, and the user is prompted to be in a semantic analysis mode when the robot is on the internet by flashing.

The following describes the components and functions of the dialogue interaction system 100 of the cloud brain 10.

As shown in fig. 4, the dialogue interaction system 100 includes a topic tag determination module 110, a user intent parsing module 120, and a dialogue data generation module 130. The functions of the above modules will be specifically described below.

And the topic tag determining module 110 is used for analyzing the context dialog interaction information during the dialog interaction between the intelligent robot and the user and generating a corresponding topic tag, wherein the topic tag is used for marking the topic to which each turn of dialog interaction belongs.

Specifically, the topic tag determination module 110 generates corresponding text information in response to the voice information after receiving the voice information forwarded by the communication module 2120. First, speech information subjected to, for example, denoising preprocessing is subjected to comprehensive analysis of speech recognition, and text information corresponding to the speech information is generated. And then, performing text analysis on the text information, namely acquiring the specific semantic content of the text. Specifically, after the recognition result is obtained, the recognition result is semantically parsed by using a natural language processing technique. Semantic analysis refers to the conversion of a given natural language into some formal representation reflecting its meaning, i.e., the conversion of a natural language that a human being can understand into a formal language that a computer can understand. After the analysis result is obtained, semantic similarity (similarity of question and question) between the analysis result and the content in the set knowledge base is calculated, so that data matched with the analysis result is searched in the knowledge base. At this point, the parsing operation of the conversation interaction information is completed.

After semantic understanding, the topic of the text information can be determined by judging whether a specific vocabulary related to the topic exists in the obtained voice text information. The "specific word" is a word or phrase that has been set in advance and is related to a topic, for example, the name of a star, the name of a movie, and the like. Moreover, a person skilled in the art can update or add a specific vocabulary according to the current network technical terms or user requirements, so that the content of the database is richer, and the user experience is improved. Each vocabulary in the database of the 'specific vocabulary' can be traversed, the morphological similarity and/or semantic similarity of the obtained voice text information and each specific vocabulary is calculated, and whether the corresponding specific vocabulary exists in the voice text information or not is judged. And when the word shape similarity is larger than the threshold value and the numerical value is extremely large, judging that the voice text has the specific vocabulary without calculating the semantic similarity, otherwise, calculating the weighted sum of the semantic similarity and the word shape similarity to judge whether the specific vocabulary exists. The method for determining whether the specific vocabulary exists in the obtained speech text information may also be implemented by other technologies, which are not limited herein.

If no specific vocabulary is found, analyzing the user intention of the dialogue interaction information according to the topics of the first rounds of dialogue, and determining the topics based on the user intention. The following examples show:

q: is the latest action movie "warwolf 2" very nice looking, did you see?

A: i do not see.

When topic judgment is carried out on the content A in the wheel-to-speech interaction information, because no matched specific vocabulary is found, the topic information of the content A cannot be determined by only depending on the specific vocabulary, so that the user intention of the content A can be determined to be that the user does not see the film warwolf 2 by combining the topic of the previous interactive statement, namely the film warwolf 2, so that the topic can be determined to be the film warwolf 2. The corresponding relationship between the topic tag and the conversational interactive statement from which the topic tag was extracted is then saved in some memory, for example:

q: is the latest action movie "warwolf 2" very nice looking, did you see? [ PROBLEM-MOTION "Zhan Lang 2 ]

A: i do not see. [ PROBLEM-MOTION "Zhan Lang 2 ]

When the topic content of the next pair of conversation interaction information is determined, the topic determination processing can be well completed by calling the topic of the context conversation interaction from the memory.

In addition to determining the topic tags by looking up a particular vocabulary, in one preferred example, the topic tag determination module 110 determines the topic tags for each pair of conversations using a topic tag determination model formed by deep learning training of data for multiple turns of conversations on the same topic.

The specific learning method is as follows:

step 1, obtaining sample information for training a preset classifier. In this embodiment, multiple rounds of dialogue data under multiple topics are selected, and the sample dialogue data under each topic are used to train a preset classifier. Preferably, the speech marked as different subjects historically by manual classification can be collected as sample data, and the speech information is converted into text mode before training.

And 2, preprocessing the sample data, and removing noise texts such as ' o ', mo, bar ' and the like to obtain a training text.

And 3, extracting the text features of the training text.

Specifically, word segmentation processing may be performed on the training text according to a preset step length, and text features may be obtained based on the word segmentation result.

And 4, inputting the text features of the training text into a classifier for training to obtain a target classifier.

After the topic tag determination module 110 completes the speech-to-text processing of each pair of inputted interactive information, the processed text is inputted into the target classifier, and the topic of the turn of interactive information can be obtained. By the method, under the condition that the content obtained by voice-text conversion has no entity information, for example, when the conversation comprises the content without the entity information, the subject content can still be accurately determined, and the processing speed is faster than that of the above-mentioned specific vocabulary query method.

And the user intention analyzing module 120 is used for acquiring the dialog data output by the user in the current turn and analyzing the user intention by combining the topic label of the context dialog interaction information.

For the user dialogue data with rich information content, for example, the content including entity information, the user intention analyzing module 120 may generate corresponding text information from the voice information according to the semantic understanding operation of the topic tag determining module 110, and then perform semantic understanding to obtain the user intention. Considering that some user dialog contents are concise, generally, contents with actual meanings, such as "none", "seen", "not yet seen", and the like, are not included, and the real intention of the user cannot be directly recognized according to the text corresponding to the speech, therefore, when the intention of the user is analyzed, the intention of the user is preferably recognized in combination with the topic tags of the contextual dialog interaction.

Referring to the above example, the analysis result obtained by the voice recognition is "i'm does not see", and the topic information of the last conversation content is: movie- "warwolf 2", so by combining the two it can be determined that the true intent of the present dialog interaction is "no movie" warwolf 2 ". Compared with the prior art, if the user intentions obtained purely according to the analysis result of the current conversation content are various, the replied voice information is easy to deviate from the actual intention, and bad user experience is brought, and the intention is determined by combining the topic information of the context conversation, so that the problems can be solved well.

A dialogue data generation module 130 that generates dialogue interaction data according to the user intent decision.

Specifically, the dialogue data generation module 130 selects dialogue interaction content matching with the topic tags from the dialogue database 140 and combines the current turn of user dialogue intention, and generates and outputs dialogue interaction data to the user, wherein the data of the dialogue database 140 is labeled with different topic tags.

In the dialogue database 140, corresponding response modes under different topic labels are set for the same question. Specifically, the database 140 stores a list of questions and response contents, and has a structure in which response modes that can correspond to the same question (with a high probability) are sorted out, and topic identification is performed on questions with incomplete semantics according to the response contents. The list shown below:

the specific topic tag determination method can be realized by adopting a topic tag determination model, namely, response content is input into the model, and then a corresponding tag is obtained. By topic tagging the data in the database, an appropriate reply can be selected when a question Q input by a user is likely to occur on multiple topics.

And the dialogue data generation module 130 is used for generating dialogue interaction data by combining the corresponding reply mode after determining the current turn topic label. And matching output templates are available for the response modes set in the database, and generating dialogue interaction data based on the output templates.

Topic 1:

q: is there a good-looking movie recommendation?

A, the people can listen to the Roman holiday.

And Q is seen.

A2 (correct): then I recommend you a bar, and the just-reflected "wolf war 2" is good at.

Topic 2:

q: i catch a cold.

A is the difference in flow, to see the doctor?

And Q is seen.

A2: what is that, how the doctor said?

In addition, in other examples, as shown in fig. 5, the dialog interaction processing system 100 of the present invention may further include a user identity recognition module 150, which recognizes the user identity and determines whether the current user is a child user; and if the user is a child user, carrying out conversation interaction based on a conversation database and a topic label built for the child user.

Reference may be made in particular to the following child user identification method. For example, the image capturing unit 2113 captures the face information of the current user, and sends the face information to the user identification module 150 of the cloud brain 10 through the communication module 2120, which first detects the presence of the face from the scene and determines the position of the face. Then, after the face is detected, face recognition is carried out, namely the detected face to be recognized is compared and matched with different types of faces in the database to obtain related information. The face recognition can adopt a method of extracting geometric features of the face and a template matching method, and in the example, the template matching method is preferentially adopted. In addition, whether the current user is a child user can be identified through a sound feature detection mode, for example, the voice input by the user is identified, and whether the voice is a child sound is judged. In this embodiment, a voice recognition model is previously set in the user identification module 150, and the voice input by the user can be recognized by the voice recognition model to determine the category of the voice. The speech recognition model may be a machine learning model that may classify the class of speech after training and learning a large amount of sample data. Before testing the speech, the classifier needs to be trained to obtain the target classifier. The method specifically comprises the following steps:

step 1, obtaining sample voice for training a preset classifier. In this embodiment, the sound of the child may be sampled as sample voices, and the preset classifier is trained using the collected sample voices. Preferably, speech that has historically been labeled as child voices by manual classification may be collected as sample speech.

And 2, carrying out voice activity detection on the sample voice to remove silence in the training data to obtain the training voice.

And 3, extracting acoustic features of the training voice.

Specifically, the training speech may be framed according to a preset step size, and then acoustic features may be extracted from each frame of the training speech according to the preset step size, where the acoustic features may be Filter bank (Filter bank40, Fbank40) features or Mel-Frequency Cepstral Coefficients (MFCC) features.

And 4, inputting the acoustic features of the training voice into a classifier for training to obtain a target classifier.

The user identification module 150 is configured to mainly consider: the session database in the cloud brain 10 is relatively numerous and may contain sensitive content or bad network information. After the user identification module 150 identifies the child user, the cloud brain 10 further selects a dialogue database and a dialogue tag built for the child user to perform dialogue interaction, so that sensitive content can be prevented from being sent to the child user, and adverse effects on physical and psychological health of the child are prevented.

The structure of a dialogue database constructed for a child user is similar to that of the database, but the question and answer content mainly comprises educational content such as child education and entertainment, and sensitive and unintelligible information is shielded.

Fig. 6 is a flowchart illustrating an example one of a dialog interaction method for an intelligent robot according to an embodiment of the present application. The interaction flow of the present interactive system is described below with reference to fig. 6.

As shown in fig. 6, in step S610, the topic tag determination module 110 analyzes the context dialog interaction information during the dialog interaction between the intelligent robot and the user, and generates a corresponding topic tag, where the topic tag is used to label a topic to which each turn of dialog interaction belongs; in step S620, the user intention analyzing module 120 obtains the dialog data output by the user in the current turn, and obtains the user intention by analyzing the topic tag of the context dialog interaction information; in step S130, the dialogue data generation module 130 generates dialogue interaction data according to the user intention decision.

In the embodiment of the invention, the topic label generation model is trained through a deep learning method, so that the corresponding topic label can be determined for any turn of conversation, and after the voice information of the user is received, the output under the same topic can be generated by combining the current topic label, thereby ensuring the continuity of the conversation, further improving the conversation quality and improving the conversation experience of the user.

Supplementary notes

When the intelligent robot in the present embodiment is a story machine, the following features may be further provided in addition to the above-described features.

(1) The story machine can be used as a part of a family Internet of things and is interconnected with the WeChat;

(2) the functions of on-demand playing, collection, voice control interruption, sound and the like are achieved;

(3) the system has an OCR (Optical Character Recognition) function, and realizes the functions of reading a picture book and a book by voice;

(4) the content can be actively pushed according to the preference of the child user.

The method of the present invention is described as being implemented in a computer system. The computer system may be provided in a control core processor, for example. For example, the methods described herein may be implemented as software executable with control logic that is executed by a CPU in an operating system. The functionality described herein may be implemented as a set of program instructions stored in a non-transitory tangible computer readable medium. When implemented in this manner, the computer program comprises a set of instructions which, when executed by a computer, cause the computer to perform a method capable of carrying out the functions described above. Programmable logic may be temporarily or permanently installed in a non-transitory tangible computer-readable medium, such as a read-only memory chip, computer memory, disk, or other storage medium. In addition to being implemented in software, the logic described herein may be embodied using discrete components, integrated circuits, programmable logic used in conjunction with a programmable logic device such as a Field Programmable Gate Array (FPGA) or microprocessor, or any other device including any combination thereof. All such embodiments are intended to fall within the scope of the present invention.

It is to be understood that the disclosed embodiments of the invention are not limited to the process steps disclosed herein, but extend to equivalents thereof as would be understood by those skilled in the relevant art. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A dialogue interaction processing method for an intelligent robot comprises the following steps:

analyzing context dialog interaction information in the process of dialog interaction between the intelligent robot and a user to generate a corresponding topic label, wherein the topic label is used for marking a topic to which each turn of dialog interaction belongs;

the method for analyzing the context dialog interaction information and generating the corresponding topic label comprises the following steps:

performing voice recognition and semantic analysis on the preprocessed user interaction information to obtain a text and semantic content of the user interaction information, further determining data matched with the current user interaction information in a set knowledge base through a semantic similarity calculation method to serve as an analysis result of the conversation interaction information, judging whether specific words related to topics exist in the current user interaction information or not based on the analysis result, and if yes, determining the topics of the user interaction information;

if the topic label does not exist, determining the topic label of each pair of conversations by using a topic label determination model, wherein the topic label determination model is formed by performing deep learning training on data of multiple conversations on the same topic;

acquiring dialogue data output by a user in the current turn, and analyzing by combining a topic label of context dialogue interaction information to obtain a user intention;

and generating dialogue interaction data according to the user intention decision.

2. The method of claim 1, wherein, in the step of generating dialogue interaction data based on the user intent decision,

and selecting the conversation interactive content matched with the topic label from a conversation database, combining the conversation intention of the user in the current turn, generating conversation interactive data and outputting the conversation interactive data to the user, wherein the data in the conversation database is labeled with different topic labels.

3. The method of claim 2,

setting corresponding reply modes under different topic labels for the same question in the conversation database;

and after the topic label of the current turn is determined, generating dialogue interaction data by combining a corresponding reply mode.

4. The method according to any one of claims 1 to 3, further comprising:

identifying the user identity, and judging whether the current user is a child user;

and if the user is a child user, carrying out conversation interaction based on a conversation database and a conversation label built for the child user.

5. A dialogue interaction processing system for an intelligent robot, the system comprising the following modules:

the topic tag determining module is used for analyzing the context conversation interaction information in the conversation interaction process of the intelligent robot and the user and generating a corresponding topic tag, wherein the topic tag is used for marking a topic to which each turn of conversation interaction belongs;

the topic label determining module analyzes the context dialog interaction information through the following operations to generate a corresponding topic label:

the user intention analysis module is used for acquiring dialogue data output by a user in the current turn and analyzing the dialogue data in combination with the topic labels of the context dialogue interaction information to obtain the user intention;

a dialogue data generation module that generates dialogue interaction data according to the user intent decision.

6. The system of claim 5,

and the dialogue data generation module is used for selecting dialogue interaction content matched with the topic tags from a dialogue database, combining the dialogue intentions of the current turn of user, generating dialogue interaction data and outputting the dialogue interaction data to the user, wherein the data of the dialogue database is labeled with different topic tags.

7. The system of claim 6,

and the dialogue data generation module is used for generating dialogue interaction data by combining a corresponding reply mode after determining the topic label of the current turn.

8. The system of any one of claims 5 to 7, further comprising:

the user identity identification module is used for identifying the user identity and judging whether the current user is a child user;

and the dialogue data generation module is used for carrying out dialogue interaction based on a dialogue database and a topic label built for the child user when the user is the child user.

9. A conversational interaction system for an intelligent robot, the system comprising:

a cloud server provided with the dialogue interaction processing system according to any one of claims 5 to 8;

the intelligent robot acquires multi-mode interaction data interacted with a user, sends the multi-mode interaction data to the cloud server, and outputs a dialogue interaction statement from the cloud server to the user.

10. The dialog interaction system of claim 9,

the intelligent robot is a story machine or a chat robot.